[jira] [Commented] (HDFS-12684) Ozone: SCMMXBean NodeCount is overlapping with NodeManagerMXBean
[ https://issues.apache.org/jira/browse/HDFS-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212196#comment-16212196 ] Eric Yang commented on HDFS-12684: -- [~cheersyang] Thank you for the clarification. > Ozone: SCMMXBean NodeCount is overlapping with NodeManagerMXBean > > > Key: HDFS-12684 > URL: https://issues.apache.org/jira/browse/HDFS-12684 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Minor > Attachments: HDFS-12684-HDFS-7240.001.patch > > > I found this issue while reviewing HDFS-11468, from http://scm_host:9876/jmx, > both SCM and SCMNodeManager has {{NodeCount}} metrics > {noformat} > { > "name" : > "Hadoop:service=StorageContainerManager,name=StorageContainerManagerInfo,component=ServerRuntime", > "modelerType" : "org.apache.hadoop.ozone.scm.StorageContainerManager", > "ClientRpcPort" : "9860", > "DatanodeRpcPort" : "9861", > "NodeCount" : [ { > "key" : "STALE", > "value" : 0 > }, { > "key" : "DECOMMISSIONING", > "value" : 0 > }, { > "key" : "DECOMMISSIONED", > "value" : 0 > }, { > "key" : "FREE_NODE", > "value" : 0 > }, { > "key" : "RAFT_MEMBER", > "value" : 0 > }, { > "key" : "HEALTHY", > "value" : 0 > }, { > "key" : "DEAD", > "value" : 0 > }, { > "key" : "UNKNOWN", > "value" : 0 > } ], > "CompileInfo" : "2017-10-17T06:47Z xxx", > "Version" : "3.1.0-SNAPSHOT, r6019a25908ce75155656f13effd8e2e53ed43461", > "SoftwareVersion" : "3.1.0-SNAPSHOT", > "StartedTimeInMillis" : 1508393551065 > }, { > "name" : "Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo", > "modelerType" : "org.apache.hadoop.ozone.scm.node.SCMNodeManager", > "NodeCount" : [ { > "key" : "STALE", > "value" : 0 > }, { > "key" : "DECOMMISSIONING", > "value" : 0 > }, { > "key" : "DECOMMISSIONED", > "value" : 0 > }, { > "key" : "FREE_NODE", > "value" : 0 > }, { > "key" : "RAFT_MEMBER", > "value" : 0 > }, { > "key" : "HEALTHY", > "value" : 0 > }, { > "key" : "DEAD", > "value" : 0 > }, { > "key" : "UNKNOWN", > "value" : 0 > } ], > "OutOfChillMode" : false, > "MinimumChillModeNodes" : 1, > "ChillModeStatus" : "Still in chill mode, waiting on nodes to report in. > 0 nodes reported, minimal 1 nodes required." > } > {noformat} > hence, propose to remove {{NodeCount}} from {{SCMMXBean}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11468) Ozone: SCM: Add Node Metrics for SCM
[ https://issues.apache.org/jira/browse/HDFS-11468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-11468: - Status: Patch Available (was: Reopened) Re-trigger Jenkins. > Ozone: SCM: Add Node Metrics for SCM > > > Key: HDFS-11468 > URL: https://issues.apache.org/jira/browse/HDFS-11468 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Xiaoyu Yao >Assignee: Yiqun Lin >Priority: Critical > Labels: OzonePostMerge > Attachments: HDFS-11468-HDFS-7240.001.patch, > HDFS-11468-HDFS-7240.002.patch, HDFS-11468-HDFS-7240.003.patch, > HDFS-11468-HDFS-7240.004.patch > > > This ticket is opened to add node metrics in SCM based on heartbeat, node > report and container report from datanodes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-11468) Ozone: SCM: Add Node Metrics for SCM
[ https://issues.apache.org/jira/browse/HDFS-11468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin reopened HDFS-11468: -- > Ozone: SCM: Add Node Metrics for SCM > > > Key: HDFS-11468 > URL: https://issues.apache.org/jira/browse/HDFS-11468 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Xiaoyu Yao >Assignee: Yiqun Lin >Priority: Critical > Labels: OzonePostMerge > Attachments: HDFS-11468-HDFS-7240.001.patch, > HDFS-11468-HDFS-7240.002.patch, HDFS-11468-HDFS-7240.003.patch, > HDFS-11468-HDFS-7240.004.patch > > > This ticket is opened to add node metrics in SCM based on heartbeat, node > report and container report from datanodes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11468) Ozone: SCM: Add Node Metrics for SCM
[ https://issues.apache.org/jira/browse/HDFS-11468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-11468: - Resolution: Fixed Status: Resolved (was: Patch Available) > Ozone: SCM: Add Node Metrics for SCM > > > Key: HDFS-11468 > URL: https://issues.apache.org/jira/browse/HDFS-11468 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Xiaoyu Yao >Assignee: Yiqun Lin >Priority: Critical > Labels: OzonePostMerge > Attachments: HDFS-11468-HDFS-7240.001.patch, > HDFS-11468-HDFS-7240.002.patch, HDFS-11468-HDFS-7240.003.patch, > HDFS-11468-HDFS-7240.004.patch > > > This ticket is opened to add node metrics in SCM based on heartbeat, node > report and container report from datanodes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11468) Ozone: SCM: Add Node Metrics for SCM
[ https://issues.apache.org/jira/browse/HDFS-11468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212125#comment-16212125 ] Hadoop QA commented on HDFS-11468: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 15s{color} | {color:red} Docker failed to build yetus/hadoop:71bbb86. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-11468 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893167/HDFS-11468-HDFS-7240.004.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/21757/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Ozone: SCM: Add Node Metrics for SCM > > > Key: HDFS-11468 > URL: https://issues.apache.org/jira/browse/HDFS-11468 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Xiaoyu Yao >Assignee: Yiqun Lin >Priority: Critical > Labels: OzonePostMerge > Attachments: HDFS-11468-HDFS-7240.001.patch, > HDFS-11468-HDFS-7240.002.patch, HDFS-11468-HDFS-7240.003.patch, > HDFS-11468-HDFS-7240.004.patch > > > This ticket is opened to add node metrics in SCM based on heartbeat, node > report and container report from datanodes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12544) SnapshotDiff - support diff generation on any snapshot root descendant directory
[ https://issues.apache.org/jira/browse/HDFS-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212120#comment-16212120 ] Hadoop QA commented on HDFS-12544: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 42s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 3s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 41s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 468 unchanged - 5 fixed = 470 total (was 473) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 35s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 91m 52s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}141m 8s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes | | | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:ca8ddc6 | | JIRA Issue | HDFS-12544 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893147/HDFS-12544.04.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux fd96f11bdc8a 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ce7cf66 | | Default Java | 1.8.0_131 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/21756/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/21756/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/21756/testReport/ | | modules | C:
[jira] [Commented] (HDFS-12544) SnapshotDiff - support diff generation on any snapshot root descendant directory
[ https://issues.apache.org/jira/browse/HDFS-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212118#comment-16212118 ] Hadoop QA commented on HDFS-12544: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 32s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 1s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 39s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 466 unchanged - 5 fixed = 468 total (was 471) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 29s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 35s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}142m 22s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestReplication | | | hadoop.hdfs.server.federation.router.TestNamenodeHeartbeat | | | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:ca8ddc6 | | JIRA Issue | HDFS-12544 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893147/HDFS-12544.04.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux f3dba8225884 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 0f1c037 | | Default Java | 1.8.0_131 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/21755/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/21755/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/21755/testReport/
[jira] [Updated] (HDFS-11468) Ozone: SCM: Add Node Metrics for SCM
[ https://issues.apache.org/jira/browse/HDFS-11468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-11468: - Attachment: HDFS-11468-HDFS-7240.004.patch Attach new patch that removing node metrics related change. Thanks [~cheersyang]. > Ozone: SCM: Add Node Metrics for SCM > > > Key: HDFS-11468 > URL: https://issues.apache.org/jira/browse/HDFS-11468 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Xiaoyu Yao >Assignee: Yiqun Lin >Priority: Critical > Labels: OzonePostMerge > Attachments: HDFS-11468-HDFS-7240.001.patch, > HDFS-11468-HDFS-7240.002.patch, HDFS-11468-HDFS-7240.003.patch, > HDFS-11468-HDFS-7240.004.patch > > > This ticket is opened to add node metrics in SCM based on heartbeat, node > report and container report from datanodes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12482) Provide a configuration to adjust the weight of EC recovery tasks to adjust the speed of recovery
[ https://issues.apache.org/jira/browse/HDFS-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212087#comment-16212087 ] Xiao Chen commented on HDFS-12482: -- Thanks [~eddyxu] for revving. Looks good in general, a few comments: - let's add input validation for {{xmitWeight}}. Although {{Math.max}} will use 1, negative values and 0 looks invalid. Any thoughts on whether we should have an upper bound limit? - in the .md documentation, suggest to add an example so people can know what to set without checking code. Maybe something like 'For example, if there are 2 read streams and 1 write stream, xmits weight of 0.5 means recovery will be scheduled for 1 EC block and 1 replicated block. xmits weight of 1.0 means xxx' (Or better words) - Test looks great. Trivially suggest we bump the timeout to at least 3 minutes, since it took 20 seconds on my local. > Provide a configuration to adjust the weight of EC recovery tasks to adjust > the speed of recovery > - > > Key: HDFS-12482 > URL: https://issues.apache.org/jira/browse/HDFS-12482 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Affects Versions: 3.0.0-alpha4 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-12482.00.patch, HDFS-12482.01.patch > > > The relative speed of EC recovery comparing to 3x replica recovery is a > function of (EC codec, number of sources, NIC speed, and CPU speed, and etc). > Currently the EC recovery has a fixed {{xmitsInProgress}} of {{max(# of > sources, # of targets)}} comparing to {{1}} for 3x replica recovery, and NN > uses {{xmitsInProgress}} to decide how much recovery tasks to schedule to the > DataNode this we can add a coefficient for user to tune the weight of EC > recovery tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12448) Make sure user defined erasure coding policy ID will not overflow
[ https://issues.apache.org/jira/browse/HDFS-12448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212085#comment-16212085 ] Hudson commented on HDFS-12448: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13114 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13114/]) HDFS-12448. Make sure user defined erasure coding policy ID will not (kai.zheng: rev ce7cf66e5ed74c124afdb5a6902fbf297211cc04) * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/erasurecode/ErasureCodeConstants.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestErasureCodingPolicies.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ErasureCodingPolicyManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSErasureCoding.md > Make sure user defined erasure coding policy ID will not overflow > - > > Key: HDFS-12448 > URL: https://issues.apache.org/jira/browse/HDFS-12448 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Reporter: SammiChen >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > Fix For: 3.0.0 > > Attachments: HDFS-12448.001.patch, HDFS-12448.002.patch > > > Current policy ID is of type "byte". 1~63 is reserved for built-in erasure > coding policy. 64 above is for user defined erasure coding policy. Make sure > user policy ID will not overflow when addErasureCodingPolicy API is called. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12688) HDFS File Not Removed Despite Successful "Moved to .Trash" Message
[ https://issues.apache.org/jira/browse/HDFS-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212035#comment-16212035 ] Shriya Gupta commented on HDFS-12688: - I suspected that same asynchronous behaviour when this issue was brought to my attention by a user. However, at no point in time am I running two copies of the job at the same time. Each time, I have launched my script only after the previous execution would end. I suggested running it multiple times only because the error occurs randomly -- it has also shown up at the very first run of the script sometimes. > HDFS File Not Removed Despite Successful "Moved to .Trash" Message > -- > > Key: HDFS-12688 > URL: https://issues.apache.org/jira/browse/HDFS-12688 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.6.0 >Reporter: Shriya Gupta >Priority: Critical > > Wrote a simple script to delete and create a file and ran it multiple times. > However, some executions of the script randomly threw a FileAlreadyExists > error while the others succeeded despite successful hdfs dfs -rm command. The > script is as below, I have reproduced it in two different environments -- > hdfs dfs -ls /user/shriya/shell_test/ > echo "starting hdfs remove **" > hdfs dfs -rm -r -f /user/shriya/shell_test/wordcountOutput > echo "hdfs compeleted!" > hdfs dfs -ls /user/shriya/shell_test/ > echo "starting mapReduce***" > mapred job -libjars > /data/home/shriya/shell_test/hadoop-mapreduce-client-jobclient-2.7.1.jar > -submit /data/home/shriya/shell_test/wordcountJob.xml > The message confirming successful move -- > 17/10/19 14:49:12 INFO fs.TrashPolicyDefault: Moved: > 'hdfs://nameservice1/user/shriya/shell_test/wordcountOutput' to trash at: > hdfs://nameservice1/user/shriya/.Trash/Current/user/shriya/shell_test/wordcountOutput1508438952728 > The contents of subsequent -ls after -rm also showed that the file still > existed) > The error I got when my MapReduce job tried to create the file -- > 17/10/19 14:50:00 WARN security.UserGroupInformation: > PriviledgedActionException as: (auth:KERBEROS) > cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory > hdfs://nameservice1/user/shriya/shell_test/wordcountOutput already exists > Exception in thread "main" > org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory > hdfs://nameservice1/user/shriya/shell_test/wordcountOutput already exists > at > org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131) > at > org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:272) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304) > at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:315) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1277) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12684) Ozone: SCMMXBean NodeCount is overlapping with NodeManagerMXBean
[ https://issues.apache.org/jira/browse/HDFS-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12684: --- Summary: Ozone: SCMMXBean NodeCount is overlapping with NodeManagerMXBean (was: Ozone: SCM metrics NodeCount is overlapping with node manager metrics) > Ozone: SCMMXBean NodeCount is overlapping with NodeManagerMXBean > > > Key: HDFS-12684 > URL: https://issues.apache.org/jira/browse/HDFS-12684 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Minor > Attachments: HDFS-12684-HDFS-7240.001.patch > > > I found this issue while reviewing HDFS-11468, from http://scm_host:9876/jmx, > both SCM and SCMNodeManager has {{NodeCount}} metrics > {noformat} > { > "name" : > "Hadoop:service=StorageContainerManager,name=StorageContainerManagerInfo,component=ServerRuntime", > "modelerType" : "org.apache.hadoop.ozone.scm.StorageContainerManager", > "ClientRpcPort" : "9860", > "DatanodeRpcPort" : "9861", > "NodeCount" : [ { > "key" : "STALE", > "value" : 0 > }, { > "key" : "DECOMMISSIONING", > "value" : 0 > }, { > "key" : "DECOMMISSIONED", > "value" : 0 > }, { > "key" : "FREE_NODE", > "value" : 0 > }, { > "key" : "RAFT_MEMBER", > "value" : 0 > }, { > "key" : "HEALTHY", > "value" : 0 > }, { > "key" : "DEAD", > "value" : 0 > }, { > "key" : "UNKNOWN", > "value" : 0 > } ], > "CompileInfo" : "2017-10-17T06:47Z xxx", > "Version" : "3.1.0-SNAPSHOT, r6019a25908ce75155656f13effd8e2e53ed43461", > "SoftwareVersion" : "3.1.0-SNAPSHOT", > "StartedTimeInMillis" : 1508393551065 > }, { > "name" : "Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo", > "modelerType" : "org.apache.hadoop.ozone.scm.node.SCMNodeManager", > "NodeCount" : [ { > "key" : "STALE", > "value" : 0 > }, { > "key" : "DECOMMISSIONING", > "value" : 0 > }, { > "key" : "DECOMMISSIONED", > "value" : 0 > }, { > "key" : "FREE_NODE", > "value" : 0 > }, { > "key" : "RAFT_MEMBER", > "value" : 0 > }, { > "key" : "HEALTHY", > "value" : 0 > }, { > "key" : "DEAD", > "value" : 0 > }, { > "key" : "UNKNOWN", > "value" : 0 > } ], > "OutOfChillMode" : false, > "MinimumChillModeNodes" : 1, > "ChillModeStatus" : "Still in chill mode, waiting on nodes to report in. > 0 nodes reported, minimal 1 nodes required." > } > {noformat} > hence, propose to remove {{NodeCount}} from {{SCMMXBean}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12448) Make sure user defined erasure coding policy ID will not overflow
[ https://issues.apache.org/jira/browse/HDFS-12448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-12448: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Committed to trunk and 3.0 branch. Thanks [~HuafengWang] for the contribution! > Make sure user defined erasure coding policy ID will not overflow > - > > Key: HDFS-12448 > URL: https://issues.apache.org/jira/browse/HDFS-12448 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Reporter: SammiChen >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > Fix For: 3.0.0 > > Attachments: HDFS-12448.001.patch, HDFS-12448.002.patch > > > Current policy ID is of type "byte". 1~63 is reserved for built-in erasure > coding policy. 64 above is for user defined erasure coding policy. Make sure > user policy ID will not overflow when addErasureCodingPolicy API is called. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12684) Ozone: SCM metrics NodeCount is overlapping with node manager metrics
[ https://issues.apache.org/jira/browse/HDFS-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212033#comment-16212033 ] Weiwei Yang commented on HDFS-12684: Hi [~eyang] Sorry I was not giving a clear title, this is actually the SCMNodeManager in Ozone branch, similar name with yarn NodeManager but they are two different services. Just updated the JIRA title to avoid confusing. > Ozone: SCM metrics NodeCount is overlapping with node manager metrics > - > > Key: HDFS-12684 > URL: https://issues.apache.org/jira/browse/HDFS-12684 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Minor > Attachments: HDFS-12684-HDFS-7240.001.patch > > > I found this issue while reviewing HDFS-11468, from http://scm_host:9876/jmx, > both SCM and SCMNodeManager has {{NodeCount}} metrics > {noformat} > { > "name" : > "Hadoop:service=StorageContainerManager,name=StorageContainerManagerInfo,component=ServerRuntime", > "modelerType" : "org.apache.hadoop.ozone.scm.StorageContainerManager", > "ClientRpcPort" : "9860", > "DatanodeRpcPort" : "9861", > "NodeCount" : [ { > "key" : "STALE", > "value" : 0 > }, { > "key" : "DECOMMISSIONING", > "value" : 0 > }, { > "key" : "DECOMMISSIONED", > "value" : 0 > }, { > "key" : "FREE_NODE", > "value" : 0 > }, { > "key" : "RAFT_MEMBER", > "value" : 0 > }, { > "key" : "HEALTHY", > "value" : 0 > }, { > "key" : "DEAD", > "value" : 0 > }, { > "key" : "UNKNOWN", > "value" : 0 > } ], > "CompileInfo" : "2017-10-17T06:47Z xxx", > "Version" : "3.1.0-SNAPSHOT, r6019a25908ce75155656f13effd8e2e53ed43461", > "SoftwareVersion" : "3.1.0-SNAPSHOT", > "StartedTimeInMillis" : 1508393551065 > }, { > "name" : "Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo", > "modelerType" : "org.apache.hadoop.ozone.scm.node.SCMNodeManager", > "NodeCount" : [ { > "key" : "STALE", > "value" : 0 > }, { > "key" : "DECOMMISSIONING", > "value" : 0 > }, { > "key" : "DECOMMISSIONED", > "value" : 0 > }, { > "key" : "FREE_NODE", > "value" : 0 > }, { > "key" : "RAFT_MEMBER", > "value" : 0 > }, { > "key" : "HEALTHY", > "value" : 0 > }, { > "key" : "DEAD", > "value" : 0 > }, { > "key" : "UNKNOWN", > "value" : 0 > } ], > "OutOfChillMode" : false, > "MinimumChillModeNodes" : 1, > "ChillModeStatus" : "Still in chill mode, waiting on nodes to report in. > 0 nodes reported, minimal 1 nodes required." > } > {noformat} > hence, propose to remove {{NodeCount}} from {{SCMMXBean}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12448) Make sure user defined erasure coding policy ID will not overflow
[ https://issues.apache.org/jira/browse/HDFS-12448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212021#comment-16212021 ] Kai Zheng commented on HDFS-12448: -- +1 on the latest patch. > Make sure user defined erasure coding policy ID will not overflow > - > > Key: HDFS-12448 > URL: https://issues.apache.org/jira/browse/HDFS-12448 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Reporter: SammiChen >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-12448.001.patch, HDFS-12448.002.patch > > > Current policy ID is of type "byte". 1~63 is reserved for built-in erasure > coding policy. 64 above is for user defined erasure coding policy. Make sure > user policy ID will not overflow when addErasureCodingPolicy API is called. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12620) Backporting HDFS-10467 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-12620: --- Issue Type: Sub-task (was: Improvement) Parent: HDFS-10467 > Backporting HDFS-10467 to branch-2 > -- > > Key: HDFS-12620 > URL: https://issues.apache.org/jira/browse/HDFS-12620 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri > Attachments: HDFS-10467-branch-2.001.patch, > HDFS-10467-branch-2.002.patch, HDFS-10467-branch-2.003.patch, > HDFS-10467-branch-2.004.patch, HDFS-10467-branch-2.patch, > HDFS-12620-branch-2.000.patch, HDFS-12620-branch-2.004.patch, > HDFS-12620-branch-2.005.patch, HDFS-12620-branch-2.006.patch, > HDFS-12620-branch-2.007.patch, HDFS-12620-branch-2.008.patch, > HDFS-12620-branch-2.009.patch, HDFS-12620-branch-2.010.patch, > HDFS-12620-branch-2.011.patch, HDFS-12620-branch-2.012.patch, > HDFS-12620.000.patch > > > When backporting HDFS-10467, there are a few things that changed: > * {{bin\hdfs}} > * {{ClientProtocol}} > * Java 7 not supporting referencing functions > * {{org.eclipse.jetty.util.ajax.JSON}} in branch-2 is > {{org.mortbay.util.ajax.JSON}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12620) Backporting HDFS-10467 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-12620: --- Resolution: Fixed Hadoop Flags: Reviewed Target Version/s: 2.9.0, 3.0.0 Status: Resolved (was: Patch Available) > Backporting HDFS-10467 to branch-2 > -- > > Key: HDFS-12620 > URL: https://issues.apache.org/jira/browse/HDFS-12620 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri > Attachments: HDFS-10467-branch-2.001.patch, > HDFS-10467-branch-2.002.patch, HDFS-10467-branch-2.003.patch, > HDFS-10467-branch-2.004.patch, HDFS-10467-branch-2.patch, > HDFS-12620-branch-2.000.patch, HDFS-12620-branch-2.004.patch, > HDFS-12620-branch-2.005.patch, HDFS-12620-branch-2.006.patch, > HDFS-12620-branch-2.007.patch, HDFS-12620-branch-2.008.patch, > HDFS-12620-branch-2.009.patch, HDFS-12620-branch-2.010.patch, > HDFS-12620-branch-2.011.patch, HDFS-12620-branch-2.012.patch, > HDFS-12620.000.patch > > > When backporting HDFS-10467, there are a few things that changed: > * {{bin\hdfs}} > * {{ClientProtocol}} > * Java 7 not supporting referencing functions > * {{org.eclipse.jetty.util.ajax.JSON}} in branch-2 is > {{org.mortbay.util.ajax.JSON}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12620) Backporting HDFS-10467 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212000#comment-16212000 ] Íñigo Goiri edited comment on HDFS-12620 at 10/20/17 1:11 AM: -- Thanks [~subru], [~asuresh], and [~chris.douglas] for the review. I did the cherry pick to branch-2 and pushed the fixes. [^HDFS-10467-branch-2.004.patch] contains the full HDFS-10467 backporting to branch-2 including the fixes required for branch-2. [^HDFS-12620-branch-2.012.patch] is just the changes required for branch-2. Finally, one of the fixes needed to be ported to trunk, so I added [^HDFS-12620.000.patch] to trunk and branch-3.0. I ran the unit tests before pushing and everything passed so it should be fine. was (Author: elgoiri): Thanks [~subru], [~asuresh], and [~chris.douglas] for the review. I did the cherry pick to branch-2 and pushed the fixes. [^HDFS-10467-branch-2.004.patch] contains the full HDFS-10467 backporting to branch-2 including the fixes required for branch-2. [^HDFS-12620-branch-2.012.patch] is just the changes required for branch-2. Finally, one of the fixes needed to be ported to trunk, so I added [^HDFS-12620.000.patch] to trunk and branch-3. I ran the unit tests before pushing and everything passed so it should be fine. > Backporting HDFS-10467 to branch-2 > -- > > Key: HDFS-12620 > URL: https://issues.apache.org/jira/browse/HDFS-12620 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri > Attachments: HDFS-10467-branch-2.001.patch, > HDFS-10467-branch-2.002.patch, HDFS-10467-branch-2.003.patch, > HDFS-10467-branch-2.004.patch, HDFS-10467-branch-2.patch, > HDFS-12620-branch-2.000.patch, HDFS-12620-branch-2.004.patch, > HDFS-12620-branch-2.005.patch, HDFS-12620-branch-2.006.patch, > HDFS-12620-branch-2.007.patch, HDFS-12620-branch-2.008.patch, > HDFS-12620-branch-2.009.patch, HDFS-12620-branch-2.010.patch, > HDFS-12620-branch-2.011.patch, HDFS-12620-branch-2.012.patch, > HDFS-12620.000.patch > > > When backporting HDFS-10467, there are a few things that changed: > * {{bin\hdfs}} > * {{ClientProtocol}} > * Java 7 not supporting referencing functions > * {{org.eclipse.jetty.util.ajax.JSON}} in branch-2 is > {{org.mortbay.util.ajax.JSON}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12620) Backporting HDFS-10467 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-12620: --- Attachment: HDFS-12620.000.patch > Backporting HDFS-10467 to branch-2 > -- > > Key: HDFS-12620 > URL: https://issues.apache.org/jira/browse/HDFS-12620 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri > Attachments: HDFS-10467-branch-2.001.patch, > HDFS-10467-branch-2.002.patch, HDFS-10467-branch-2.003.patch, > HDFS-10467-branch-2.004.patch, HDFS-10467-branch-2.patch, > HDFS-12620-branch-2.000.patch, HDFS-12620-branch-2.004.patch, > HDFS-12620-branch-2.005.patch, HDFS-12620-branch-2.006.patch, > HDFS-12620-branch-2.007.patch, HDFS-12620-branch-2.008.patch, > HDFS-12620-branch-2.009.patch, HDFS-12620-branch-2.010.patch, > HDFS-12620-branch-2.011.patch, HDFS-12620-branch-2.012.patch, > HDFS-12620.000.patch > > > When backporting HDFS-10467, there are a few things that changed: > * {{bin\hdfs}} > * {{ClientProtocol}} > * Java 7 not supporting referencing functions > * {{org.eclipse.jetty.util.ajax.JSON}} in branch-2 is > {{org.mortbay.util.ajax.JSON}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12620) Backporting HDFS-10467 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-12620: --- Attachment: HDFS-12620-branch-2.012.patch HDFS-10467-branch-2.004.patch Thanks [~subru], [~asuresh], and [~chris.douglas] for the review. I did the cherry pick to branch-2 and pushed the fixes. [^HDFS-10467-branch-2.004.patch] contains the full HDFS-10467 backporting to branch-2 including the fixes required for branch-2. [^HDFS-12620-branch-2.012.patch] is just the changes required for branch-2. Finally, one of the fixes needed to be ported to trunk, so I added [^HDFS-12620.000.patch] to trunk and branch-3. I ran the unit tests before pushing and everything passed so it should be fine. > Backporting HDFS-10467 to branch-2 > -- > > Key: HDFS-12620 > URL: https://issues.apache.org/jira/browse/HDFS-12620 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri > Attachments: HDFS-10467-branch-2.001.patch, > HDFS-10467-branch-2.002.patch, HDFS-10467-branch-2.003.patch, > HDFS-10467-branch-2.004.patch, HDFS-10467-branch-2.patch, > HDFS-12620-branch-2.000.patch, HDFS-12620-branch-2.004.patch, > HDFS-12620-branch-2.005.patch, HDFS-12620-branch-2.006.patch, > HDFS-12620-branch-2.007.patch, HDFS-12620-branch-2.008.patch, > HDFS-12620-branch-2.009.patch, HDFS-12620-branch-2.010.patch, > HDFS-12620-branch-2.011.patch, HDFS-12620-branch-2.012.patch > > > When backporting HDFS-10467, there are a few things that changed: > * {{bin\hdfs}} > * {{ClientProtocol}} > * Java 7 not supporting referencing functions > * {{org.eclipse.jetty.util.ajax.JSON}} in branch-2 is > {{org.mortbay.util.ajax.JSON}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12544) SnapshotDiff - support diff generation on any snapshot root descendant directory
[ https://issues.apache.org/jira/browse/HDFS-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12544: -- Attachment: HDFS-12544.04.patch Attached v04 patch to address the following: 1. Handled file rename/move case for the snapshot scope directory. 2. New unit test for the file rename. 3. Added more comments in the test and snapshot manager. 4. Fixed typos pointed out by Yongjun in the previous comment. [~yzhangal], can you please take a look at the patch? > SnapshotDiff - support diff generation on any snapshot root descendant > directory > > > Key: HDFS-12544 > URL: https://issues.apache.org/jira/browse/HDFS-12544 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12544.01.patch, HDFS-12544.02.patch, > HDFS-12544.03.patch, HDFS-12544.04.patch > > > {noformat} > # hdfs snapshotDiff > > {noformat} > Using snapshot diff command, we can generate a diff report between any two > given snapshots under a snapshot root directory. The command today only > accepts the path that is a snapshot root. There are many deployments where > the snapshot root is configured at the higher level directory but the diff > report needed is only for a specific directory under the snapshot root. In > these cases, the diff report can be filtered for changes pertaining to the > directory we are interested in. But when the snapshot root directory is very > huge, the snapshot diff report generation can take minutes even if we are > interested to know the changes only in a small directory. So, it would be > highly performant if the diff report calculation can be limited to only the > interesting sub-directory of the snapshot root instead of the whole snapshot > root. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12620) Backporting HDFS-10467 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-12620: --- Attachment: (was: HDFS-12620-branch-2.fixes.patch) > Backporting HDFS-10467 to branch-2 > -- > > Key: HDFS-12620 > URL: https://issues.apache.org/jira/browse/HDFS-12620 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri > Attachments: HDFS-10467-branch-2.001.patch, > HDFS-10467-branch-2.002.patch, HDFS-10467-branch-2.003.patch, > HDFS-10467-branch-2.patch, HDFS-12620-branch-2.000.patch, > HDFS-12620-branch-2.004.patch, HDFS-12620-branch-2.005.patch, > HDFS-12620-branch-2.006.patch, HDFS-12620-branch-2.007.patch, > HDFS-12620-branch-2.008.patch, HDFS-12620-branch-2.009.patch, > HDFS-12620-branch-2.010.patch, HDFS-12620-branch-2.011.patch > > > When backporting HDFS-10467, there are a few things that changed: > * {{bin\hdfs}} > * {{ClientProtocol}} > * Java 7 not supporting referencing functions > * {{org.eclipse.jetty.util.ajax.JSON}} in branch-2 is > {{org.mortbay.util.ajax.JSON}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12671) [READ] Test NameNode restarts when PROVIDED is configured
[ https://issues.apache.org/jira/browse/HDFS-12671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virajith Jalaparti updated HDFS-12671: -- Summary: [READ] Test NameNode restarts when PROVIDED is configured (was: [READ] Test NameNode restarts) > [READ] Test NameNode restarts when PROVIDED is configured > - > > Key: HDFS-12671 > URL: https://issues.apache.org/jira/browse/HDFS-12671 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Virajith Jalaparti > Attachments: HDFS-12671-HDFS-9806.001.patch > > > Add test case to ensure namenode restarts can be handled with provided > storage. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12544) SnapshotDiff - support diff generation on any snapshot root descendant directory
[ https://issues.apache.org/jira/browse/HDFS-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211876#comment-16211876 ] Manoj Govindassamy commented on HDFS-12544: --- Thanks for the review comments [~yzhangal]. Good discussion on the file rename behavior w.r.t snapshot diff for descendant directory. Thats right, the renamed files still show up in the diff report as "R" entry even though they are moved out of the scope (descendant) directory. To get the same behavior as the normal snapshot diff report, these renamed files whose target is not under the scoped directory should be shown as "D" deleted entries in the report. Will post a new patch to handle this case. > SnapshotDiff - support diff generation on any snapshot root descendant > directory > > > Key: HDFS-12544 > URL: https://issues.apache.org/jira/browse/HDFS-12544 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12544.01.patch, HDFS-12544.02.patch, > HDFS-12544.03.patch > > > {noformat} > # hdfs snapshotDiff > > {noformat} > Using snapshot diff command, we can generate a diff report between any two > given snapshots under a snapshot root directory. The command today only > accepts the path that is a snapshot root. There are many deployments where > the snapshot root is configured at the higher level directory but the diff > report needed is only for a specific directory under the snapshot root. In > these cases, the diff report can be filtered for changes pertaining to the > directory we are interested in. But when the snapshot root directory is very > huge, the snapshot diff report generation can take minutes even if we are > interested to know the changes only in a small directory. So, it would be > highly performant if the diff report calculation can be limited to only the > interesting sub-directory of the snapshot root instead of the whole snapshot > root. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12682) ECAdmin -listPolicies will always show policy state as DISABLED
[ https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-12682: --- Priority: Blocker (was: Major) > ECAdmin -listPolicies will always show policy state as DISABLED > --- > > Key: HDFS-12682 > URL: https://issues.apache.org/jira/browse/HDFS-12682 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Blocker > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12682.01.patch > > > On a real cluster, {{hdfs ec -listPolicies}} will always show policy state as > DISABLED. > {noformat} > [hdfs@nightly6x-1 root]$ hdfs ec -listPolicies > Erasure Coding Policies: > ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5, State=DISABLED] > ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2, State=DISABLED] > ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1, State=DISABLED] > ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, > Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], > CellSize=1048576, Id=3, State=DISABLED] > ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, > numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4, State=DISABLED] > [hdfs@nightly6x-1 root]$ hdfs ec -getPolicy -path /ecec > XOR-2-1-1024k > {noformat} > This is because when [deserializing > protobuf|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java#L2942], > the static instance of [SystemErasureCodingPolicies > class|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SystemErasureCodingPolicies.java#L101] > is first checked, and always returns the cached policy objects, which are > created by default with state=DISABLED. > All the existing unit tests pass, because that static instance that the > client (e.g. ECAdmin) reads in unit test is updated by NN. :) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12544) SnapshotDiff - support diff generation on any snapshot root descendant directory
[ https://issues.apache.org/jira/browse/HDFS-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211833#comment-16211833 ] Yongjun Zhang commented on HDFS-12544: -- Hi [~manojg], One thought, about snapshotDiff to scoped dir, if we have a move operation "mv x y", where x is in the scoped dir, and y is not in scoped dir, but both are in snapshot root hierarchy. There are two issues: # If we do snapshotDiff on snapshot root, we will get a "rename" operation, this helps distcp to do a "sync" without copying renamed file. With the change of this jira, we will lose the optimization. If the different sub-dirs are very independent, then we are fine. We can document this though. # With this change, it's expected that we get a "delete" entry when doing snapshotDiff at the source dir and "new" entry when doing snapshotDiff at the target dir. Would you please confirm if this is the case? # If the result of 2 is not as expected, we need to have new patch rev to fix it. Thanks. > SnapshotDiff - support diff generation on any snapshot root descendant > directory > > > Key: HDFS-12544 > URL: https://issues.apache.org/jira/browse/HDFS-12544 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12544.01.patch, HDFS-12544.02.patch, > HDFS-12544.03.patch > > > {noformat} > # hdfs snapshotDiff > > {noformat} > Using snapshot diff command, we can generate a diff report between any two > given snapshots under a snapshot root directory. The command today only > accepts the path that is a snapshot root. There are many deployments where > the snapshot root is configured at the higher level directory but the diff > report needed is only for a specific directory under the snapshot root. In > these cases, the diff report can be filtered for changes pertaining to the > directory we are interested in. But when the snapshot root directory is very > huge, the snapshot diff report generation can take minutes even if we are > interested to know the changes only in a small directory. So, it would be > highly performant if the diff report calculation can be limited to only the > interesting sub-directory of the snapshot root instead of the whole snapshot > root. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11821) BlockManager.getMissingReplOneBlocksCount() does not report correct value if corrupt file with replication factor of 1 gets deleted
[ https://issues.apache.org/jira/browse/HDFS-11821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211803#comment-16211803 ] Ravi Prakash commented on HDFS-11821: - Hi Wellington! Thanks for your explanation. I'm sorry I've been tardy on this issue. Thank you for the ping. I took some time to step through the debugger and understand what's going on. If you set a breakpoint [here|https://github.com/apache/hadoop/blob/4ab0c8f96a41c573cc1f1e71c18871d243f952b9/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/LowRedundancyBlocks.java#L378], you'll see that remove is being called with priority level 5. Hence the check on [line 385|https://github.com/apache/hadoop/blob/4ab0c8f96a41c573cc1f1e71c18871d243f952b9/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/LowRedundancyBlocks.java#L385] is failing (which would correctly allow corruptReplicationOneBlocks to be decremented). I just tested this small fix which doesn't increase the time taken to delete the blocks : {code} $ git diff diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/LowRedundancyBlocks.java b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/LowRedundancyBlocks.java index 347d606a04e..e3f228d2947 100644 --- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/LowRedundancyBlocks.java +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/LowRedundancyBlocks.java @@ -365,7 +365,7 @@ boolean remove(BlockInfo block, int priLevel, int oldExpectedReplicas) { NameNode.blockStateChangeLog.debug( "BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block" + " {} from priority queue {}", block, i); - decrementBlockStat(block, priLevel, oldExpectedReplicas); + decrementBlockStat(block, i, oldExpectedReplicas); return true; } } {code} Could you please test this too? > BlockManager.getMissingReplOneBlocksCount() does not report correct value if > corrupt file with replication factor of 1 gets deleted > --- > > Key: HDFS-11821 > URL: https://issues.apache.org/jira/browse/HDFS-11821 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.6.0, 3.0.0-alpha2 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Attachments: HDFS-11821-1.patch, HDFS-11821-2.patch > > > *BlockManager* keeps a separate metric for number of missing blocks with > replication factor of 1. This is returned by > *BlockManager.getMissingReplOneBlocksCount()* method currently, and that's > what is displayed on below attribute for *dfsadmin -report* (in below > example, there's one corrupt block that relates to a file with replication > factor of 1): > {noformat} > ... > Missing blocks (with replication factor 1): 1 > ... > {noformat} > However, if the related file gets deleted, (for instance, using hdfs fsck > -delete option), this metric never gets updated, and *dfsadmin -report* will > keep reporting a missing block, even though the file does not exist anymore. > The only workaround available is to restart the NN, so that this metric will > be cleared. > This can be easily reproduced by forcing a replication factor 1 file > corruption such as follows: > 1) Put a file into hdfs with replication factor 1: > {noformat} > $ hdfs dfs -Ddfs.replication=1 -put test_corrupt / > $ hdfs dfs -ls / > -rw-r--r-- 1 hdfs supergroup 19 2017-05-10 09:21 /test_corrupt > {noformat} > 2) Find related block for the file and delete it from DN: > {noformat} > $ hdfs fsck /test_corrupt -files -blocks -locations > ... > /test_corrupt 19 bytes, 1 block(s): OK > 0. BP-782213640-172.31.113.82-1494420317936:blk_1073742742_1918 len=19 > Live_repl=1 > [DatanodeInfoWithStorage[172.31.112.178:20002,DS-a0dc0b30-a323-4087-8c36-26ffdfe44f46,DISK]] > Status: HEALTHY > ... > $ find /dfs/dn/ -name blk_1073742742* > /dfs/dn/current/BP-782213640-172.31.113.82-1494420317936/current/finalized/subdir0/subdir3/blk_1073742742 > /dfs/dn/current/BP-782213640-172.31.113.82-1494420317936/current/finalized/subdir0/subdir3/blk_1073742742_1918.meta > $ rm -rf > /dfs/dn/current/BP-782213640-172.31.113.82-1494420317936/current/finalized/subdir0/subdir3/blk_1073742742 > $ rm -rf > /dfs/dn/current/BP-782213640-172.31.113.82-1494420317936/current/finalized/subdir0/subdir3/blk_1073742742_1918.meta > {noformat} > 3) Running fsck will report the corruption as expected: > {noformat} > $ hdfs fsck
[jira] [Updated] (HDFS-12682) ECAdmin -listPolicies will always show policy state as DISABLED
[ https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-12682: - Attachment: HDFS-12682.01.patch Uploading patch 1 to show the idea. It's not easy to split the metadata part to HDFS-12686, because the protobuf definition is changed. Some considerations in this patch: - Modifying protobuf definition and Public class {{ErasureCodingPolicy}} are not compatible. But this is fixing the previous bug, so I think the right thing to do here. - For the get {{ErasureCodingPolicy}} methods, I went without state for all. File attributes only care about policy, not its state; file system level (e.g. {{hdfs ec -getPolicy -path }}) I think since we always require -path, it's also focusing on the policy rather than its state. If there is requirement to get the state of a policy without using {{-listPolices}} in the future, we could add new APIs for that. Will check tests and think about coverage. Appreciate early reviews / feedback on the patch. > ECAdmin -listPolicies will always show policy state as DISABLED > --- > > Key: HDFS-12682 > URL: https://issues.apache.org/jira/browse/HDFS-12682 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Xiao Chen >Assignee: Xiao Chen > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12682.01.patch > > > On a real cluster, {{hdfs ec -listPolicies}} will always show policy state as > DISABLED. > {noformat} > [hdfs@nightly6x-1 root]$ hdfs ec -listPolicies > Erasure Coding Policies: > ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5, State=DISABLED] > ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2, State=DISABLED] > ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1, State=DISABLED] > ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, > Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], > CellSize=1048576, Id=3, State=DISABLED] > ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, > numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4, State=DISABLED] > [hdfs@nightly6x-1 root]$ hdfs ec -getPolicy -path /ecec > XOR-2-1-1024k > {noformat} > This is because when [deserializing > protobuf|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java#L2942], > the static instance of [SystemErasureCodingPolicies > class|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SystemErasureCodingPolicies.java#L101] > is first checked, and always returns the cached policy objects, which are > created by default with state=DISABLED. > All the existing unit tests pass, because that static instance that the > client (e.g. ECAdmin) reads in unit test is updated by NN. :) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12683) DFSZKFailOverController re-order logic for logging Exception
[ https://issues.apache.org/jira/browse/HDFS-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211784#comment-16211784 ] Hadoop QA commented on HDFS-12683: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 2m 39s{color} | {color:red} Docker failed to build yetus/hadoop:0de40f0. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-12683 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893125/HDFS-12683.04.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/21754/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > DFSZKFailOverController re-order logic for logging Exception > > > Key: HDFS-12683 > URL: https://issues.apache.org/jira/browse/HDFS-12683 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham > Attachments: HDFS-12683.01.patch, HDFS-12683.02.patch, > HDFS-12683.03.patch, HDFS-12683.04.patch > > > The ZKFC should log fatal exceptions before closing the connections and > terminating server. > Occasionally we have seen DFSZKFailOver shutdown, but no exception or no > error being logged. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12683) DFSZKFailOverController re-order logic for logging Exception
[ https://issues.apache.org/jira/browse/HDFS-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDFS-12683: -- Attachment: HDFS-12683.04.patch > DFSZKFailOverController re-order logic for logging Exception > > > Key: HDFS-12683 > URL: https://issues.apache.org/jira/browse/HDFS-12683 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham > Attachments: HDFS-12683.01.patch, HDFS-12683.02.patch, > HDFS-12683.03.patch, HDFS-12683.04.patch > > > The ZKFC should log fatal exceptions before closing the connections and > terminating server. > Occasionally we have seen DFSZKFailOver shutdown, but no exception or no > error being logged. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12683) DFSZKFailOverController re-order logic for logging Exception
[ https://issues.apache.org/jira/browse/HDFS-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211747#comment-16211747 ] Hadoop QA commented on HDFS-12683: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 11s{color} | {color:red} Docker failed to build yetus/hadoop:0de40f0. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-12683 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893119/HDFS-12683.02.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/21753/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > DFSZKFailOverController re-order logic for logging Exception > > > Key: HDFS-12683 > URL: https://issues.apache.org/jira/browse/HDFS-12683 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham > Attachments: HDFS-12683.01.patch, HDFS-12683.02.patch, > HDFS-12683.03.patch > > > The ZKFC should log fatal exceptions before closing the connections and > terminating server. > Occasionally we have seen DFSZKFailOver shutdown, but no exception or no > error being logged. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12683) DFSZKFailOverController re-order logic for logging Exception
[ https://issues.apache.org/jira/browse/HDFS-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDFS-12683: -- Attachment: HDFS-12683.03.patch > DFSZKFailOverController re-order logic for logging Exception > > > Key: HDFS-12683 > URL: https://issues.apache.org/jira/browse/HDFS-12683 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham > Attachments: HDFS-12683.01.patch, HDFS-12683.02.patch, > HDFS-12683.03.patch > > > The ZKFC should log fatal exceptions before closing the connections and > terminating server. > Occasionally we have seen DFSZKFailOver shutdown, but no exception or no > error being logged. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12683) DFSZKFailOverController re-order logic for logging Exception
[ https://issues.apache.org/jira/browse/HDFS-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDFS-12683: -- Attachment: HDFS-12683.02.patch > DFSZKFailOverController re-order logic for logging Exception > > > Key: HDFS-12683 > URL: https://issues.apache.org/jira/browse/HDFS-12683 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham > Attachments: HDFS-12683.01.patch, HDFS-12683.02.patch > > > The ZKFC should log fatal exceptions before closing the connections and > terminating server. > Occasionally we have seen DFSZKFailOver shutdown, but no exception or no > error being logged. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12620) Backporting HDFS-10467 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211736#comment-16211736 ] Subru Krishnan commented on HDFS-12620: --- [~elgoiri], thanks for working through this. Please go ahead with the cherry-picks to branch-2 and post the diff patch afterwards. > Backporting HDFS-10467 to branch-2 > -- > > Key: HDFS-12620 > URL: https://issues.apache.org/jira/browse/HDFS-12620 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri > Attachments: HDFS-10467-branch-2.001.patch, > HDFS-10467-branch-2.002.patch, HDFS-10467-branch-2.003.patch, > HDFS-10467-branch-2.patch, HDFS-12620-branch-2.000.patch, > HDFS-12620-branch-2.004.patch, HDFS-12620-branch-2.005.patch, > HDFS-12620-branch-2.006.patch, HDFS-12620-branch-2.007.patch, > HDFS-12620-branch-2.008.patch, HDFS-12620-branch-2.009.patch, > HDFS-12620-branch-2.010.patch, HDFS-12620-branch-2.011.patch, > HDFS-12620-branch-2.fixes.patch > > > When backporting HDFS-10467, there are a few things that changed: > * {{bin\hdfs}} > * {{ClientProtocol}} > * Java 7 not supporting referencing functions > * {{org.eclipse.jetty.util.ajax.JSON}} in branch-2 is > {{org.mortbay.util.ajax.JSON}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12688) HDFS File Not Removed Despite Successful "Moved to .Trash" Message
[ https://issues.apache.org/jira/browse/HDFS-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211729#comment-16211729 ] Jason Lowe commented on HDFS-12688: --- Have you checked the HDFS audit logs? That should give you clues about what is happening here and who is re-creating the directory. I suspect what's happening here is that the job is executing asynchronously, and you're actually running multiple copies of the job at the same time when you run the script multiple times. If the job is still running then it is going to re-create the output directory when its tasks need to write output. > HDFS File Not Removed Despite Successful "Moved to .Trash" Message > -- > > Key: HDFS-12688 > URL: https://issues.apache.org/jira/browse/HDFS-12688 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.6.0 >Reporter: Shriya Gupta >Priority: Critical > > Wrote a simple script to delete and create a file and ran it multiple times. > However, some executions of the script randomly threw a FileAlreadyExists > error while the others succeeded despite successful hdfs dfs -rm command. The > script is as below, I have reproduced it in two different environments -- > hdfs dfs -ls /user/shriya/shell_test/ > echo "starting hdfs remove **" > hdfs dfs -rm -r -f /user/shriya/shell_test/wordcountOutput > echo "hdfs compeleted!" > hdfs dfs -ls /user/shriya/shell_test/ > echo "starting mapReduce***" > mapred job -libjars > /data/home/shriya/shell_test/hadoop-mapreduce-client-jobclient-2.7.1.jar > -submit /data/home/shriya/shell_test/wordcountJob.xml > The message confirming successful move -- > 17/10/19 14:49:12 INFO fs.TrashPolicyDefault: Moved: > 'hdfs://nameservice1/user/shriya/shell_test/wordcountOutput' to trash at: > hdfs://nameservice1/user/shriya/.Trash/Current/user/shriya/shell_test/wordcountOutput1508438952728 > The contents of subsequent -ls after -rm also showed that the file still > existed) > The error I got when my MapReduce job tried to create the file -- > 17/10/19 14:50:00 WARN security.UserGroupInformation: > PriviledgedActionException as: (auth:KERBEROS) > cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory > hdfs://nameservice1/user/shriya/shell_test/wordcountOutput already exists > Exception in thread "main" > org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory > hdfs://nameservice1/user/shriya/shell_test/wordcountOutput already exists > at > org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131) > at > org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:272) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304) > at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:315) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1277) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12620) Backporting HDFS-10467 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211724#comment-16211724 ] Hadoop QA commented on HDFS-12620: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} HDFS-12620 does not apply to branch-2. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-12620 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893115/HDFS-12620-branch-2.fixes.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/21752/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Backporting HDFS-10467 to branch-2 > -- > > Key: HDFS-12620 > URL: https://issues.apache.org/jira/browse/HDFS-12620 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri > Attachments: HDFS-10467-branch-2.001.patch, > HDFS-10467-branch-2.002.patch, HDFS-10467-branch-2.003.patch, > HDFS-10467-branch-2.patch, HDFS-12620-branch-2.000.patch, > HDFS-12620-branch-2.004.patch, HDFS-12620-branch-2.005.patch, > HDFS-12620-branch-2.006.patch, HDFS-12620-branch-2.007.patch, > HDFS-12620-branch-2.008.patch, HDFS-12620-branch-2.009.patch, > HDFS-12620-branch-2.010.patch, HDFS-12620-branch-2.011.patch, > HDFS-12620-branch-2.fixes.patch > > > When backporting HDFS-10467, there are a few things that changed: > * {{bin\hdfs}} > * {{ClientProtocol}} > * Java 7 not supporting referencing functions > * {{org.eclipse.jetty.util.ajax.JSON}} in branch-2 is > {{org.mortbay.util.ajax.JSON}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12620) Backporting HDFS-10467 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-12620: --- Attachment: HDFS-12620-branch-2.fixes.patch HDFS-12620-branch-2.fixes.patch contains the fixes after the cherry pick is done. This would actually be the new commit. I can upload it again after the cherry-pick to get a clean report. > Backporting HDFS-10467 to branch-2 > -- > > Key: HDFS-12620 > URL: https://issues.apache.org/jira/browse/HDFS-12620 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri > Attachments: HDFS-10467-branch-2.001.patch, > HDFS-10467-branch-2.002.patch, HDFS-10467-branch-2.003.patch, > HDFS-10467-branch-2.patch, HDFS-12620-branch-2.000.patch, > HDFS-12620-branch-2.004.patch, HDFS-12620-branch-2.005.patch, > HDFS-12620-branch-2.006.patch, HDFS-12620-branch-2.007.patch, > HDFS-12620-branch-2.008.patch, HDFS-12620-branch-2.009.patch, > HDFS-12620-branch-2.010.patch, HDFS-12620-branch-2.011.patch, > HDFS-12620-branch-2.fixes.patch > > > When backporting HDFS-10467, there are a few things that changed: > * {{bin\hdfs}} > * {{ClientProtocol}} > * Java 7 not supporting referencing functions > * {{org.eclipse.jetty.util.ajax.JSON}} in branch-2 is > {{org.mortbay.util.ajax.JSON}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12683) DFSZKFailOverController re-order logic for logging Exception
[ https://issues.apache.org/jira/browse/HDFS-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDFS-12683: -- Attachment: HDFS-12683.01.patch > DFSZKFailOverController re-order logic for logging Exception > > > Key: HDFS-12683 > URL: https://issues.apache.org/jira/browse/HDFS-12683 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham > Attachments: HDFS-12683.01.patch > > > The ZKFC should log fatal exceptions before closing the connections and > terminating server. > Occasionally we have seen DFSZKFailOver shutdown, but no exception or no > error being logged. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12683) DFSZKFailOverController re-order logic for logging Exception
[ https://issues.apache.org/jira/browse/HDFS-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDFS-12683: -- Status: Patch Available (was: In Progress) > DFSZKFailOverController re-order logic for logging Exception > > > Key: HDFS-12683 > URL: https://issues.apache.org/jira/browse/HDFS-12683 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham > Attachments: HDFS-12683.01.patch > > > The ZKFC should log fatal exceptions before closing the connections and > terminating server. > Occasionally we have seen DFSZKFailOver shutdown, but no exception or no > error being logged. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12620) Backporting HDFS-10467 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211677#comment-16211677 ] Arun Suresh commented on HDFS-12620: bq. I'm trying one last time with jenkins with version 011 but if it doesn't go through, I would stick to this. Agreed. [~subru] / [~chris.douglas] - are you ok with it ? > Backporting HDFS-10467 to branch-2 > -- > > Key: HDFS-12620 > URL: https://issues.apache.org/jira/browse/HDFS-12620 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri > Attachments: HDFS-10467-branch-2.001.patch, > HDFS-10467-branch-2.002.patch, HDFS-10467-branch-2.003.patch, > HDFS-10467-branch-2.patch, HDFS-12620-branch-2.000.patch, > HDFS-12620-branch-2.004.patch, HDFS-12620-branch-2.005.patch, > HDFS-12620-branch-2.006.patch, HDFS-12620-branch-2.007.patch, > HDFS-12620-branch-2.008.patch, HDFS-12620-branch-2.009.patch, > HDFS-12620-branch-2.010.patch, HDFS-12620-branch-2.011.patch > > > When backporting HDFS-10467, there are a few things that changed: > * {{bin\hdfs}} > * {{ClientProtocol}} > * Java 7 not supporting referencing functions > * {{org.eclipse.jetty.util.ajax.JSON}} in branch-2 is > {{org.mortbay.util.ajax.JSON}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12620) Backporting HDFS-10467 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211660#comment-16211660 ] Íñigo Goiri edited comment on HDFS-12620 at 10/19/17 8:15 PM: -- I got the following: {code} Total Elapsed time: 925m 35s -1 overall _ _ __ | ___|_ _(_) |_ _ _ __ ___| | | |_ / _` | | | | | | '__/ _ \ | | _| (_| | | | |_| | | | __/_| |_| \__,_|_|_|\__,_|_| \___(_) | Vote | Subsystem | Runtime | Comment | 0 |shellcheck | 0m 0s | Shellcheck was not available. | 0 | findbugs | 0m 0s | Findbugs executables are not available. | +1 | @author | 0m 0s | The patch does not contain any @author | ||| tags. | +1 |test4tests | 0m 0s | The patch appears to include 25 new or | ||| modified test files. | +1 |mvninstall | 5m 53s| branch-2 passed | +1 | compile | 0m 37s| branch-2 passed | +1 |checkstyle | 0m 28s| branch-2 passed | +1 | mvnsite | 0m 45s| branch-2 passed | +1 |mvneclipse | 0m 14s| branch-2 passed | +1 | javadoc | 0m 50s| branch-2 passed | +1 |mvninstall | 0m 38s| the patch passed | +1 | compile | 0m 36s| the patch passed | +1 |cc | 0m 36s| the patch passed | +1 | javac | 0m 36s| the patch passed | -1 |checkstyle | 0m 27s| hadoop-hdfs-project/hadoop-hdfs: The | ||| patch generated 11 new + 624 unchanged - | ||| 0 fixed = 635 total (was 624) | +1 | mvnsite | 0m 44s| the patch passed | +1 |mvneclipse | 0m 11s| the patch passed | +1 | shelldocs | 0m 3s | There were no new shelldocs issues. | +1 |whitespace | 0m 0s | The patch has no whitespace issues. | +1 | xml | 0m 1s | The patch has no ill-formed XML file. | +1 | javadoc | 0m 54s| the patch passed | -1 | unit | 891m 25s | hadoop-hdfs in the patch failed. | +1 |asflicense | 20m 29s | The patch does not generate ASF License | ||| warnings. | || 925m 35s | Reason | Tests Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodePeerMetrics | hadoop.hdfs.server.datanode.TestDataNodeUUID | hadoop.hdfs.server.namenode.startupprogress.TestStartupProgress | hadoop.hdfs.server.namenode.ha.TestStandbyBlockManagement Timed out junit tests | org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery | org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistPolicy | org.apache.hadoop.hdfs.server.namenode.ha.TestLossyRetryInvocationHandler | org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | org.apache.hadoop.hdfs.TestRestartDFS | org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby | org.apache.hadoop.hdfs.server.datanode.TestBlockCountersInPendingIBR | org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration | org.apache.hadoop.hdfs.server.datanode.TestReadOnlySharedStorage | org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyWriter | org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetricsLogger | org.apache.hadoop.hdfs.server.namenode.ha.TestXAttrsWithHA | org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery | org.apache.hadoop.hdfs.server.datanode.TestDataNodeFaultInjector | org.apache.hadoop.hdfs.server.namenode.TestNestedEncryptionZones | org.apache.hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport | org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistLockedMemory | org.apache.hadoop.hdfs.server.datanode.TestDataNodeMXBean | org.apache.hadoop.hdfs.server.namenode.TestBlockPlacementPolicyRackFaultTolerant | org.apache.hadoop.hdfs.server.datanode.TestDnRespectsBlockReportSplitThreshold | org.apache.hadoop.hdfs.server.datanode.TestIncrementalBlockReports | org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache |
[jira] [Commented] (HDFS-12620) Backporting HDFS-10467 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211660#comment-16211660 ] Íñigo Goiri commented on HDFS-12620: I got the following: {code} Total Elapsed time: 925m 35s -1 overall _ _ __ | ___|_ _(_) |_ _ _ __ ___| | | |_ / _` | | | | | | '__/ _ \ | | _| (_| | | | |_| | | | __/_| |_| \__,_|_|_|\__,_|_| \___(_) | Vote | Subsystem | Runtime | Comment | 0 |shellcheck | 0m 0s | Shellcheck was not available. | 0 | findbugs | 0m 0s | Findbugs executables are not available. | +1 | @author | 0m 0s | The patch does not contain any @author | ||| tags. | +1 |test4tests | 0m 0s | The patch appears to include 25 new or | ||| modified test files. | +1 |mvninstall | 5m 53s| branch-2 passed | +1 | compile | 0m 37s| branch-2 passed | +1 |checkstyle | 0m 28s| branch-2 passed | +1 | mvnsite | 0m 45s| branch-2 passed | +1 |mvneclipse | 0m 14s| branch-2 passed | +1 | javadoc | 0m 50s| branch-2 passed | +1 |mvninstall | 0m 38s| the patch passed | +1 | compile | 0m 36s| the patch passed | +1 |cc | 0m 36s| the patch passed | +1 | javac | 0m 36s| the patch passed | -1 |checkstyle | 0m 27s| hadoop-hdfs-project/hadoop-hdfs: The | ||| patch generated 11 new + 624 unchanged - | ||| 0 fixed = 635 total (was 624) | +1 | mvnsite | 0m 44s| the patch passed | +1 |mvneclipse | 0m 11s| the patch passed | +1 | shelldocs | 0m 3s | There were no new shelldocs issues. | +1 |whitespace | 0m 0s | The patch has no whitespace issues. | +1 | xml | 0m 1s | The patch has no ill-formed XML file. | +1 | javadoc | 0m 54s| the patch passed | -1 | unit | 891m 25s | hadoop-hdfs in the patch failed. | +1 |asflicense | 20m 29s | The patch does not generate ASF License | ||| warnings. | || 925m 35s | Reason | Tests Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodePeerMetrics | hadoop.hdfs.server.datanode.TestDataNodeUUID | hadoop.hdfs.server.namenode.startupprogress.TestStartupProgress | hadoop.hdfs.server.namenode.ha.TestStandbyBlockManagement Timed out junit tests | org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery | org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistPolicy | org.apache.hadoop.hdfs.server.namenode.ha.TestLossyRetryInvocationHandler | org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | org.apache.hadoop.hdfs.TestRestartDFS | org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby | org.apache.hadoop.hdfs.server.datanode.TestBlockCountersInPendingIBR | org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration | org.apache.hadoop.hdfs.server.datanode.TestReadOnlySharedStorage | org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyWriter | org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetricsLogger | org.apache.hadoop.hdfs.server.namenode.ha.TestXAttrsWithHA | org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery | org.apache.hadoop.hdfs.server.datanode.TestDataNodeFaultInjector | org.apache.hadoop.hdfs.server.namenode.TestNestedEncryptionZones | org.apache.hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport | org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistLockedMemory | org.apache.hadoop.hdfs.server.datanode.TestDataNodeMXBean | org.apache.hadoop.hdfs.server.namenode.TestBlockPlacementPolicyRackFaultTolerant | org.apache.hadoop.hdfs.server.datanode.TestDnRespectsBlockReportSplitThreshold | org.apache.hadoop.hdfs.server.datanode.TestIncrementalBlockReports | org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache | org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing |
[jira] [Updated] (HDFS-12665) [AliasMap] Create a version of the AliasMap that runs in memory in the Namenode (leveldb)
[ https://issues.apache.org/jira/browse/HDFS-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ewan Higgs updated HDFS-12665: -- Parent Issue: HDFS-9806 (was: HDFS-12090) > [AliasMap] Create a version of the AliasMap that runs in memory in the > Namenode (leveldb) > - > > Key: HDFS-12665 > URL: https://issues.apache.org/jira/browse/HDFS-12665 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ewan Higgs >Assignee: Ewan Higgs > Attachments: HDFS-12665-HDFS-9806.001.patch > > > The design of Provided Storage requires the use of an AliasMap to manage the > mapping between blocks of files on the local HDFS and ranges of files on a > remote storage system. To reduce load from the Namenode, this can be done > using a pluggable external service (e.g. AzureTable, Cassandra, Ratis). > However, to aide adoption and ease of deployment, we propose an in memory > version. > This AliasMap will be a wrapper around LevelDB (already a dependency from the > Timeline Service) and use protobuf for the key (blockpool, blockid, and > genstamp) and the value (url, offset, length, nonce). The in memory service > will also have a configurable port on which it will listen for updates from > Storage Policy Satisfier (SPS) Coordinating Datanodes (C-DN). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12680) Ozone: SCM: Lease support for container creation
[ https://issues.apache.org/jira/browse/HDFS-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211639#comment-16211639 ] Nanda kumar commented on HDFS-12680: HDFS-12689 is created to track the deletion of containers in {{DELETING}} state in {{ContainerStateManager}} > Ozone: SCM: Lease support for container creation > > > Key: HDFS-12680 > URL: https://issues.apache.org/jira/browse/HDFS-12680 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nanda kumar >Assignee: Nanda kumar > Labels: ozoneMerge > Attachments: HDFS-12680-HDFS-7240.000.patch, > HDFS-12680-HDFS-7240.001.patch > > > This brings in lease support for container creation. > Lease should be give for a container that is moved to {{CREATING}} state when > {{BEGIN_CREATE}} event happens, {{LeaseException}} should be thrown if the > container already holds a lease. Lease must be released during > {{COMPLETE_CREATE}} event. If the lease times out container should be moved > to {{DELETING}} state, and exception should be thrown if {{COMPLETE_CREATE}} > event is received on that container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12689) Ozone: SCM: Clean up of container in DELETING state
Nanda kumar created HDFS-12689: -- Summary: Ozone: SCM: Clean up of container in DELETING state Key: HDFS-12689 URL: https://issues.apache.org/jira/browse/HDFS-12689 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Nanda kumar When creating container times out, the container is moved to {{DELETING}} state. Once the container is in DELETING state {{ContainerStateManager}} should do cleanup and delete the containers. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12663) Ozone: OzoneClient: Remove protobuf classes exposed to clients through OzoneBucket
[ https://issues.apache.org/jira/browse/HDFS-12663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211630#comment-16211630 ] Hadoop QA commented on HDFS-12663: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 11s{color} | {color:red} Docker failed to build yetus/hadoop:71bbb86. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-12663 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893101/HDFS-12663-HDFS-7240.002.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/21750/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Ozone: OzoneClient: Remove protobuf classes exposed to clients through > OzoneBucket > -- > > Key: HDFS-12663 > URL: https://issues.apache.org/jira/browse/HDFS-12663 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nanda kumar >Assignee: Nanda kumar > Labels: ozoneMerge > Attachments: HDFS-12663-HDFS-7240.000.patch, > HDFS-12663-HDFS-7240.001.patch, HDFS-12663-HDFS-7240.002.patch > > > As part of {{OzoneBucket#createKey}} we are currently exposing protobuf enums > {{OzoneProtos.ReplicationType}} and {{OzoneProtos.ReplicationFactor}} through > client, this can be replaced with client side enums and conversion can be > done internally. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12663) Ozone: OzoneClient: Remove protobuf classes exposed to clients through OzoneBucket
[ https://issues.apache.org/jira/browse/HDFS-12663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar updated HDFS-12663: --- Attachment: HDFS-12663-HDFS-7240.002.patch > Ozone: OzoneClient: Remove protobuf classes exposed to clients through > OzoneBucket > -- > > Key: HDFS-12663 > URL: https://issues.apache.org/jira/browse/HDFS-12663 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nanda kumar >Assignee: Nanda kumar > Labels: ozoneMerge > Attachments: HDFS-12663-HDFS-7240.000.patch, > HDFS-12663-HDFS-7240.001.patch, HDFS-12663-HDFS-7240.002.patch > > > As part of {{OzoneBucket#createKey}} we are currently exposing protobuf enums > {{OzoneProtos.ReplicationType}} and {{OzoneProtos.ReplicationFactor}} through > client, this can be replaced with client side enums and conversion can be > done internally. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-9808) Combine READ_ONLY_SHARED DatanodeStorages with the same ID
[ https://issues.apache.org/jira/browse/HDFS-9808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ewan Higgs resolved HDFS-9808. -- Resolution: Won't Fix This was part of HDFS-11190. > Combine READ_ONLY_SHARED DatanodeStorages with the same ID > -- > > Key: HDFS-9808 > URL: https://issues.apache.org/jira/browse/HDFS-9808 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chris Douglas > > In HDFS-5318, each datanode that can reach a (read only) block reports itself > as a valid location for the block. While accurate, this increases (redundant) > block report traffic and- without partitioning on the backend- may return an > overwhelming number of replica locations for each block. > Instead, a DN could report only that the shared storage is reachable. The > contents of the storage could be reported separately/synthetically to the > block manager, which can collapse all instances into a single storage. A > subset of locations- closest to the client, etc.- can be returned, rather > than all possible locations. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12688) HDFS File Not Removed Despite Successful "Moved to .Trash" Message
Shriya Gupta created HDFS-12688: --- Summary: HDFS File Not Removed Despite Successful "Moved to .Trash" Message Key: HDFS-12688 URL: https://issues.apache.org/jira/browse/HDFS-12688 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 2.6.0 Reporter: Shriya Gupta Priority: Critical Wrote a simple script to delete and create a file and ran it multiple times. However, some executions of the script randomly threw a FileAlreadyExists error while the others succeeded despite successful hdfs dfs -rm command. The script is as below, I have reproduced it in two different environments -- hdfs dfs -ls /user/shriya/shell_test/ echo "starting hdfs remove **" hdfs dfs -rm -r -f /user/shriya/shell_test/wordcountOutput echo "hdfs compeleted!" hdfs dfs -ls /user/shriya/shell_test/ echo "starting mapReduce***" mapred job -libjars /data/home/shriya/shell_test/hadoop-mapreduce-client-jobclient-2.7.1.jar -submit /data/home/shriya/shell_test/wordcountJob.xml The message confirming successful move -- 17/10/19 14:49:12 INFO fs.TrashPolicyDefault: Moved: 'hdfs://nameservice1/user/shriya/shell_test/wordcountOutput' to trash at: hdfs://nameservice1/user/shriya/.Trash/Current/user/shriya/shell_test/wordcountOutput1508438952728 The contents of subsequent -ls after -rm also showed that the file still existed) The error I got when my MapReduce job tried to create the file -- 17/10/19 14:50:00 WARN security.UserGroupInformation: PriviledgedActionException as: (auth:KERBEROS) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://nameservice1/user/shriya/shell_test/wordcountOutput already exists Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://nameservice1/user/shriya/shell_test/wordcountOutput already exists at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:272) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304) at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:315) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1277) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12675) Ozone: TestLeaseManager#testLeaseCallbackWithMultipleLeases fails
[ https://issues.apache.org/jira/browse/HDFS-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211564#comment-16211564 ] Nanda kumar commented on HDFS-12675: Thanks for the contribution [~linyiqun]. I have committed the code to feature branch. > Ozone: TestLeaseManager#testLeaseCallbackWithMultipleLeases fails > -- > > Key: HDFS-12675 > URL: https://issues.apache.org/jira/browse/HDFS-12675 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-12675-HDFS-7240.001.patch > > > Caught one UT failure > {{TestLeaseManager#testLeaseCallbackWithMultipleLeases}}: > {noformat} > jcava.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.ozone.lease.TestLeaseManager.testLeaseCallbackWithMultipleLeases(TestLeaseManager.java:293) > {noformat} > The reason of this error is lease {{leaseFive}} didn't expire in test. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12675) Ozone: TestLeaseManager#testLeaseCallbackWithMultipleLeases fails
[ https://issues.apache.org/jira/browse/HDFS-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar updated HDFS-12675: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Ozone: TestLeaseManager#testLeaseCallbackWithMultipleLeases fails > -- > > Key: HDFS-12675 > URL: https://issues.apache.org/jira/browse/HDFS-12675 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-12675-HDFS-7240.001.patch > > > Caught one UT failure > {{TestLeaseManager#testLeaseCallbackWithMultipleLeases}}: > {noformat} > jcava.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.ozone.lease.TestLeaseManager.testLeaseCallbackWithMultipleLeases(TestLeaseManager.java:293) > {noformat} > The reason of this error is lease {{leaseFive}} didn't expire in test. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12675) Ozone: TestLeaseManager#testLeaseCallbackWithMultipleLeases fails
[ https://issues.apache.org/jira/browse/HDFS-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211551#comment-16211551 ] Nanda kumar commented on HDFS-12675: Thanks [~linyiqun] for working on this. +1 the patch looks good to me, will commit it shortly. > Ozone: TestLeaseManager#testLeaseCallbackWithMultipleLeases fails > -- > > Key: HDFS-12675 > URL: https://issues.apache.org/jira/browse/HDFS-12675 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-12675-HDFS-7240.001.patch > > > Caught one UT failure > {{TestLeaseManager#testLeaseCallbackWithMultipleLeases}}: > {noformat} > jcava.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.ozone.lease.TestLeaseManager.testLeaseCallbackWithMultipleLeases(TestLeaseManager.java:293) > {noformat} > The reason of this error is lease {{leaseFive}} didn't expire in test. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12482) Provide a configuration to adjust the weight of EC recovery tasks to adjust the speed of recovery
[ https://issues.apache.org/jira/browse/HDFS-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211538#comment-16211538 ] Hadoop QA commented on HDFS-12482: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 2m 51s{color} | {color:red} Docker failed to build yetus/hadoop:0de40f0. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-12482 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893089/HDFS-12482.01.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/21749/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Provide a configuration to adjust the weight of EC recovery tasks to adjust > the speed of recovery > - > > Key: HDFS-12482 > URL: https://issues.apache.org/jira/browse/HDFS-12482 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Affects Versions: 3.0.0-alpha4 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-12482.00.patch, HDFS-12482.01.patch > > > The relative speed of EC recovery comparing to 3x replica recovery is a > function of (EC codec, number of sources, NIC speed, and CPU speed, and etc). > Currently the EC recovery has a fixed {{xmitsInProgress}} of {{max(# of > sources, # of targets)}} comparing to {{1}} for 3x replica recovery, and NN > uses {{xmitsInProgress}} to decide how much recovery tasks to schedule to the > DataNode this we can add a coefficient for user to tune the weight of EC > recovery tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12683) DFSZKFailOverController re-order logic for logging Exception
[ https://issues.apache.org/jira/browse/HDFS-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-12683: - Description: The ZKFC should log fatal exceptions before closing the connections and terminating server. Occasionally we have seen DFSZKFailOver shutdown, but no exception or no error being logged. was: Log the exception before closing the connections and terminating server. As some times occasionally we have seen DFSZKFailOver shutdown, but no exception or no error being logged. > DFSZKFailOverController re-order logic for logging Exception > > > Key: HDFS-12683 > URL: https://issues.apache.org/jira/browse/HDFS-12683 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham > > The ZKFC should log fatal exceptions before closing the connections and > terminating server. > Occasionally we have seen DFSZKFailOver shutdown, but no exception or no > error being logged. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12482) Provide a configuration to adjust the weight of EC recovery tasks to adjust the speed of recovery
[ https://issues.apache.org/jira/browse/HDFS-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211500#comment-16211500 ] Lei (Eddy) Xu edited comment on HDFS-12482 at 10/19/17 6:25 PM: Thanks for the reviews, [~andrew.wang] Updated the patch to add documents. Empirically, an value between {{(0, 1.0]}} seems can achieve similar recovery speed and network / cpu overhead between replicated blocks and ec block recovery _on my testing cluster_. But it highly depends on HW. I will set this value to {{0.5}} initially in this patch. [~xiaochen], [~manojg] mind to give a review? was (Author: eddyxu): Thanks for the reviews, [~andrew.wang] Updated the patch to add documents. Empirically, an value between {{(0, 1.0]}} seems can achieve similar recovery speed and network / cpu overhead between replicated blocks and ec block recovery _on my testing cluster_. But it highly depends on HW. I will set this value to {{0.5}} initially in this patch. > Provide a configuration to adjust the weight of EC recovery tasks to adjust > the speed of recovery > - > > Key: HDFS-12482 > URL: https://issues.apache.org/jira/browse/HDFS-12482 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Affects Versions: 3.0.0-alpha4 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-12482.00.patch, HDFS-12482.01.patch > > > The relative speed of EC recovery comparing to 3x replica recovery is a > function of (EC codec, number of sources, NIC speed, and CPU speed, and etc). > Currently the EC recovery has a fixed {{xmitsInProgress}} of {{max(# of > sources, # of targets)}} comparing to {{1}} for 3x replica recovery, and NN > uses {{xmitsInProgress}} to decide how much recovery tasks to schedule to the > DataNode this we can add a coefficient for user to tune the weight of EC > recovery tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12482) Provide a configuration to adjust the weight of EC recovery tasks to adjust the speed of recovery
[ https://issues.apache.org/jira/browse/HDFS-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-12482: - Attachment: HDFS-12482.01.patch Thanks for the reviews, [~andrew.wang] Updated the patch to add documents. Empirically, an value between {{(0, 1.0]}} seems can achieve similar recovery speed and network / cpu overhead between replicated blocks and ec block recovery _on my testing cluster_. But it highly depends on HW. I will set this value to {{0.5}} initially in this patch. > Provide a configuration to adjust the weight of EC recovery tasks to adjust > the speed of recovery > - > > Key: HDFS-12482 > URL: https://issues.apache.org/jira/browse/HDFS-12482 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Affects Versions: 3.0.0-alpha4 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-12482.00.patch, HDFS-12482.01.patch > > > The relative speed of EC recovery comparing to 3x replica recovery is a > function of (EC codec, number of sources, NIC speed, and CPU speed, and etc). > Currently the EC recovery has a fixed {{xmitsInProgress}} of {{max(# of > sources, # of targets)}} comparing to {{1}} for 3x replica recovery, and NN > uses {{xmitsInProgress}} to decide how much recovery tasks to schedule to the > DataNode this we can add a coefficient for user to tune the weight of EC > recovery tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12667) KMSClientProvider#ValueQueue does synchronous fetch of edeks in background async thread.
[ https://issues.apache.org/jira/browse/HDFS-12667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211484#comment-16211484 ] Xiao Chen commented on HDFS-12667: -- bq. will lock individual queue. Sounds good to me. Thanks for the heads up guys. :) > KMSClientProvider#ValueQueue does synchronous fetch of edeks in background > async thread. > > > Key: HDFS-12667 > URL: https://issues.apache.org/jira/browse/HDFS-12667 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, kms >Affects Versions: 3.0.0-alpha4 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > > There are couple of issues in KMSClientProvider#ValueQueue. > 1. > {code:title=ValueQueue.java|borderStyle=solid} > private final LoadingCachekeyQueues; > // Stripped rwlocks based on key name to synchronize the queue from > // the sync'ed rw-thread and the background async refill thread. > private final List lockArray = > new ArrayList<>(LOCK_ARRAY_SIZE); > {code} > It hashes the key name into 16 buckets. > In the code chunk below, > {code:title=ValueQueue.java|borderStyle=solid} > public List getAtMost(String keyName, int num) throws IOException, > ExecutionException { > ... > ... > readLock(keyName); > E val = keyQueue.poll(); > readUnlock(keyName); > ... > } > private void submitRefillTask(final String keyName, > final Queue keyQueue) throws InterruptedException { > ... > ... > writeLock(keyName); // It holds the write lock while the key is > being asynchronously fetched. So the read requests for all the keys that > hashes to this bucket will essentially be blocked. > try { > if (keyQueue.size() < threshold && !isCanceled()) { > refiller.fillQueueForKey(name, keyQueue, > cacheSize - keyQueue.size()); > } > ... > } finally { > writeUnlock(keyName); > } > } > } > {code} > According to above code chunk, if two keys (lets say key1 and key2) hashes to > the same bucket (between 1 and 16), then if key1 is asynchronously being > refetched then all the getKey for key2 will be blocked. > 2. Due to stripped rw locks, the asynchronous behavior of refill keys is now > synchronous to other handler threads. > I understand that locks were added so that we don't kick off multiple > asynchronous refilling thread for the same key. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12682) ECAdmin -listPolicies will always show policy state as DISABLED
[ https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211473#comment-16211473 ] Xiao Chen commented on HDFS-12682: -- Thanks for the response Sammi, good find on HDFS-12686! I propose we fix the problem by: - Remove the state from {{ErasureCodingPolicy}}. The motivation is, {{ErasureCodingPolicy}} is returned with {{HdfsFileStatus}}, which impacts all clients listing hdfs. We want to make it as lightweight as possible, and keep Andrew's work on HDFS-11565 for performance. - Add a new class {{ErasureCodingPolicyInfo}} (or whatever name people feel intuitive), that contains the policy and its state. This will be used by the ECAdmin-purpose APIs, as well as internally HDFS persistency. Will prepare a patch toward this direction for demonstration. If you or any watchers have concerns, please feel free to speak up. > ECAdmin -listPolicies will always show policy state as DISABLED > --- > > Key: HDFS-12682 > URL: https://issues.apache.org/jira/browse/HDFS-12682 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Xiao Chen >Assignee: Xiao Chen > Labels: hdfs-ec-3.0-must-do > > On a real cluster, {{hdfs ec -listPolicies}} will always show policy state as > DISABLED. > {noformat} > [hdfs@nightly6x-1 root]$ hdfs ec -listPolicies > Erasure Coding Policies: > ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5, State=DISABLED] > ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2, State=DISABLED] > ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1, State=DISABLED] > ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, > Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], > CellSize=1048576, Id=3, State=DISABLED] > ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, > numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4, State=DISABLED] > [hdfs@nightly6x-1 root]$ hdfs ec -getPolicy -path /ecec > XOR-2-1-1024k > {noformat} > This is because when [deserializing > protobuf|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java#L2942], > the static instance of [SystemErasureCodingPolicies > class|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SystemErasureCodingPolicies.java#L101] > is first checked, and always returns the cached policy objects, which are > created by default with state=DISABLED. > All the existing unit tests pass, because that static instance that the > client (e.g. ECAdmin) reads in unit test is updated by NN. :) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12667) KMSClientProvider#ValueQueue does synchronous fetch of edeks in background async thread.
[ https://issues.apache.org/jira/browse/HDFS-12667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211463#comment-16211463 ] Rushabh S Shah edited comment on HDFS-12667 at 10/19/17 6:06 PM: - bq. Sorry for breaking this, I'll investigate about the locking as well. hi [~xiaochen], thanks for commenting. We (I and Daryn) have an implementation idea which will remove the stripped locking and will lock individual queue. Give me couple of days to post 1st draft. Just to clarify I am just working on 1st point and not the second. was (Author: shahrs87): bq. Sorry for breaking this, I'll investigate about the locking as well. hi [~xiaochen], thanks for commenting. We (I and Daryn) have an implementation idea which will remove the stripped locking and will lock individual queue. Give me couple of days to post 1st draft. > KMSClientProvider#ValueQueue does synchronous fetch of edeks in background > async thread. > > > Key: HDFS-12667 > URL: https://issues.apache.org/jira/browse/HDFS-12667 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, kms >Affects Versions: 3.0.0-alpha4 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > > There are couple of issues in KMSClientProvider#ValueQueue. > 1. > {code:title=ValueQueue.java|borderStyle=solid} > private final LoadingCachekeyQueues; > // Stripped rwlocks based on key name to synchronize the queue from > // the sync'ed rw-thread and the background async refill thread. > private final List lockArray = > new ArrayList<>(LOCK_ARRAY_SIZE); > {code} > It hashes the key name into 16 buckets. > In the code chunk below, > {code:title=ValueQueue.java|borderStyle=solid} > public List getAtMost(String keyName, int num) throws IOException, > ExecutionException { > ... > ... > readLock(keyName); > E val = keyQueue.poll(); > readUnlock(keyName); > ... > } > private void submitRefillTask(final String keyName, > final Queue keyQueue) throws InterruptedException { > ... > ... > writeLock(keyName); // It holds the write lock while the key is > being asynchronously fetched. So the read requests for all the keys that > hashes to this bucket will essentially be blocked. > try { > if (keyQueue.size() < threshold && !isCanceled()) { > refiller.fillQueueForKey(name, keyQueue, > cacheSize - keyQueue.size()); > } > ... > } finally { > writeUnlock(keyName); > } > } > } > {code} > According to above code chunk, if two keys (lets say key1 and key2) hashes to > the same bucket (between 1 and 16), then if key1 is asynchronously being > refetched then all the getKey for key2 will be blocked. > 2. Due to stripped rw locks, the asynchronous behavior of refill keys is now > synchronous to other handler threads. > I understand that locks were added so that we don't kick off multiple > asynchronous refilling thread for the same key. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12667) KMSClientProvider#ValueQueue does synchronous fetch of edeks in background async thread.
[ https://issues.apache.org/jira/browse/HDFS-12667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211464#comment-16211464 ] Daryn Sharp commented on HDFS-12667: I was originally going to comment on another key roll occurring during the re-encrypt but deleted it because I already wrote too much. :). The inability to numerically compare is indeed unfortunate because I too thought we could take advantage of a version check. > KMSClientProvider#ValueQueue does synchronous fetch of edeks in background > async thread. > > > Key: HDFS-12667 > URL: https://issues.apache.org/jira/browse/HDFS-12667 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, kms >Affects Versions: 3.0.0-alpha4 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > > There are couple of issues in KMSClientProvider#ValueQueue. > 1. > {code:title=ValueQueue.java|borderStyle=solid} > private final LoadingCachekeyQueues; > // Stripped rwlocks based on key name to synchronize the queue from > // the sync'ed rw-thread and the background async refill thread. > private final List lockArray = > new ArrayList<>(LOCK_ARRAY_SIZE); > {code} > It hashes the key name into 16 buckets. > In the code chunk below, > {code:title=ValueQueue.java|borderStyle=solid} > public List getAtMost(String keyName, int num) throws IOException, > ExecutionException { > ... > ... > readLock(keyName); > E val = keyQueue.poll(); > readUnlock(keyName); > ... > } > private void submitRefillTask(final String keyName, > final Queue keyQueue) throws InterruptedException { > ... > ... > writeLock(keyName); // It holds the write lock while the key is > being asynchronously fetched. So the read requests for all the keys that > hashes to this bucket will essentially be blocked. > try { > if (keyQueue.size() < threshold && !isCanceled()) { > refiller.fillQueueForKey(name, keyQueue, > cacheSize - keyQueue.size()); > } > ... > } finally { > writeUnlock(keyName); > } > } > } > {code} > According to above code chunk, if two keys (lets say key1 and key2) hashes to > the same bucket (between 1 and 16), then if key1 is asynchronously being > refetched then all the getKey for key2 will be blocked. > 2. Due to stripped rw locks, the asynchronous behavior of refill keys is now > synchronous to other handler threads. > I understand that locks were added so that we don't kick off multiple > asynchronous refilling thread for the same key. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12667) KMSClientProvider#ValueQueue does synchronous fetch of edeks in background async thread.
[ https://issues.apache.org/jira/browse/HDFS-12667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211463#comment-16211463 ] Rushabh S Shah commented on HDFS-12667: --- bq. Sorry for breaking this, I'll investigate about the locking as well. hi [~xiaochen], thanks for commenting. We (I and Daryn) have an implementation idea which will remove the stripped locking and will lock individual queue. Give me couple of days to post 1st draft. > KMSClientProvider#ValueQueue does synchronous fetch of edeks in background > async thread. > > > Key: HDFS-12667 > URL: https://issues.apache.org/jira/browse/HDFS-12667 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, kms >Affects Versions: 3.0.0-alpha4 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > > There are couple of issues in KMSClientProvider#ValueQueue. > 1. > {code:title=ValueQueue.java|borderStyle=solid} > private final LoadingCachekeyQueues; > // Stripped rwlocks based on key name to synchronize the queue from > // the sync'ed rw-thread and the background async refill thread. > private final List lockArray = > new ArrayList<>(LOCK_ARRAY_SIZE); > {code} > It hashes the key name into 16 buckets. > In the code chunk below, > {code:title=ValueQueue.java|borderStyle=solid} > public List getAtMost(String keyName, int num) throws IOException, > ExecutionException { > ... > ... > readLock(keyName); > E val = keyQueue.poll(); > readUnlock(keyName); > ... > } > private void submitRefillTask(final String keyName, > final Queue keyQueue) throws InterruptedException { > ... > ... > writeLock(keyName); // It holds the write lock while the key is > being asynchronously fetched. So the read requests for all the keys that > hashes to this bucket will essentially be blocked. > try { > if (keyQueue.size() < threshold && !isCanceled()) { > refiller.fillQueueForKey(name, keyQueue, > cacheSize - keyQueue.size()); > } > ... > } finally { > writeUnlock(keyName); > } > } > } > {code} > According to above code chunk, if two keys (lets say key1 and key2) hashes to > the same bucket (between 1 and 16), then if key1 is asynchronously being > refetched then all the getKey for key2 will be blocked. > 2. Due to stripped rw locks, the asynchronous behavior of refill keys is now > synchronous to other handler threads. > I understand that locks were added so that we don't kick off multiple > asynchronous refilling thread for the same key. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12684) Ozone: SCM metrics NodeCount is overlapping with node manager metrics
[ https://issues.apache.org/jira/browse/HDFS-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211436#comment-16211436 ] Eric Yang commented on HDFS-12684: -- [~cheersyang] Would it make more sense to remove node manager metrics removed from HDFS project instead of StorageContainerManager node count? I am not sure if this is an overlap of YARN terminology with something in HDFS. It would be nice to keep NodeManager as a YARN terminology. > Ozone: SCM metrics NodeCount is overlapping with node manager metrics > - > > Key: HDFS-12684 > URL: https://issues.apache.org/jira/browse/HDFS-12684 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Minor > Attachments: HDFS-12684-HDFS-7240.001.patch > > > I found this issue while reviewing HDFS-11468, from http://scm_host:9876/jmx, > both SCM and SCMNodeManager has {{NodeCount}} metrics > {noformat} > { > "name" : > "Hadoop:service=StorageContainerManager,name=StorageContainerManagerInfo,component=ServerRuntime", > "modelerType" : "org.apache.hadoop.ozone.scm.StorageContainerManager", > "ClientRpcPort" : "9860", > "DatanodeRpcPort" : "9861", > "NodeCount" : [ { > "key" : "STALE", > "value" : 0 > }, { > "key" : "DECOMMISSIONING", > "value" : 0 > }, { > "key" : "DECOMMISSIONED", > "value" : 0 > }, { > "key" : "FREE_NODE", > "value" : 0 > }, { > "key" : "RAFT_MEMBER", > "value" : 0 > }, { > "key" : "HEALTHY", > "value" : 0 > }, { > "key" : "DEAD", > "value" : 0 > }, { > "key" : "UNKNOWN", > "value" : 0 > } ], > "CompileInfo" : "2017-10-17T06:47Z xxx", > "Version" : "3.1.0-SNAPSHOT, r6019a25908ce75155656f13effd8e2e53ed43461", > "SoftwareVersion" : "3.1.0-SNAPSHOT", > "StartedTimeInMillis" : 1508393551065 > }, { > "name" : "Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo", > "modelerType" : "org.apache.hadoop.ozone.scm.node.SCMNodeManager", > "NodeCount" : [ { > "key" : "STALE", > "value" : 0 > }, { > "key" : "DECOMMISSIONING", > "value" : 0 > }, { > "key" : "DECOMMISSIONED", > "value" : 0 > }, { > "key" : "FREE_NODE", > "value" : 0 > }, { > "key" : "RAFT_MEMBER", > "value" : 0 > }, { > "key" : "HEALTHY", > "value" : 0 > }, { > "key" : "DEAD", > "value" : 0 > }, { > "key" : "UNKNOWN", > "value" : 0 > } ], > "OutOfChillMode" : false, > "MinimumChillModeNodes" : 1, > "ChillModeStatus" : "Still in chill mode, waiting on nodes to report in. > 0 nodes reported, minimal 1 nodes required." > } > {noformat} > hence, propose to remove {{NodeCount}} from {{SCMMXBean}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12667) KMSClientProvider#ValueQueue does synchronous fetch of edeks in background async thread.
[ https://issues.apache.org/jira/browse/HDFS-12667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211418#comment-16211418 ] Xiao Chen commented on HDFS-12667: -- Thanks [~daryn] for elaborating. I agree the implementation can cause the problem. Updated the link to HDFS-11210 as a broken by. Sorry for breaking this, I'll investigate about the locking as well. Just want to extend the discussion on the second point: It's theoretically true that a create may release the lock, spend significant amount of time during generate, long enough that it only comes back after the re-encryption is issued and has gone past this file... In that case it sounds like we have to check the returned edeks with re-encryption (if any), after the create op reacquire that lock, right? That's a pretty head scratching scenario - generate again may solve it, but keyversion, being a String, can only be compared in an equalsTo fashion (rather than greaterThan / lessThan), so if someone roll the key on the KMS again during re-encryption (say v1->v2, and the re-encryption was submitted for the v0->v1 roll), every create after that point will have to generate twice - because re-encrypt isn't aware of the roll and still compares with v1, while the new creates are on v2, which for equalsTo comparison is indifferent than v0 v.s. v1. > KMSClientProvider#ValueQueue does synchronous fetch of edeks in background > async thread. > > > Key: HDFS-12667 > URL: https://issues.apache.org/jira/browse/HDFS-12667 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, kms >Affects Versions: 3.0.0-alpha4 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > > There are couple of issues in KMSClientProvider#ValueQueue. > 1. > {code:title=ValueQueue.java|borderStyle=solid} > private final LoadingCachekeyQueues; > // Stripped rwlocks based on key name to synchronize the queue from > // the sync'ed rw-thread and the background async refill thread. > private final List lockArray = > new ArrayList<>(LOCK_ARRAY_SIZE); > {code} > It hashes the key name into 16 buckets. > In the code chunk below, > {code:title=ValueQueue.java|borderStyle=solid} > public List getAtMost(String keyName, int num) throws IOException, > ExecutionException { > ... > ... > readLock(keyName); > E val = keyQueue.poll(); > readUnlock(keyName); > ... > } > private void submitRefillTask(final String keyName, > final Queue keyQueue) throws InterruptedException { > ... > ... > writeLock(keyName); // It holds the write lock while the key is > being asynchronously fetched. So the read requests for all the keys that > hashes to this bucket will essentially be blocked. > try { > if (keyQueue.size() < threshold && !isCanceled()) { > refiller.fillQueueForKey(name, keyQueue, > cacheSize - keyQueue.size()); > } > ... > } finally { > writeUnlock(keyName); > } > } > } > {code} > According to above code chunk, if two keys (lets say key1 and key2) hashes to > the same bucket (between 1 and 16), then if key1 is asynchronously being > refetched then all the getKey for key2 will be blocked. > 2. Due to stripped rw locks, the asynchronous behavior of refill keys is now > synchronous to other handler threads. > I understand that locks were added so that we don't kick off multiple > asynchronous refilling thread for the same key. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12638) NameNode exits due to ReplicationMonitor thread received Runtime exception in ReplicationWork#chooseTargets
[ https://issues.apache.org/jira/browse/HDFS-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211413#comment-16211413 ] Daryn Sharp commented on HDFS-12638: bq. Yes, I think our code should bear with such orphan blocks, instead of failing the NN with NPE like this. At least. See below, they aren't really orphaned. I think it's correct for the NN to crash if the namesystem data structures are corrupted. bq. I assume when the snapshot gets deleted, these blocks will be also removed from the blocks map. But before that, we need to live with such orphaned blocks To the block manager, replication monitor, etc these copy-on-truncate blocks are not (supposed to be) special. My prior point stated another way is the block is not orphaned if it's in a snapshot diff. INodes are not orphaned when only referenced via a snapshot diff. A block in the blocks map should not be referencing an inode not in the inodes map. Direct namespace accessibility is irrelevant to the block/inode/map linkages being correct. We need to fix the bug, not mask it. > NameNode exits due to ReplicationMonitor thread received Runtime exception in > ReplicationWork#chooseTargets > --- > > Key: HDFS-12638 > URL: https://issues.apache.org/jira/browse/HDFS-12638 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.2 >Reporter: Jiandan Yang > Attachments: HDFS-12638-branch-2.8.2.001.patch > > > Active NamNode exit due to NPE, I can confirm that the BlockCollection passed > in when creating ReplicationWork is null, but I do not know why > BlockCollection is null, By view history I found > [HDFS-9754|https://issues.apache.org/jira/browse/HDFS-9754] remove judging > whether BlockCollection is null. > NN logs are as following: > {code:java} > 2017-10-11 16:29:06,161 ERROR [ReplicationMonitor] > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > ReplicationMonitor thread received Runtime exception. > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:55) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1532) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1491) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3792) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3744) > at java.lang.Thread.run(Thread.java:834) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12686) Erasure coding system policy state is not correctly saved and loaded during real cluster restart
[ https://issues.apache.org/jira/browse/HDFS-12686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-12686: - Priority: Critical (was: Major) > Erasure coding system policy state is not correctly saved and loaded during > real cluster restart > > > Key: HDFS-12686 > URL: https://issues.apache.org/jira/browse/HDFS-12686 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-beta1 >Reporter: SammiChen >Assignee: SammiChen >Priority: Critical > Labels: hdfs-ec-3.0-must-do > > Inspired by HDFS-12682, I found the system erasure coding policy state will > not be correctly saved and loaded in a real cluster. Through there are such > kind of unit tests and all are passed with MiniCluster. It's because the > MiniCluster keeps the same static system erasure coding policy object after > the NN restart operation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12680) Ozone: SCM: Lease support for container creation
[ https://issues.apache.org/jira/browse/HDFS-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211373#comment-16211373 ] Hadoop QA commented on HDFS-12680: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 15s{color} | {color:red} Docker failed to build yetus/hadoop:71bbb86. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-12680 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893072/HDFS-12680-HDFS-7240.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/21748/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Ozone: SCM: Lease support for container creation > > > Key: HDFS-12680 > URL: https://issues.apache.org/jira/browse/HDFS-12680 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nanda kumar >Assignee: Nanda kumar > Labels: ozoneMerge > Attachments: HDFS-12680-HDFS-7240.000.patch, > HDFS-12680-HDFS-7240.001.patch > > > This brings in lease support for container creation. > Lease should be give for a container that is moved to {{CREATING}} state when > {{BEGIN_CREATE}} event happens, {{LeaseException}} should be thrown if the > container already holds a lease. Lease must be released during > {{COMPLETE_CREATE}} event. If the lease times out container should be moved > to {{DELETING}} state, and exception should be thrown if {{COMPLETE_CREATE}} > event is received on that container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12680) Ozone: SCM: Lease support for container creation
[ https://issues.apache.org/jira/browse/HDFS-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211372#comment-16211372 ] Nanda kumar commented on HDFS-12680: Thanks [~anu] & [~linyiqun] for the review. Please find my response below, also updated the patch (v001) based on the review comments. bq. Not that it is going to make any difference, just want to make sure that this is a conscious decision. Yeah, this was a conscious decision. If we don’t move the container to {{CREATING}} state during allocate block, we might give same container to multiple clients with create flag set. This will again bring in the issue of two clients trying to create same container. bq. Is there a use case where this is needed. This change was required because the state can change for a container but that should not result in different hashCode for the same container. In particular when we acquire a lease, LeaseManager uses the resource as key in a HashMap (ContainerInfo is the resource here) and while trying to release (after state change) if we don’t get a same hashCode we will not be able to find it. Since state is used in {{ContainerInfo#equals()}} we should not get any unexpected behavior. bq. should we add this to ozone-conf.xml? done bq. but what happens after the timeOut (OzoneProtos.LifeCycleEvent.TIMEOUT) The container is moved to {{DELETING}} state, we have to do clean up after that. I will create another jira to track it. bq. conf.setInt? done bq. It would be better to define a var TIMEOUT=1 and reuse this var in the test method. done bq. We should increase sleep time done bq. Can you add an additional check Since {{LeaseNotFoundException}} is wrapped with {{IOException}}, added {{thrown.expect(IOException.class)}} bq. The following lines don't executes in test. Fixed > Ozone: SCM: Lease support for container creation > > > Key: HDFS-12680 > URL: https://issues.apache.org/jira/browse/HDFS-12680 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nanda kumar >Assignee: Nanda kumar > Labels: ozoneMerge > Attachments: HDFS-12680-HDFS-7240.000.patch, > HDFS-12680-HDFS-7240.001.patch > > > This brings in lease support for container creation. > Lease should be give for a container that is moved to {{CREATING}} state when > {{BEGIN_CREATE}} event happens, {{LeaseException}} should be thrown if the > container already holds a lease. Lease must be released during > {{COMPLETE_CREATE}} event. If the lease times out container should be moved > to {{DELETING}} state, and exception should be thrown if {{COMPLETE_CREATE}} > event is received on that container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12680) Ozone: SCM: Lease support for container creation
[ https://issues.apache.org/jira/browse/HDFS-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar updated HDFS-12680: --- Attachment: HDFS-12680-HDFS-7240.001.patch > Ozone: SCM: Lease support for container creation > > > Key: HDFS-12680 > URL: https://issues.apache.org/jira/browse/HDFS-12680 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nanda kumar >Assignee: Nanda kumar > Labels: ozoneMerge > Attachments: HDFS-12680-HDFS-7240.000.patch, > HDFS-12680-HDFS-7240.001.patch > > > This brings in lease support for container creation. > Lease should be give for a container that is moved to {{CREATING}} state when > {{BEGIN_CREATE}} event happens, {{LeaseException}} should be thrown if the > container already holds a lease. Lease must be released during > {{COMPLETE_CREATE}} event. If the lease times out container should be moved > to {{DELETING}} state, and exception should be thrown if {{COMPLETE_CREATE}} > event is received on that container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12619) Do not catch and throw unchecked exceptions if IBRs fail to process
[ https://issues.apache.org/jira/browse/HDFS-12619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211355#comment-16211355 ] Xiao Chen commented on HDFS-12619: -- Thanks [~jojochuang], trunk maps to 3.1.0 now. I think you should also cherry-pick to branch-3.0. :) > Do not catch and throw unchecked exceptions if IBRs fail to process > --- > > Key: HDFS-12619 > URL: https://issues.apache.org/jira/browse/HDFS-12619 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Minor > Fix For: 2.9.0, 2.8.3, 3.0.0 > > Attachments: HDFS-12619.001.patch > > > HDFS-9198 added the following code > {code:title=BlockManager#processIncrementalBlockReport} > public void processIncrementalBlockReport(final DatanodeID nodeID, > final StorageReceivedDeletedBlocks srdb) throws IOException { > ... > try { > processIncrementalBlockReport(node, srdb); > } catch (Exception ex) { > node.setForceRegistration(true); > throw ex; > } > } > {code} > In Apache Hadoop 2.7.x ~ 3.0, the code snippet is accepted by Java compiler. > However, when I attempted to backport it to a CDH5.3 release (based on Apache > Hadoop 2.5.0), the compiler complains the exception is unhandled, because the > method defines it throws IOException instead of Exception. > While the code compiles for Apache Hadoop 2.7.x ~ 3.0, I feel it is not a > good practice to catch an unchecked exception and then rethrow it. How about > rewriting it with a finally block and a conditional variable? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12667) KMSClientProvider#ValueQueue does synchronous fetch of edeks in background async thread.
[ https://issues.apache.org/jira/browse/HDFS-12667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211324#comment-16211324 ] Daryn Sharp commented on HDFS-12667: The requirement to ensure new edeks is fine. The problem is the implementation. It may severely penalize the common case performance for a rare event. We are investigating alternate designs because the locking has to go. First, the new locks protecting access to key's blocking queue negates concurrency. When the queue is being refilled, all edek requests are blocked until the refill is done – even if there are still edeks available. Furthermore the striped locking causes edek requests for other requests to unnecessarily block too. This is an unacceptable penalty to the common case. Second, the base requirement motivating the locks was to ensure after a reencrypt starts that no new creates will use old edeks. However it appears to not be correct. A create op may release the fsn lock, fetch an old edek, a reencrypt is issued which drains the queue and now expects new edeks, the in-progress create reacquires the lock and uses the old edek. The race condition is tight, probably only an issue if the waiting creates should have been in the first batch, but it's wrong. We are trying to integrate a internal patch for an async policy to prevent blocking a handler – doesn't matter that the fsn lock is released, blocking a handler is unacceptable. The new locking model is incompatible. Namely it forces the poll to be protected by the lock so the edek fetch cannot short out. > KMSClientProvider#ValueQueue does synchronous fetch of edeks in background > async thread. > > > Key: HDFS-12667 > URL: https://issues.apache.org/jira/browse/HDFS-12667 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, kms >Affects Versions: 3.0.0-alpha4 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > > There are couple of issues in KMSClientProvider#ValueQueue. > 1. > {code:title=ValueQueue.java|borderStyle=solid} > private final LoadingCachekeyQueues; > // Stripped rwlocks based on key name to synchronize the queue from > // the sync'ed rw-thread and the background async refill thread. > private final List lockArray = > new ArrayList<>(LOCK_ARRAY_SIZE); > {code} > It hashes the key name into 16 buckets. > In the code chunk below, > {code:title=ValueQueue.java|borderStyle=solid} > public List getAtMost(String keyName, int num) throws IOException, > ExecutionException { > ... > ... > readLock(keyName); > E val = keyQueue.poll(); > readUnlock(keyName); > ... > } > private void submitRefillTask(final String keyName, > final Queue keyQueue) throws InterruptedException { > ... > ... > writeLock(keyName); // It holds the write lock while the key is > being asynchronously fetched. So the read requests for all the keys that > hashes to this bucket will essentially be blocked. > try { > if (keyQueue.size() < threshold && !isCanceled()) { > refiller.fillQueueForKey(name, keyQueue, > cacheSize - keyQueue.size()); > } > ... > } finally { > writeUnlock(keyName); > } > } > } > {code} > According to above code chunk, if two keys (lets say key1 and key2) hashes to > the same bucket (between 1 and 16), then if key1 is asynchronously being > refetched then all the getKey for key2 will be blocked. > 2. Due to stripped rw locks, the asynchronous behavior of refill keys is now > synchronous to other handler threads. > I understand that locks were added so that we don't kick off multiple > asynchronous refilling thread for the same key. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12620) Backporting HDFS-10467 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-12620: --- Attachment: HDFS-12620-branch-2.011.patch > Backporting HDFS-10467 to branch-2 > -- > > Key: HDFS-12620 > URL: https://issues.apache.org/jira/browse/HDFS-12620 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri > Attachments: HDFS-10467-branch-2.001.patch, > HDFS-10467-branch-2.002.patch, HDFS-10467-branch-2.003.patch, > HDFS-10467-branch-2.patch, HDFS-12620-branch-2.000.patch, > HDFS-12620-branch-2.004.patch, HDFS-12620-branch-2.005.patch, > HDFS-12620-branch-2.006.patch, HDFS-12620-branch-2.007.patch, > HDFS-12620-branch-2.008.patch, HDFS-12620-branch-2.009.patch, > HDFS-12620-branch-2.010.patch, HDFS-12620-branch-2.011.patch > > > When backporting HDFS-10467, there are a few things that changed: > * {{bin\hdfs}} > * {{ClientProtocol}} > * Java 7 not supporting referencing functions > * {{org.eclipse.jetty.util.ajax.JSON}} in branch-2 is > {{org.mortbay.util.ajax.JSON}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11807) libhdfs++: Get minidfscluster tests running under valgrind
[ https://issues.apache.org/jira/browse/HDFS-11807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211232#comment-16211232 ] James Clampffer commented on HDFS-11807: This seems to hang forever in libhdfs_mini_stress_valgrind_hdfspp_test_shim_static - I don't see memcheck/valgrind running and the test isn't using any CPU. During the build the compiler complains a lot about not checking results from the read() and write() calls to the IPC socket which makes me think the main process is stuck waiting on the side process to say it's done. > libhdfs++: Get minidfscluster tests running under valgrind > -- > > Key: HDFS-11807 > URL: https://issues.apache.org/jira/browse/HDFS-11807 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: Anatoli Shein > Attachments: HDFS-11807.HDFS-8707.000.patch > > > The gmock based unit tests generally don't expose race conditions and memory > stomps. A good way to expose these is running libhdfs++ stress tests and > tools under valgrind and pointing them at a real cluster. Right now the CI > tools don't do that so bugs occasionally slip in and aren't caught until they > cause trouble in applications that use libhdfs++ for HDFS access. > The reason the minidfscluster tests don't run under valgrind is because the > GC and JIT compiler in the embedded JVM do things that look like errors to > valgrind. I'd like to have these tests do some basic setup and then fork > into two processes: one for the minidfscluster stuff and one for the > libhdfs++ client test. A small amount of shared memory can be used to > provide a place for the minidfscluster to stick the hdfsBuilder object that > the client needs to get info about which port to connect to. Can also stick > a condition variable there to let the minidfscluster know when it can shut > down. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12687) Client has recovered DN will not be removed from the “filed”
xuzq created HDFS-12687: --- Summary: Client has recovered DN will not be removed from the “filed” Key: HDFS-12687 URL: https://issues.apache.org/jira/browse/HDFS-12687 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 2.8.1 Reporter: xuzq When client writing pipeline, such as Client=>DN1=>DN2=DN3.At one point, DN2 crashed, client will execute the recovery process. The error DN2 will be added into "failed". Client will apply a new DN from NN with "failed" and replace the DN2 in the pipeline, eg: Client=>DN1=>DN4=>DN3. This Client running After a long time, client is still writing data for the file. Of course, there are many pipelines. eg. Client => DN-1 => DN-2 => DN-3. When DN-2 crashed, error DN-2 will be added into "failed", client will execute the recovery process as before. It will get a new DN from NN with the "failed", and {color:red}NN will select one DN from all DNs exclude "failed", even if DN-2 has restarted{color}. Questions: Why not remove DN2(started) from "failed"?? Why is the collection of error nodes in the recovery process Shared with the get next Block.such as private final List failed = new ArrayList<>(); private final LoadingCacheexcludedNodes; As Before, when DN2 crashed, client will recovery the pipeline after timeout(default worst need 490s). When the client finished writing this block and apply the next block, NN maybe return the block which contains the error data node 'DN2'. When client will create a new pipeline for the new block, {color:red}client will has to go through a connection timeout{color}(default need 60s). If "failed" and "excludedNodes" is one collection, it will avoid the connection timeout. Because "excludedNodes" is dynamically deleted, it also avoid the first problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12619) Do not catch and throw unchecked exceptions if IBRs fail to process
[ https://issues.apache.org/jira/browse/HDFS-12619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-12619: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0 2.8.3 2.9.0 Status: Resolved (was: Patch Available) Thanks [~hanishakoneru] and [~xiaochen] for the review. Patch 001 was committed in trunk (3.0.0), branch-2 (2.9.0) and branch-2.8 (2.8.3) > Do not catch and throw unchecked exceptions if IBRs fail to process > --- > > Key: HDFS-12619 > URL: https://issues.apache.org/jira/browse/HDFS-12619 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Minor > Fix For: 2.9.0, 2.8.3, 3.0.0 > > Attachments: HDFS-12619.001.patch > > > HDFS-9198 added the following code > {code:title=BlockManager#processIncrementalBlockReport} > public void processIncrementalBlockReport(final DatanodeID nodeID, > final StorageReceivedDeletedBlocks srdb) throws IOException { > ... > try { > processIncrementalBlockReport(node, srdb); > } catch (Exception ex) { > node.setForceRegistration(true); > throw ex; > } > } > {code} > In Apache Hadoop 2.7.x ~ 3.0, the code snippet is accepted by Java compiler. > However, when I attempted to backport it to a CDH5.3 release (based on Apache > Hadoop 2.5.0), the compiler complains the exception is unhandled, because the > method defines it throws IOException instead of Exception. > While the code compiles for Apache Hadoop 2.7.x ~ 3.0, I feel it is not a > good practice to catch an unchecked exception and then rethrow it. How about > rewriting it with a finally block and a conditional variable? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12620) Backporting HDFS-10467 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211004#comment-16211004 ] Hadoop QA commented on HDFS-12620: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 54s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue} 0m 0s{color} | {color:blue} Shelldocs was not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 25 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 37s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 25s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s{color} | {color:green} branch-2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 45s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 11 new + 624 unchanged - 0 fixed = 635 total (was 624) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 2s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}713m 27s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 16m 16s{color} | {color:red} The patch generated 68 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}769m 5s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.snapshot.TestSnapshotManager | | | hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshot | | | hadoop.hdfs.server.namenode.TestSecondaryWebUi | | | hadoop.hdfs.server.namenode.TestFavoredNodesEndToEnd | | | hadoop.hdfs.server.namenode.ha.TestHAStateTransitions | | | hadoop.hdfs.server.datanode.TestRefreshNamenodes | | | hadoop.hdfs.server.datanode.TestDataNodeUUID | | | hadoop.hdfs.TestTrashWithSecureEncryptionZones | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.server.datanode.TestBlockHasMultipleReplicasOnSameDN | | | hadoop.hdfs.server.datanode.TestLargeBlockReport | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength | | | hadoop.hdfs.server.datanode.TestDataNodeFaultInjector | | | hadoop.hdfs.server.namenode.TestCommitBlockWithInvalidGenStamp | | | hadoop.hdfs.server.namenode.TestAuditLogger | | |
[jira] [Commented] (HDFS-12665) [AliasMap] Create a version of the AliasMap that runs in memory in the Namenode (leveldb)
[ https://issues.apache.org/jira/browse/HDFS-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210890#comment-16210890 ] Hadoop QA commented on HDFS-12665: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 11s{color} | {color:red} Docker failed to build yetus/hadoop:71bbb86. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-12665 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893034/HDFS-12665-HDFS-9806.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/21747/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > [AliasMap] Create a version of the AliasMap that runs in memory in the > Namenode (leveldb) > - > > Key: HDFS-12665 > URL: https://issues.apache.org/jira/browse/HDFS-12665 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ewan Higgs >Assignee: Ewan Higgs > Attachments: HDFS-12665-HDFS-9806.001.patch > > > The design of Provided Storage requires the use of an AliasMap to manage the > mapping between blocks of files on the local HDFS and ranges of files on a > remote storage system. To reduce load from the Namenode, this can be done > using a pluggable external service (e.g. AzureTable, Cassandra, Ratis). > However, to aide adoption and ease of deployment, we propose an in memory > version. > This AliasMap will be a wrapper around LevelDB (already a dependency from the > Timeline Service) and use protobuf for the key (blockpool, blockid, and > genstamp) and the value (url, offset, length, nonce). The in memory service > will also have a configurable port on which it will listen for updates from > Storage Policy Satisfier (SPS) Coordinating Datanodes (C-DN). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12665) [AliasMap] Create a version of the AliasMap that runs in memory in the Namenode (leveldb)
[ https://issues.apache.org/jira/browse/HDFS-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ewan Higgs updated HDFS-12665: -- Status: Patch Available (was: Open) > [AliasMap] Create a version of the AliasMap that runs in memory in the > Namenode (leveldb) > - > > Key: HDFS-12665 > URL: https://issues.apache.org/jira/browse/HDFS-12665 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ewan Higgs >Assignee: Ewan Higgs > Attachments: HDFS-12665-HDFS-9806.001.patch > > > The design of Provided Storage requires the use of an AliasMap to manage the > mapping between blocks of files on the local HDFS and ranges of files on a > remote storage system. To reduce load from the Namenode, this can be done > using a pluggable external service (e.g. AzureTable, Cassandra, Ratis). > However, to aide adoption and ease of deployment, we propose an in memory > version. > This AliasMap will be a wrapper around LevelDB (already a dependency from the > Timeline Service) and use protobuf for the key (blockpool, blockid, and > genstamp) and the value (url, offset, length, nonce). The in memory service > will also have a configurable port on which it will listen for updates from > Storage Policy Satisfier (SPS) Coordinating Datanodes (C-DN). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12665) [AliasMap] Create a version of the AliasMap that runs in memory in the Namenode (leveldb)
[ https://issues.apache.org/jira/browse/HDFS-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ewan Higgs updated HDFS-12665: -- Attachment: HDFS-12665-HDFS-9806.001.patch Attaching work from WDC implementing the In Memory AliasMap. This work is rebased on top of HDFS-11902. > [AliasMap] Create a version of the AliasMap that runs in memory in the > Namenode (leveldb) > - > > Key: HDFS-12665 > URL: https://issues.apache.org/jira/browse/HDFS-12665 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ewan Higgs >Assignee: Ewan Higgs > Attachments: HDFS-12665-HDFS-9806.001.patch > > > The design of Provided Storage requires the use of an AliasMap to manage the > mapping between blocks of files on the local HDFS and ranges of files on a > remote storage system. To reduce load from the Namenode, this can be done > using a pluggable external service (e.g. AzureTable, Cassandra, Ratis). > However, to aide adoption and ease of deployment, we propose an in memory > version. > This AliasMap will be a wrapper around LevelDB (already a dependency from the > Timeline Service) and use protobuf for the key (blockpool, blockid, and > genstamp) and the value (url, offset, length, nonce). The in memory service > will also have a configurable port on which it will listen for updates from > Storage Policy Satisfier (SPS) Coordinating Datanodes (C-DN). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-12665) [AliasMap] Create a version of the AliasMap that runs in memory in the Namenode (leveldb)
[ https://issues.apache.org/jira/browse/HDFS-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ewan Higgs reassigned HDFS-12665: - Assignee: Ewan Higgs > [AliasMap] Create a version of the AliasMap that runs in memory in the > Namenode (leveldb) > - > > Key: HDFS-12665 > URL: https://issues.apache.org/jira/browse/HDFS-12665 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ewan Higgs >Assignee: Ewan Higgs > > The design of Provided Storage requires the use of an AliasMap to manage the > mapping between blocks of files on the local HDFS and ranges of files on a > remote storage system. To reduce load from the Namenode, this can be done > using a pluggable external service (e.g. AzureTable, Cassandra, Ratis). > However, to aide adoption and ease of deployment, we propose an in memory > version. > This AliasMap will be a wrapper around LevelDB (already a dependency from the > Timeline Service) and use protobuf for the key (blockpool, blockid, and > genstamp) and the value (url, offset, length, nonce). The in memory service > will also have a configurable port on which it will listen for updates from > Storage Policy Satisfier (SPS) Coordinating Datanodes (C-DN). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11468) Ozone: SCM: Add Node Metrics for SCM
[ https://issues.apache.org/jira/browse/HDFS-11468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210870#comment-16210870 ] Weiwei Yang commented on HDFS-11468: Hi [~linyiqun] sounds good to me, please go ahead submitting a patch, thanks a lot! > Ozone: SCM: Add Node Metrics for SCM > > > Key: HDFS-11468 > URL: https://issues.apache.org/jira/browse/HDFS-11468 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Xiaoyu Yao >Assignee: Yiqun Lin >Priority: Critical > Labels: OzonePostMerge > Attachments: HDFS-11468-HDFS-7240.001.patch, > HDFS-11468-HDFS-7240.002.patch, HDFS-11468-HDFS-7240.003.patch > > > This ticket is opened to add node metrics in SCM based on heartbeat, node > report and container report from datanodes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12677) Extend TestReconstructStripedFile with a random EC policy
[ https://issues.apache.org/jira/browse/HDFS-12677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210802#comment-16210802 ] Hadoop QA commented on HDFS-12677: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 11s{color} | {color:red} Docker failed to build yetus/hadoop:0de40f0. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-12677 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893020/HDFS-12677.1.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/21746/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Extend TestReconstructStripedFile with a random EC policy > - > > Key: HDFS-12677 > URL: https://issues.apache.org/jira/browse/HDFS-12677 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, test >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma > Attachments: HDFS-12677.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12677) Extend TestReconstructStripedFile with a random EC policy
[ https://issues.apache.org/jira/browse/HDFS-12677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-12677: Status: Patch Available (was: Open) > Extend TestReconstructStripedFile with a random EC policy > - > > Key: HDFS-12677 > URL: https://issues.apache.org/jira/browse/HDFS-12677 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, test >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma > Attachments: HDFS-12677.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12677) Extend TestReconstructStripedFile with a random EC policy
[ https://issues.apache.org/jira/browse/HDFS-12677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-12677: Attachment: HDFS-12677.1.patch Uploaded the 1st patch. The new test class with a random ec policy extends {{TestReconstructStripedFile}} with a few changes. When the ec policy is {{XOR-2-1-1024k}}, this assertion fails. {code:title=TestReconstructStripedFile#testNNSendsErasureCodingTasks|borderStyle=solid} assertTrue(policy.getNumParityUnits() >= deadDN); {code} I checked the code and this assertion seems not to make sense. So the 1st patch removes it. I confirmed that all EC policies pass the all tests with the patch in my local computer. > Extend TestReconstructStripedFile with a random EC policy > - > > Key: HDFS-12677 > URL: https://issues.apache.org/jira/browse/HDFS-12677 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, test >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma > Attachments: HDFS-12677.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12682) ECAdmin -listPolicies will always show policy state as DISABLED
[ https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210790#comment-16210790 ] SammiChen commented on HDFS-12682: -- Hi [~xiaochen], thanks for reporting this issue. Inspired by your discovery, I found the same issue exists in system EC persist into and load from fsImage (HFDS-12686). The current convertErasureCodingPolicy function is perfect in most cases. For special cases, like get all erasure coding policy and persist policy into fsImage, I think we need a new edition for full convert. {quote} The problem I see from HDFS-12258's implementation though, is the mutable ECPS is saved on the immutable ECP, breaking assumptions such as shared single instance policy. At the same time the policy is still not persisted independently. I think ECPS is highly dependent on the missing piece from HDFS-7337: policies are not persisted to NN metadata. The state of whether a policy is enabled could be persisted together with the policy, without impacting HDFSFileStatus. {quote} Persist ec policies is implemented in HDFS-7337. {quote} I think this bug (HDFS-12682) and HDFS-12258 would make more sense if we could first persist policies to NN metadata. Would also be helpful to separate out something like ErasureCodingPolicyAndState for the policy-specific APIs, so the state isn't deserialized onto HDFSFileStatus. {quote} For HDFS-12258, [~zhouwei], [~drankye] and I, we discussed and do have two different approaches when we first think about how to implement it. One is the current implemented approach, which add one extra "state" field in the existing ECP definition. Another is define a new class, something like {{ErasureCodingPolicyWithState}} to hold the EPC and new policy state field. They are almost equally good. The only concern is if we introduce the new {{ErasureCodingPolicyWithState}}, it may introduce complexity to API interfaces, and to end users. There are multiple EC related APIs. If we return {{ErasureCodingPolicyWithState}} for {{getAllErasureCodingPolicies}} , should we return {{ErasureCodingPolicyWithState}} or {{ErasureCodingPolicy}} for {{getErasureCodingPolicy}}? something like that. Also is it worth to introduce a new class definition in Hadoop which only has 1 extra new field? After all the considerations, the current approach is chosen to leverage the existing ECP. Please let me know if you have other concerns. Thanks! > ECAdmin -listPolicies will always show policy state as DISABLED > --- > > Key: HDFS-12682 > URL: https://issues.apache.org/jira/browse/HDFS-12682 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Xiao Chen >Assignee: Xiao Chen > Labels: hdfs-ec-3.0-must-do > > On a real cluster, {{hdfs ec -listPolicies}} will always show policy state as > DISABLED. > {noformat} > [hdfs@nightly6x-1 root]$ hdfs ec -listPolicies > Erasure Coding Policies: > ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5, State=DISABLED] > ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2, State=DISABLED] > ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1, State=DISABLED] > ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, > Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], > CellSize=1048576, Id=3, State=DISABLED] > ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, > numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4, State=DISABLED] > [hdfs@nightly6x-1 root]$ hdfs ec -getPolicy -path /ecec > XOR-2-1-1024k > {noformat} > This is because when [deserializing > protobuf|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java#L2942], > the static instance of [SystemErasureCodingPolicies > class|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SystemErasureCodingPolicies.java#L101] > is first checked, and always returns the cached policy objects, which are > created by default with state=DISABLED. > All the existing unit tests pass, because that static instance that the > client (e.g. ECAdmin) reads in unit test is updated by NN. :) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Comment Edited] (HDFS-11468) Ozone: SCM: Add Node Metrics for SCM
[ https://issues.apache.org/jira/browse/HDFS-11468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210765#comment-16210765 ] Yiqun Lin edited comment on HDFS-11468 at 10/19/17 9:27 AM: Thanks for the response, [~cheersyang]. I am thinking for this again . I will update the metric name for container stat metrics. bq. These are SCM level metrics right? Is it a bit overlapping with SCMMXBean? Yes, it looks node metrics in SCMMetrics has some overlapping with SCMMXBean. I looked into two similar classes NameNodeMXBean and NameNodeMetrics. The node count info is intended showed in Jmx interface rather than as metrics. So I'd like to remove node metrics and only keep container stat metrics in class SCMMetrics. But I insist on one point that we should kepp these two classes alone and not merge all metrics. That is mean the info exposed in *Metrics and *MXBean is a little different, like we have KSMMetrics and KSMMxBean, NameNodeMetrics and NameNodeMXBean. Does it looks good to you now, [~cheersyang]? Will attach the new patch after your response. was (Author: linyiqun): Thanks for the response, [~cheersyang]. I am thinking for this again . I will update the metric name for container stat metrics. bq. These are SCM level metrics right? Is it a bit overlapping with SCMMXBean? Yes, it looks node metrics in SCMMetrics has some overlapping with SCMMXBean. I looked into two similar classes NameNodeMXBean and NameNodeMetrics. The node count info is intended showed in Jmx interface rather than as metrics. So I'd like to remove node metrics and only keep container stat metrics in class SCMMetrics. But I insist on one point that we should kepp these two classes alone and not merge all metrics. That is mean the info exposed in *Metrics and *MXBean is a little different, like we have KSMMetrics and KSMMxBean, NameNodeMetrics and NameNodeMXBean. Does it looks good to you now, [~cheersyang]? > Ozone: SCM: Add Node Metrics for SCM > > > Key: HDFS-11468 > URL: https://issues.apache.org/jira/browse/HDFS-11468 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Xiaoyu Yao >Assignee: Yiqun Lin >Priority: Critical > Labels: OzonePostMerge > Attachments: HDFS-11468-HDFS-7240.001.patch, > HDFS-11468-HDFS-7240.002.patch, HDFS-11468-HDFS-7240.003.patch > > > This ticket is opened to add node metrics in SCM based on heartbeat, node > report and container report from datanodes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11468) Ozone: SCM: Add Node Metrics for SCM
[ https://issues.apache.org/jira/browse/HDFS-11468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210765#comment-16210765 ] Yiqun Lin commented on HDFS-11468: -- Thanks for the response, [~cheersyang]. I am thinking for this again . I will update the metric name for container stat metrics. bq. These are SCM level metrics right? Is it a bit overlapping with SCMMXBean? Yes, it looks node metrics in SCMMetrics has some overlapping with SCMMXBean. I looked into two similar classes NameNodeMXBean and NameNodeMetrics. The node count info is intended showed in Jmx interface rather than as metrics. So I'd like to remove node metrics and only keep container stat metrics in class SCMMetrics. But I insist on one point that we should kepp these two classes alone and not merge all metrics. That is mean the info exposed in *Metrics and *MXBean is a little different, like we have KSMMetrics and KSMMxBean, NameNodeMetrics and NameNodeMXBean. Does it looks good to you now, [~cheersyang]? > Ozone: SCM: Add Node Metrics for SCM > > > Key: HDFS-11468 > URL: https://issues.apache.org/jira/browse/HDFS-11468 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Xiaoyu Yao >Assignee: Yiqun Lin >Priority: Critical > Labels: OzonePostMerge > Attachments: HDFS-11468-HDFS-7240.001.patch, > HDFS-11468-HDFS-7240.002.patch, HDFS-11468-HDFS-7240.003.patch > > > This ticket is opened to add node metrics in SCM based on heartbeat, node > report and container report from datanodes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12686) Erasure coding system policy state is not correctly saved and loaded during real cluster restart
SammiChen created HDFS-12686: Summary: Erasure coding system policy state is not correctly saved and loaded during real cluster restart Key: HDFS-12686 URL: https://issues.apache.org/jira/browse/HDFS-12686 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0-beta1 Reporter: SammiChen Assignee: SammiChen Inspired by HDFS-12682, I found the system erasure coding policy state will not be correctly saved and loaded in a real cluster. Through there are such kind of unit tests and all are passed with MiniCluster. It's because the MiniCluster keeps the same static system erasure coding policy object after the NN restart operation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12685) [READ] FsVolumeImpl exception when scanning Provided storage volume
[ https://issues.apache.org/jira/browse/HDFS-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ewan Higgs updated HDFS-12685: -- Description: I left a Datanode running overnight and found this in the logs in the morning: {code} 2017-10-18 23:51:54,391 ERROR datanode.DirectoryScanner: Error compiling report for the volume, StorageId: DS-e75ebc3c-6b12-424e-875a-a4ae1a4dcc29 java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: URI scheme is not "file" at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.getDiskReport(DirectoryScanner.java:544) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:393) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalArgumentException: URI scheme is not "file" at java.io.File.(File.java:421) at org.apache.hadoop.hdfs.server.datanode.fsdataset.FsVolumeSpi$ScanInfo.(FsVolumeSpi.java:319) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ProvidedVolumeImpl$ProvidedBlockPoolSlice.compileReport(ProvidedVolumeImpl.java:155) at
[jira] [Updated] (HDFS-12685) [READ] FsVolumeImpl exception when scanning Provided storage volume
[ https://issues.apache.org/jira/browse/HDFS-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ewan Higgs updated HDFS-12685: -- Summary: [READ] FsVolumeImpl exception when scanning Provided storage volume (was: FsVolumeImpl exception when scanning Provided storage volume) > [READ] FsVolumeImpl exception when scanning Provided storage volume > --- > > Key: HDFS-12685 > URL: https://issues.apache.org/jira/browse/HDFS-12685 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ewan Higgs > > I left a Datanode running overnight and found this in the logs in the morning: > {code} > 2017-10-18 23:51:54,391 ERROR datanode.DirectoryScanner: Error compiling > report for the volume, StorageId: DS-e75ebc3c-6b12-424e-875a-a4ae1a4dcc29 > > > java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: > URI scheme is not "file" > > > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > > > > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > > > > at > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.getDiskReport(DirectoryScanner.java:544) > > > at > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:393) > > > at > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375) > > > at > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320) > > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > > > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > > > > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > > > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > > at java.lang.Thread.run(Thread.java:748) > > > > Caused by: java.lang.IllegalArgumentException: URI scheme is not "file" > > > > at java.io.File.(File.java:421) > >
[jira] [Created] (HDFS-12685) FsVolumeImpl exception when scanning Provided storage volume
Ewan Higgs created HDFS-12685: - Summary: FsVolumeImpl exception when scanning Provided storage volume Key: HDFS-12685 URL: https://issues.apache.org/jira/browse/HDFS-12685 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ewan Higgs I left a Datanode running overnight and found this in the logs in the morning: {code} 2017-10-18 23:51:54,391 ERROR datanode.DirectoryScanner: Error compiling report for the volume, StorageId: DS-e75ebc3c-6b12-424e-875a-a4ae1a4dcc29 java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: URI scheme is not "file" at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.getDiskReport(DirectoryScanner.java:544) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:393) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalArgumentException: URI scheme is not "file" at java.io.File.(File.java:421) at org.apache.hadoop.hdfs.server.datanode.fsdataset.FsVolumeSpi$ScanInfo.(FsVolumeSpi.java:319) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ProvidedVolumeImpl$ProvidedBlockPoolSlice.compileReport(ProvidedVolumeImpl.java:155)
[jira] [Updated] (HDFS-12502) nntop should support a category based on FilesInGetListingOps
[ https://issues.apache.org/jira/browse/HDFS-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-12502: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.1.0 3.0.0 2.7.5 2.8.3 2.9.0 Status: Resolved (was: Patch Available) Thanks for the review [~shv]. I just committed the patch to trunk~branch-2.7. > nntop should support a category based on FilesInGetListingOps > - > > Key: HDFS-12502 > URL: https://issues.apache.org/jira/browse/HDFS-12502 > Project: Hadoop HDFS > Issue Type: Improvement > Components: metrics >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Fix For: 2.9.0, 2.8.3, 2.7.5, 3.0.0, 3.1.0 > > Attachments: HDFS-12502.00.patch, HDFS-12502.01.patch, > HDFS-12502.02.patch, HDFS-12502.03.patch, HDFS-12502.04.patch > > > Large listing ops can oftentimes be the main contributor to NameNode > slowness. The aggregate cost of listing ops is proportional to the > {{FilesInGetListingOps}} rather than the number of listing ops. Therefore > it'd be very useful for nntop to support this category. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12684) Ozone: SCM metrics NodeCount is overlapping with node manager metrics
[ https://issues.apache.org/jira/browse/HDFS-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210685#comment-16210685 ] Hadoop QA commented on HDFS-12684: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 11s{color} | {color:red} Docker failed to build yetus/hadoop:71bbb86. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-12684 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893001/HDFS-12684-HDFS-7240.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/21745/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Ozone: SCM metrics NodeCount is overlapping with node manager metrics > - > > Key: HDFS-12684 > URL: https://issues.apache.org/jira/browse/HDFS-12684 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Minor > Attachments: HDFS-12684-HDFS-7240.001.patch > > > I found this issue while reviewing HDFS-11468, from http://scm_host:9876/jmx, > both SCM and SCMNodeManager has {{NodeCount}} metrics > {noformat} > { > "name" : > "Hadoop:service=StorageContainerManager,name=StorageContainerManagerInfo,component=ServerRuntime", > "modelerType" : "org.apache.hadoop.ozone.scm.StorageContainerManager", > "ClientRpcPort" : "9860", > "DatanodeRpcPort" : "9861", > "NodeCount" : [ { > "key" : "STALE", > "value" : 0 > }, { > "key" : "DECOMMISSIONING", > "value" : 0 > }, { > "key" : "DECOMMISSIONED", > "value" : 0 > }, { > "key" : "FREE_NODE", > "value" : 0 > }, { > "key" : "RAFT_MEMBER", > "value" : 0 > }, { > "key" : "HEALTHY", > "value" : 0 > }, { > "key" : "DEAD", > "value" : 0 > }, { > "key" : "UNKNOWN", > "value" : 0 > } ], > "CompileInfo" : "2017-10-17T06:47Z xxx", > "Version" : "3.1.0-SNAPSHOT, r6019a25908ce75155656f13effd8e2e53ed43461", > "SoftwareVersion" : "3.1.0-SNAPSHOT", > "StartedTimeInMillis" : 1508393551065 > }, { > "name" : "Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo", > "modelerType" : "org.apache.hadoop.ozone.scm.node.SCMNodeManager", > "NodeCount" : [ { > "key" : "STALE", > "value" : 0 > }, { > "key" : "DECOMMISSIONING", > "value" : 0 > }, { > "key" : "DECOMMISSIONED", > "value" : 0 > }, { > "key" : "FREE_NODE", > "value" : 0 > }, { > "key" : "RAFT_MEMBER", > "value" : 0 > }, { > "key" : "HEALTHY", > "value" : 0 > }, { > "key" : "DEAD", > "value" : 0 > }, { > "key" : "UNKNOWN", > "value" : 0 > } ], > "OutOfChillMode" : false, > "MinimumChillModeNodes" : 1, > "ChillModeStatus" : "Still in chill mode, waiting on nodes to report in. > 0 nodes reported, minimal 1 nodes required." > } > {noformat} > hence, propose to remove {{NodeCount}} from {{SCMMXBean}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10984) Expose nntop output as metrics
[ https://issues.apache.org/jira/browse/HDFS-10984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-10984: - Fix Version/s: 2.7.5 Thanks [~swagle] and [~xyao]. I backported this to branch-2.7 as well. > Expose nntop output as metrics > - > > Key: HDFS-10984 > URL: https://issues.apache.org/jira/browse/HDFS-10984 > Project: Hadoop HDFS > Issue Type: Task > Components: namenode >Affects Versions: 2.7.0 >Reporter: Siddharth Wagle >Assignee: Siddharth Wagle > Fix For: 2.8.0, 3.0.0-alpha2, 2.7.5 > > Attachments: HDFS-10984.patch, HDFS-10984.v1.patch, > HDFS-10984.v2.patch, HDFS-10984.v3.patch, HDFS-10984.v4.patch > > > The nntop output is already exposed via JMX with HDFS-6982. > However external metrics systems do not get this data. It would be valuable > to track this as a timeseries as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12684) Ozone: SCM metrics NodeCount is overlapping with node manager metrics
[ https://issues.apache.org/jira/browse/HDFS-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12684: --- Status: Patch Available (was: Open) > Ozone: SCM metrics NodeCount is overlapping with node manager metrics > - > > Key: HDFS-12684 > URL: https://issues.apache.org/jira/browse/HDFS-12684 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Minor > Attachments: HDFS-12684-HDFS-7240.001.patch > > > I found this issue while reviewing HDFS-11468, from http://scm_host:9876/jmx, > both SCM and SCMNodeManager has {{NodeCount}} metrics > {noformat} > { > "name" : > "Hadoop:service=StorageContainerManager,name=StorageContainerManagerInfo,component=ServerRuntime", > "modelerType" : "org.apache.hadoop.ozone.scm.StorageContainerManager", > "ClientRpcPort" : "9860", > "DatanodeRpcPort" : "9861", > "NodeCount" : [ { > "key" : "STALE", > "value" : 0 > }, { > "key" : "DECOMMISSIONING", > "value" : 0 > }, { > "key" : "DECOMMISSIONED", > "value" : 0 > }, { > "key" : "FREE_NODE", > "value" : 0 > }, { > "key" : "RAFT_MEMBER", > "value" : 0 > }, { > "key" : "HEALTHY", > "value" : 0 > }, { > "key" : "DEAD", > "value" : 0 > }, { > "key" : "UNKNOWN", > "value" : 0 > } ], > "CompileInfo" : "2017-10-17T06:47Z xxx", > "Version" : "3.1.0-SNAPSHOT, r6019a25908ce75155656f13effd8e2e53ed43461", > "SoftwareVersion" : "3.1.0-SNAPSHOT", > "StartedTimeInMillis" : 1508393551065 > }, { > "name" : "Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo", > "modelerType" : "org.apache.hadoop.ozone.scm.node.SCMNodeManager", > "NodeCount" : [ { > "key" : "STALE", > "value" : 0 > }, { > "key" : "DECOMMISSIONING", > "value" : 0 > }, { > "key" : "DECOMMISSIONED", > "value" : 0 > }, { > "key" : "FREE_NODE", > "value" : 0 > }, { > "key" : "RAFT_MEMBER", > "value" : 0 > }, { > "key" : "HEALTHY", > "value" : 0 > }, { > "key" : "DEAD", > "value" : 0 > }, { > "key" : "UNKNOWN", > "value" : 0 > } ], > "OutOfChillMode" : false, > "MinimumChillModeNodes" : 1, > "ChillModeStatus" : "Still in chill mode, waiting on nodes to report in. > 0 nodes reported, minimal 1 nodes required." > } > {noformat} > hence, propose to remove {{NodeCount}} from {{SCMMXBean}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12684) Ozone: SCM metrics NodeCount is overlapping with node manager metrics
[ https://issues.apache.org/jira/browse/HDFS-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12684: --- Attachment: HDFS-12684-HDFS-7240.001.patch > Ozone: SCM metrics NodeCount is overlapping with node manager metrics > - > > Key: HDFS-12684 > URL: https://issues.apache.org/jira/browse/HDFS-12684 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Minor > Attachments: HDFS-12684-HDFS-7240.001.patch > > > I found this issue while reviewing HDFS-11468, from http://scm_host:9876/jmx, > both SCM and SCMNodeManager has {{NodeCount}} metrics > {noformat} > { > "name" : > "Hadoop:service=StorageContainerManager,name=StorageContainerManagerInfo,component=ServerRuntime", > "modelerType" : "org.apache.hadoop.ozone.scm.StorageContainerManager", > "ClientRpcPort" : "9860", > "DatanodeRpcPort" : "9861", > "NodeCount" : [ { > "key" : "STALE", > "value" : 0 > }, { > "key" : "DECOMMISSIONING", > "value" : 0 > }, { > "key" : "DECOMMISSIONED", > "value" : 0 > }, { > "key" : "FREE_NODE", > "value" : 0 > }, { > "key" : "RAFT_MEMBER", > "value" : 0 > }, { > "key" : "HEALTHY", > "value" : 0 > }, { > "key" : "DEAD", > "value" : 0 > }, { > "key" : "UNKNOWN", > "value" : 0 > } ], > "CompileInfo" : "2017-10-17T06:47Z xxx", > "Version" : "3.1.0-SNAPSHOT, r6019a25908ce75155656f13effd8e2e53ed43461", > "SoftwareVersion" : "3.1.0-SNAPSHOT", > "StartedTimeInMillis" : 1508393551065 > }, { > "name" : "Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo", > "modelerType" : "org.apache.hadoop.ozone.scm.node.SCMNodeManager", > "NodeCount" : [ { > "key" : "STALE", > "value" : 0 > }, { > "key" : "DECOMMISSIONING", > "value" : 0 > }, { > "key" : "DECOMMISSIONED", > "value" : 0 > }, { > "key" : "FREE_NODE", > "value" : 0 > }, { > "key" : "RAFT_MEMBER", > "value" : 0 > }, { > "key" : "HEALTHY", > "value" : 0 > }, { > "key" : "DEAD", > "value" : 0 > }, { > "key" : "UNKNOWN", > "value" : 0 > } ], > "OutOfChillMode" : false, > "MinimumChillModeNodes" : 1, > "ChillModeStatus" : "Still in chill mode, waiting on nodes to report in. > 0 nodes reported, minimal 1 nodes required." > } > {noformat} > hence, propose to remove {{NodeCount}} from {{SCMMXBean}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12502) nntop should support a category based on FilesInGetListingOps
[ https://issues.apache.org/jira/browse/HDFS-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210677#comment-16210677 ] Hudson commented on HDFS-12502: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13105 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13105/]) HDFS-12502. nntop should support a category based on (zhz: rev 60bfee270ed3a653c44c0bc92396167b5022df6e) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/top/metrics/TopMetrics.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestTopMetrics.java > nntop should support a category based on FilesInGetListingOps > - > > Key: HDFS-12502 > URL: https://issues.apache.org/jira/browse/HDFS-12502 > Project: Hadoop HDFS > Issue Type: Improvement > Components: metrics >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-12502.00.patch, HDFS-12502.01.patch, > HDFS-12502.02.patch, HDFS-12502.03.patch, HDFS-12502.04.patch > > > Large listing ops can oftentimes be the main contributor to NameNode > slowness. The aggregate cost of listing ops is proportional to the > {{FilesInGetListingOps}} rather than the number of listing ops. Therefore > it'd be very useful for nntop to support this category. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12521) Ozone: SCM should read all Container info into memory when booting up
[ https://issues.apache.org/jira/browse/HDFS-12521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210674#comment-16210674 ] Yiqun Lin commented on HDFS-12521: -- Thanks [~ljain] for the updating patch. The change in v03 patch looks good to me. Please address the comments from [~anu]. Thanks > Ozone: SCM should read all Container info into memory when booting up > - > > Key: HDFS-12521 > URL: https://issues.apache.org/jira/browse/HDFS-12521 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Anu Engineer >Assignee: Lokesh Jain > Labels: ozoneMerge > Attachments: HDFS-12521-HDFS-7240.001.patch, > HDFS-12521-HDFS-7240.002.patch, HDFS-12521-HDFS-7240.003.patch > > > When SCM boots up it should read all containers into memory. This is a > performance optimization that allows delays on SCM side. This JIRA tracks > that issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-12684) Ozone: SCM metrics NodeCount is overlapping with node manager metrics
[ https://issues.apache.org/jira/browse/HDFS-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang reassigned HDFS-12684: -- Assignee: Weiwei Yang > Ozone: SCM metrics NodeCount is overlapping with node manager metrics > - > > Key: HDFS-12684 > URL: https://issues.apache.org/jira/browse/HDFS-12684 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Minor > > I found this issue while reviewing HDFS-11468, from http://scm_host:9876/jmx, > both SCM and SCMNodeManager has {{NodeCount}} metrics > {noformat} > { > "name" : > "Hadoop:service=StorageContainerManager,name=StorageContainerManagerInfo,component=ServerRuntime", > "modelerType" : "org.apache.hadoop.ozone.scm.StorageContainerManager", > "ClientRpcPort" : "9860", > "DatanodeRpcPort" : "9861", > "NodeCount" : [ { > "key" : "STALE", > "value" : 0 > }, { > "key" : "DECOMMISSIONING", > "value" : 0 > }, { > "key" : "DECOMMISSIONED", > "value" : 0 > }, { > "key" : "FREE_NODE", > "value" : 0 > }, { > "key" : "RAFT_MEMBER", > "value" : 0 > }, { > "key" : "HEALTHY", > "value" : 0 > }, { > "key" : "DEAD", > "value" : 0 > }, { > "key" : "UNKNOWN", > "value" : 0 > } ], > "CompileInfo" : "2017-10-17T06:47Z xxx", > "Version" : "3.1.0-SNAPSHOT, r6019a25908ce75155656f13effd8e2e53ed43461", > "SoftwareVersion" : "3.1.0-SNAPSHOT", > "StartedTimeInMillis" : 1508393551065 > }, { > "name" : "Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo", > "modelerType" : "org.apache.hadoop.ozone.scm.node.SCMNodeManager", > "NodeCount" : [ { > "key" : "STALE", > "value" : 0 > }, { > "key" : "DECOMMISSIONING", > "value" : 0 > }, { > "key" : "DECOMMISSIONED", > "value" : 0 > }, { > "key" : "FREE_NODE", > "value" : 0 > }, { > "key" : "RAFT_MEMBER", > "value" : 0 > }, { > "key" : "HEALTHY", > "value" : 0 > }, { > "key" : "DEAD", > "value" : 0 > }, { > "key" : "UNKNOWN", > "value" : 0 > } ], > "OutOfChillMode" : false, > "MinimumChillModeNodes" : 1, > "ChillModeStatus" : "Still in chill mode, waiting on nodes to report in. > 0 nodes reported, minimal 1 nodes required." > } > {noformat} > hence, propose to remove {{NodeCount}} from {{SCMMXBean}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12684) Ozone: SCM metrics NodeCount is overlapping with node manager metrics
Weiwei Yang created HDFS-12684: -- Summary: Ozone: SCM metrics NodeCount is overlapping with node manager metrics Key: HDFS-12684 URL: https://issues.apache.org/jira/browse/HDFS-12684 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone, scm Reporter: Weiwei Yang Priority: Minor I found this issue while reviewing HDFS-11468, from http://scm_host:9876/jmx, both SCM and SCMNodeManager has {{NodeCount}} metrics {noformat} { "name" : "Hadoop:service=StorageContainerManager,name=StorageContainerManagerInfo,component=ServerRuntime", "modelerType" : "org.apache.hadoop.ozone.scm.StorageContainerManager", "ClientRpcPort" : "9860", "DatanodeRpcPort" : "9861", "NodeCount" : [ { "key" : "STALE", "value" : 0 }, { "key" : "DECOMMISSIONING", "value" : 0 }, { "key" : "DECOMMISSIONED", "value" : 0 }, { "key" : "FREE_NODE", "value" : 0 }, { "key" : "RAFT_MEMBER", "value" : 0 }, { "key" : "HEALTHY", "value" : 0 }, { "key" : "DEAD", "value" : 0 }, { "key" : "UNKNOWN", "value" : 0 } ], "CompileInfo" : "2017-10-17T06:47Z xxx", "Version" : "3.1.0-SNAPSHOT, r6019a25908ce75155656f13effd8e2e53ed43461", "SoftwareVersion" : "3.1.0-SNAPSHOT", "StartedTimeInMillis" : 1508393551065 }, { "name" : "Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo", "modelerType" : "org.apache.hadoop.ozone.scm.node.SCMNodeManager", "NodeCount" : [ { "key" : "STALE", "value" : 0 }, { "key" : "DECOMMISSIONING", "value" : 0 }, { "key" : "DECOMMISSIONED", "value" : 0 }, { "key" : "FREE_NODE", "value" : 0 }, { "key" : "RAFT_MEMBER", "value" : 0 }, { "key" : "HEALTHY", "value" : 0 }, { "key" : "DEAD", "value" : 0 }, { "key" : "UNKNOWN", "value" : 0 } ], "OutOfChillMode" : false, "MinimumChillModeNodes" : 1, "ChillModeStatus" : "Still in chill mode, waiting on nodes to report in. 0 nodes reported, minimal 1 nodes required." } {noformat} hence, propose to remove {{NodeCount}} from {{SCMMXBean}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12680) Ozone: SCM: Lease support for container creation
[ https://issues.apache.org/jira/browse/HDFS-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210662#comment-16210662 ] Yiqun Lin commented on HDFS-12680: -- Hi [~nandakumar131], besides [~anu]'s review comments, some other comments for unit test: # It would be better to define a var {{TIMEOUT=1}} and reuse this var in the test method. # We should increase sleep time ({{Thread.sleep(1);}}) in the test, for example use {{1 + 1000}}. There will be some delay that lease manager starts monitor runnable and sleep for preparing to release lease. This was actually the problem I found in HDFS-12675. # Can you add an additional check {{thrown.expect(LeaseNotFoundException.class);}} ? # The following lines don't executes in test. {code} BlockContainerInfo deletingContainer = mapping.getStateManager() .getMatchingContainer( 0, containerInfo.getOwner(), containerInfo.getPipeline().getType(), containerInfo.getPipeline().getFactor(), OzoneProtos.LifeCycleState.DELETING); Assert.assertEquals(containerInfo.getContainerName(), deletingContainer.getContainerName()); {code} > Ozone: SCM: Lease support for container creation > > > Key: HDFS-12680 > URL: https://issues.apache.org/jira/browse/HDFS-12680 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nanda kumar >Assignee: Nanda kumar > Labels: ozoneMerge > Attachments: HDFS-12680-HDFS-7240.000.patch > > > This brings in lease support for container creation. > Lease should be give for a container that is moved to {{CREATING}} state when > {{BEGIN_CREATE}} event happens, {{LeaseException}} should be thrown if the > container already holds a lease. Lease must be released during > {{COMPLETE_CREATE}} event. If the lease times out container should be moved > to {{DELETING}} state, and exception should be thrown if {{COMPLETE_CREATE}} > event is received on that container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org