[jira] [Resolved] (HDFS-7877) Support maintenance state for datanodes
[ https://issues.apache.org/jira/browse/HDFS-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma resolved HDFS-7877. --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.1.0 3.0.0-beta1 2.9.0 All sub tasks have been resolved. Thanks [~ctrezzo] [~eddyxu] [~manojg] [~elek] [~linyiqun] and others for the contribution and discussion. > Support maintenance state for datanodes > --- > > Key: HDFS-7877 > URL: https://issues.apache.org/jira/browse/HDFS-7877 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 2.9.0, 3.0.0-beta1, 3.1.0 > > Attachments: HDFS-7877-2.patch, HDFS-7877.patch, > Supportmaintenancestatefordatanodes-2.pdf, > Supportmaintenancestatefordatanodes.pdf > > > This requirement came up during the design for HDFS-7541. Given this feature > is mostly independent of upgrade domain feature, it is better to track it > under a separate jira. The design and draft patch will be available soon. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11035) Better documentation for maintenace mode and upgrade domain
[ https://issues.apache.org/jira/browse/HDFS-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-11035: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.1.0 3.0.0-beta1 2.9.0 Status: Resolved (was: Patch Available) Thanks [~ctrezzo]. I have committed the patch to trunk, branch-3.0 and branch-2. > Better documentation for maintenace mode and upgrade domain > --- > > Key: HDFS-11035 > URL: https://issues.apache.org/jira/browse/HDFS-11035 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, documentation >Affects Versions: 2.9.0 >Reporter: Wei-Chiu Chuang >Assignee: Ming Ma > Fix For: 2.9.0, 3.0.0-beta1, 3.1.0 > > Attachments: HDFS-11035-2.patch, HDFS-11035.patch > > > HDFS-7541 added upgrade domain and HDFS-7877 added maintenance mode. Existing > documentation about these two features are scarce and the implementation have > evolved from the original design doc. Looking at code and Javadoc and I still > don't quite get how I can get datanodes into maintenance mode/ set up a > upgrade domain. > File this jira to propose that we write an up-to-date description of these > two features. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12473) Change hosts JSON file format
[ https://issues.apache.org/jira/browse/HDFS-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-12473: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.1.0 2.8.3 2.8.2 3.0.0-beta1 2.9.0 Status: Resolved (was: Patch Available) Thanks [~manojg] and [~zhz]. I have committed patch to trunk, branch-3.0, branch-2, branch-2.8 and branch 2.8.2. Besides the branch-2 diff mentioned above, the patch for branch-2.8/branch.2.8.2 is slightly different in the unit tests as maintenance state only exists in branch-2 and above. > Change hosts JSON file format > - > > Key: HDFS-12473 > URL: https://issues.apache.org/jira/browse/HDFS-12473 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 2.9.0, 3.0.0-beta1, 2.8.2, 2.8.3, 3.1.0 > > Attachments: HDFS-12473-2.patch, HDFS-12473-3.patch, > HDFS-12473-4.patch, HDFS-12473-5.patch, HDFS-12473-6.patch, > HDFS-12473-branch-2.patch, HDFS-12473.patch > > > The existing host JSON file format doesn't have a top-level token. > {noformat} > {"hostName": "host1"} > {"hostName": "host2", "upgradeDomain": "ud0"} > {"hostName": "host3", "adminState": "DECOMMISSIONED"} > {"hostName": "host4", "upgradeDomain": "ud2", "adminState": > "DECOMMISSIONED"} > {"hostName": "host5", "port": 8090} > {"hostName": "host6", "adminState": "IN_MAINTENANCE"} > {"hostName": "host7", "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": "112233"} > {noformat} > Instead, to conform with the JSON standard it should be like > {noformat} > [ > {"hostName": "host1"}, > {"hostName": "host2", "upgradeDomain": "ud0"}, > {"hostName": "host3", "adminState": "DECOMMISSIONED"}, > {"hostName": "host4", "upgradeDomain": "ud2", "adminState": > "DECOMMISSIONED"}, > {"hostName": "host5", "port": 8090}, > {"hostName": "host6", "adminState": "IN_MAINTENANCE"}, > {"hostName": "host7", "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": "112233"} > ] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12473) Change hosts JSON file format
[ https://issues.apache.org/jira/browse/HDFS-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-12473: --- Attachment: HDFS-12473-branch-2.patch Compared to the patch for trunk, branch-2's version is slightly different because it depends on different version of jackson. Specifically trunk can use {{com.fasterxml.jackson.databind.ObjectMapper}} while branch-2 should use {{org.codehaus.jackson.map.ObjectMapper}}. {{CombinedHostsFileReader.java}} also needs to be modified slightly due to different exception in case of empty file. > Change hosts JSON file format > - > > Key: HDFS-12473 > URL: https://issues.apache.org/jira/browse/HDFS-12473 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-12473-2.patch, HDFS-12473-3.patch, > HDFS-12473-4.patch, HDFS-12473-5.patch, HDFS-12473-6.patch, > HDFS-12473-branch-2.patch, HDFS-12473.patch > > > The existing host JSON file format doesn't have a top-level token. > {noformat} > {"hostName": "host1"} > {"hostName": "host2", "upgradeDomain": "ud0"} > {"hostName": "host3", "adminState": "DECOMMISSIONED"} > {"hostName": "host4", "upgradeDomain": "ud2", "adminState": > "DECOMMISSIONED"} > {"hostName": "host5", "port": 8090} > {"hostName": "host6", "adminState": "IN_MAINTENANCE"} > {"hostName": "host7", "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": "112233"} > {noformat} > Instead, to conform with the JSON standard it should be like > {noformat} > [ > {"hostName": "host1"}, > {"hostName": "host2", "upgradeDomain": "ud0"}, > {"hostName": "host3", "adminState": "DECOMMISSIONED"}, > {"hostName": "host4", "upgradeDomain": "ud2", "adminState": > "DECOMMISSIONED"}, > {"hostName": "host5", "port": 8090}, > {"hostName": "host6", "adminState": "IN_MAINTENANCE"}, > {"hostName": "host7", "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": "112233"} > ] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12473) Change hosts JSON file format
[ https://issues.apache.org/jira/browse/HDFS-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-12473: --- Attachment: HDFS-12473-6.patch Thanks [~manojg]. Here is the new patch to address your comments, except for the exception handling where I prefer the code swallow as fewer exception as possible. > Change hosts JSON file format > - > > Key: HDFS-12473 > URL: https://issues.apache.org/jira/browse/HDFS-12473 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-12473-2.patch, HDFS-12473-3.patch, > HDFS-12473-4.patch, HDFS-12473-5.patch, HDFS-12473-6.patch, HDFS-12473.patch > > > The existing host JSON file format doesn't have a top-level token. > {noformat} > {"hostName": "host1"} > {"hostName": "host2", "upgradeDomain": "ud0"} > {"hostName": "host3", "adminState": "DECOMMISSIONED"} > {"hostName": "host4", "upgradeDomain": "ud2", "adminState": > "DECOMMISSIONED"} > {"hostName": "host5", "port": 8090} > {"hostName": "host6", "adminState": "IN_MAINTENANCE"} > {"hostName": "host7", "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": "112233"} > {noformat} > Instead, to conform with the JSON standard it should be like > {noformat} > [ > {"hostName": "host1"}, > {"hostName": "host2", "upgradeDomain": "ud0"}, > {"hostName": "host3", "adminState": "DECOMMISSIONED"}, > {"hostName": "host4", "upgradeDomain": "ud2", "adminState": > "DECOMMISSIONED"}, > {"hostName": "host5", "port": 8090}, > {"hostName": "host6", "adminState": "IN_MAINTENANCE"}, > {"hostName": "host7", "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": "112233"} > ] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11035) Better documentation for maintenace mode and upgrade domain
[ https://issues.apache.org/jira/browse/HDFS-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170944#comment-16170944 ] Ming Ma commented on HDFS-11035: Thanks [~ctrezzo]. I will commit it to trunk, branch-3.0 and branch-2 by EOD tomorrow in case [~jojochuang] [~manojg] [~eddyxu] have any additional comments. > Better documentation for maintenace mode and upgrade domain > --- > > Key: HDFS-11035 > URL: https://issues.apache.org/jira/browse/HDFS-11035 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, documentation >Affects Versions: 2.9.0 >Reporter: Wei-Chiu Chuang >Assignee: Ming Ma > Attachments: HDFS-11035-2.patch, HDFS-11035.patch > > > HDFS-7541 added upgrade domain and HDFS-7877 added maintenance mode. Existing > documentation about these two features are scarce and the implementation have > evolved from the original design doc. Looking at code and Javadoc and I still > don't quite get how I can get datanodes into maintenance mode/ set up a > upgrade domain. > File this jira to propose that we write an up-to-date description of these > two features. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11035) Better documentation for maintenace mode and upgrade domain
[ https://issues.apache.org/jira/browse/HDFS-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-11035: --- Attachment: HDFS-11035-2.patch The new patch has added the new docs to site.xml and fixed couple nits. Thanks [~ctrezzo] for the review. > Better documentation for maintenace mode and upgrade domain > --- > > Key: HDFS-11035 > URL: https://issues.apache.org/jira/browse/HDFS-11035 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, documentation >Affects Versions: 2.9.0 >Reporter: Wei-Chiu Chuang >Assignee: Ming Ma > Attachments: HDFS-11035-2.patch, HDFS-11035.patch > > > HDFS-7541 added upgrade domain and HDFS-7877 added maintenance mode. Existing > documentation about these two features are scarce and the implementation have > evolved from the original design doc. Looking at code and Javadoc and I still > don't quite get how I can get datanodes into maintenance mode/ set up a > upgrade domain. > File this jira to propose that we write an up-to-date description of these > two features. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12473) Change hosts JSON file format
[ https://issues.apache.org/jira/browse/HDFS-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170746#comment-16170746 ] Ming Ma commented on HDFS-12473: ah got it. So the assumption of "backward compatibility isn't an issue as long as the feature hasn't been officially released" isn't true all the time. While it is generally better to keep the code clean without unnecessary handling, for this specific issue it seems ok to include backward compatibility for unreleased feature given it doesn't complicate the code much. Can you check if 4.patch is ready for commit? > Change hosts JSON file format > - > > Key: HDFS-12473 > URL: https://issues.apache.org/jira/browse/HDFS-12473 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-12473-2.patch, HDFS-12473-3.patch, > HDFS-12473-4.patch, HDFS-12473-5.patch, HDFS-12473.patch > > > The existing host JSON file format doesn't have a top-level token. > {noformat} > {"hostName": "host1"} > {"hostName": "host2", "upgradeDomain": "ud0"} > {"hostName": "host3", "adminState": "DECOMMISSIONED"} > {"hostName": "host4", "upgradeDomain": "ud2", "adminState": > "DECOMMISSIONED"} > {"hostName": "host5", "port": 8090} > {"hostName": "host6", "adminState": "IN_MAINTENANCE"} > {"hostName": "host7", "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": "112233"} > {noformat} > Instead, to conform with the JSON standard it should be like > {noformat} > [ > {"hostName": "host1"}, > {"hostName": "host2", "upgradeDomain": "ud0"}, > {"hostName": "host3", "adminState": "DECOMMISSIONED"}, > {"hostName": "host4", "upgradeDomain": "ud2", "adminState": > "DECOMMISSIONED"}, > {"hostName": "host5", "port": 8090}, > {"hostName": "host6", "adminState": "IN_MAINTENANCE"}, > {"hostName": "host7", "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": "112233"} > ] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12473) Change hosts JSON file format
[ https://issues.apache.org/jira/browse/HDFS-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-12473: --- Attachment: HDFS-12473-5.patch >From discussion with [~zhz], 2.8.2 hasn't been released yet. Thus we don't >need to deal with the backward compatibility issue of old JSON format being >used in HDFS-7541, assuming we can get the patch in 2.8.2 and branch-3.0. >[~manojg] here is the latest patch. > Change hosts JSON file format > - > > Key: HDFS-12473 > URL: https://issues.apache.org/jira/browse/HDFS-12473 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-12473-2.patch, HDFS-12473-3.patch, > HDFS-12473-4.patch, HDFS-12473-5.patch, HDFS-12473.patch > > > The existing host JSON file format doesn't have a top-level token. > {noformat} > {"hostName": "host1"} > {"hostName": "host2", "upgradeDomain": "ud0"} > {"hostName": "host3", "adminState": "DECOMMISSIONED"} > {"hostName": "host4", "upgradeDomain": "ud2", "adminState": > "DECOMMISSIONED"} > {"hostName": "host5", "port": 8090} > {"hostName": "host6", "adminState": "IN_MAINTENANCE"} > {"hostName": "host7", "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": "112233"} > {noformat} > Instead, to conform with the JSON standard it should be like > {noformat} > [ > {"hostName": "host1"}, > {"hostName": "host2", "upgradeDomain": "ud0"}, > {"hostName": "host3", "adminState": "DECOMMISSIONED"}, > {"hostName": "host4", "upgradeDomain": "ud2", "adminState": > "DECOMMISSIONED"}, > {"hostName": "host5", "port": 8090}, > {"hostName": "host6", "adminState": "IN_MAINTENANCE"}, > {"hostName": "host7", "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": "112233"} > ] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-11035) Better documentation for maintenace mode and upgrade domain
[ https://issues.apache.org/jira/browse/HDFS-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169365#comment-16169365 ] Ming Ma edited comment on HDFS-11035 at 9/17/17 5:47 PM: - Here is the draft patch including one doc for upgrade domain and another one for datanode administration in general (decommission and maintenance). cc [~jojochuang] [~manojg] [~eddyxu] [~ctrezzo] was (Author: mingma): Here is the draft patch including one doc for upgrade domain and another one for datanode administration in general (decommission and maintenance). cc [~jojochuang] [~manojg] [~eddyxu] > Better documentation for maintenace mode and upgrade domain > --- > > Key: HDFS-11035 > URL: https://issues.apache.org/jira/browse/HDFS-11035 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, documentation >Affects Versions: 2.9.0 >Reporter: Wei-Chiu Chuang >Assignee: Ming Ma > Attachments: HDFS-11035.patch > > > HDFS-7541 added upgrade domain and HDFS-7877 added maintenance mode. Existing > documentation about these two features are scarce and the implementation have > evolved from the original design doc. Looking at code and Javadoc and I still > don't quite get how I can get datanodes into maintenance mode/ set up a > upgrade domain. > File this jira to propose that we write an up-to-date description of these > two features. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11035) Better documentation for maintenace mode and upgrade domain
[ https://issues.apache.org/jira/browse/HDFS-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-11035: --- Attachment: HDFS-11035.patch Here is the draft patch including one doc for upgrade domain and another one for datanode administration in general (decommission and maintenance). cc [~jojochuang] [~manojg] [~eddyxu] > Better documentation for maintenace mode and upgrade domain > --- > > Key: HDFS-11035 > URL: https://issues.apache.org/jira/browse/HDFS-11035 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, documentation >Affects Versions: 2.9.0 >Reporter: Wei-Chiu Chuang > Attachments: HDFS-11035.patch > > > HDFS-7541 added upgrade domain and HDFS-7877 added maintenance mode. Existing > documentation about these two features are scarce and the implementation have > evolved from the original design doc. Looking at code and Javadoc and I still > don't quite get how I can get datanodes into maintenance mode/ set up a > upgrade domain. > File this jira to propose that we write an up-to-date description of these > two features. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-11035) Better documentation for maintenace mode and upgrade domain
[ https://issues.apache.org/jira/browse/HDFS-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma reassigned HDFS-11035: -- Assignee: Ming Ma > Better documentation for maintenace mode and upgrade domain > --- > > Key: HDFS-11035 > URL: https://issues.apache.org/jira/browse/HDFS-11035 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, documentation >Affects Versions: 2.9.0 >Reporter: Wei-Chiu Chuang >Assignee: Ming Ma > Attachments: HDFS-11035.patch > > > HDFS-7541 added upgrade domain and HDFS-7877 added maintenance mode. Existing > documentation about these two features are scarce and the implementation have > evolved from the original design doc. Looking at code and Javadoc and I still > don't quite get how I can get datanodes into maintenance mode/ set up a > upgrade domain. > File this jira to propose that we write an up-to-date description of these > two features. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11035) Better documentation for maintenace mode and upgrade domain
[ https://issues.apache.org/jira/browse/HDFS-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-11035: --- Status: Patch Available (was: Open) > Better documentation for maintenace mode and upgrade domain > --- > > Key: HDFS-11035 > URL: https://issues.apache.org/jira/browse/HDFS-11035 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, documentation >Affects Versions: 2.9.0 >Reporter: Wei-Chiu Chuang >Assignee: Ming Ma > Attachments: HDFS-11035.patch > > > HDFS-7541 added upgrade domain and HDFS-7877 added maintenance mode. Existing > documentation about these two features are scarce and the implementation have > evolved from the original design doc. Looking at code and Javadoc and I still > don't quite get how I can get datanodes into maintenance mode/ set up a > upgrade domain. > File this jira to propose that we write an up-to-date description of these > two features. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12473) Change hosts JSON file format
[ https://issues.apache.org/jira/browse/HDFS-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-12473: --- Attachment: HDFS-12473-4.patch > Change hosts JSON file format > - > > Key: HDFS-12473 > URL: https://issues.apache.org/jira/browse/HDFS-12473 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-12473-2.patch, HDFS-12473-3.patch, > HDFS-12473-4.patch, HDFS-12473.patch > > > The existing host JSON file format doesn't have a top-level token. > {noformat} > {"hostName": "host1"} > {"hostName": "host2", "upgradeDomain": "ud0"} > {"hostName": "host3", "adminState": "DECOMMISSIONED"} > {"hostName": "host4", "upgradeDomain": "ud2", "adminState": > "DECOMMISSIONED"} > {"hostName": "host5", "port": 8090} > {"hostName": "host6", "adminState": "IN_MAINTENANCE"} > {"hostName": "host7", "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": "112233"} > {noformat} > Instead, to conform with the JSON standard it should be like > {noformat} > [ > {"hostName": "host1"}, > {"hostName": "host2", "upgradeDomain": "ud0"}, > {"hostName": "host3", "adminState": "DECOMMISSIONED"}, > {"hostName": "host4", "upgradeDomain": "ud2", "adminState": > "DECOMMISSIONED"}, > {"hostName": "host5", "port": 8090}, > {"hostName": "host6", "adminState": "IN_MAINTENANCE"}, > {"hostName": "host7", "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": "112233"} > ] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12473) Change hosts JSON file format
[ https://issues.apache.org/jira/browse/HDFS-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-12473: --- Attachment: HDFS-12473-2.patch Thanks Manoj. Here is the updated patch to address your comments. bq. What happens when the hosts file has improper json format? I was hoping we can get it before 3.0 beta release thus without worrying about compatibility issue. But it looks like upgrade domain feature has been backported to 2.8.2. Unfortunately that means we have to support the old format. bq. #readFile can now return null object The updated patch will return empty array instead. bq. If MAPPER is no more used, can be removed. It was removed. Maybe you referred to the existing file. bq. CombinedHostsFileReader.readFile() can return null if the input hosts file has no entries. test case testEmptyCombinedHostsFileReader > Change hosts JSON file format > - > > Key: HDFS-12473 > URL: https://issues.apache.org/jira/browse/HDFS-12473 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-12473-2.patch, HDFS-12473.patch > > > The existing host JSON file format doesn't have a top-level token. > {noformat} > {"hostName": "host1"} > {"hostName": "host2", "upgradeDomain": "ud0"} > {"hostName": "host3", "adminState": "DECOMMISSIONED"} > {"hostName": "host4", "upgradeDomain": "ud2", "adminState": > "DECOMMISSIONED"} > {"hostName": "host5", "port": 8090} > {"hostName": "host6", "adminState": "IN_MAINTENANCE"} > {"hostName": "host7", "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": "112233"} > {noformat} > Instead, to conform with the JSON standard it should be like > {noformat} > [ > {"hostName": "host1"}, > {"hostName": "host2", "upgradeDomain": "ud0"}, > {"hostName": "host3", "adminState": "DECOMMISSIONED"}, > {"hostName": "host4", "upgradeDomain": "ud2", "adminState": > "DECOMMISSIONED"}, > {"hostName": "host5", "port": 8090}, > {"hostName": "host6", "adminState": "IN_MAINTENANCE"}, > {"hostName": "host7", "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": "112233"} > ] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12473) Change host JSON file format
[ https://issues.apache.org/jira/browse/HDFS-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168521#comment-16168521 ] Ming Ma edited comment on HDFS-12473 at 9/15/17 9:09 PM: - Here is the draft patch. cc [~eddyxu] and [~manojg]. was (Author: mingma): Here is the draft patch. > Change host JSON file format > > > Key: HDFS-12473 > URL: https://issues.apache.org/jira/browse/HDFS-12473 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-12473.patch > > > The existing host JSON file format doesn't have a top-level token. > {noformat} > {"hostName": "host1"} > {"hostName": "host2", "upgradeDomain": "ud0"} > {"hostName": "host3", "adminState": "DECOMMISSIONED"} > {"hostName": "host4", "upgradeDomain": "ud2", "adminState": > "DECOMMISSIONED"} > {"hostName": "host5", "port": 8090} > {"hostName": "host6", "adminState": "IN_MAINTENANCE"} > {"hostName": "host7", "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": "112233"} > {noformat} > Instead, to conform with the JSON standard it should be like > {noformat} > [ > {"hostName": "host1"}, > {"hostName": "host2", "upgradeDomain": "ud0"}, > {"hostName": "host3", "adminState": "DECOMMISSIONED"}, > {"hostName": "host4", "upgradeDomain": "ud2", "adminState": > "DECOMMISSIONED"}, > {"hostName": "host5", "port": 8090}, > {"hostName": "host6", "adminState": "IN_MAINTENANCE"}, > {"hostName": "host7", "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": "112233"} > ] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12473) Change host JSON file format
[ https://issues.apache.org/jira/browse/HDFS-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-12473: --- Issue Type: Sub-task (was: Bug) Parent: HDFS-7877 > Change host JSON file format > > > Key: HDFS-12473 > URL: https://issues.apache.org/jira/browse/HDFS-12473 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-12473.patch > > > The existing host JSON file format doesn't have a top-level token. > {noformat} > {"hostName": "host1"} > {"hostName": "host2", "upgradeDomain": "ud0"} > {"hostName": "host3", "adminState": "DECOMMISSIONED"} > {"hostName": "host4", "upgradeDomain": "ud2", "adminState": > "DECOMMISSIONED"} > {"hostName": "host5", "port": 8090} > {"hostName": "host6", "adminState": "IN_MAINTENANCE"} > {"hostName": "host7", "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": "112233"} > {noformat} > Instead, to conform with the JSON standard it should be like > {noformat} > [ > {"hostName": "host1"}, > {"hostName": "host2", "upgradeDomain": "ud0"}, > {"hostName": "host3", "adminState": "DECOMMISSIONED"}, > {"hostName": "host4", "upgradeDomain": "ud2", "adminState": > "DECOMMISSIONED"}, > {"hostName": "host5", "port": 8090}, > {"hostName": "host6", "adminState": "IN_MAINTENANCE"}, > {"hostName": "host7", "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": "112233"} > ] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12473) Change host JSON file format
[ https://issues.apache.org/jira/browse/HDFS-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-12473: --- Assignee: Ming Ma Status: Patch Available (was: Open) > Change host JSON file format > > > Key: HDFS-12473 > URL: https://issues.apache.org/jira/browse/HDFS-12473 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-12473.patch > > > The existing host JSON file format doesn't have a top-level token. > {noformat} > {"hostName": "host1"} > {"hostName": "host2", "upgradeDomain": "ud0"} > {"hostName": "host3", "adminState": "DECOMMISSIONED"} > {"hostName": "host4", "upgradeDomain": "ud2", "adminState": > "DECOMMISSIONED"} > {"hostName": "host5", "port": 8090} > {"hostName": "host6", "adminState": "IN_MAINTENANCE"} > {"hostName": "host7", "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": "112233"} > {noformat} > Instead, to conform with the JSON standard it should be like > {noformat} > [ > {"hostName": "host1"}, > {"hostName": "host2", "upgradeDomain": "ud0"}, > {"hostName": "host3", "adminState": "DECOMMISSIONED"}, > {"hostName": "host4", "upgradeDomain": "ud2", "adminState": > "DECOMMISSIONED"}, > {"hostName": "host5", "port": 8090}, > {"hostName": "host6", "adminState": "IN_MAINTENANCE"}, > {"hostName": "host7", "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": "112233"} > ] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12473) Change host JSON file format
[ https://issues.apache.org/jira/browse/HDFS-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-12473: --- Attachment: HDFS-12473.patch Here is the draft patch. > Change host JSON file format > > > Key: HDFS-12473 > URL: https://issues.apache.org/jira/browse/HDFS-12473 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ming Ma > Attachments: HDFS-12473.patch > > > The existing host JSON file format doesn't have a top-level token. > {noformat} > {"hostName": "host1"} > {"hostName": "host2", "upgradeDomain": "ud0"} > {"hostName": "host3", "adminState": "DECOMMISSIONED"} > {"hostName": "host4", "upgradeDomain": "ud2", "adminState": > "DECOMMISSIONED"} > {"hostName": "host5", "port": 8090} > {"hostName": "host6", "adminState": "IN_MAINTENANCE"} > {"hostName": "host7", "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": "112233"} > {noformat} > Instead, to conform with the JSON standard it should be like > {noformat} > [ > {"hostName": "host1"}, > {"hostName": "host2", "upgradeDomain": "ud0"}, > {"hostName": "host3", "adminState": "DECOMMISSIONED"}, > {"hostName": "host4", "upgradeDomain": "ud2", "adminState": > "DECOMMISSIONED"}, > {"hostName": "host5", "port": 8090}, > {"hostName": "host6", "adminState": "IN_MAINTENANCE"}, > {"hostName": "host7", "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": "112233"} > ] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12473) Change host JSON file format
Ming Ma created HDFS-12473: -- Summary: Change host JSON file format Key: HDFS-12473 URL: https://issues.apache.org/jira/browse/HDFS-12473 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma The existing host JSON file format doesn't have a top-level token. {noformat} {"hostName": "host1"} {"hostName": "host2", "upgradeDomain": "ud0"} {"hostName": "host3", "adminState": "DECOMMISSIONED"} {"hostName": "host4", "upgradeDomain": "ud2", "adminState": "DECOMMISSIONED"} {"hostName": "host5", "port": 8090} {"hostName": "host6", "adminState": "IN_MAINTENANCE"} {"hostName": "host7", "adminState": "IN_MAINTENANCE", "maintenanceExpireTimeInMS": "112233"} {noformat} Instead, to conform with the JSON standard it should be like {noformat} [ {"hostName": "host1"}, {"hostName": "host2", "upgradeDomain": "ud0"}, {"hostName": "host3", "adminState": "DECOMMISSIONED"}, {"hostName": "host4", "upgradeDomain": "ud2", "adminState": "DECOMMISSIONED"}, {"hostName": "host5", "port": 8090}, {"hostName": "host6", "adminState": "IN_MAINTENANCE"}, {"hostName": "host7", "adminState": "IN_MAINTENANCE", "maintenanceExpireTimeInMS": "112233"} ] {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby
[ https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164158#comment-16164158 ] Ming Ma commented on HDFS-10702: [~csun], the long checkpoint duration could cause the following issue: * Checkpointer holding {{cpLock}} takes a long time to finish for a large namespace. * edit log tailer blocked by {{cpLock}} can't update namespace. Thus the namespace becomes stale. * An application deletes a file. StandbyNN receives incremental block report indicating the blocks have been removed and update its blockmap. * StandbyNN's stale namespace still has the file but without block locations. > Add a Client API and Proxy Provider to enable stale read from Standby > - > > Key: HDFS-10702 > URL: https://issues.apache.org/jira/browse/HDFS-10702 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jiayi Zhou >Assignee: Sean Mackrory >Priority: Minor > Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch, > HDFS-10702.003.patch, HDFS-10702.004.patch, HDFS-10702.005.patch, > HDFS-10702.006.patch, HDFS-10702.007.patch, HDFS-10702.008.patch, > StaleReadfromStandbyNN.pdf > > > Currently, clients must always talk to the active NameNode when performing > any metadata operation, which means active NameNode could be a bottleneck for > scalability. One way to solve this problem is to send read-only operations to > Standby NameNode. The disadvantage is that it might be a stale read. > Here, I'm thinking of adding a Client API to enable/disable stale read from > Standby which gives Client the power to set the staleness restriction. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12285) Better handling of namenode ip address change
[ https://issues.apache.org/jira/browse/HDFS-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122746#comment-16122746 ] Ming Ma commented on HDFS-12285: Thanks [~shahrs87]. Yeah indeed related, although the exception and the scenario look different from the other jiras. Even if it is the same, let us keep this jira around for validation when we resolve the issue. > Better handling of namenode ip address change > - > > Key: HDFS-12285 > URL: https://issues.apache.org/jira/browse/HDFS-12285 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ming Ma > > RPC client layer provides functionality to detect ip address change: > {noformat} > Client.java > private synchronized boolean updateAddress() throws IOException { > // Do a fresh lookup with the old host name. > InetSocketAddress currentAddr = NetUtils.createSocketAddrForHost( >server.getHostName(), server.getPort()); > .. > } > {noformat} > To use this feature, we need to enable retry via > {{dfs.client.retry.policy.enabled}}. Otherwise {{TryOnceThenFail}} > RetryPolicy will be used; which caused {{handleConnectionFailure}} to throw > {{ConnectException}} exception without retrying with the new ip address. > {noformat} > private void handleConnectionFailure(int curRetries, IOException ioe > ) throws IOException { > closeConnection(); > final RetryAction action; > try { > action = connectionRetryPolicy.shouldRetry(ioe, curRetries, 0, true); > } catch(Exception e) { > throw e instanceof IOException? (IOException)e: new IOException(e); > } > .. > } > {noformat} > However, using such configuration isn't ideal. What happens is DFSClient > still holds onto the cached old ip address created by {{namenode = > proxyInfo.getProxy();}}. Thus when a new rpc connection is created, it starts > with the old ip followed by retry with the new ip. It will be nice if > DFSClient can update namenode proxy automatically upon ip address change. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12285) Better handling of namenode ip address change
[ https://issues.apache.org/jira/browse/HDFS-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-12285: --- Description: RPC client layer provides functionality to detect ip address change: {noformat} Client.java private synchronized boolean updateAddress() throws IOException { // Do a fresh lookup with the old host name. InetSocketAddress currentAddr = NetUtils.createSocketAddrForHost( server.getHostName(), server.getPort()); .. } {noformat} To use this feature, we need to enable retry via {{dfs.client.retry.policy.enabled}}. Otherwise {{TryOnceThenFail}} RetryPolicy will be used; which caused {{handleConnectionFailure}} to throw {{ConnectException}} exception without retrying with the new ip address. {noformat} private void handleConnectionFailure(int curRetries, IOException ioe ) throws IOException { closeConnection(); final RetryAction action; try { action = connectionRetryPolicy.shouldRetry(ioe, curRetries, 0, true); } catch(Exception e) { throw e instanceof IOException? (IOException)e: new IOException(e); } .. } {noformat} However, using such configuration isn't ideal. What happens is DFSClient still holds onto the cached old ip address created by {{namenode = proxyInfo.getProxy();}}. Thus when a new rpc connection is created, it starts with the old ip followed by retry with the new ip. It will be nice if DFSClient can update namenode proxy automatically upon ip address change. was: RPC client layer provides functionality to detect ip address change: {noformat} Client.java private synchronized boolean updateAddress() throws IOException { // Do a fresh lookup with the old host name. InetSocketAddress currentAddr = NetUtils.createSocketAddrForHost( server.getHostName(), server.getPort()); .. } {noformat} To use this feature, we need to enable retry via {{dfs.client.retry.policy.enabled}}. Otherwise {{TryOnceThenFail}} RetryPolicy will be used; which caused {{handleConnectionFailure}} to throw {{ConnectException}} exception without retrying with the new ip address. {noformat} private void handleConnectionFailure(int curRetries, IOException ioe ) throws IOException { closeConnection(); final RetryAction action; try { action = connectionRetryPolicy.shouldRetry(ioe, curRetries, 0, true); } catch(Exception e) { throw e instanceof IOException? (IOException)e: new IOException(e); } .. } {noformat} However, using such configuration isn't ideal. What happens is DFSClient still has the cached the old ip address created by {{namenode = proxyInfo.getProxy();}}. Then when a new rpc connection is created, it starts with the old ip followed by retry with the new ip. It will be nice if DFSClient can refresh namenode proxy automatically. > Better handling of namenode ip address change > - > > Key: HDFS-12285 > URL: https://issues.apache.org/jira/browse/HDFS-12285 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ming Ma > > RPC client layer provides functionality to detect ip address change: > {noformat} > Client.java > private synchronized boolean updateAddress() throws IOException { > // Do a fresh lookup with the old host name. > InetSocketAddress currentAddr = NetUtils.createSocketAddrForHost( >server.getHostName(), server.getPort()); > .. > } > {noformat} > To use this feature, we need to enable retry via > {{dfs.client.retry.policy.enabled}}. Otherwise {{TryOnceThenFail}} > RetryPolicy will be used; which caused {{handleConnectionFailure}} to throw > {{ConnectException}} exception without retrying with the new ip address. > {noformat} > private void handleConnectionFailure(int curRetries, IOException ioe > ) throws IOException { > closeConnection(); > final RetryAction action; > try { > action = connectionRetryPolicy.shouldRetry(ioe, curRetries, 0, true); > } catch(Exception e) { > throw e instanceof IOException? (IOException)e: new IOException(e); > } > .. > } > {noformat} > However, using such configuration isn't ideal. What happens is DFSClient > still holds onto the cached old ip address created by {{namenode = > proxyInfo.getProxy();}}. Thus when a new rpc connection is created, it starts > with the old ip followed by retry with the new ip. It will be nice if > DFSClient can update namenode proxy automatically upon ip address change. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail:
[jira] [Created] (HDFS-12285) Better handling of namenode ip address change
Ming Ma created HDFS-12285: -- Summary: Better handling of namenode ip address change Key: HDFS-12285 URL: https://issues.apache.org/jira/browse/HDFS-12285 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma RPC client layer provides functionality to detect ip address change: {noformat} Client.java private synchronized boolean updateAddress() throws IOException { // Do a fresh lookup with the old host name. InetSocketAddress currentAddr = NetUtils.createSocketAddrForHost( server.getHostName(), server.getPort()); .. } {noformat} To use this feature, we need to enable retry via {{dfs.client.retry.policy.enabled}}. Otherwise {{TryOnceThenFail}} RetryPolicy will be used; which caused {{handleConnectionFailure}} to throw {{ConnectException}} exception without retrying with the new ip address. {noformat} private void handleConnectionFailure(int curRetries, IOException ioe ) throws IOException { closeConnection(); final RetryAction action; try { action = connectionRetryPolicy.shouldRetry(ioe, curRetries, 0, true); } catch(Exception e) { throw e instanceof IOException? (IOException)e: new IOException(e); } .. } {noformat} However, using such configuration isn't ideal. What happens is DFSClient still has the cached the old ip address created by {{namenode = proxyInfo.getProxy();}}. Then when a new rpc connection is created, it starts with the old ip followed by retry with the new ip. It will be nice if DFSClient can refresh namenode proxy automatically. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11035) Better documentation for maintenace mode and upgrade domain
[ https://issues-test.apache.org/jira/browse/HDFS-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090130#comment-16090130 ] Ming Ma commented on HDFS-11035: Given these features are related to existing concepts such as decommission and block placement, we can include description of these features in relevant sections of existing *.md files. > Better documentation for maintenace mode and upgrade domain > --- > > Key: HDFS-11035 > URL: https://issues-test.apache.org/jira/browse/HDFS-11035 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, documentation >Affects Versions: 2.9.0 >Reporter: Wei-Chiu Chuang > > HDFS-7541 added upgrade domain and HDFS-7877 added maintenance mode. Existing > documentation about these two features are scarce and the implementation have > evolved from the original design doc. Looking at code and Javadoc and I still > don't quite get how I can get datanodes into maintenance mode/ set up a > upgrade domain. > File this jira to propose that we write an up-to-date description of these > two features. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-9388) Refactor decommission related code to support maintenance state for datanodes
[ https://issues-test.apache.org/jira/browse/HDFS-9388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077367#comment-16077367 ] Ming Ma edited comment on HDFS-9388 at 8/8/17 1:06 AM: --- Thanks [~manojg]. was (Author: mingma): 1. Thanks [~manojg]. > Refactor decommission related code to support maintenance state for datanodes > - > > Key: HDFS-9388 > URL: https://issues-test.apache.org/jira/browse/HDFS-9388 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9388.01.patch, HDFS-9388.02.patch > > > Lots of code can be shared between the existing decommission functionality > and to-be-added maintenance state support for datanodes. To make it easier to > add maintenance state support, let us first modify the existing code to make > it more general. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-6939) Support path-based filtering of inotify events
[ https://issues.apache.org/jira/browse/HDFS-6939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117482#comment-16117482 ] Ming Ma commented on HDFS-6939: --- Yeah we can include this feature if it provides values. Couple questions: * Each RPC getEditsFromTxid call ends up sending the filter over the wire; so filter with lots of paths has perf impact. Do we need to support large number of paths per call? * In the future there could be other type of filters, e.g. a) based on FsEditLogOp type; b) support different logical operators OR, AND, etc. To make it extensible, perhaps we can define an interface with signature shouldNotify(FsEditLogOp) and provide the path-based PathBasedInotifyFilter for now. Then InotifyFSEditLogOpTranslator will be simpler by checking shouldNotify upfront; if we need to add path-and-editop-based filtering, we can just add PathAndOpBasedInotifyFilter without changing InotifyFSEditLogOpTranslator. * DFSClient's existing getInotifyEventStream methods are only used by DistributedFileSystem. So you don't need to keep these old methods on DFSClient; instead have DistributedFileSystem's old getInotifyEventStream methods call DFSClient's new methods. Also maybe we can consider depreciate DistributedFileSystem's old getInotifyEventStream methods. > Support path-based filtering of inotify events > -- > > Key: HDFS-6939 > URL: https://issues.apache.org/jira/browse/HDFS-6939 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client, namenode, qjm >Reporter: James Thomas >Assignee: Surendra Singh Lilhore > Attachments: HDFS-6939-001.patch > > > Users should be able to specify that they only want events involving > particular paths. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9388) Refactor decommission related code to support maintenance state for datanodes
[ https://issues.apache.org/jira/browse/HDFS-9388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077367#comment-16077367 ] Ming Ma commented on HDFS-9388: --- 1. Thanks [~manojg]. > Refactor decommission related code to support maintenance state for datanodes > - > > Key: HDFS-9388 > URL: https://issues.apache.org/jira/browse/HDFS-9388 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9388.01.patch, HDFS-9388.02.patch > > > Lots of code can be shared between the existing decommission functionality > and to-be-added maintenance state support for datanodes. To make it easier to > add maintenance state support, let us first modify the existing code to make > it more general. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9388) Refactor decommission related code to support maintenance state for datanodes
[ https://issues.apache.org/jira/browse/HDFS-9388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070937#comment-16070937 ] Ming Ma commented on HDFS-9388: --- Thanks [~manojg]. Looks good over all. Couple nits: * Configuration keys DFS_NAMENODE_DECOMMISSION_* only mentioned decommission in hdfs-default.xml. Better to use general term like admin, or include maintenance. * Comments in functions like handleInsufficientlyStored and processBlocksInternal refer to decommission only; would be useful to update the comments. * The checkstyle and whitespace might not be related to the change. Still it will be nice to fix them if it isn't too much effort. > Refactor decommission related code to support maintenance state for datanodes > - > > Key: HDFS-9388 > URL: https://issues.apache.org/jira/browse/HDFS-9388 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9388.01.patch > > > Lots of code can be shared between the existing decommission functionality > and to-be-added maintenance state support for datanodes. To make it easier to > add maintenance state support, let us first modify the existing code to make > it more general. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11446) TestMaintenanceState#testWithNNAndDNRestart fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-11446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027081#comment-16027081 ] Ming Ma commented on HDFS-11446: +1. > TestMaintenanceState#testWithNNAndDNRestart fails intermittently > > > Key: HDFS-11446 > URL: https://issues.apache.org/jira/browse/HDFS-11446 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha2 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-11446.001.patch, HDFS-11446.002.patch, > HDFS-11446.003.patch, HDFS-11446.004.patch, HDFS-11446-branch-2.002.patch, > HDFS-11446-branch-2.patch > > > The test {{TestMaintenanceState#testWithNNAndDNRestart}} fails in trunk. The > stack info( > https://builds.apache.org/job/PreCommit-HDFS-Build/18423/testReport/ ): > {code} > java.lang.AssertionError: expected null, but was: for block BP-1367163238-172.17.0.2-1487836532907:blk_1073741825_1001: > expected 3, got 2 > ,DatanodeInfoWithStorage[127.0.0.1:42649,DS-c499e6ef-ce14-428b-baef-8cf2a122b248,DISK],DatanodeInfoWithStorage[127.0.0.1:40774,DS-cc484c09-6e32-4804-a337-2871f37b62e1,DISK],pending > block # 1 ,under replicated # 0 ,> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at > org.apache.hadoop.hdfs.TestMaintenanceState.testWithNNAndDNRestart(TestMaintenanceState.java:731) > {code} > The failure seems due to pending block has not been replicated. We can bump > the retry times since sometimes the cluster would be busy. Also we can use > {{GenericTestUtils#waitFor}} to simplified the current compared logic. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11790) Decommissioning of a DataNode after MaintenanceState takes a very long time to complete
[ https://issues.apache.org/jira/browse/HDFS-11790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16022321#comment-16022321 ] Ming Ma commented on HDFS-11790: Thanks [~manojg] for reporting this. Hmm, the existing code should take care of this. Wonder if it is due to some corner cases where the following functions don't skip maintenance nodes properly. * BlockManager#createLocatedBlock should skip IN_MAINTENANCE nodes. * BlockManager#chooseSourceDatanodes should skip MAINTENANCE_NOT_FOR_READ nodes set for IN_MAINTENANCE nodes. > Decommissioning of a DataNode after MaintenanceState takes a very long time > to complete > --- > > Key: HDFS-11790 > URL: https://issues.apache.org/jira/browse/HDFS-11790 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-11790-test.01.patch > > > *Problem:* > When a DataNode is requested for Decommissioning after it successfully > transitioned to MaintenanceState (HDFS-7877), the decommissioning state > transition is stuck for a long time even for very small number of blocks in > the cluster. > *Details:* > * A DataNode DN1 wa requested for MaintenanceState and it successfully > transitioned from ENTERING_MAINTENANCE state IN_MAINTENANCE state as there > are sufficient replication for all its blocks. > * As DN1 was in maintenance state now, the DataNode process was stopped on > DN1. Later the same DN1 was requested for Decommissioning. > * As part of Decommissioning, all the blocks residing in DN1 were requested > for re-replicated to other DataNodes, so that DN1 could transition from > ENTERING_DECOMMISSION to DECOMMISSIONED. > * But, re-replication for few blocks was stuck for a long time. Eventually it > got completed. > * Digging the code and logs, found that the IN_MAINTENANCE DN1 was chosen as > a source datanode for re-replication of few of the blocks. Since DataNode > process on DN1 was already stopped, the re-replication was stuck for a long > time. > * Eventually PendingReplicationMonitor timed out, and those re-replication > were re-scheduled for those timed out blocks. Again, during the > re-replication also, the IN_MAINT DN1 was chose as a source datanode for few > of the blocks leading to timeout again. This iteration continued for few > times until all blocks get re-replicated. > * By design, IN_MAINT datandoes should not be chosen for any read or write > operations. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7541) Upgrade Domains in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993607#comment-15993607 ] Ming Ma commented on HDFS-7541: --- Sounds good. HDFS-9005, HDFS-9016 and HDFS-9922 have been committed to 2.8.2. > Upgrade Domains in HDFS > --- > > Key: HDFS-7541 > URL: https://issues.apache.org/jira/browse/HDFS-7541 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 3.0.0-alpha1, 2.8.2 > > Attachments: HDFS-7541-2.patch, HDFS-7541.patch, > SupportforfastHDFSdatanoderollingupgrade.pdf, UpgradeDomains_design_v2.pdf, > UpgradeDomains_Design_v3.pdf > > > Current HDFS DN rolling upgrade step requires sequential DN restart to > minimize the impact on data availability and read/write operations. The side > effect is longer upgrade duration for large clusters. This might be > acceptable for DN JVM quick restart to update hadoop code/configuration. > However, for OS upgrade that requires machine reboot, the overall upgrade > duration will be too long if we continue to do sequential DN rolling restart. > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-7541) Upgrade Domains in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-7541: -- Fix Version/s: (was: 2.9.0) 2.8.2 > Upgrade Domains in HDFS > --- > > Key: HDFS-7541 > URL: https://issues.apache.org/jira/browse/HDFS-7541 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 3.0.0-alpha1, 2.8.2 > > Attachments: HDFS-7541-2.patch, HDFS-7541.patch, > SupportforfastHDFSdatanoderollingupgrade.pdf, UpgradeDomains_design_v2.pdf, > UpgradeDomains_Design_v3.pdf > > > Current HDFS DN rolling upgrade step requires sequential DN restart to > minimize the impact on data availability and read/write operations. The side > effect is longer upgrade duration for large clusters. This might be > acceptable for DN JVM quick restart to update hadoop code/configuration. > However, for OS upgrade that requires machine reboot, the overall upgrade > duration will be too long if we continue to do sequential DN rolling restart. > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9922) Upgrade Domain placement policy status marks a good block in violation when there are decommissioned nodes
[ https://issues.apache.org/jira/browse/HDFS-9922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-9922: -- Resolution: Fixed Fix Version/s: (was: 2.9.0) 2.8.2 Status: Resolved (was: Patch Available) Backported to branch-2.8 per the discussion in the umbrella jira. The failed tests pass locally. > Upgrade Domain placement policy status marks a good block in violation when > there are decommissioned nodes > -- > > Key: HDFS-9922 > URL: https://issues.apache.org/jira/browse/HDFS-9922 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Minor > Fix For: 2.8.2, 3.0.0-alpha1 > > Attachments: HDFS-9922.branch-2.8.001.patch, > HDFS-9922-trunk-v1.patch, HDFS-9922-trunk-v2.patch, HDFS-9922-trunk-v3.patch, > HDFS-9922-trunk-v4.patch > > > When there are replicas of a block on a decommissioned node, > BlockPlacementStatusWithUpgradeDomain#isUpgradeDomainPolicySatisfied returns > false when it should return true. This is because numberOfReplicas is the > number of in-service replicas for the block and upgradeDomains.size() is the > number of upgrade domains across all replicas of the block. Specifically, we > hit this scenario when numberOfReplicas is equal to upgradeDomainFactor and > upgradeDomains.size() is greater than numberOfReplicas. > {code} > private boolean isUpgradeDomainPolicySatisfied() { > if (numberOfReplicas <= upgradeDomainFactor) { > return (numberOfReplicas == upgradeDomains.size()); > } else { > return upgradeDomains.size() >= upgradeDomainFactor; > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9922) Upgrade Domain placement policy status marks a good block in violation when there are decommissioned nodes
[ https://issues.apache.org/jira/browse/HDFS-9922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-9922: -- Attachment: HDFS-9922.branch-2.8.001.patch > Upgrade Domain placement policy status marks a good block in violation when > there are decommissioned nodes > -- > > Key: HDFS-9922 > URL: https://issues.apache.org/jira/browse/HDFS-9922 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Minor > Fix For: 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-9922.branch-2.8.001.patch, > HDFS-9922-trunk-v1.patch, HDFS-9922-trunk-v2.patch, HDFS-9922-trunk-v3.patch, > HDFS-9922-trunk-v4.patch > > > When there are replicas of a block on a decommissioned node, > BlockPlacementStatusWithUpgradeDomain#isUpgradeDomainPolicySatisfied returns > false when it should return true. This is because numberOfReplicas is the > number of in-service replicas for the block and upgradeDomains.size() is the > number of upgrade domains across all replicas of the block. Specifically, we > hit this scenario when numberOfReplicas is equal to upgradeDomainFactor and > upgradeDomains.size() is greater than numberOfReplicas. > {code} > private boolean isUpgradeDomainPolicySatisfied() { > if (numberOfReplicas <= upgradeDomainFactor) { > return (numberOfReplicas == upgradeDomains.size()); > } else { > return upgradeDomains.size() >= upgradeDomainFactor; > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9922) Upgrade Domain placement policy status marks a good block in violation when there are decommissioned nodes
[ https://issues.apache.org/jira/browse/HDFS-9922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-9922: -- Status: Patch Available (was: Reopened) > Upgrade Domain placement policy status marks a good block in violation when > there are decommissioned nodes > -- > > Key: HDFS-9922 > URL: https://issues.apache.org/jira/browse/HDFS-9922 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Minor > Fix For: 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-9922.branch-2.8.001.patch, > HDFS-9922-trunk-v1.patch, HDFS-9922-trunk-v2.patch, HDFS-9922-trunk-v3.patch, > HDFS-9922-trunk-v4.patch > > > When there are replicas of a block on a decommissioned node, > BlockPlacementStatusWithUpgradeDomain#isUpgradeDomainPolicySatisfied returns > false when it should return true. This is because numberOfReplicas is the > number of in-service replicas for the block and upgradeDomains.size() is the > number of upgrade domains across all replicas of the block. Specifically, we > hit this scenario when numberOfReplicas is equal to upgradeDomainFactor and > upgradeDomains.size() is greater than numberOfReplicas. > {code} > private boolean isUpgradeDomainPolicySatisfied() { > if (numberOfReplicas <= upgradeDomainFactor) { > return (numberOfReplicas == upgradeDomains.size()); > } else { > return upgradeDomains.size() >= upgradeDomainFactor; > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-9922) Upgrade Domain placement policy status marks a good block in violation when there are decommissioned nodes
[ https://issues.apache.org/jira/browse/HDFS-9922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma reopened HDFS-9922: --- > Upgrade Domain placement policy status marks a good block in violation when > there are decommissioned nodes > -- > > Key: HDFS-9922 > URL: https://issues.apache.org/jira/browse/HDFS-9922 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Minor > Fix For: 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-9922-trunk-v1.patch, HDFS-9922-trunk-v2.patch, > HDFS-9922-trunk-v3.patch, HDFS-9922-trunk-v4.patch > > > When there are replicas of a block on a decommissioned node, > BlockPlacementStatusWithUpgradeDomain#isUpgradeDomainPolicySatisfied returns > false when it should return true. This is because numberOfReplicas is the > number of in-service replicas for the block and upgradeDomains.size() is the > number of upgrade domains across all replicas of the block. Specifically, we > hit this scenario when numberOfReplicas is equal to upgradeDomainFactor and > upgradeDomains.size() is greater than numberOfReplicas. > {code} > private boolean isUpgradeDomainPolicySatisfied() { > if (numberOfReplicas <= upgradeDomainFactor) { > return (numberOfReplicas == upgradeDomains.size()); > } else { > return upgradeDomains.size() >= upgradeDomainFactor; > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9016) Display upgrade domain information in fsck
[ https://issues.apache.org/jira/browse/HDFS-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-9016: -- Resolution: Fixed Fix Version/s: (was: 2.9.0) 2.8.2 Status: Resolved (was: Patch Available) Backported to branch-2.8 per the discussion in the umbrella jira. > Display upgrade domain information in fsck > -- > > Key: HDFS-9016 > URL: https://issues.apache.org/jira/browse/HDFS-9016 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 2.8.2, 3.0.0-alpha1 > > Attachments: HDFS-9016-2.patch, HDFS-9016-3.patch, HDFS-9016-4.patch, > HDFS-9016-4.patch, HDFS-9016-branch-2-2.patch, > HDFS-9016.branch-2.8.001.patch, HDFS-9016-branch-2.patch, HDFS-9016.patch > > > This will make it easy for people to use fsck to check block placement when > upgrade domain is enabled. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9016) Display upgrade domain information in fsck
[ https://issues.apache.org/jira/browse/HDFS-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-9016: -- Status: Patch Available (was: Reopened) > Display upgrade domain information in fsck > -- > > Key: HDFS-9016 > URL: https://issues.apache.org/jira/browse/HDFS-9016 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-9016-2.patch, HDFS-9016-3.patch, HDFS-9016-4.patch, > HDFS-9016-4.patch, HDFS-9016-branch-2-2.patch, > HDFS-9016.branch-2.8.001.patch, HDFS-9016-branch-2.patch, HDFS-9016.patch > > > This will make it easy for people to use fsck to check block placement when > upgrade domain is enabled. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9016) Display upgrade domain information in fsck
[ https://issues.apache.org/jira/browse/HDFS-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-9016: -- Attachment: HDFS-9016.branch-2.8.001.patch > Display upgrade domain information in fsck > -- > > Key: HDFS-9016 > URL: https://issues.apache.org/jira/browse/HDFS-9016 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-9016-2.patch, HDFS-9016-3.patch, HDFS-9016-4.patch, > HDFS-9016-4.patch, HDFS-9016-branch-2-2.patch, > HDFS-9016.branch-2.8.001.patch, HDFS-9016-branch-2.patch, HDFS-9016.patch > > > This will make it easy for people to use fsck to check block placement when > upgrade domain is enabled. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-9016) Display upgrade domain information in fsck
[ https://issues.apache.org/jira/browse/HDFS-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma reopened HDFS-9016: --- > Display upgrade domain information in fsck > -- > > Key: HDFS-9016 > URL: https://issues.apache.org/jira/browse/HDFS-9016 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-9016-2.patch, HDFS-9016-3.patch, HDFS-9016-4.patch, > HDFS-9016-4.patch, HDFS-9016-branch-2-2.patch, HDFS-9016-branch-2.patch, > HDFS-9016.patch > > > This will make it easy for people to use fsck to check block placement when > upgrade domain is enabled. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9005) Provide configuration support for upgrade domain
[ https://issues.apache.org/jira/browse/HDFS-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-9005: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: (was: 2.9.0) 2.8.2 Status: Resolved (was: Patch Available) Backported to branch-2.8. > Provide configuration support for upgrade domain > > > Key: HDFS-9005 > URL: https://issues.apache.org/jira/browse/HDFS-9005 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 2.8.2, 3.0.0-alpha1 > > Attachments: HDFS-9005-2.patch, HDFS-9005-3.patch, HDFS-9005-4.patch, > HDFS-9005.branch-2.8.001.patch, HDFS-9005.patch > > > As part of the upgrade domain feature, we need to provide a mechanism to > specify upgrade domain for each datanode. One way to accomplish that is to > allow admins specify an upgrade domain script that takes DN ip or hostname as > input and return the upgrade domain. Then namenode will use it at run time to > set {{DatanodeInfo}}'s upgrade domain string. The configuration can be > something like: > {noformat} > > dfs.namenode.upgrade.domain.script.file.name > /etc/hadoop/conf/upgrade-domain.sh > > {noformat} > just like topology script, -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9005) Provide configuration support for upgrade domain
[ https://issues.apache.org/jira/browse/HDFS-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-9005: -- Attachment: HDFS-9005.branch-2.8.001.patch Per the discussion in the umbrella jira, we want to the feature to be in 2.8. Here is the patch for branch-2.8. It requires some manual effort. All HDFS tests passed locally. > Provide configuration support for upgrade domain > > > Key: HDFS-9005 > URL: https://issues.apache.org/jira/browse/HDFS-9005 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-9005-2.patch, HDFS-9005-3.patch, HDFS-9005-4.patch, > HDFS-9005.branch-2.8.001.patch, HDFS-9005.patch > > > As part of the upgrade domain feature, we need to provide a mechanism to > specify upgrade domain for each datanode. One way to accomplish that is to > allow admins specify an upgrade domain script that takes DN ip or hostname as > input and return the upgrade domain. Then namenode will use it at run time to > set {{DatanodeInfo}}'s upgrade domain string. The configuration can be > something like: > {noformat} > > dfs.namenode.upgrade.domain.script.file.name > /etc/hadoop/conf/upgrade-domain.sh > > {noformat} > just like topology script, -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7541) Upgrade Domains in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990559#comment-15990559 ] Ming Ma commented on HDFS-7541: --- Sure I can backport HDFS-9005, HDFS-9016 and HDFS-9922 to 2.8. Which 2.8 release do we want, 2.8.1 or 2.8.2? Pushing the feature to 2.7 requires much more work though. Regarding the production quality, yes it has been pretty reliable. The only feature we don't use in our production is HDFS-9005. We used script-based configuration approach while the feature was developed and tested and haven't spend time changing the configuration mechanism. > Upgrade Domains in HDFS > --- > > Key: HDFS-7541 > URL: https://issues.apache.org/jira/browse/HDFS-7541 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Kihwal Lee > Fix For: 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-7541-2.patch, HDFS-7541.patch, > SupportforfastHDFSdatanoderollingupgrade.pdf, UpgradeDomains_design_v2.pdf, > UpgradeDomains_Design_v3.pdf > > > Current HDFS DN rolling upgrade step requires sequential DN restart to > minimize the impact on data availability and read/write operations. The side > effect is longer upgrade duration for large clusters. This might be > acceptable for DN JVM quick restart to update hadoop code/configuration. > However, for OS upgrade that requires machine reboot, the overall upgrade > duration will be too long if we continue to do sequential DN rolling restart. > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9922) Upgrade Domain placement policy status marks a good block in violation when there are decommissioned nodes
[ https://issues.apache.org/jira/browse/HDFS-9922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979473#comment-15979473 ] Ming Ma commented on HDFS-9922: --- Currently upgrade domain isn't considered available in 2.8 due to these changes. If we want the feature to be in 2.8, the major backport item is HDFS-9005. > Upgrade Domain placement policy status marks a good block in violation when > there are decommissioned nodes > -- > > Key: HDFS-9922 > URL: https://issues.apache.org/jira/browse/HDFS-9922 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Minor > Fix For: 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-9922-trunk-v1.patch, HDFS-9922-trunk-v2.patch, > HDFS-9922-trunk-v3.patch, HDFS-9922-trunk-v4.patch > > > When there are replicas of a block on a decommissioned node, > BlockPlacementStatusWithUpgradeDomain#isUpgradeDomainPolicySatisfied returns > false when it should return true. This is because numberOfReplicas is the > number of in-service replicas for the block and upgradeDomains.size() is the > number of upgrade domains across all replicas of the block. Specifically, we > hit this scenario when numberOfReplicas is equal to upgradeDomainFactor and > upgradeDomains.size() is greater than numberOfReplicas. > {code} > private boolean isUpgradeDomainPolicySatisfied() { > if (numberOfReplicas <= upgradeDomainFactor) { > return (numberOfReplicas == upgradeDomains.size()); > } else { > return upgradeDomains.size() >= upgradeDomainFactor; > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11446) TestMaintenanceState#testWithNNAndDNRestart fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-11446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15895823#comment-15895823 ] Ming Ma commented on HDFS-11446: Thanks. Got the following compile error for branch-2 which uses java 7. {noformat} local variable ... is accessed from within inner class; needs to be declared final {noformat} > TestMaintenanceState#testWithNNAndDNRestart fails intermittently > > > Key: HDFS-11446 > URL: https://issues.apache.org/jira/browse/HDFS-11446 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha2 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-11446.001.patch, HDFS-11446.002.patch > > > The test {{TestMaintenanceState#testWithNNAndDNRestart}} fails in trunk. The > stack info( > https://builds.apache.org/job/PreCommit-HDFS-Build/18423/testReport/ ): > {code} > java.lang.AssertionError: expected null, but was: for block BP-1367163238-172.17.0.2-1487836532907:blk_1073741825_1001: > expected 3, got 2 > ,DatanodeInfoWithStorage[127.0.0.1:42649,DS-c499e6ef-ce14-428b-baef-8cf2a122b248,DISK],DatanodeInfoWithStorage[127.0.0.1:40774,DS-cc484c09-6e32-4804-a337-2871f37b62e1,DISK],pending > block # 1 ,under replicated # 0 ,> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at > org.apache.hadoop.hdfs.TestMaintenanceState.testWithNNAndDNRestart(TestMaintenanceState.java:731) > {code} > The failure seems due to pending block has not been replicated. We can bump > the retry times since sometimes the cluster would be busy. Also we can use > {{GenericTestUtils#waitFor}} to simplified the current compared logic. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9388) Refactor decommission related code to support maintenance state for datanodes
[ https://issues.apache.org/jira/browse/HDFS-9388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15894786#comment-15894786 ] Ming Ma commented on HDFS-9388: --- Or DatanodeAdminManager? DatanodeMaintenance seems good. DatanodeService and DatanodeProvision are too general. > Refactor decommission related code to support maintenance state for datanodes > - > > Key: HDFS-9388 > URL: https://issues.apache.org/jira/browse/HDFS-9388 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Manoj Govindassamy > > Lots of code can be shared between the existing decommission functionality > and to-be-added maintenance state support for datanodes. To make it easier to > add maintenance state support, let us first modify the existing code to make > it more general. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11446) TestMaintenanceState#testWithNNAndDNRestart fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-11446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15894772#comment-15894772 ] Ming Ma commented on HDFS-11446: Thanks [~linyiqun]. The patch looks good. Do you get compile issue when applied to branch-2? > TestMaintenanceState#testWithNNAndDNRestart fails intermittently > > > Key: HDFS-11446 > URL: https://issues.apache.org/jira/browse/HDFS-11446 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha2 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-11446.001.patch > > > The test {{TestMaintenanceState#testWithNNAndDNRestart}} fails in trunk. The > stack info( > https://builds.apache.org/job/PreCommit-HDFS-Build/18423/testReport/ ): > {code} > java.lang.AssertionError: expected null, but was: for block BP-1367163238-172.17.0.2-1487836532907:blk_1073741825_1001: > expected 3, got 2 > ,DatanodeInfoWithStorage[127.0.0.1:42649,DS-c499e6ef-ce14-428b-baef-8cf2a122b248,DISK],DatanodeInfoWithStorage[127.0.0.1:40774,DS-cc484c09-6e32-4804-a337-2871f37b62e1,DISK],pending > block # 1 ,under replicated # 0 ,> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at > org.apache.hadoop.hdfs.TestMaintenanceState.testWithNNAndDNRestart(TestMaintenanceState.java:731) > {code} > The failure seems due to pending block has not been replicated. We can bump > the retry times since sometimes the cluster would be busy. Also we can use > {{GenericTestUtils#waitFor}} to simplified the current compared logic. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11412) Maintenance minimum replication config value allowable range should be [0, DefaultReplication]
[ https://issues.apache.org/jira/browse/HDFS-11412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-11412: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0-alpha3 2.9.0 Status: Resolved (was: Patch Available) Committed to branch-2. Thanks [~manojg] for the contribution. > Maintenance minimum replication config value allowable range should be [0, > DefaultReplication] > -- > > Key: HDFS-11412 > URL: https://issues.apache.org/jira/browse/HDFS-11412 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Fix For: 2.9.0, 3.0.0-alpha3 > > Attachments: HDFS-11412.01.patch, HDFS-11412.02.patch, > HDFS-11412-branch-2.01.patch > > > Currently the allowed value range for Maintenance Min Replication > {{dfs.namenode.maintenance.replication.min}} is 0 to > {{dfs.namenode.replication.min}} (default=1). Users wanting not to affect the > performance of the cluster would wish to have the Maintenance Min Replication > number greater than 1, say 2. In the current design, it is possible to have > this Maintenance Min Replication configuration, but only after changing the > NameNode level Block Min Replication to 2, and which could slowdown the > overall latency for client writes. > Technically speaking we should be allowing Maintenance Min Replication to be > in range 0 to dfs.replication.max. > * There is always config value of 0 for users not wanting any > availability/performance during maintenance. > * And, performance centric workloads can still get maintenance done without > major disruptions by having a bigger Maintenance Min Replication. Setting the > upper limit as dfs.replication.max could be an overkill as it could trigger > re-replication which Maintenance State is trying to avoid. So, we could allow > the {{dfs.namenode.maintenance.replication.min}} in the range {{0 to > dfs.replication}} > {noformat} > if (minMaintenanceR < 0) { > throw new IOException("Unexpected configuration parameters: " > + DFSConfigKeys.DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY > + " = " + minMaintenanceR + " < 0"); > } > if (minMaintenanceR > minR) { > throw new IOException("Unexpected configuration parameters: " > + DFSConfigKeys.DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY > + " = " + minMaintenanceR + " > " > + DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY > + " = " + minR); > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11412) Maintenance minimum replication config value allowable range should be [0, DefaultReplication]
[ https://issues.apache.org/jira/browse/HDFS-11412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891612#comment-15891612 ] Ming Ma commented on HDFS-11412: +1. Committed to trunk. [~manojg], could you please provide another patch for branch-2 as it doesn't apply? Thanks. > Maintenance minimum replication config value allowable range should be [0, > DefaultReplication] > -- > > Key: HDFS-11412 > URL: https://issues.apache.org/jira/browse/HDFS-11412 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-11412.01.patch, HDFS-11412.02.patch > > > Currently the allowed value range for Maintenance Min Replication > {{dfs.namenode.maintenance.replication.min}} is 0 to > {{dfs.namenode.replication.min}} (default=1). Users wanting not to affect the > performance of the cluster would wish to have the Maintenance Min Replication > number greater than 1, say 2. In the current design, it is possible to have > this Maintenance Min Replication configuration, but only after changing the > NameNode level Block Min Replication to 2, and which could slowdown the > overall latency for client writes. > Technically speaking we should be allowing Maintenance Min Replication to be > in range 0 to dfs.replication.max. > * There is always config value of 0 for users not wanting any > availability/performance during maintenance. > * And, performance centric workloads can still get maintenance done without > major disruptions by having a bigger Maintenance Min Replication. Setting the > upper limit as dfs.replication.max could be an overkill as it could trigger > re-replication which Maintenance State is trying to avoid. So, we could allow > the {{dfs.namenode.maintenance.replication.min}} in the range {{0 to > dfs.replication}} > {noformat} > if (minMaintenanceR < 0) { > throw new IOException("Unexpected configuration parameters: " > + DFSConfigKeys.DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY > + " = " + minMaintenanceR + " < 0"); > } > if (minMaintenanceR > minR) { > throw new IOException("Unexpected configuration parameters: " > + DFSConfigKeys.DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY > + " = " + minMaintenanceR + " > " > + DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY > + " = " + minR); > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11412) Maintenance minimum replication config value allowable range should be [0, DefaultReplication]
[ https://issues.apache.org/jira/browse/HDFS-11412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-11412: --- Summary: Maintenance minimum replication config value allowable range should be [0, DefaultReplication] (was: Maintenance minimum replication config value allowable range should be {0 - DefaultReplication}) > Maintenance minimum replication config value allowable range should be [0, > DefaultReplication] > -- > > Key: HDFS-11412 > URL: https://issues.apache.org/jira/browse/HDFS-11412 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-11412.01.patch, HDFS-11412.02.patch > > > Currently the allowed value range for Maintenance Min Replication > {{dfs.namenode.maintenance.replication.min}} is 0 to > {{dfs.namenode.replication.min}} (default=1). Users wanting not to affect the > performance of the cluster would wish to have the Maintenance Min Replication > number greater than 1, say 2. In the current design, it is possible to have > this Maintenance Min Replication configuration, but only after changing the > NameNode level Block Min Replication to 2, and which could slowdown the > overall latency for client writes. > Technically speaking we should be allowing Maintenance Min Replication to be > in range 0 to dfs.replication.max. > * There is always config value of 0 for users not wanting any > availability/performance during maintenance. > * And, performance centric workloads can still get maintenance done without > major disruptions by having a bigger Maintenance Min Replication. Setting the > upper limit as dfs.replication.max could be an overkill as it could trigger > re-replication which Maintenance State is trying to avoid. So, we could allow > the {{dfs.namenode.maintenance.replication.min}} in the range {{0 to > dfs.replication}} > {noformat} > if (minMaintenanceR < 0) { > throw new IOException("Unexpected configuration parameters: " > + DFSConfigKeys.DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY > + " = " + minMaintenanceR + " < 0"); > } > if (minMaintenanceR > minR) { > throw new IOException("Unexpected configuration parameters: " > + DFSConfigKeys.DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY > + " = " + minMaintenanceR + " > " > + DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY > + " = " + minR); > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11412) Maintenance minimum replication config value allowable range should be {0 - DefaultReplication}
[ https://issues.apache.org/jira/browse/HDFS-11412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890705#comment-15890705 ] Ming Ma commented on HDFS-11412: bq. this particular range can have adverse effects as it can force replicate to larger number of blocks (to honor minReplicationToBeInMaintenance) even for the files that aren't created with higher replication factor. It can choose to only force the replication up to the replication factor of that files. So for most files which have default replication factor, the less of {default replication factor, minReplicationToBeInMaintenance} will be used as the min replication value f during maintenance. So the impact should be similar to setting minReplicationToBeInMaintenance to default replication factor. Also this is similar to how the following case will be handled. Set minReplicationToBeInMaintenance to the default replication factor. For files with replication factor of 2, 2 will be used the value for the min replication value. bq. May be we need to return the max or min value based on how the block replication is set compared to the default replication Maybe we can modify getMinReplicationToBeInMaintenance to return the less of {file replication factor, minReplicationToBeInMaintenance}. > Maintenance minimum replication config value allowable range should be {0 - > DefaultReplication} > --- > > Key: HDFS-11412 > URL: https://issues.apache.org/jira/browse/HDFS-11412 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-11412.01.patch > > > Currently the allowed value range for Maintenance Min Replication > {{dfs.namenode.maintenance.replication.min}} is 0 to > {{dfs.namenode.replication.min}} (default=1). Users wanting not to affect the > performance of the cluster would wish to have the Maintenance Min Replication > number greater than 1, say 2. In the current design, it is possible to have > this Maintenance Min Replication configuration, but only after changing the > NameNode level Block Min Replication to 2, and which could slowdown the > overall latency for client writes. > Technically speaking we should be allowing Maintenance Min Replication to be > in range 0 to dfs.replication.max. > * There is always config value of 0 for users not wanting any > availability/performance during maintenance. > * And, performance centric workloads can still get maintenance done without > major disruptions by having a bigger Maintenance Min Replication. Setting the > upper limit as dfs.replication.max could be an overkill as it could trigger > re-replication which Maintenance State is trying to avoid. So, we could allow > the {{dfs.namenode.maintenance.replication.min}} in the range {{0 to > dfs.replication}} > {noformat} > if (minMaintenanceR < 0) { > throw new IOException("Unexpected configuration parameters: " > + DFSConfigKeys.DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY > + " = " + minMaintenanceR + " < 0"); > } > if (minMaintenanceR > minR) { > throw new IOException("Unexpected configuration parameters: " > + DFSConfigKeys.DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY > + " = " + minMaintenanceR + " > " > + DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY > + " = " + minR); > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11411) Avoid OutOfMemoryError in TestMaintenanceState test runs
[ https://issues.apache.org/jira/browse/HDFS-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-11411: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0-alpha3 2.9.0 Status: Resolved (was: Patch Available) +1. Thanks [~manojg] for the contribution. Committed to trunk and branch-2. > Avoid OutOfMemoryError in TestMaintenanceState test runs > > > Key: HDFS-11411 > URL: https://issues.apache.org/jira/browse/HDFS-11411 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Fix For: 2.9.0, 3.0.0-alpha3 > > Attachments: HDFS-11411.01.patch, HDFS-11411.02.patch > > > TestMainteananceState test runs are seeing OutOfMemoryError issues quite > frequently now. Need to fix tests that are consuming lots of memory/threads. > {noformat} > --- > T E S T S > --- > Running org.apache.hadoop.hdfs.TestMaintenanceState > Tests run: 21, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 219.479 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.Te > testTransitionFromDecommissioned(org.apache.hadoop.hdfs.TestMaintenanceState) > Time elapsed: 0.64 sec <<< ERROR! > java.lang.OutOfMemoryError: unable to create new native thread > testTakeDeadNodeOutOfMaintenance(org.apache.hadoop.hdfs.TestMaintenanceState) > Time elapsed: 0.031 sec <<< ERROR! > java.lang.OutOfMemoryError: unable to create new native thread > testWithNNAndDNRestart(org.apache.hadoop.hdfs.TestMaintenanceState) Time > elapsed: 0.03 sec <<< ERROR! > java.lang.OutOfMemoryError: unable to create new native thread > testMultipleNodesMaintenance(org.apache.hadoop.hdfs.TestMaintenanceState) > Time elapsed: 60.127 sec <<< ERROR! > java.io.IOException: Problem starting http server > Results : > Tests in error: > > TestMaintenanceState.testTransitionFromDecommissioned:225->AdminStatesBaseTest.startCluster:413->AdminStatesBaseTest.s > > TestMaintenanceState.testTakeDeadNodeOutOfMaintenance:636->AdminStatesBaseTest.startCluster:413->AdminStatesBaseTest.s > > TestMaintenanceState.testWithNNAndDNRestart:692->AdminStatesBaseTest.startCluster:413->AdminStatesBaseTest.startCluste > > TestMaintenanceState.testMultipleNodesMaintenance:532->AdminStatesBaseTest.startCluster:413->AdminStatesBaseTest.start > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9388) Refactor decommission related code to support maintenance state for datanodes
[ https://issues.apache.org/jira/browse/HDFS-9388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869184#comment-15869184 ] Ming Ma commented on HDFS-9388: --- [~manojg], most of the work has been done by other jiras. Some specific items left include the rename of DecommissionManager and if comments about decommission should be updated. Please feel free to assign it to yourself. Thank you! > Refactor decommission related code to support maintenance state for datanodes > - > > Key: HDFS-9388 > URL: https://issues.apache.org/jira/browse/HDFS-9388 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma > > Lots of code can be shared between the existing decommission functionality > and to-be-added maintenance state support for datanodes. To make it easier to > add maintenance state support, let us first modify the existing code to make > it more general. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11411) Avoid OutOfMemoryError in TestMaintenanceState test runs
[ https://issues.apache.org/jira/browse/HDFS-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869170#comment-15869170 ] Ming Ma commented on HDFS-11411: Looks good. Nits: * For {{testExpectedReplication}} case, should we move the setup() call into the function that calls startCluster? * Maybe at the end of the function, call teardown first, then setup for the next iteration. Otherwise, setup will be called twice (one from the test case setup and another one from the added explicit call) for the first iteration of test case. > Avoid OutOfMemoryError in TestMaintenanceState test runs > > > Key: HDFS-11411 > URL: https://issues.apache.org/jira/browse/HDFS-11411 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-11411.01.patch > > > TestMainteananceState test runs are seeing OutOfMemoryError issues quite > frequently now. Need to fix tests that are consuming lots of memory/threads. > {noformat} > --- > T E S T S > --- > Running org.apache.hadoop.hdfs.TestMaintenanceState > Tests run: 21, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 219.479 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.Te > testTransitionFromDecommissioned(org.apache.hadoop.hdfs.TestMaintenanceState) > Time elapsed: 0.64 sec <<< ERROR! > java.lang.OutOfMemoryError: unable to create new native thread > testTakeDeadNodeOutOfMaintenance(org.apache.hadoop.hdfs.TestMaintenanceState) > Time elapsed: 0.031 sec <<< ERROR! > java.lang.OutOfMemoryError: unable to create new native thread > testWithNNAndDNRestart(org.apache.hadoop.hdfs.TestMaintenanceState) Time > elapsed: 0.03 sec <<< ERROR! > java.lang.OutOfMemoryError: unable to create new native thread > testMultipleNodesMaintenance(org.apache.hadoop.hdfs.TestMaintenanceState) > Time elapsed: 60.127 sec <<< ERROR! > java.io.IOException: Problem starting http server > Results : > Tests in error: > > TestMaintenanceState.testTransitionFromDecommissioned:225->AdminStatesBaseTest.startCluster:413->AdminStatesBaseTest.s > > TestMaintenanceState.testTakeDeadNodeOutOfMaintenance:636->AdminStatesBaseTest.startCluster:413->AdminStatesBaseTest.s > > TestMaintenanceState.testWithNNAndDNRestart:692->AdminStatesBaseTest.startCluster:413->AdminStatesBaseTest.startCluste > > TestMaintenanceState.testMultipleNodesMaintenance:532->AdminStatesBaseTest.startCluster:413->AdminStatesBaseTest.start > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11265) Extend visualization for Maintenance Mode under Datanode tab in the NameNode UI
[ https://issues.apache.org/jira/browse/HDFS-11265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-11265: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0-alpha3 2.9.0 Status: Resolved (was: Patch Available) Thanks [~elektrobank] for the contribution and [~manojg] for the review. Committed to trunk and branch-2. > Extend visualization for Maintenance Mode under Datanode tab in the NameNode > UI > --- > > Key: HDFS-11265 > URL: https://issues.apache.org/jira/browse/HDFS-11265 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Elek, Marton > Fix For: 2.9.0, 3.0.0-alpha3 > > Attachments: ex.png, HDFS-11265.001.patch, icons.png, x.png > > > With HDFS-9391, DataNodes in MaintenanceModes states are shown under DataNode > page in NameNode UI, but they are lacking icon visualization like the ones > shown for other node states. Need to extend the icon visualization to cover > Maintenance Mode. > {code} >
[jira] [Commented] (HDFS-11265) Extend visualization for Maintenance Mode under Datanode tab in the NameNode UI
[ https://issues.apache.org/jira/browse/HDFS-11265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869107#comment-15869107 ] Ming Ma commented on HDFS-11265: Strictly speaking, live decommissioned nodes can serve read requests as the least preferred replicas. But even with that, the existing patch LGTM. +1. > Extend visualization for Maintenance Mode under Datanode tab in the NameNode > UI > --- > > Key: HDFS-11265 > URL: https://issues.apache.org/jira/browse/HDFS-11265 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Elek, Marton > Attachments: ex.png, HDFS-11265.001.patch, icons.png, x.png > > > With HDFS-9391, DataNodes in MaintenanceModes states are shown under DataNode > page in NameNode UI, but they are lacking icon visualization like the ones > shown for other node states. Need to extend the icon visualization to cover > Maintenance Mode. > {code} >
[jira] [Commented] (HDFS-11412) Maintenance minimum replication config value allowable range should be {0 - DefaultReplication}
[ https://issues.apache.org/jira/browse/HDFS-11412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868977#comment-15868977 ] Ming Ma commented on HDFS-11412: Thanks [~manojg]. * Regarding whether to use default replication factor or max replication factor, do you care about the following use case? default == 3, max = 30. Block A have large replication factor 30 and would like to keep at least 20 live replicas around during maintenance. Then put 20 nodes with replicas of Block A into maintenance at the same time. To make sure at least 20 live replicas after maintenance, the system need to honor minReplicationToBeInMaintenance == 20. * Impact on {{getExpectedLiveRedundancyNum}} calculation. Set minReplicationToBeInMaintenance to 3. Block B's replication factor is 2. Put one of its replicas into maintenance. Inside function {{getExpectedLiveRedundancyNum}}, {{Math.max(expectedRedundancy - numberReplicas.maintenanceReplicas(), getMinReplicationToBeInMaintenance())}} == 3. Ideally the function should return 2. > Maintenance minimum replication config value allowable range should be {0 - > DefaultReplication} > --- > > Key: HDFS-11412 > URL: https://issues.apache.org/jira/browse/HDFS-11412 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-11412.01.patch > > > Currently the allowed value range for Maintenance Min Replication > {{dfs.namenode.maintenance.replication.min}} is 0 to > {{dfs.namenode.replication.min}} (default=1). Users wanting not to affect the > performance of the cluster would wish to have the Maintenance Min Replication > number greater than 1, say 2. In the current design, it is possible to have > this Maintenance Min Replication configuration, but only after changing the > NameNode level Block Min Replication to 2, and which could slowdown the > overall latency for client writes. > Technically speaking we should be allowing Maintenance Min Replication to be > in range 0 to dfs.replication.max. > * There is always config value of 0 for users not wanting any > availability/performance during maintenance. > * And, performance centric workloads can still get maintenance done without > major disruptions by having a bigger Maintenance Min Replication. Setting the > upper limit as dfs.replication.max could be an overkill as it could trigger > re-replication which Maintenance State is trying to avoid. So, we could allow > the {{dfs.namenode.maintenance.replication.min}} in the range {{0 to > dfs.replication}} > {noformat} > if (minMaintenanceR < 0) { > throw new IOException("Unexpected configuration parameters: " > + DFSConfigKeys.DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY > + " = " + minMaintenanceR + " < 0"); > } > if (minMaintenanceR > minR) { > throw new IOException("Unexpected configuration parameters: " > + DFSConfigKeys.DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY > + " = " + minMaintenanceR + " > " > + DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY > + " = " + minR); > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7877) Support maintenance state for datanodes
[ https://issues.apache.org/jira/browse/HDFS-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868948#comment-15868948 ] Ming Ma commented on HDFS-7877: --- ok. Will follow up the discussion in HDFS-11412. > Support maintenance state for datanodes > --- > > Key: HDFS-7877 > URL: https://issues.apache.org/jira/browse/HDFS-7877 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-7877-2.patch, HDFS-7877.patch, > Supportmaintenancestatefordatanodes-2.pdf, > Supportmaintenancestatefordatanodes.pdf > > > This requirement came up during the design for HDFS-7541. Given this feature > is mostly independent of upgrade domain feature, it is better to track it > under a separate jira. The design and draft patch will be available soon. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-7877) Support maintenance state for datanodes
[ https://issues.apache.org/jira/browse/HDFS-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862549#comment-15862549 ] Ming Ma edited comment on HDFS-7877 at 2/11/17 10:23 PM: - Thanks [~manojg] and [~dilaver] for the good point. What you suggested makes sense. The reason we don't have this requirement so far is probably because when we put nodes into maintenance, we often do it one upgrade domain at a time, thus no two replicas will be put to maintenance at the same time. To confirm, given we still allow applications to create blocks with smaller replication factor than {{dfs.namenode.maintenance.replication.min}}, the transition policy from {{ENTERING_MAINTENANCE}} to {{IN_MAINTENANCE}} will become the # of live replicas >= min({{dfs.namenode.maintenance.replication.min}}, replication factor). was (Author: mingma): Thanks [~manojg]. Good point. What you suggested makes sense. The reason we don't have this requirement in our production is probably because we only put nodes in one upgrade domain into maintenance at a time; after one batch is done, move to the next upgrade domain. Thus no two replicas will be put to maintenance at the same time. To confirm, given we will still allow applications to create blocks with smaller replication factor than {{dfs.namenode.maintenance.replication.min}}, the transition policy from {{ENTERING_MAINTENANCE}} to {{IN_MAINTENANCE}} becomes the # of live replicas >= min({{dfs.namenode.maintenance.replication.min}}, replication factor). > Support maintenance state for datanodes > --- > > Key: HDFS-7877 > URL: https://issues.apache.org/jira/browse/HDFS-7877 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-7877-2.patch, HDFS-7877.patch, > Supportmaintenancestatefordatanodes-2.pdf, > Supportmaintenancestatefordatanodes.pdf > > > This requirement came up during the design for HDFS-7541. Given this feature > is mostly independent of upgrade domain feature, it is better to track it > under a separate jira. The design and draft patch will be available soon. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7877) Support maintenance state for datanodes
[ https://issues.apache.org/jira/browse/HDFS-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862549#comment-15862549 ] Ming Ma commented on HDFS-7877: --- Thanks [~manojg]. Good point. What you suggested makes sense. The reason we don't have this requirement in our production is probably because we only put nodes in one upgrade domain into maintenance at a time; after one batch is done, move to the next upgrade domain. Thus no two replicas will be put to maintenance at the same time. To confirm, given we will still allow applications to create blocks with smaller replication factor than {{dfs.namenode.maintenance.replication.min}}, the transition policy from {{ENTERING_MAINTENANCE}} to {{IN_MAINTENANCE}} becomes the # of live replicas >= min({{dfs.namenode.maintenance.replication.min}}, replication factor). > Support maintenance state for datanodes > --- > > Key: HDFS-7877 > URL: https://issues.apache.org/jira/browse/HDFS-7877 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-7877-2.patch, HDFS-7877.patch, > Supportmaintenancestatefordatanodes-2.pdf, > Supportmaintenancestatefordatanodes.pdf > > > This requirement came up during the design for HDFS-7541. Given this feature > is mostly independent of upgrade domain feature, it is better to track it > under a separate jira. The design and draft patch will be available soon. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11378) Verify multiple DataNodes can be decommissioned/maintenance at the same time
[ https://issues.apache.org/jira/browse/HDFS-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-11378: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0-alpha3 2.9.0 Status: Resolved (was: Patch Available) +1. Thanks [~manojg] for the contribution. I have committed it to trunk and branch-2. > Verify multiple DataNodes can be decommissioned/maintenance at the same time > > > Key: HDFS-11378 > URL: https://issues.apache.org/jira/browse/HDFS-11378 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Fix For: 2.9.0, 3.0.0-alpha3 > > Attachments: HDFS-11378.01.patch > > > DecommissionManager is capable of transitioning multiple DataNodes to > Decommission/Maintenance states. Current tests under TestDecommission and > TestMaintenanceState only request for one DataNode for > Decommission/Maintenance. Better if we can simulate real world cases whereby > multiple DataNodes can be taken out of service and verify the resulting block > replication factor for the files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11378) Verify multiple DataNodes can be decommissioned/maintenance at the same time
[ https://issues.apache.org/jira/browse/HDFS-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15843195#comment-15843195 ] Ming Ma commented on HDFS-11378: The patch LGTM. Thanks [~manojg] for the useful test cases! We might want to add test cases of putting some nodes to decommission and other nodes to maintenance at the same time. But that can be done in a separate jira unless it is your attention to do it here. > Verify multiple DataNodes can be decommissioned/maintenance at the same time > > > Key: HDFS-11378 > URL: https://issues.apache.org/jira/browse/HDFS-11378 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-11378.01.patch > > > DecommissionManager is capable of transitioning multiple DataNodes to > Decommission/Maintenance states. Current tests under TestDecommission and > TestMaintenanceState only request for one DataNode for > Decommission/Maintenance. Better if we can simulate real world cases whereby > multiple DataNodes can be taken out of service and verify the resulting block > replication factor for the files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11296) Maintenance state expiry should be an epoch time and not jvm monotonic
[ https://issues.apache.org/jira/browse/HDFS-11296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-11296: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0-alpha3 2.9.0 Status: Resolved (was: Patch Available) +1. The failed tests aren't related. Committed to trunk and branch-2. Thanks [~manojg] for the contribution. Thanks [~eddyxu] for the review. > Maintenance state expiry should be an epoch time and not jvm monotonic > -- > > Key: HDFS-11296 > URL: https://issues.apache.org/jira/browse/HDFS-11296 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Fix For: 2.9.0, 3.0.0-alpha3 > > Attachments: HDFS-11296.01.patch, HDFS-11296.02.patch, > HDFS-11296.03.patch, HDFS-11296-branch-2.01.patch, > HDFS-11296-branch-2.02.patch, HDFS-11296-branch-2.03.patch > > > Currently it is possible to configure an expiry time in milliseconds for a > DataNode in maintenance state. As per the design, the expiry attribute is an > absolute time, beyond which NameNode starts to stop the ongoing maintenance > operation for that DataNode. Internally in the code, this expiry time is read > and checked against {{Time.monotonicNow()}} making the expiry based on more > of JVM's runtime, which is very difficult to configure for any external user. > The goal is to make the expiry time an absolute epoch time, so that its easy > to configure for external users. > {noformat} > { > "hostName": , > "port": , > "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": > } > {noformat} > DatanodeInfo.java > {noformat} > public static boolean maintenanceNotExpired(long maintenanceExpireTimeInMS) > { > return Time.monotonicNow() < maintenanceExpireTimeInMS; > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11296) Maintenance state expiry should be an epoch time and not jvm monotonic
[ https://issues.apache.org/jira/browse/HDFS-11296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822666#comment-15822666 ] Ming Ma commented on HDFS-11296: Thanks Manoj for the fix. Nit: Maybe use Time.now() instead? > Maintenance state expiry should be an epoch time and not jvm monotonic > -- > > Key: HDFS-11296 > URL: https://issues.apache.org/jira/browse/HDFS-11296 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-11296.01.patch, HDFS-11296-branch-2.01.patch > > > Currently it is possible to configure an expiry time in milliseconds for a > DataNode in maintenance state. As per the design, the expiry attribute is an > absolute time, beyond which NameNode starts to stop the ongoing maintenance > operation for that DataNode. Internally in the code, this expiry time is read > and checked against {{Time.monotonicNow()}} making the expiry based on more > of JVM's runtime, which is very difficult to configure for any external user. > The goal is to make the expiry time an absolute epoch time, so that its easy > to configure for external users. > {noformat} > { > "hostName": , > "port": , > "adminState": "IN_MAINTENANCE", > "maintenanceExpireTimeInMS": > } > {noformat} > DatanodeInfo.java > {noformat} > public static boolean maintenanceNotExpired(long maintenanceExpireTimeInMS) > { > return Time.monotonicNow() < maintenanceExpireTimeInMS; > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817119#comment-15817119 ] Ming Ma edited comment on HDFS-9391 at 1/11/17 4:23 AM: Thanks [~manojg] for the contribution. Thanks [~dilaver] and [~eddyxu] for the review. I have committed the patch to trunk and branch-2. was (Author: mingma): Thanks [~manojg] for the contribution. Thanks [~eddyxu] for the review. I have committed the patch to trunk and branch-2. > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, > HDFS-9391-branch-2-MaintenanceMode-WebUI.pdf, HDFS-9391-branch-2.01.patch, > HDFS-9391-branch-2.02.patch, HDFS-9391.01.patch, HDFS-9391.02.patch, > HDFS-9391.03.patch, HDFS-9391.04.patch, Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-9391: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0-alpha2 2.9.0 Status: Resolved (was: Patch Available) Thanks [~manojg] for the contribution. Thanks [~eddyxu] for the review. I have committed the patch to trunk and branch-2. > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, > HDFS-9391-branch-2-MaintenanceMode-WebUI.pdf, HDFS-9391-branch-2.01.patch, > HDFS-9391-branch-2.02.patch, HDFS-9391.01.patch, HDFS-9391.02.patch, > HDFS-9391.03.patch, HDFS-9391.04.patch, Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816464#comment-15816464 ] Ming Ma commented on HDFS-9391: --- Thanks [~manojg]. It seems there is a typo in branch-2 patch {{getLeavingServiceStatus().set}} which passes the wrong variable, which caused TestDecommissioningStatus to fail. In addition, it will be useful to verify UI for the branch-2 patch. > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, > HDFS-9391-branch-2.01.patch, HDFS-9391.01.patch, HDFS-9391.02.patch, > HDFS-9391.03.patch, HDFS-9391.04.patch, Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813481#comment-15813481 ] Ming Ma commented on HDFS-9391: --- +1. Manoj, given the patch doesn't apply directly to branch-2, can you please provide another patch? Thanks. > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, > HDFS-9391.02.patch, HDFS-9391.03.patch, HDFS-9391.04.patch, Maintenance > webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812642#comment-15812642 ] Ming Ma commented on HDFS-9391: --- Thanks Manoj. I just found something related to our discussion. For any decommissioning node, given getDecommissionOnlyReplicas is the same as getOutOfServiceOnlyReplicas, can we just use getOutOfServiceOnlyReplicas value for JSON decommissionOnlyReplicas property? Same for any entering maintenance node. In other words, we might not need to add the extra decommissionOnlyReplicas and maintenanceOnlyReplicas to LeavingServiceStatus. > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, > HDFS-9391.02.patch, HDFS-9391.03.patch, Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804934#comment-15804934 ] Ming Ma commented on HDFS-9391: --- Thanks. Sounds good. > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, > HDFS-9391.02.patch, Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803589#comment-15803589 ] Ming Ma commented on HDFS-9391: --- Then for that specific case when {{DecommissionManager#Monitor#processBlocksInternal}} is processing the decommissioning node, NumberReplicas#decommissionedAndDecommissioning() > 0 and NumberReplicas#maintenanceReplicas() > 0 are satisfied. Thus both decommissionOnlyReplicas and maintenanceOnlyReplicas will be incremented. The same applies to the other two entering maintenance nodes. > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, > HDFS-9391.02.patch, Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803538#comment-15803538 ] Ming Ma edited comment on HDFS-9391 at 1/6/17 4:41 AM: --- A given replica is only in one admin state, normal, decommission or maintenance. But {{NumberReplicas}} represents the state of all replicas. Thus for the case "One replica is decommissioning and two replicas of the same block are entering maintenance", {{NumberReplicas#decommissionedAndDecommissioning == 1}}, {{NumberReplicas#maintenanceReplicas() == 2}}. No? was (Author: mingma): A given replica is only in one state, either decommission or maintenance. But {{NumberReplicas}} represents the state of all replicas. Thus for the case "One replica is decommissioning and two replicas of the same block are entering maintenance", {{NumberReplicas#decommissionedAndDecommissioning == 1}}, {{NumberReplicas#maintenanceReplicas() == 2}}. No? > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, > HDFS-9391.02.patch, Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803538#comment-15803538 ] Ming Ma commented on HDFS-9391: --- A given replica is only in one state, either decommission or maintenance. But {{NumberReplicas}} represents the state of all replicas. Thus for the case "One replica is decommissioning and two replicas of the same block are entering maintenance", {{NumberReplicas#decommissionedAndDecommissioning == 1}}, {{NumberReplicas#maintenanceReplicas() == 2}}. No? > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, > HDFS-9391.02.patch, Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15800575#comment-15800575 ] Ming Ma commented on HDFS-9391: --- Sure let us keep what you have in patch 02. Just to make sure, can you confirm the followings? * For the case of "one replica is decommissioning and two replicas of the same block are entering maintenance", the code will still increment maintenanceOnlyReplicas when processing the decommissioning node, because NumberReplicas includes all replicas stats. Thus decommissionOnlyReplicas == maintenanceOnlyReplicas == outOfServiceReplicas. * For the case of "all replicas are decommissioning", then EnteringMaintenance page will have nothing to show to begin with given no nodes are entering maintenance. > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, > HDFS-9391.02.patch, Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15799905#comment-15799905 ] Ming Ma commented on HDFS-9391: --- Good point. Actually it seems maintenanceOnlyReplicas is the same as outOfServiceOnlyReplicas in such case. For example, say one replica is decommissioning and two are entering maintenance, both maintenanceOnlyReplicas and outOfServiceOnlyReplicas are incremented. In other words, maintenanceOnlyReplicas isn't strictly "all 3 replicas are maintenance". Maybe this new definition is more desirable. What do you think? > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, > HDFS-9391.02.patch, Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15795843#comment-15795843 ] Ming Ma commented on HDFS-9391: --- Thanks Manoj. Yep let us keep the existing property as Eddy mentioned. * In {{getMaintenanceOnlyReplicas}} the check of {{if (!isDecommissionInProgress() && !isEnteringMaintenance())}} only needs to check for maintenance part. * It seems you will need to add {{In Maintenance dead}} to match the addition of {{nodes[i].state = "down-maintenance";}}. * For the {{EnteringMaintenanceNodes}} page, it uses {{maintenanceOnlyReplicas}} to describe {{Blocks with no live replicas}}. Should we use {{OutOfServiceOnlyReplicas}}? > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, > HDFS-9391.02.patch, Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756345#comment-15756345 ] Ming Ma commented on HDFS-9391: --- Thanks [~manojg]. Some minor questions: * {{.put("inMaintenance", node.isInMaintenance())}} might not be necessary given it also outputs {{.put("adminState", node.getAdminState().toString())}}. * Should {{liveDecommissioningReplicas}} be {{OnlyDecommissioningReplicas}} which is the old behavior before maintenance? There are two differences, one is "Only", another one is "live". > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, > Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756346#comment-15756346 ] Ming Ma commented on HDFS-9391: --- Thanks [~manojg]. Some minor questions: * {{.put("inMaintenance", node.isInMaintenance())}} might not be necessary given it also outputs {{.put("adminState", node.getAdminState().toString())}}. * Should {{liveDecommissioningReplicas}} be {{OnlyDecommissioningReplicas}} which is the old behavior before maintenance? There are two differences, one is "Only", another one is "live". > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, > Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDFS-9391) Update webUI/JMX to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-9391: -- Comment: was deleted (was: Thanks [~manojg]. Some minor questions: * {{.put("inMaintenance", node.isInMaintenance())}} might not be necessary given it also outputs {{.put("adminState", node.getAdminState().toString())}}. * Should {{liveDecommissioningReplicas}} be {{OnlyDecommissioningReplicas}} which is the old behavior before maintenance? There are two differences, one is "Only", another one is "live".) > Update webUI/JMX to display maintenance state info > -- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, > Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9391) Update webUI/JMX/fsck to display maintenance state info
[ https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-9391: -- Attachment: Maintenance webUI.png Thanks [~manojg]! bq. Shouldn't the DecomNodes include only replicas for DECOMMISSION_INPROGRESS nodes? Good point. The question what value "decommissionOnlyReplicas" property should be in the context of maintenance mode. A specific example is if a block has 3 replicas with one node entering maintenance and the other two being decommissioned, if it should be included in "decommissionOnlyReplicas". Given we normally use the property as a risk indicator, e.g. what if all decommissioning or entering maintenance nodes fail, it seems ok to include both. Sure there is backward compatibility semantics here; you can argue it is ok given the behavior is the same without maintenance. If we really want to separately account for all-3-replicas-being-decommissioned, we can keep the strict semantics and add a new property "outOfServiceOnlyReplicas" to account for both types. To enable that, we will need to track each type separately in LeavingServiceStatus. bq. should we also have FSNameSystem#getMaintenanceNodes? Yes something like NameNodeMXBean#getEnteringMaintenanceNodes will be useful. bq. w.r.t showing Maintenance nodes details ? getDeadNodes only returns decommissioned case. You can add ".put("adminState", node.getAdminState().toString())" to the JSON to cover maintenance. You can also add counters to FSNamesystemMBean such as getNumMaintenanceLiveDataNodes, similar to getNumDecomLiveDataNodes and getNumDecomDeadDataNodes. bq. "In Maintenance" Live/Dead nodes count also need to be shown along with Decommission nodes ? That is right. The attached screenshot could be useful. After someone clicks the "Entering Maintenance Nodes", it should redirect to another page about its progress, similar to the "Decommissioning Nodes". bq. Is there a plan to expose the concept of 'OutOfService'? Based on how we use them, decommissioned nodes are tracked separately from maintenance nodes, other than the first point you brought up. > Update webUI/JMX/fsck to display maintenance state info > --- > > Key: HDFS-9391 > URL: https://issues.apache.org/jira/browse/HDFS-9391 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Manoj Govindassamy > Attachments: Maintenance webUI.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10206) Datanodes not sorted properly by distance when the reader isn't a datanode
[ https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-10206: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.9.0 Status: Resolved (was: Patch Available) +1. Thanks [~nandakumar131] for the contribution. I have committed the patch to trunk and branch-2. > Datanodes not sorted properly by distance when the reader isn't a datanode > -- > > Key: HDFS-10206 > URL: https://issues.apache.org/jira/browse/HDFS-10206 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Nandakumar > Fix For: 2.9.0 > > Attachments: HDFS-10206-branch-2.8.003.patch, HDFS-10206.000.patch, > HDFS-10206.001.patch, HDFS-10206.002.patch, HDFS-10206.003.patch > > > If the DFSClient machine is not a datanode, but it shares its rack with some > datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} > might not put the local-rack datanodes at the beginning of the sorted list. > That is because the function didn't call {{networktopology.add(client);}} to > properly set the node's parent node; something required by > {{networktopology.sortByDistance}} to compute distance between two nodes in > the same topology tree. > Another issue with {{networktopology.sortByDistance}} is it only > distinguishes local rack from remote rack, but it doesn't support general > distance calculation to tell how remote the rack is. > {noformat} > NetworkTopology.java > protected int getWeight(Node reader, Node node) { > // 0 is local, 1 is same rack, 2 is off rack > // Start off by initializing to off rack > int weight = 2; > if (reader != null) { > if (reader.equals(node)) { > weight = 0; > } else if (isOnSameRack(reader, node)) { > weight = 1; > } > } > return weight; > } > {noformat} > HDFS-10203 has suggested moving the sorting from namenode to DFSClient to > address another issue. Regardless of where we do the sorting, we still need > fix the issues outline here. > Note that BlockPlacementPolicyDefault shares the same NetworkTopology object > used by DatanodeManager and requires Nodes stored in the topology to be > {{DatanodeDescriptor}} for block placement. So we need to make sure we don't > pollute the NetworkTopology if we plan to fix it on the server side. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10206) Datanodes not sorted properly by distance when the reader isn't a datanode
[ https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-10206: --- Summary: Datanodes not sorted properly by distance when the reader isn't a datanode (was: datanodes not sorted properly by distance if the reader isn't a datanode) > Datanodes not sorted properly by distance when the reader isn't a datanode > -- > > Key: HDFS-10206 > URL: https://issues.apache.org/jira/browse/HDFS-10206 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Nandakumar > Attachments: HDFS-10206-branch-2.8.003.patch, HDFS-10206.000.patch, > HDFS-10206.001.patch, HDFS-10206.002.patch, HDFS-10206.003.patch > > > If the DFSClient machine is not a datanode, but it shares its rack with some > datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} > might not put the local-rack datanodes at the beginning of the sorted list. > That is because the function didn't call {{networktopology.add(client);}} to > properly set the node's parent node; something required by > {{networktopology.sortByDistance}} to compute distance between two nodes in > the same topology tree. > Another issue with {{networktopology.sortByDistance}} is it only > distinguishes local rack from remote rack, but it doesn't support general > distance calculation to tell how remote the rack is. > {noformat} > NetworkTopology.java > protected int getWeight(Node reader, Node node) { > // 0 is local, 1 is same rack, 2 is off rack > // Start off by initializing to off rack > int weight = 2; > if (reader != null) { > if (reader.equals(node)) { > weight = 0; > } else if (isOnSameRack(reader, node)) { > weight = 1; > } > } > return weight; > } > {noformat} > HDFS-10203 has suggested moving the sorting from namenode to DFSClient to > address another issue. Regardless of where we do the sorting, we still need > fix the issues outline here. > Note that BlockPlacementPolicyDefault shares the same NetworkTopology object > used by DatanodeManager and requires Nodes stored in the topology to be > {{DatanodeDescriptor}} for block placement. So we need to make sure we don't > pollute the NetworkTopology if we plan to fix it on the server side. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10206) datanodes not sorted properly by distance if the reader isn't a datanode
[ https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-10206: --- Summary: datanodes not sorted properly by distance if the reader isn't a datanode (was: getBlockLocations might not sort datanodes properly by distance) > datanodes not sorted properly by distance if the reader isn't a datanode > > > Key: HDFS-10206 > URL: https://issues.apache.org/jira/browse/HDFS-10206 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Nandakumar > Attachments: HDFS-10206-branch-2.8.003.patch, HDFS-10206.000.patch, > HDFS-10206.001.patch, HDFS-10206.002.patch, HDFS-10206.003.patch > > > If the DFSClient machine is not a datanode, but it shares its rack with some > datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} > might not put the local-rack datanodes at the beginning of the sorted list. > That is because the function didn't call {{networktopology.add(client);}} to > properly set the node's parent node; something required by > {{networktopology.sortByDistance}} to compute distance between two nodes in > the same topology tree. > Another issue with {{networktopology.sortByDistance}} is it only > distinguishes local rack from remote rack, but it doesn't support general > distance calculation to tell how remote the rack is. > {noformat} > NetworkTopology.java > protected int getWeight(Node reader, Node node) { > // 0 is local, 1 is same rack, 2 is off rack > // Start off by initializing to off rack > int weight = 2; > if (reader != null) { > if (reader.equals(node)) { > weight = 0; > } else if (isOnSameRack(reader, node)) { > weight = 1; > } > } > return weight; > } > {noformat} > HDFS-10203 has suggested moving the sorting from namenode to DFSClient to > address another issue. Regardless of where we do the sorting, we still need > fix the issues outline here. > Note that BlockPlacementPolicyDefault shares the same NetworkTopology object > used by DatanodeManager and requires Nodes stored in the topology to be > {{DatanodeDescriptor}} for block placement. So we need to make sure we don't > pollute the NetworkTopology if we plan to fix it on the server side. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance
[ https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727676#comment-15727676 ] Ming Ma commented on HDFS-10206: Thanks [~nandakumar131]. The patch looks good. Given the patch doesn't apply directly for branch-2. Can you provide another patch for branch-2? You can use the naming convention for the branch-2 patch based on "Naming your patch" section in https://wiki.apache.org/hadoop/HowToContribute so that Jenkins can can run the precommit job. > getBlockLocations might not sort datanodes properly by distance > --- > > Key: HDFS-10206 > URL: https://issues.apache.org/jira/browse/HDFS-10206 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Nandakumar > Attachments: HDFS-10206.000.patch, HDFS-10206.001.patch, > HDFS-10206.002.patch, HDFS-10206.003.patch > > > If the DFSClient machine is not a datanode, but it shares its rack with some > datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} > might not put the local-rack datanodes at the beginning of the sorted list. > That is because the function didn't call {{networktopology.add(client);}} to > properly set the node's parent node; something required by > {{networktopology.sortByDistance}} to compute distance between two nodes in > the same topology tree. > Another issue with {{networktopology.sortByDistance}} is it only > distinguishes local rack from remote rack, but it doesn't support general > distance calculation to tell how remote the rack is. > {noformat} > NetworkTopology.java > protected int getWeight(Node reader, Node node) { > // 0 is local, 1 is same rack, 2 is off rack > // Start off by initializing to off rack > int weight = 2; > if (reader != null) { > if (reader.equals(node)) { > weight = 0; > } else if (isOnSameRack(reader, node)) { > weight = 1; > } > } > return weight; > } > {noformat} > HDFS-10203 has suggested moving the sorting from namenode to DFSClient to > address another issue. Regardless of where we do the sorting, we still need > fix the issues outline here. > Note that BlockPlacementPolicyDefault shares the same NetworkTopology object > used by DatanodeManager and requires Nodes stored in the topology to be > {{DatanodeDescriptor}} for block placement. So we need to make sure we don't > pollute the NetworkTopology if we plan to fix it on the server side. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance
[ https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-10206: --- Status: Patch Available (was: Open) > getBlockLocations might not sort datanodes properly by distance > --- > > Key: HDFS-10206 > URL: https://issues.apache.org/jira/browse/HDFS-10206 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Nandakumar > Attachments: HDFS-10206.000.patch, HDFS-10206.001.patch, > HDFS-10206.002.patch, HDFS-10206.003.patch > > > If the DFSClient machine is not a datanode, but it shares its rack with some > datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} > might not put the local-rack datanodes at the beginning of the sorted list. > That is because the function didn't call {{networktopology.add(client);}} to > properly set the node's parent node; something required by > {{networktopology.sortByDistance}} to compute distance between two nodes in > the same topology tree. > Another issue with {{networktopology.sortByDistance}} is it only > distinguishes local rack from remote rack, but it doesn't support general > distance calculation to tell how remote the rack is. > {noformat} > NetworkTopology.java > protected int getWeight(Node reader, Node node) { > // 0 is local, 1 is same rack, 2 is off rack > // Start off by initializing to off rack > int weight = 2; > if (reader != null) { > if (reader.equals(node)) { > weight = 0; > } else if (isOnSameRack(reader, node)) { > weight = 1; > } > } > return weight; > } > {noformat} > HDFS-10203 has suggested moving the sorting from namenode to DFSClient to > address another issue. Regardless of where we do the sorting, we still need > fix the issues outline here. > Note that BlockPlacementPolicyDefault shares the same NetworkTopology object > used by DatanodeManager and requires Nodes stored in the topology to be > {{DatanodeDescriptor}} for block placement. So we need to make sure we don't > pollute the NetworkTopology if we plan to fix it on the server side. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance
[ https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15718536#comment-15718536 ] Ming Ma commented on HDFS-10206: ok. Maybe it isn't precise way to refer it. The "network path" comes from NodeBase#getPath method. Anyway, the point is the new method should return 0 in case of two identical nodes. > getBlockLocations might not sort datanodes properly by distance > --- > > Key: HDFS-10206 > URL: https://issues.apache.org/jira/browse/HDFS-10206 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Nandakumar > Attachments: HDFS-10206.000.patch, HDFS-10206.001.patch, > HDFS-10206.002.patch > > > If the DFSClient machine is not a datanode, but it shares its rack with some > datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} > might not put the local-rack datanodes at the beginning of the sorted list. > That is because the function didn't call {{networktopology.add(client);}} to > properly set the node's parent node; something required by > {{networktopology.sortByDistance}} to compute distance between two nodes in > the same topology tree. > Another issue with {{networktopology.sortByDistance}} is it only > distinguishes local rack from remote rack, but it doesn't support general > distance calculation to tell how remote the rack is. > {noformat} > NetworkTopology.java > protected int getWeight(Node reader, Node node) { > // 0 is local, 1 is same rack, 2 is off rack > // Start off by initializing to off rack > int weight = 2; > if (reader != null) { > if (reader.equals(node)) { > weight = 0; > } else if (isOnSameRack(reader, node)) { > weight = 1; > } > } > return weight; > } > {noformat} > HDFS-10203 has suggested moving the sorting from namenode to DFSClient to > address another issue. Regardless of where we do the sorting, we still need > fix the issues outline here. > Note that BlockPlacementPolicyDefault shares the same NetworkTopology object > used by DatanodeManager and requires Nodes stored in the topology to be > {{DatanodeDescriptor}} for block placement. So we need to make sure we don't > pollute the NetworkTopology if we plan to fix it on the server side. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance
[ https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15717543#comment-15717543 ] Ming Ma commented on HDFS-10206: To clarify, "two nodes of the same network path" referred to two identical nodes, just like how getWeight could return 0 in such case. > getBlockLocations might not sort datanodes properly by distance > --- > > Key: HDFS-10206 > URL: https://issues.apache.org/jira/browse/HDFS-10206 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Nandakumar > Attachments: HDFS-10206.000.patch, HDFS-10206.001.patch, > HDFS-10206.002.patch > > > If the DFSClient machine is not a datanode, but it shares its rack with some > datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} > might not put the local-rack datanodes at the beginning of the sorted list. > That is because the function didn't call {{networktopology.add(client);}} to > properly set the node's parent node; something required by > {{networktopology.sortByDistance}} to compute distance between two nodes in > the same topology tree. > Another issue with {{networktopology.sortByDistance}} is it only > distinguishes local rack from remote rack, but it doesn't support general > distance calculation to tell how remote the rack is. > {noformat} > NetworkTopology.java > protected int getWeight(Node reader, Node node) { > // 0 is local, 1 is same rack, 2 is off rack > // Start off by initializing to off rack > int weight = 2; > if (reader != null) { > if (reader.equals(node)) { > weight = 0; > } else if (isOnSameRack(reader, node)) { > weight = 1; > } > } > return weight; > } > {noformat} > HDFS-10203 has suggested moving the sorting from namenode to DFSClient to > address another issue. Regardless of where we do the sorting, we still need > fix the issues outline here. > Note that BlockPlacementPolicyDefault shares the same NetworkTopology object > used by DatanodeManager and requires Nodes stored in the topology to be > {{DatanodeDescriptor}} for block placement. So we need to make sure we don't > pollute the NetworkTopology if we plan to fix it on the server side. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance
[ https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15717370#comment-15717370 ] Ming Ma commented on HDFS-10206: Thanks [~nandakumar131]! The patches look good overall. To make the method more general, seems better to have getWeightUsingNetworkLocation return 0 when two nodes have the same network path. [~daryn] [~kihwal], any concerns about the added 0.1ms latency? Note this only happens for non-datanode reader scenario and it doesn't hold FSNamesystem lock. > getBlockLocations might not sort datanodes properly by distance > --- > > Key: HDFS-10206 > URL: https://issues.apache.org/jira/browse/HDFS-10206 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Nandakumar > Attachments: HDFS-10206.000.patch, HDFS-10206.001.patch, > HDFS-10206.002.patch > > > If the DFSClient machine is not a datanode, but it shares its rack with some > datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} > might not put the local-rack datanodes at the beginning of the sorted list. > That is because the function didn't call {{networktopology.add(client);}} to > properly set the node's parent node; something required by > {{networktopology.sortByDistance}} to compute distance between two nodes in > the same topology tree. > Another issue with {{networktopology.sortByDistance}} is it only > distinguishes local rack from remote rack, but it doesn't support general > distance calculation to tell how remote the rack is. > {noformat} > NetworkTopology.java > protected int getWeight(Node reader, Node node) { > // 0 is local, 1 is same rack, 2 is off rack > // Start off by initializing to off rack > int weight = 2; > if (reader != null) { > if (reader.equals(node)) { > weight = 0; > } else if (isOnSameRack(reader, node)) { > weight = 1; > } > } > return weight; > } > {noformat} > HDFS-10203 has suggested moving the sorting from namenode to DFSClient to > address another issue. Regardless of where we do the sorting, we still need > fix the issues outline here. > Note that BlockPlacementPolicyDefault shares the same NetworkTopology object > used by DatanodeManager and requires Nodes stored in the topology to be > {{DatanodeDescriptor}} for block placement. So we need to make sure we don't > pollute the NetworkTopology if we plan to fix it on the server side. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11096) Support rolling upgrade between 2.x and 3.x
[ https://issues.apache.org/jira/browse/HDFS-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15707537#comment-15707537 ] Ming Ma commented on HDFS-11096: Thanks [~andrew.wang] [~ka...@cloudera.com] for bringing up this important topic. bq. this means we can't remove in 3.0 unless it was deprecated in 2.2. To have comprehensive coverage, can we use static analysis tool to do cross reference between branch 2 to trunk's source code, or JACC can handle that? bq. This has been a problem even within just the 2.x line, so there's a real need for better cross-version integration testing Indeed. Such Investment will pay off in the long term, e.g. whatever cross-version integration test system we come up can be used not only for 2.x -> 3.x, but 2.x -> 2.y as well. * Regarding how to automate binary and wired compatibility verification, we can do something within the hadoop project first without integration with upper layers. In addition, maybe there is a way to test it on a dev machine, for example package 2.x jars into a container to run client or datanode, then test it with 3.x containers. Or maybe some sort of setup using Jenkins + Docker containers. This could have caught some of 2.x incompatibility issues. * For prioritization, we can also evaluate the impact or there is any work around. For example, we don't have to verify 2.7 -> 3.0 if we know 2.7 -> 2.8 and 2.8 -> 3.0 work. * Rolling upgrade is important for high SLA data ingestion or maybe hbase scenario. Without that, we have to fail over the entire cluster during upgrade. > Support rolling upgrade between 2.x and 3.x > --- > > Key: HDFS-11096 > URL: https://issues.apache.org/jira/browse/HDFS-11096 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rolling upgrades >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Priority: Blocker > > trunk has a minimum software version of 3.0.0-alpha1. This means we can't > rolling upgrade between branch-2 and trunk. > This is a showstopper for large deployments. Unless there are very compelling > reasons to break compatibility, let's restore the ability to rolling upgrade > to 3.x releases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance
[ https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15707535#comment-15707535 ] Ming Ma commented on HDFS-10206: bq. Can you point out the variables which are to be made more generic? nonDataNodeReader. However, it turns out NetworkTopology has several existing references of "datanode". So It is good to have and up to you if you want to fix it. bq. With 000.patch the weight is calculated using network location for off rack datanodes which impacts the micro-benchmark results. Got it. Thanks for the clarification. So 001.patch shouldn't has difference. Do you mind confirming? bq. Weight calculation after this patch Can you confirm with 0002.patch the weights? It seems to return 0, 2, 4. The old behavior is 0, 1, 2. > getBlockLocations might not sort datanodes properly by distance > --- > > Key: HDFS-10206 > URL: https://issues.apache.org/jira/browse/HDFS-10206 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Nandakumar > Attachments: HDFS-10206.000.patch, HDFS-10206.001.patch, > HDFS-10206.002.patch > > > If the DFSClient machine is not a datanode, but it shares its rack with some > datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} > might not put the local-rack datanodes at the beginning of the sorted list. > That is because the function didn't call {{networktopology.add(client);}} to > properly set the node's parent node; something required by > {{networktopology.sortByDistance}} to compute distance between two nodes in > the same topology tree. > Another issue with {{networktopology.sortByDistance}} is it only > distinguishes local rack from remote rack, but it doesn't support general > distance calculation to tell how remote the rack is. > {noformat} > NetworkTopology.java > protected int getWeight(Node reader, Node node) { > // 0 is local, 1 is same rack, 2 is off rack > // Start off by initializing to off rack > int weight = 2; > if (reader != null) { > if (reader.equals(node)) { > weight = 0; > } else if (isOnSameRack(reader, node)) { > weight = 1; > } > } > return weight; > } > {noformat} > HDFS-10203 has suggested moving the sorting from namenode to DFSClient to > address another issue. Regardless of where we do the sorting, we still need > fix the issues outline here. > Note that BlockPlacementPolicyDefault shares the same NetworkTopology object > used by DatanodeManager and requires Nodes stored in the topology to be > {{DatanodeDescriptor}} for block placement. So we need to make sure we don't > pollute the NetworkTopology if we plan to fix it on the server side. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance
[ https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15702984#comment-15702984 ] Ming Ma commented on HDFS-10206: * NetworkTopology can be used by HDFS, YARN and MAPREDUCE. It is better to make variable names more general. bq. Out of three replica, one will be in off rack datanode which is causing the difference But the reader should pick the closest one, either "Same Node" and "DataNode in same rack". Perhaps you can clarify the setup. bq. Weight calculation after this patch So the weight value definition has changed. It should be fine given it isn't a public interface. Still NetworkTopologyWithNodeGroup has its own getWeight definition based on the old definition. Either we update that or keep the weight value. > getBlockLocations might not sort datanodes properly by distance > --- > > Key: HDFS-10206 > URL: https://issues.apache.org/jira/browse/HDFS-10206 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Nandakumar > Attachments: HDFS-10206.000.patch, HDFS-10206.001.patch, > HDFS-10206.002.patch > > > If the DFSClient machine is not a datanode, but it shares its rack with some > datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} > might not put the local-rack datanodes at the beginning of the sorted list. > That is because the function didn't call {{networktopology.add(client);}} to > properly set the node's parent node; something required by > {{networktopology.sortByDistance}} to compute distance between two nodes in > the same topology tree. > Another issue with {{networktopology.sortByDistance}} is it only > distinguishes local rack from remote rack, but it doesn't support general > distance calculation to tell how remote the rack is. > {noformat} > NetworkTopology.java > protected int getWeight(Node reader, Node node) { > // 0 is local, 1 is same rack, 2 is off rack > // Start off by initializing to off rack > int weight = 2; > if (reader != null) { > if (reader.equals(node)) { > weight = 0; > } else if (isOnSameRack(reader, node)) { > weight = 1; > } > } > return weight; > } > {noformat} > HDFS-10203 has suggested moving the sorting from namenode to DFSClient to > address another issue. Regardless of where we do the sorting, we still need > fix the issues outline here. > Note that BlockPlacementPolicyDefault shares the same NetworkTopology object > used by DatanodeManager and requires Nodes stored in the topology to be > {{DatanodeDescriptor}} for block placement. So we need to make sure we don't > pollute the NetworkTopology if we plan to fix it on the server side. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance
[ https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692234#comment-15692234 ] Ming Ma commented on HDFS-10206: Thanks [~nanda619] for the micro benchmark and the new patch. * Any idea why 000.patch makes difference for the "Same Node" and "DataNode in same rack"? * In the context of the overall data transfer duration, the overhead of 0.1ms looks acceptable, especially given DatanodeManager#sortLocatedBlocks doesn't take FSNamesystem's lock. * It seems getWeightUsingNetworkLocation and normalizeNetworkLocationPath can be static. * getWeight function calls getDistance, which returns the distance between two nodes, not the weight defined as the distance between nodes and ancestors. Maybe we can define a new function like getDistanceToClosestCommonAncestor, which can also take care of the isOnSameRack case as well. * About ReadWriteLock.readLock, it might be ok given under normal workload there won't be much write to NetworkTopology. > getBlockLocations might not sort datanodes properly by distance > --- > > Key: HDFS-10206 > URL: https://issues.apache.org/jira/browse/HDFS-10206 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Nandakumar > Attachments: HDFS-10206.000.patch, HDFS-10206.001.patch > > > If the DFSClient machine is not a datanode, but it shares its rack with some > datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} > might not put the local-rack datanodes at the beginning of the sorted list. > That is because the function didn't call {{networktopology.add(client);}} to > properly set the node's parent node; something required by > {{networktopology.sortByDistance}} to compute distance between two nodes in > the same topology tree. > Another issue with {{networktopology.sortByDistance}} is it only > distinguishes local rack from remote rack, but it doesn't support general > distance calculation to tell how remote the rack is. > {noformat} > NetworkTopology.java > protected int getWeight(Node reader, Node node) { > // 0 is local, 1 is same rack, 2 is off rack > // Start off by initializing to off rack > int weight = 2; > if (reader != null) { > if (reader.equals(node)) { > weight = 0; > } else if (isOnSameRack(reader, node)) { > weight = 1; > } > } > return weight; > } > {noformat} > HDFS-10203 has suggested moving the sorting from namenode to DFSClient to > address another issue. Regardless of where we do the sorting, we still need > fix the issues outline here. > Note that BlockPlacementPolicyDefault shares the same NetworkTopology object > used by DatanodeManager and requires Nodes stored in the topology to be > {{DatanodeDescriptor}} for block placement. So we need to make sure we don't > pollute the NetworkTopology if we plan to fix it on the server side. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance
[ https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15679692#comment-15679692 ] Ming Ma commented on HDFS-10206: bq. Any comments on using NetworkTopology.contains(node) to check and use NetworkTopology.getDistance(node1, node2) to get the distance in case if the reader is an off rack datanode? Here is another option. DatanodeManager#sortLocatedBlock already knows if its a datanode. So we can have a new NetworkTopology#sortByDistance that supports check-by-reference. > getBlockLocations might not sort datanodes properly by distance > --- > > Key: HDFS-10206 > URL: https://issues.apache.org/jira/browse/HDFS-10206 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Nandakumar > Attachments: HDFS-10206.000.patch > > > If the DFSClient machine is not a datanode, but it shares its rack with some > datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} > might not put the local-rack datanodes at the beginning of the sorted list. > That is because the function didn't call {{networktopology.add(client);}} to > properly set the node's parent node; something required by > {{networktopology.sortByDistance}} to compute distance between two nodes in > the same topology tree. > Another issue with {{networktopology.sortByDistance}} is it only > distinguishes local rack from remote rack, but it doesn't support general > distance calculation to tell how remote the rack is. > {noformat} > NetworkTopology.java > protected int getWeight(Node reader, Node node) { > // 0 is local, 1 is same rack, 2 is off rack > // Start off by initializing to off rack > int weight = 2; > if (reader != null) { > if (reader.equals(node)) { > weight = 0; > } else if (isOnSameRack(reader, node)) { > weight = 1; > } > } > return weight; > } > {noformat} > HDFS-10203 has suggested moving the sorting from namenode to DFSClient to > address another issue. Regardless of where we do the sorting, we still need > fix the issues outline here. > Note that BlockPlacementPolicyDefault shares the same NetworkTopology object > used by DatanodeManager and requires Nodes stored in the topology to be > {{DatanodeDescriptor}} for block placement. So we need to make sure we don't > pollute the NetworkTopology if we plan to fix it on the server side. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance
[ https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674317#comment-15674317 ] Ming Ma commented on HDFS-10206: bq. that is why getDistanceUsingNetworkLocation is called only when the conditions reader.equals(node) and isOnSameRack(reader, node) are not satisfied. There are two scenarios this new function will be called. one is the reader being a datanode in a remote rack in a large cluster; for that NetworkTopology already has the reader in its tree, it will be faster to compare parents reference. Another one is the reader being a non-datanode, the new function will be useful here. Do you have any micro benchmark? bq. With this patch it will be 0 for local, 1 for same rack and after that the value is incremented by 1 for each level. >From the below code, it seems each level will increase by 2. {noformat} weight = (path1Token.length - currentLevel) + (path2Token.length - currentLevel); {noformat} > getBlockLocations might not sort datanodes properly by distance > --- > > Key: HDFS-10206 > URL: https://issues.apache.org/jira/browse/HDFS-10206 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Nandakumar > Attachments: HDFS-10206.000.patch > > > If the DFSClient machine is not a datanode, but it shares its rack with some > datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} > might not put the local-rack datanodes at the beginning of the sorted list. > That is because the function didn't call {{networktopology.add(client);}} to > properly set the node's parent node; something required by > {{networktopology.sortByDistance}} to compute distance between two nodes in > the same topology tree. > Another issue with {{networktopology.sortByDistance}} is it only > distinguishes local rack from remote rack, but it doesn't support general > distance calculation to tell how remote the rack is. > {noformat} > NetworkTopology.java > protected int getWeight(Node reader, Node node) { > // 0 is local, 1 is same rack, 2 is off rack > // Start off by initializing to off rack > int weight = 2; > if (reader != null) { > if (reader.equals(node)) { > weight = 0; > } else if (isOnSameRack(reader, node)) { > weight = 1; > } > } > return weight; > } > {noformat} > HDFS-10203 has suggested moving the sorting from namenode to DFSClient to > address another issue. Regardless of where we do the sorting, we still need > fix the issues outline here. > Note that BlockPlacementPolicyDefault shares the same NetworkTopology object > used by DatanodeManager and requires Nodes stored in the topology to be > {{DatanodeDescriptor}} for block placement. So we need to make sure we don't > pollute the NetworkTopology if we plan to fix it on the server side. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance
[ https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15673058#comment-15673058 ] Ming Ma commented on HDFS-10206: Thank [~nandakumar131]! * When the conditions {{reader.equals(node) & isOnSameRack(reader, node) }} aren't satisfied, this patch will cause extra string parsing. Wonder if there is any major performance impact. If that isn't an issue, can getDistanceUsingNetworkLocation handle all scenarios including {{reader.equals(node) & isOnSameRack(reader, node) }}? * It probably doesn't matter much. {{getWeight}} used to return 0, 1, 2, 3, etc. as network layer increases. With the patch it changes to 0, 1, 2, 4, etc.. > getBlockLocations might not sort datanodes properly by distance > --- > > Key: HDFS-10206 > URL: https://issues.apache.org/jira/browse/HDFS-10206 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Nandakumar > Attachments: HDFS-10206.000.patch > > > If the DFSClient machine is not a datanode, but it shares its rack with some > datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} > might not put the local-rack datanodes at the beginning of the sorted list. > That is because the function didn't call {{networktopology.add(client);}} to > properly set the node's parent node; something required by > {{networktopology.sortByDistance}} to compute distance between two nodes in > the same topology tree. > Another issue with {{networktopology.sortByDistance}} is it only > distinguishes local rack from remote rack, but it doesn't support general > distance calculation to tell how remote the rack is. > {noformat} > NetworkTopology.java > protected int getWeight(Node reader, Node node) { > // 0 is local, 1 is same rack, 2 is off rack > // Start off by initializing to off rack > int weight = 2; > if (reader != null) { > if (reader.equals(node)) { > weight = 0; > } else if (isOnSameRack(reader, node)) { > weight = 1; > } > } > return weight; > } > {noformat} > HDFS-10203 has suggested moving the sorting from namenode to DFSClient to > address another issue. Regardless of where we do the sorting, we still need > fix the issues outline here. > Note that BlockPlacementPolicyDefault shares the same NetworkTopology object > used by DatanodeManager and requires Nodes stored in the topology to be > {{DatanodeDescriptor}} for block placement. So we need to make sure we don't > pollute the NetworkTopology if we plan to fix it on the server side. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby
[ https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637336#comment-15637336 ] Ming Ma commented on HDFS-10702: Thanks [~zhz] for the ping. Thanks [~clouderajiayi] [~mackrorysd] for the great work. Yes it might be useful to leverage inotify, or at least evaluating it. In this SbNN polling approach, I am interested in knowing more how the applications plan to use it, specifically when they will decide to call getSyncInfo. In multi tenant environment, an application might care about specific files/directories, not necessarily the namespace has changed at a global level. Here are some comments specific to the patch. * Standby namenode has its own checkpoint lock to reduce checkpoint's impact on block report. Thus there could be some assumption that checkpointer is the only reader of namespace in standby. You might want to confirm if there is any implication. * In the case of multiple standbys, one is the checkpointer, thus you can consider allowing client to connect to standbys not doing checkpoint. * if the server config is "dfs.ha.allow.stale.reads" is set to false, and client side enables stale read, it seems the client will still keep trying. Wonder if client side should consider the server side config as well. * Federation configuration support might need some more work. It could depend on how you want to enable it on client side. Current patch is based on run time config on per client instance. You can also allow define client side config like "dfs.client..ha.allow.stale.reads". * After NN failover, does StaleReadProxyProvider#standbyProxies get refreshed? If not, a long running client could keep using the old standby. * RPC layer is more general that HDFS. So it will be better if allowStandbyRead can be refactored out. > Add a Client API and Proxy Provider to enable stale read from Standby > - > > Key: HDFS-10702 > URL: https://issues.apache.org/jira/browse/HDFS-10702 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jiayi Zhou >Assignee: Jiayi Zhou >Priority: Minor > Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch, > HDFS-10702.003.patch, HDFS-10702.004.patch, HDFS-10702.005.patch, > HDFS-10702.006.patch, StaleReadfromStandbyNN.pdf > > > Currently, clients must always talk to the active NameNode when performing > any metadata operation, which means active NameNode could be a bottleneck for > scalability. One way to solve this problem is to send read-only operations to > Standby NameNode. The disadvantage is that it might be a stale read. > Here, I'm thinking of adding a Client API to enable/disable stale read from > Standby which gives Client the power to set the staleness restriction. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9390) Block management for maintenance states
[ https://issues.apache.org/jira/browse/HDFS-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-9390: -- Resolution: Fixed Fix Version/s: 3.0.0-alpha2 2.9.0 Status: Resolved (was: Patch Available) Thanks [~eddyxu] again. I have committed it to trunk and branch-2. > Block management for maintenance states > --- > > Key: HDFS-9390 > URL: https://issues.apache.org/jira/browse/HDFS-9390 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: HDFS-9390-2.patch, HDFS-9390-3.patch, HDFS-9390-4.patch, > HDFS-9390-5.patch, HDFS-9390-branch-2.002.patch, HDFS-9390-branch-2.patch, > HDFS-9390.patch > > > When a node is transitioned to/stay in/transitioned out of maintenance state, > we need to make sure blocks w.r.t. that nodes are properly handled. > * When nodes are put into maintenance, it will first go to > ENTERING_MAINTENANCE, and make sure blocks are minimally replicated before > the nodes are transitioned to IN_MAINTENANCE. > * Do not replica blocks when nodes are in maintenance states. Maintenance > replica will remain in BlockMaps and thus is still considered valid from > block replication point of view. In other words, putting a node to > “maintenance” mode won’t trigger BlockManager to replicate its blocks. > * Do not invalidate replicas on node under maintenance. After any file's > replication factor is reduced, NN needs to invalidate some replicas. It > should exclude nodes under maintenance in the handling. > * Do not put IN_MAINTENANCE replicas in LocatedBlock for read operation. > * Do not allocate any new block on nodes under maintenance. > * Have Balancer exclude nodes under maintenance. > * Exclude nodes under maintenance for DN cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9390) Block management for maintenance states
[ https://issues.apache.org/jira/browse/HDFS-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-9390: -- Attachment: HDFS-9390-branch-2.002.patch Reload with the proper patch name for Jenkins to run. > Block management for maintenance states > --- > > Key: HDFS-9390 > URL: https://issues.apache.org/jira/browse/HDFS-9390 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-9390-2.patch, HDFS-9390-3.patch, HDFS-9390-4.patch, > HDFS-9390-5.patch, HDFS-9390-branch-2.002.patch, HDFS-9390-branch-2.patch, > HDFS-9390.patch > > > When a node is transitioned to/stay in/transitioned out of maintenance state, > we need to make sure blocks w.r.t. that nodes are properly handled. > * When nodes are put into maintenance, it will first go to > ENTERING_MAINTENANCE, and make sure blocks are minimally replicated before > the nodes are transitioned to IN_MAINTENANCE. > * Do not replica blocks when nodes are in maintenance states. Maintenance > replica will remain in BlockMaps and thus is still considered valid from > block replication point of view. In other words, putting a node to > “maintenance” mode won’t trigger BlockManager to replicate its blocks. > * Do not invalidate replicas on node under maintenance. After any file's > replication factor is reduced, NN needs to invalidate some replicas. It > should exclude nodes under maintenance in the handling. > * Do not put IN_MAINTENANCE replicas in LocatedBlock for read operation. > * Do not allocate any new block on nodes under maintenance. > * Have Balancer exclude nodes under maintenance. > * Exclude nodes under maintenance for DN cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9390) Block management for maintenance states
[ https://issues.apache.org/jira/browse/HDFS-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-9390: -- Attachment: (was: HDFS-9390-2-branch-2.patch) > Block management for maintenance states > --- > > Key: HDFS-9390 > URL: https://issues.apache.org/jira/browse/HDFS-9390 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-9390-2.patch, HDFS-9390-3.patch, HDFS-9390-4.patch, > HDFS-9390-5.patch, HDFS-9390-branch-2.patch, HDFS-9390.patch > > > When a node is transitioned to/stay in/transitioned out of maintenance state, > we need to make sure blocks w.r.t. that nodes are properly handled. > * When nodes are put into maintenance, it will first go to > ENTERING_MAINTENANCE, and make sure blocks are minimally replicated before > the nodes are transitioned to IN_MAINTENANCE. > * Do not replica blocks when nodes are in maintenance states. Maintenance > replica will remain in BlockMaps and thus is still considered valid from > block replication point of view. In other words, putting a node to > “maintenance” mode won’t trigger BlockManager to replicate its blocks. > * Do not invalidate replicas on node under maintenance. After any file's > replication factor is reduced, NN needs to invalidate some replicas. It > should exclude nodes under maintenance in the handling. > * Do not put IN_MAINTENANCE replicas in LocatedBlock for read operation. > * Do not allocate any new block on nodes under maintenance. > * Have Balancer exclude nodes under maintenance. > * Exclude nodes under maintenance for DN cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org