[jira] [Resolved] (HDFS-7877) Support maintenance state for datanodes

2017-09-20 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma resolved HDFS-7877.
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.1.0
   3.0.0-beta1
   2.9.0

All sub tasks have been resolved. Thanks [~ctrezzo] [~eddyxu] [~manojg] [~elek] 
[~linyiqun] and others for the contribution and discussion.

> Support maintenance state for datanodes
> ---
>
> Key: HDFS-7877
> URL: https://issues.apache.org/jira/browse/HDFS-7877
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.9.0, 3.0.0-beta1, 3.1.0
>
> Attachments: HDFS-7877-2.patch, HDFS-7877.patch, 
> Supportmaintenancestatefordatanodes-2.pdf, 
> Supportmaintenancestatefordatanodes.pdf
>
>
> This requirement came up during the design for HDFS-7541. Given this feature 
> is mostly independent of upgrade domain feature, it is better to track it 
> under a separate jira. The design and draft patch will be available soon.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11035) Better documentation for maintenace mode and upgrade domain

2017-09-20 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-11035:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.1.0
   3.0.0-beta1
   2.9.0
   Status: Resolved  (was: Patch Available)

Thanks [~ctrezzo]. I have committed the patch to trunk, branch-3.0 and branch-2.

> Better documentation for maintenace mode and upgrade domain
> ---
>
> Key: HDFS-11035
> URL: https://issues.apache.org/jira/browse/HDFS-11035
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, documentation
>Affects Versions: 2.9.0
>Reporter: Wei-Chiu Chuang
>Assignee: Ming Ma
> Fix For: 2.9.0, 3.0.0-beta1, 3.1.0
>
> Attachments: HDFS-11035-2.patch, HDFS-11035.patch
>
>
> HDFS-7541 added upgrade domain and HDFS-7877 added maintenance mode. Existing 
> documentation about these two features are scarce and the implementation have 
> evolved from the original design doc. Looking at code and Javadoc and I still 
> don't quite get how I can get datanodes into maintenance mode/ set up a 
> upgrade domain.
> File this jira to propose that we write an up-to-date description of these 
> two features.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12473) Change hosts JSON file format

2017-09-20 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-12473:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.1.0
   2.8.3
   2.8.2
   3.0.0-beta1
   2.9.0
   Status: Resolved  (was: Patch Available)

Thanks [~manojg] and [~zhz]. I have committed patch to trunk, branch-3.0, 
branch-2, branch-2.8 and branch 2.8.2. Besides the branch-2 diff mentioned 
above, the patch for branch-2.8/branch.2.8.2 is slightly different in the unit 
tests as maintenance state only exists in branch-2 and above.

> Change hosts JSON file format
> -
>
> Key: HDFS-12473
> URL: https://issues.apache.org/jira/browse/HDFS-12473
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.9.0, 3.0.0-beta1, 2.8.2, 2.8.3, 3.1.0
>
> Attachments: HDFS-12473-2.patch, HDFS-12473-3.patch, 
> HDFS-12473-4.patch, HDFS-12473-5.patch, HDFS-12473-6.patch, 
> HDFS-12473-branch-2.patch, HDFS-12473.patch
>
>
> The existing host JSON file format doesn't have a top-level token.
> {noformat}
>   {"hostName": "host1"}
>   {"hostName": "host2", "upgradeDomain": "ud0"}
>   {"hostName": "host3", "adminState": "DECOMMISSIONED"}
>   {"hostName": "host4", "upgradeDomain": "ud2", "adminState": 
> "DECOMMISSIONED"}
>   {"hostName": "host5", "port": 8090}
>   {"hostName": "host6", "adminState": "IN_MAINTENANCE"}
>   {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
> "maintenanceExpireTimeInMS": "112233"}
> {noformat}
> Instead, to conform with the JSON standard it should be like
> {noformat}
> [
>   {"hostName": "host1"},
>   {"hostName": "host2", "upgradeDomain": "ud0"},
>   {"hostName": "host3", "adminState": "DECOMMISSIONED"},
>   {"hostName": "host4", "upgradeDomain": "ud2", "adminState": 
> "DECOMMISSIONED"},
>   {"hostName": "host5", "port": 8090},
>   {"hostName": "host6", "adminState": "IN_MAINTENANCE"},
>   {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
> "maintenanceExpireTimeInMS": "112233"}
> ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12473) Change hosts JSON file format

2017-09-19 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-12473:
---
Attachment: HDFS-12473-branch-2.patch

Compared to the patch for trunk, branch-2's version is slightly different 
because it depends on different version of jackson. Specifically trunk can use 
{{com.fasterxml.jackson.databind.ObjectMapper}} while branch-2 should use 
{{org.codehaus.jackson.map.ObjectMapper}}. {{CombinedHostsFileReader.java}} 
also needs to be modified slightly due to different exception in case of empty 
file.

> Change hosts JSON file format
> -
>
> Key: HDFS-12473
> URL: https://issues.apache.org/jira/browse/HDFS-12473
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-12473-2.patch, HDFS-12473-3.patch, 
> HDFS-12473-4.patch, HDFS-12473-5.patch, HDFS-12473-6.patch, 
> HDFS-12473-branch-2.patch, HDFS-12473.patch
>
>
> The existing host JSON file format doesn't have a top-level token.
> {noformat}
>   {"hostName": "host1"}
>   {"hostName": "host2", "upgradeDomain": "ud0"}
>   {"hostName": "host3", "adminState": "DECOMMISSIONED"}
>   {"hostName": "host4", "upgradeDomain": "ud2", "adminState": 
> "DECOMMISSIONED"}
>   {"hostName": "host5", "port": 8090}
>   {"hostName": "host6", "adminState": "IN_MAINTENANCE"}
>   {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
> "maintenanceExpireTimeInMS": "112233"}
> {noformat}
> Instead, to conform with the JSON standard it should be like
> {noformat}
> [
>   {"hostName": "host1"},
>   {"hostName": "host2", "upgradeDomain": "ud0"},
>   {"hostName": "host3", "adminState": "DECOMMISSIONED"},
>   {"hostName": "host4", "upgradeDomain": "ud2", "adminState": 
> "DECOMMISSIONED"},
>   {"hostName": "host5", "port": 8090},
>   {"hostName": "host6", "adminState": "IN_MAINTENANCE"},
>   {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
> "maintenanceExpireTimeInMS": "112233"}
> ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12473) Change hosts JSON file format

2017-09-19 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-12473:
---
Attachment: HDFS-12473-6.patch

Thanks [~manojg]. Here is the new patch to address your comments, except for 
the exception handling where I prefer the code swallow as fewer exception as 
possible.

> Change hosts JSON file format
> -
>
> Key: HDFS-12473
> URL: https://issues.apache.org/jira/browse/HDFS-12473
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-12473-2.patch, HDFS-12473-3.patch, 
> HDFS-12473-4.patch, HDFS-12473-5.patch, HDFS-12473-6.patch, HDFS-12473.patch
>
>
> The existing host JSON file format doesn't have a top-level token.
> {noformat}
>   {"hostName": "host1"}
>   {"hostName": "host2", "upgradeDomain": "ud0"}
>   {"hostName": "host3", "adminState": "DECOMMISSIONED"}
>   {"hostName": "host4", "upgradeDomain": "ud2", "adminState": 
> "DECOMMISSIONED"}
>   {"hostName": "host5", "port": 8090}
>   {"hostName": "host6", "adminState": "IN_MAINTENANCE"}
>   {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
> "maintenanceExpireTimeInMS": "112233"}
> {noformat}
> Instead, to conform with the JSON standard it should be like
> {noformat}
> [
>   {"hostName": "host1"},
>   {"hostName": "host2", "upgradeDomain": "ud0"},
>   {"hostName": "host3", "adminState": "DECOMMISSIONED"},
>   {"hostName": "host4", "upgradeDomain": "ud2", "adminState": 
> "DECOMMISSIONED"},
>   {"hostName": "host5", "port": 8090},
>   {"hostName": "host6", "adminState": "IN_MAINTENANCE"},
>   {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
> "maintenanceExpireTimeInMS": "112233"}
> ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11035) Better documentation for maintenace mode and upgrade domain

2017-09-18 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170944#comment-16170944
 ] 

Ming Ma commented on HDFS-11035:


Thanks [~ctrezzo]. I will commit it to trunk, branch-3.0 and branch-2 by EOD 
tomorrow in case [~jojochuang] [~manojg] [~eddyxu] have any additional comments.

> Better documentation for maintenace mode and upgrade domain
> ---
>
> Key: HDFS-11035
> URL: https://issues.apache.org/jira/browse/HDFS-11035
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, documentation
>Affects Versions: 2.9.0
>Reporter: Wei-Chiu Chuang
>Assignee: Ming Ma
> Attachments: HDFS-11035-2.patch, HDFS-11035.patch
>
>
> HDFS-7541 added upgrade domain and HDFS-7877 added maintenance mode. Existing 
> documentation about these two features are scarce and the implementation have 
> evolved from the original design doc. Looking at code and Javadoc and I still 
> don't quite get how I can get datanodes into maintenance mode/ set up a 
> upgrade domain.
> File this jira to propose that we write an up-to-date description of these 
> two features.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11035) Better documentation for maintenace mode and upgrade domain

2017-09-18 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-11035:
---
Attachment: HDFS-11035-2.patch

The new patch has added the new docs to site.xml and fixed couple nits. Thanks 
[~ctrezzo] for the review.

> Better documentation for maintenace mode and upgrade domain
> ---
>
> Key: HDFS-11035
> URL: https://issues.apache.org/jira/browse/HDFS-11035
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, documentation
>Affects Versions: 2.9.0
>Reporter: Wei-Chiu Chuang
>Assignee: Ming Ma
> Attachments: HDFS-11035-2.patch, HDFS-11035.patch
>
>
> HDFS-7541 added upgrade domain and HDFS-7877 added maintenance mode. Existing 
> documentation about these two features are scarce and the implementation have 
> evolved from the original design doc. Looking at code and Javadoc and I still 
> don't quite get how I can get datanodes into maintenance mode/ set up a 
> upgrade domain.
> File this jira to propose that we write an up-to-date description of these 
> two features.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12473) Change hosts JSON file format

2017-09-18 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170746#comment-16170746
 ] 

Ming Ma commented on HDFS-12473:


ah got it. So the assumption of "backward compatibility isn't an issue as long 
as the feature hasn't been officially released" isn't true all the time. While 
it is generally better to keep the code clean without unnecessary handling, for 
this specific issue it seems ok to include backward compatibility for 
unreleased feature given it doesn't complicate the code much. Can you check if 
4.patch is ready for commit?

> Change hosts JSON file format
> -
>
> Key: HDFS-12473
> URL: https://issues.apache.org/jira/browse/HDFS-12473
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-12473-2.patch, HDFS-12473-3.patch, 
> HDFS-12473-4.patch, HDFS-12473-5.patch, HDFS-12473.patch
>
>
> The existing host JSON file format doesn't have a top-level token.
> {noformat}
>   {"hostName": "host1"}
>   {"hostName": "host2", "upgradeDomain": "ud0"}
>   {"hostName": "host3", "adminState": "DECOMMISSIONED"}
>   {"hostName": "host4", "upgradeDomain": "ud2", "adminState": 
> "DECOMMISSIONED"}
>   {"hostName": "host5", "port": 8090}
>   {"hostName": "host6", "adminState": "IN_MAINTENANCE"}
>   {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
> "maintenanceExpireTimeInMS": "112233"}
> {noformat}
> Instead, to conform with the JSON standard it should be like
> {noformat}
> [
>   {"hostName": "host1"},
>   {"hostName": "host2", "upgradeDomain": "ud0"},
>   {"hostName": "host3", "adminState": "DECOMMISSIONED"},
>   {"hostName": "host4", "upgradeDomain": "ud2", "adminState": 
> "DECOMMISSIONED"},
>   {"hostName": "host5", "port": 8090},
>   {"hostName": "host6", "adminState": "IN_MAINTENANCE"},
>   {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
> "maintenanceExpireTimeInMS": "112233"}
> ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12473) Change hosts JSON file format

2017-09-18 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-12473:
---
Attachment: HDFS-12473-5.patch

>From discussion with [~zhz], 2.8.2 hasn't been released yet. Thus we don't 
>need to deal with the backward compatibility issue of old JSON format being 
>used in HDFS-7541, assuming we can get the patch in 2.8.2 and branch-3.0. 
>[~manojg] here is the latest patch.

> Change hosts JSON file format
> -
>
> Key: HDFS-12473
> URL: https://issues.apache.org/jira/browse/HDFS-12473
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-12473-2.patch, HDFS-12473-3.patch, 
> HDFS-12473-4.patch, HDFS-12473-5.patch, HDFS-12473.patch
>
>
> The existing host JSON file format doesn't have a top-level token.
> {noformat}
>   {"hostName": "host1"}
>   {"hostName": "host2", "upgradeDomain": "ud0"}
>   {"hostName": "host3", "adminState": "DECOMMISSIONED"}
>   {"hostName": "host4", "upgradeDomain": "ud2", "adminState": 
> "DECOMMISSIONED"}
>   {"hostName": "host5", "port": 8090}
>   {"hostName": "host6", "adminState": "IN_MAINTENANCE"}
>   {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
> "maintenanceExpireTimeInMS": "112233"}
> {noformat}
> Instead, to conform with the JSON standard it should be like
> {noformat}
> [
>   {"hostName": "host1"},
>   {"hostName": "host2", "upgradeDomain": "ud0"},
>   {"hostName": "host3", "adminState": "DECOMMISSIONED"},
>   {"hostName": "host4", "upgradeDomain": "ud2", "adminState": 
> "DECOMMISSIONED"},
>   {"hostName": "host5", "port": 8090},
>   {"hostName": "host6", "adminState": "IN_MAINTENANCE"},
>   {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
> "maintenanceExpireTimeInMS": "112233"}
> ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-11035) Better documentation for maintenace mode and upgrade domain

2017-09-17 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169365#comment-16169365
 ] 

Ming Ma edited comment on HDFS-11035 at 9/17/17 5:47 PM:
-

Here is the draft patch including one doc for upgrade domain and another one 
for datanode administration in general (decommission and maintenance). cc 
[~jojochuang] [~manojg] [~eddyxu] [~ctrezzo]  


was (Author: mingma):
Here is the draft patch including one doc for upgrade domain and another one 
for datanode administration in general (decommission and maintenance). cc 
[~jojochuang] [~manojg] [~eddyxu] 

> Better documentation for maintenace mode and upgrade domain
> ---
>
> Key: HDFS-11035
> URL: https://issues.apache.org/jira/browse/HDFS-11035
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, documentation
>Affects Versions: 2.9.0
>Reporter: Wei-Chiu Chuang
>Assignee: Ming Ma
> Attachments: HDFS-11035.patch
>
>
> HDFS-7541 added upgrade domain and HDFS-7877 added maintenance mode. Existing 
> documentation about these two features are scarce and the implementation have 
> evolved from the original design doc. Looking at code and Javadoc and I still 
> don't quite get how I can get datanodes into maintenance mode/ set up a 
> upgrade domain.
> File this jira to propose that we write an up-to-date description of these 
> two features.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11035) Better documentation for maintenace mode and upgrade domain

2017-09-17 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-11035:
---
Attachment: HDFS-11035.patch

Here is the draft patch including one doc for upgrade domain and another one 
for datanode administration in general (decommission and maintenance). cc 
[~jojochuang] [~manojg] [~eddyxu] 

> Better documentation for maintenace mode and upgrade domain
> ---
>
> Key: HDFS-11035
> URL: https://issues.apache.org/jira/browse/HDFS-11035
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, documentation
>Affects Versions: 2.9.0
>Reporter: Wei-Chiu Chuang
> Attachments: HDFS-11035.patch
>
>
> HDFS-7541 added upgrade domain and HDFS-7877 added maintenance mode. Existing 
> documentation about these two features are scarce and the implementation have 
> evolved from the original design doc. Looking at code and Javadoc and I still 
> don't quite get how I can get datanodes into maintenance mode/ set up a 
> upgrade domain.
> File this jira to propose that we write an up-to-date description of these 
> two features.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-11035) Better documentation for maintenace mode and upgrade domain

2017-09-17 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma reassigned HDFS-11035:
--

Assignee: Ming Ma

> Better documentation for maintenace mode and upgrade domain
> ---
>
> Key: HDFS-11035
> URL: https://issues.apache.org/jira/browse/HDFS-11035
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, documentation
>Affects Versions: 2.9.0
>Reporter: Wei-Chiu Chuang
>Assignee: Ming Ma
> Attachments: HDFS-11035.patch
>
>
> HDFS-7541 added upgrade domain and HDFS-7877 added maintenance mode. Existing 
> documentation about these two features are scarce and the implementation have 
> evolved from the original design doc. Looking at code and Javadoc and I still 
> don't quite get how I can get datanodes into maintenance mode/ set up a 
> upgrade domain.
> File this jira to propose that we write an up-to-date description of these 
> two features.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11035) Better documentation for maintenace mode and upgrade domain

2017-09-17 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-11035:
---
Status: Patch Available  (was: Open)

> Better documentation for maintenace mode and upgrade domain
> ---
>
> Key: HDFS-11035
> URL: https://issues.apache.org/jira/browse/HDFS-11035
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, documentation
>Affects Versions: 2.9.0
>Reporter: Wei-Chiu Chuang
>Assignee: Ming Ma
> Attachments: HDFS-11035.patch
>
>
> HDFS-7541 added upgrade domain and HDFS-7877 added maintenance mode. Existing 
> documentation about these two features are scarce and the implementation have 
> evolved from the original design doc. Looking at code and Javadoc and I still 
> don't quite get how I can get datanodes into maintenance mode/ set up a 
> upgrade domain.
> File this jira to propose that we write an up-to-date description of these 
> two features.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12473) Change hosts JSON file format

2017-09-16 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-12473:
---
Attachment: HDFS-12473-4.patch

> Change hosts JSON file format
> -
>
> Key: HDFS-12473
> URL: https://issues.apache.org/jira/browse/HDFS-12473
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-12473-2.patch, HDFS-12473-3.patch, 
> HDFS-12473-4.patch, HDFS-12473.patch
>
>
> The existing host JSON file format doesn't have a top-level token.
> {noformat}
>   {"hostName": "host1"}
>   {"hostName": "host2", "upgradeDomain": "ud0"}
>   {"hostName": "host3", "adminState": "DECOMMISSIONED"}
>   {"hostName": "host4", "upgradeDomain": "ud2", "adminState": 
> "DECOMMISSIONED"}
>   {"hostName": "host5", "port": 8090}
>   {"hostName": "host6", "adminState": "IN_MAINTENANCE"}
>   {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
> "maintenanceExpireTimeInMS": "112233"}
> {noformat}
> Instead, to conform with the JSON standard it should be like
> {noformat}
> [
>   {"hostName": "host1"},
>   {"hostName": "host2", "upgradeDomain": "ud0"},
>   {"hostName": "host3", "adminState": "DECOMMISSIONED"},
>   {"hostName": "host4", "upgradeDomain": "ud2", "adminState": 
> "DECOMMISSIONED"},
>   {"hostName": "host5", "port": 8090},
>   {"hostName": "host6", "adminState": "IN_MAINTENANCE"},
>   {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
> "maintenanceExpireTimeInMS": "112233"}
> ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12473) Change hosts JSON file format

2017-09-15 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-12473:
---
Attachment: HDFS-12473-2.patch

Thanks Manoj. Here is the updated patch to address your comments.

bq. What happens when the hosts file has improper json format? 
I was hoping we can get it before 3.0 beta release thus without worrying about 
compatibility issue. But it looks like upgrade domain feature has been 
backported to 2.8.2. Unfortunately that means we have to support the old format.

bq. #readFile can now return null object
The updated patch will return empty array instead.

bq. If MAPPER is no more used, can be removed.
It was removed. Maybe you referred to the existing file.

bq. CombinedHostsFileReader.readFile() can return null if the input hosts file 
has no entries.
test case testEmptyCombinedHostsFileReader 

> Change hosts JSON file format
> -
>
> Key: HDFS-12473
> URL: https://issues.apache.org/jira/browse/HDFS-12473
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-12473-2.patch, HDFS-12473.patch
>
>
> The existing host JSON file format doesn't have a top-level token.
> {noformat}
>   {"hostName": "host1"}
>   {"hostName": "host2", "upgradeDomain": "ud0"}
>   {"hostName": "host3", "adminState": "DECOMMISSIONED"}
>   {"hostName": "host4", "upgradeDomain": "ud2", "adminState": 
> "DECOMMISSIONED"}
>   {"hostName": "host5", "port": 8090}
>   {"hostName": "host6", "adminState": "IN_MAINTENANCE"}
>   {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
> "maintenanceExpireTimeInMS": "112233"}
> {noformat}
> Instead, to conform with the JSON standard it should be like
> {noformat}
> [
>   {"hostName": "host1"},
>   {"hostName": "host2", "upgradeDomain": "ud0"},
>   {"hostName": "host3", "adminState": "DECOMMISSIONED"},
>   {"hostName": "host4", "upgradeDomain": "ud2", "adminState": 
> "DECOMMISSIONED"},
>   {"hostName": "host5", "port": 8090},
>   {"hostName": "host6", "adminState": "IN_MAINTENANCE"},
>   {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
> "maintenanceExpireTimeInMS": "112233"}
> ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12473) Change host JSON file format

2017-09-15 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168521#comment-16168521
 ] 

Ming Ma edited comment on HDFS-12473 at 9/15/17 9:09 PM:
-

Here is the draft patch. cc [~eddyxu] and [~manojg].


was (Author: mingma):
Here is the draft patch.

> Change host JSON file format
> 
>
> Key: HDFS-12473
> URL: https://issues.apache.org/jira/browse/HDFS-12473
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-12473.patch
>
>
> The existing host JSON file format doesn't have a top-level token.
> {noformat}
>   {"hostName": "host1"}
>   {"hostName": "host2", "upgradeDomain": "ud0"}
>   {"hostName": "host3", "adminState": "DECOMMISSIONED"}
>   {"hostName": "host4", "upgradeDomain": "ud2", "adminState": 
> "DECOMMISSIONED"}
>   {"hostName": "host5", "port": 8090}
>   {"hostName": "host6", "adminState": "IN_MAINTENANCE"}
>   {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
> "maintenanceExpireTimeInMS": "112233"}
> {noformat}
> Instead, to conform with the JSON standard it should be like
> {noformat}
> [
>   {"hostName": "host1"},
>   {"hostName": "host2", "upgradeDomain": "ud0"},
>   {"hostName": "host3", "adminState": "DECOMMISSIONED"},
>   {"hostName": "host4", "upgradeDomain": "ud2", "adminState": 
> "DECOMMISSIONED"},
>   {"hostName": "host5", "port": 8090},
>   {"hostName": "host6", "adminState": "IN_MAINTENANCE"},
>   {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
> "maintenanceExpireTimeInMS": "112233"}
> ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12473) Change host JSON file format

2017-09-15 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-12473:
---
Issue Type: Sub-task  (was: Bug)
Parent: HDFS-7877

> Change host JSON file format
> 
>
> Key: HDFS-12473
> URL: https://issues.apache.org/jira/browse/HDFS-12473
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-12473.patch
>
>
> The existing host JSON file format doesn't have a top-level token.
> {noformat}
>   {"hostName": "host1"}
>   {"hostName": "host2", "upgradeDomain": "ud0"}
>   {"hostName": "host3", "adminState": "DECOMMISSIONED"}
>   {"hostName": "host4", "upgradeDomain": "ud2", "adminState": 
> "DECOMMISSIONED"}
>   {"hostName": "host5", "port": 8090}
>   {"hostName": "host6", "adminState": "IN_MAINTENANCE"}
>   {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
> "maintenanceExpireTimeInMS": "112233"}
> {noformat}
> Instead, to conform with the JSON standard it should be like
> {noformat}
> [
>   {"hostName": "host1"},
>   {"hostName": "host2", "upgradeDomain": "ud0"},
>   {"hostName": "host3", "adminState": "DECOMMISSIONED"},
>   {"hostName": "host4", "upgradeDomain": "ud2", "adminState": 
> "DECOMMISSIONED"},
>   {"hostName": "host5", "port": 8090},
>   {"hostName": "host6", "adminState": "IN_MAINTENANCE"},
>   {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
> "maintenanceExpireTimeInMS": "112233"}
> ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12473) Change host JSON file format

2017-09-15 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-12473:
---
Assignee: Ming Ma
  Status: Patch Available  (was: Open)

> Change host JSON file format
> 
>
> Key: HDFS-12473
> URL: https://issues.apache.org/jira/browse/HDFS-12473
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-12473.patch
>
>
> The existing host JSON file format doesn't have a top-level token.
> {noformat}
>   {"hostName": "host1"}
>   {"hostName": "host2", "upgradeDomain": "ud0"}
>   {"hostName": "host3", "adminState": "DECOMMISSIONED"}
>   {"hostName": "host4", "upgradeDomain": "ud2", "adminState": 
> "DECOMMISSIONED"}
>   {"hostName": "host5", "port": 8090}
>   {"hostName": "host6", "adminState": "IN_MAINTENANCE"}
>   {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
> "maintenanceExpireTimeInMS": "112233"}
> {noformat}
> Instead, to conform with the JSON standard it should be like
> {noformat}
> [
>   {"hostName": "host1"},
>   {"hostName": "host2", "upgradeDomain": "ud0"},
>   {"hostName": "host3", "adminState": "DECOMMISSIONED"},
>   {"hostName": "host4", "upgradeDomain": "ud2", "adminState": 
> "DECOMMISSIONED"},
>   {"hostName": "host5", "port": 8090},
>   {"hostName": "host6", "adminState": "IN_MAINTENANCE"},
>   {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
> "maintenanceExpireTimeInMS": "112233"}
> ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12473) Change host JSON file format

2017-09-15 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-12473:
---
Attachment: HDFS-12473.patch

Here is the draft patch.

> Change host JSON file format
> 
>
> Key: HDFS-12473
> URL: https://issues.apache.org/jira/browse/HDFS-12473
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
> Attachments: HDFS-12473.patch
>
>
> The existing host JSON file format doesn't have a top-level token.
> {noformat}
>   {"hostName": "host1"}
>   {"hostName": "host2", "upgradeDomain": "ud0"}
>   {"hostName": "host3", "adminState": "DECOMMISSIONED"}
>   {"hostName": "host4", "upgradeDomain": "ud2", "adminState": 
> "DECOMMISSIONED"}
>   {"hostName": "host5", "port": 8090}
>   {"hostName": "host6", "adminState": "IN_MAINTENANCE"}
>   {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
> "maintenanceExpireTimeInMS": "112233"}
> {noformat}
> Instead, to conform with the JSON standard it should be like
> {noformat}
> [
>   {"hostName": "host1"},
>   {"hostName": "host2", "upgradeDomain": "ud0"},
>   {"hostName": "host3", "adminState": "DECOMMISSIONED"},
>   {"hostName": "host4", "upgradeDomain": "ud2", "adminState": 
> "DECOMMISSIONED"},
>   {"hostName": "host5", "port": 8090},
>   {"hostName": "host6", "adminState": "IN_MAINTENANCE"},
>   {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
> "maintenanceExpireTimeInMS": "112233"}
> ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12473) Change host JSON file format

2017-09-15 Thread Ming Ma (JIRA)
Ming Ma created HDFS-12473:
--

 Summary: Change host JSON file format
 Key: HDFS-12473
 URL: https://issues.apache.org/jira/browse/HDFS-12473
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma


The existing host JSON file format doesn't have a top-level token.

{noformat}
  {"hostName": "host1"}
  {"hostName": "host2", "upgradeDomain": "ud0"}
  {"hostName": "host3", "adminState": "DECOMMISSIONED"}
  {"hostName": "host4", "upgradeDomain": "ud2", "adminState": "DECOMMISSIONED"}
  {"hostName": "host5", "port": 8090}
  {"hostName": "host6", "adminState": "IN_MAINTENANCE"}
  {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
"maintenanceExpireTimeInMS": "112233"}
{noformat}


Instead, to conform with the JSON standard it should be like

{noformat}
[
  {"hostName": "host1"},
  {"hostName": "host2", "upgradeDomain": "ud0"},
  {"hostName": "host3", "adminState": "DECOMMISSIONED"},
  {"hostName": "host4", "upgradeDomain": "ud2", "adminState": "DECOMMISSIONED"},
  {"hostName": "host5", "port": 8090},
  {"hostName": "host6", "adminState": "IN_MAINTENANCE"},
  {"hostName": "host7", "adminState": "IN_MAINTENANCE", 
"maintenanceExpireTimeInMS": "112233"}
]
{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby

2017-09-12 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164158#comment-16164158
 ] 

Ming Ma commented on HDFS-10702:


[~csun], the long checkpoint duration could cause the following issue:

* Checkpointer holding {{cpLock}} takes a long time to finish for a large 
namespace.
* edit log tailer blocked by {{cpLock}} can't update namespace. Thus the 
namespace becomes stale.
* An application deletes a file. StandbyNN receives incremental block report 
indicating the blocks have been removed and update its blockmap.
* StandbyNN's stale namespace still has the file but without block locations. 

> Add a Client API and Proxy Provider to enable stale read from Standby
> -
>
> Key: HDFS-10702
> URL: https://issues.apache.org/jira/browse/HDFS-10702
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jiayi Zhou
>Assignee: Sean Mackrory
>Priority: Minor
> Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch, 
> HDFS-10702.003.patch, HDFS-10702.004.patch, HDFS-10702.005.patch, 
> HDFS-10702.006.patch, HDFS-10702.007.patch, HDFS-10702.008.patch, 
> StaleReadfromStandbyNN.pdf
>
>
> Currently, clients must always talk to the active NameNode when performing 
> any metadata operation, which means active NameNode could be a bottleneck for 
> scalability. One way to solve this problem is to send read-only operations to 
> Standby NameNode. The disadvantage is that it might be a stale read. 
> Here, I'm thinking of adding a Client API to enable/disable stale read from 
> Standby which gives Client the power to set the staleness restriction.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12285) Better handling of namenode ip address change

2017-08-10 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122746#comment-16122746
 ] 

Ming Ma commented on HDFS-12285:


Thanks [~shahrs87]. Yeah indeed related, although the exception and the 
scenario look different from the other jiras. Even if it is the same, let us 
keep this jira around for validation when we resolve the issue.

> Better handling of namenode ip address change
> -
>
> Key: HDFS-12285
> URL: https://issues.apache.org/jira/browse/HDFS-12285
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>
> RPC client layer provides functionality to detect ip address change:
> {noformat}
> Client.java
> private synchronized boolean updateAddress() throws IOException {
>   // Do a fresh lookup with the old host name.
>   InetSocketAddress currentAddr = NetUtils.createSocketAddrForHost(
>server.getHostName(), server.getPort());
> ..
> }
> {noformat}
> To use this feature, we need to enable retry via 
> {{dfs.client.retry.policy.enabled}}. Otherwise {{TryOnceThenFail}} 
> RetryPolicy will be used; which caused {{handleConnectionFailure}} to throw 
> {{ConnectException}} exception without retrying with the new ip address.
> {noformat}
> private void handleConnectionFailure(int curRetries, IOException ioe
> ) throws IOException {
>   closeConnection();
>   final RetryAction action;
>   try {
> action = connectionRetryPolicy.shouldRetry(ioe, curRetries, 0, true);
>   } catch(Exception e) {
> throw e instanceof IOException? (IOException)e: new IOException(e);
>   }
>   ..
>   }
> {noformat}
> However, using such configuration isn't ideal. What happens is DFSClient 
> still holds onto the cached old ip address created by {{namenode = 
> proxyInfo.getProxy();}}. Thus when a new rpc connection is created, it starts 
> with the old ip followed by retry with the new ip. It will be nice if 
> DFSClient can update namenode proxy automatically upon ip address change.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12285) Better handling of namenode ip address change

2017-08-09 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-12285:
---
Description: 
RPC client layer provides functionality to detect ip address change:

{noformat}
Client.java
private synchronized boolean updateAddress() throws IOException {
  // Do a fresh lookup with the old host name.
  InetSocketAddress currentAddr = NetUtils.createSocketAddrForHost(
   server.getHostName(), server.getPort());
..
}
{noformat}

To use this feature, we need to enable retry via 
{{dfs.client.retry.policy.enabled}}. Otherwise {{TryOnceThenFail}} RetryPolicy 
will be used; which caused {{handleConnectionFailure}} to throw 
{{ConnectException}} exception without retrying with the new ip address.

{noformat}
private void handleConnectionFailure(int curRetries, IOException ioe
) throws IOException {
  closeConnection();

  final RetryAction action;
  try {
action = connectionRetryPolicy.shouldRetry(ioe, curRetries, 0, true);
  } catch(Exception e) {
throw e instanceof IOException? (IOException)e: new IOException(e);
  }
  ..
  }
{noformat}


However, using such configuration isn't ideal. What happens is DFSClient still 
holds onto the cached old ip address created by {{namenode = 
proxyInfo.getProxy();}}. Thus when a new rpc connection is created, it starts 
with the old ip followed by retry with the new ip. It will be nice if DFSClient 
can update namenode proxy automatically upon ip address change.

  was:
RPC client layer provides functionality to detect ip address change:

{noformat}
Client.java
private synchronized boolean updateAddress() throws IOException {
  // Do a fresh lookup with the old host name.
  InetSocketAddress currentAddr = NetUtils.createSocketAddrForHost(
   server.getHostName(), server.getPort());
..
}
{noformat}

To use this feature, we need to enable retry via 
{{dfs.client.retry.policy.enabled}}. Otherwise {{TryOnceThenFail}} RetryPolicy 
will be used; which caused {{handleConnectionFailure}} to throw 
{{ConnectException}} exception without retrying with the new ip address.

{noformat}
private void handleConnectionFailure(int curRetries, IOException ioe
) throws IOException {
  closeConnection();

  final RetryAction action;
  try {
action = connectionRetryPolicy.shouldRetry(ioe, curRetries, 0, true);
  } catch(Exception e) {
throw e instanceof IOException? (IOException)e: new IOException(e);
  }
  ..
  }
{noformat}


However, using such configuration isn't ideal. What happens is DFSClient still 
has the cached the old ip address created by {{namenode = 
proxyInfo.getProxy();}}. Then when a new rpc connection is created, it starts 
with the old ip followed by retry with the new ip. It will be nice if DFSClient 
can refresh namenode proxy automatically.


> Better handling of namenode ip address change
> -
>
> Key: HDFS-12285
> URL: https://issues.apache.org/jira/browse/HDFS-12285
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>
> RPC client layer provides functionality to detect ip address change:
> {noformat}
> Client.java
> private synchronized boolean updateAddress() throws IOException {
>   // Do a fresh lookup with the old host name.
>   InetSocketAddress currentAddr = NetUtils.createSocketAddrForHost(
>server.getHostName(), server.getPort());
> ..
> }
> {noformat}
> To use this feature, we need to enable retry via 
> {{dfs.client.retry.policy.enabled}}. Otherwise {{TryOnceThenFail}} 
> RetryPolicy will be used; which caused {{handleConnectionFailure}} to throw 
> {{ConnectException}} exception without retrying with the new ip address.
> {noformat}
> private void handleConnectionFailure(int curRetries, IOException ioe
> ) throws IOException {
>   closeConnection();
>   final RetryAction action;
>   try {
> action = connectionRetryPolicy.shouldRetry(ioe, curRetries, 0, true);
>   } catch(Exception e) {
> throw e instanceof IOException? (IOException)e: new IOException(e);
>   }
>   ..
>   }
> {noformat}
> However, using such configuration isn't ideal. What happens is DFSClient 
> still holds onto the cached old ip address created by {{namenode = 
> proxyInfo.getProxy();}}. Thus when a new rpc connection is created, it starts 
> with the old ip followed by retry with the new ip. It will be nice if 
> DFSClient can update namenode proxy automatically upon ip address change.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: 

[jira] [Created] (HDFS-12285) Better handling of namenode ip address change

2017-08-09 Thread Ming Ma (JIRA)
Ming Ma created HDFS-12285:
--

 Summary: Better handling of namenode ip address change
 Key: HDFS-12285
 URL: https://issues.apache.org/jira/browse/HDFS-12285
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma


RPC client layer provides functionality to detect ip address change:

{noformat}
Client.java
private synchronized boolean updateAddress() throws IOException {
  // Do a fresh lookup with the old host name.
  InetSocketAddress currentAddr = NetUtils.createSocketAddrForHost(
   server.getHostName(), server.getPort());
..
}
{noformat}

To use this feature, we need to enable retry via 
{{dfs.client.retry.policy.enabled}}. Otherwise {{TryOnceThenFail}} RetryPolicy 
will be used; which caused {{handleConnectionFailure}} to throw 
{{ConnectException}} exception without retrying with the new ip address.

{noformat}
private void handleConnectionFailure(int curRetries, IOException ioe
) throws IOException {
  closeConnection();

  final RetryAction action;
  try {
action = connectionRetryPolicy.shouldRetry(ioe, curRetries, 0, true);
  } catch(Exception e) {
throw e instanceof IOException? (IOException)e: new IOException(e);
  }
  ..
  }
{noformat}


However, using such configuration isn't ideal. What happens is DFSClient still 
has the cached the old ip address created by {{namenode = 
proxyInfo.getProxy();}}. Then when a new rpc connection is created, it starts 
with the old ip followed by retry with the new ip. It will be nice if DFSClient 
can refresh namenode proxy automatically.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11035) Better documentation for maintenace mode and upgrade domain

2017-08-07 Thread Ming Ma (JIRA)

[ 
https://issues-test.apache.org/jira/browse/HDFS-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090130#comment-16090130
 ] 

Ming Ma commented on HDFS-11035:


Given these features are related to existing concepts such as decommission and 
block placement, we can include description of these features in relevant 
sections of existing *.md files.

> Better documentation for maintenace mode and upgrade domain
> ---
>
> Key: HDFS-11035
> URL: https://issues-test.apache.org/jira/browse/HDFS-11035
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, documentation
>Affects Versions: 2.9.0
>Reporter: Wei-Chiu Chuang
>
> HDFS-7541 added upgrade domain and HDFS-7877 added maintenance mode. Existing 
> documentation about these two features are scarce and the implementation have 
> evolved from the original design doc. Looking at code and Javadoc and I still 
> don't quite get how I can get datanodes into maintenance mode/ set up a 
> upgrade domain.
> File this jira to propose that we write an up-to-date description of these 
> two features.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-9388) Refactor decommission related code to support maintenance state for datanodes

2017-08-07 Thread Ming Ma (JIRA)

[ 
https://issues-test.apache.org/jira/browse/HDFS-9388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077367#comment-16077367
 ] 

Ming Ma edited comment on HDFS-9388 at 8/8/17 1:06 AM:
---

 Thanks [~manojg].


was (Author: mingma):
 1. Thanks [~manojg].

> Refactor decommission related code to support maintenance state for datanodes
> -
>
> Key: HDFS-9388
> URL: https://issues-test.apache.org/jira/browse/HDFS-9388
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Ming Ma
>Assignee: Manoj Govindassamy
> Attachments: HDFS-9388.01.patch, HDFS-9388.02.patch
>
>
> Lots of code can be shared between the existing decommission functionality 
> and to-be-added maintenance state support for datanodes. To make it easier to 
> add maintenance state support, let us first modify the existing code to make 
> it more general.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-6939) Support path-based filtering of inotify events

2017-08-07 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117482#comment-16117482
 ] 

Ming Ma commented on HDFS-6939:
---

Yeah we can include this feature if it provides values. Couple questions:

* Each RPC getEditsFromTxid call ends up sending the filter over the wire; so 
filter with lots of paths has perf impact. Do we need to support large number 
of paths per call?
* In the future there could be other type of filters, e.g. a) based on 
FsEditLogOp type; b) support different logical operators OR, AND, etc. To make 
it extensible, perhaps we can define an interface with signature 
shouldNotify(FsEditLogOp) and provide the path-based PathBasedInotifyFilter for 
now. Then InotifyFSEditLogOpTranslator will be simpler by checking shouldNotify 
upfront; if we need to add path-and-editop-based filtering, we can just add 
PathAndOpBasedInotifyFilter without changing InotifyFSEditLogOpTranslator.
*  DFSClient's existing getInotifyEventStream methods are only used by 
DistributedFileSystem. So you don't need to keep these old methods on 
DFSClient; instead have DistributedFileSystem's old getInotifyEventStream 
methods call DFSClient's new methods. Also maybe we can consider depreciate 
DistributedFileSystem's old getInotifyEventStream methods.

> Support path-based filtering of inotify events
> --
>
> Key: HDFS-6939
> URL: https://issues.apache.org/jira/browse/HDFS-6939
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode, qjm
>Reporter: James Thomas
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-6939-001.patch
>
>
> Users should be able to specify that they only want events involving 
> particular paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9388) Refactor decommission related code to support maintenance state for datanodes

2017-07-06 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077367#comment-16077367
 ] 

Ming Ma commented on HDFS-9388:
---

 1. Thanks [~manojg].

> Refactor decommission related code to support maintenance state for datanodes
> -
>
> Key: HDFS-9388
> URL: https://issues.apache.org/jira/browse/HDFS-9388
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Ming Ma
>Assignee: Manoj Govindassamy
> Attachments: HDFS-9388.01.patch, HDFS-9388.02.patch
>
>
> Lots of code can be shared between the existing decommission functionality 
> and to-be-added maintenance state support for datanodes. To make it easier to 
> add maintenance state support, let us first modify the existing code to make 
> it more general.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9388) Refactor decommission related code to support maintenance state for datanodes

2017-06-30 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070937#comment-16070937
 ] 

Ming Ma commented on HDFS-9388:
---

 Thanks [~manojg]. Looks good over all. Couple nits:

* Configuration keys DFS_NAMENODE_DECOMMISSION_* only mentioned decommission in 
hdfs-default.xml. Better to use general term like admin, or include maintenance.
* Comments in functions like handleInsufficientlyStored and 
processBlocksInternal refer to decommission only; would be useful to update the 
comments.
* The checkstyle and whitespace might not be related to the change. Still it 
will be nice to fix them if it isn't too much effort.

> Refactor decommission related code to support maintenance state for datanodes
> -
>
> Key: HDFS-9388
> URL: https://issues.apache.org/jira/browse/HDFS-9388
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Ming Ma
>Assignee: Manoj Govindassamy
> Attachments: HDFS-9388.01.patch
>
>
> Lots of code can be shared between the existing decommission functionality 
> and to-be-added maintenance state support for datanodes. To make it easier to 
> add maintenance state support, let us first modify the existing code to make 
> it more general.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11446) TestMaintenanceState#testWithNNAndDNRestart fails intermittently

2017-05-26 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027081#comment-16027081
 ] 

Ming Ma commented on HDFS-11446:


+1.

> TestMaintenanceState#testWithNNAndDNRestart fails intermittently
> 
>
> Key: HDFS-11446
> URL: https://issues.apache.org/jira/browse/HDFS-11446
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-11446.001.patch, HDFS-11446.002.patch, 
> HDFS-11446.003.patch, HDFS-11446.004.patch, HDFS-11446-branch-2.002.patch, 
> HDFS-11446-branch-2.patch
>
>
> The test {{TestMaintenanceState#testWithNNAndDNRestart}} fails in trunk. The 
> stack info( 
> https://builds.apache.org/job/PreCommit-HDFS-Build/18423/testReport/ ):
> {code}
> java.lang.AssertionError: expected null, but was: for block BP-1367163238-172.17.0.2-1487836532907:blk_1073741825_1001: 
> expected 3, got 2 
> ,DatanodeInfoWithStorage[127.0.0.1:42649,DS-c499e6ef-ce14-428b-baef-8cf2a122b248,DISK],DatanodeInfoWithStorage[127.0.0.1:40774,DS-cc484c09-6e32-4804-a337-2871f37b62e1,DISK],pending
>  block # 1 ,under replicated # 0 ,>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.hdfs.TestMaintenanceState.testWithNNAndDNRestart(TestMaintenanceState.java:731)
> {code}
> The failure seems due to pending block has not been replicated. We can bump 
> the retry times since sometimes the cluster would be busy. Also we can use 
> {{GenericTestUtils#waitFor}} to simplified the current compared logic.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11790) Decommissioning of a DataNode after MaintenanceState takes a very long time to complete

2017-05-23 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16022321#comment-16022321
 ] 

Ming Ma commented on HDFS-11790:


Thanks [~manojg] for reporting this. Hmm, the existing code should take care of 
this. Wonder if it is due to some corner cases where the following functions 
don't skip maintenance nodes properly.

* BlockManager#createLocatedBlock should skip IN_MAINTENANCE nodes.
* BlockManager#chooseSourceDatanodes should skip MAINTENANCE_NOT_FOR_READ nodes 
set for IN_MAINTENANCE nodes.

> Decommissioning of a DataNode after MaintenanceState takes a very long time 
> to complete
> ---
>
> Key: HDFS-11790
> URL: https://issues.apache.org/jira/browse/HDFS-11790
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Attachments: HDFS-11790-test.01.patch
>
>
> *Problem:*
> When a DataNode is requested for Decommissioning after it successfully 
> transitioned to MaintenanceState (HDFS-7877), the decommissioning state 
> transition is stuck for a long time even for very small number of blocks in 
> the cluster. 
> *Details:*
> * A DataNode DN1 wa requested for MaintenanceState and it successfully 
> transitioned from ENTERING_MAINTENANCE state IN_MAINTENANCE state as there 
> are sufficient replication for all its blocks.
> * As DN1 was in maintenance state now, the DataNode process was stopped on 
> DN1. Later the same DN1 was requested for Decommissioning. 
> * As part of Decommissioning, all the blocks residing in DN1 were requested 
> for re-replicated to other DataNodes, so that DN1 could transition from 
> ENTERING_DECOMMISSION to DECOMMISSIONED. 
> * But, re-replication for few blocks was stuck for a long time. Eventually it 
> got completed.
> * Digging the code and logs, found that the IN_MAINTENANCE DN1 was chosen as 
> a source datanode for re-replication of few of the blocks. Since DataNode 
> process on DN1 was already stopped, the re-replication was stuck for a long 
> time.
> * Eventually PendingReplicationMonitor timed out, and those re-replication 
> were re-scheduled for those timed out blocks. Again, during the 
> re-replication also, the IN_MAINT DN1 was chose as a source datanode for few 
> of the blocks leading to timeout again. This iteration continued for few 
> times until all blocks get re-replicated.
> * By design, IN_MAINT datandoes should not be chosen for any read or write 
> operations.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7541) Upgrade Domains in HDFS

2017-05-02 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993607#comment-15993607
 ] 

Ming Ma commented on HDFS-7541:
---

Sounds good. HDFS-9005, HDFS-9016 and HDFS-9922 have been committed to 2.8.2.

> Upgrade Domains in HDFS
> ---
>
> Key: HDFS-7541
> URL: https://issues.apache.org/jira/browse/HDFS-7541
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 3.0.0-alpha1, 2.8.2
>
> Attachments: HDFS-7541-2.patch, HDFS-7541.patch, 
> SupportforfastHDFSdatanoderollingupgrade.pdf, UpgradeDomains_design_v2.pdf, 
> UpgradeDomains_Design_v3.pdf
>
>
> Current HDFS DN rolling upgrade step requires sequential DN restart to 
> minimize the impact on data availability and read/write operations. The side 
> effect is longer upgrade duration for large clusters. This might be 
> acceptable for DN JVM quick restart to update hadoop code/configuration. 
> However, for OS upgrade that requires machine reboot, the overall upgrade 
> duration will be too long if we continue to do sequential DN rolling restart.
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-7541) Upgrade Domains in HDFS

2017-05-02 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-7541:
--
Fix Version/s: (was: 2.9.0)
   2.8.2

> Upgrade Domains in HDFS
> ---
>
> Key: HDFS-7541
> URL: https://issues.apache.org/jira/browse/HDFS-7541
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 3.0.0-alpha1, 2.8.2
>
> Attachments: HDFS-7541-2.patch, HDFS-7541.patch, 
> SupportforfastHDFSdatanoderollingupgrade.pdf, UpgradeDomains_design_v2.pdf, 
> UpgradeDomains_Design_v3.pdf
>
>
> Current HDFS DN rolling upgrade step requires sequential DN restart to 
> minimize the impact on data availability and read/write operations. The side 
> effect is longer upgrade duration for large clusters. This might be 
> acceptable for DN JVM quick restart to update hadoop code/configuration. 
> However, for OS upgrade that requires machine reboot, the overall upgrade 
> duration will be too long if we continue to do sequential DN rolling restart.
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9922) Upgrade Domain placement policy status marks a good block in violation when there are decommissioned nodes

2017-05-02 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-9922:
--
   Resolution: Fixed
Fix Version/s: (was: 2.9.0)
   2.8.2
   Status: Resolved  (was: Patch Available)

Backported to branch-2.8 per the discussion in the umbrella jira. The failed 
tests pass locally.

> Upgrade Domain placement policy status marks a good block in violation when 
> there are decommissioned nodes
> --
>
> Key: HDFS-9922
> URL: https://issues.apache.org/jira/browse/HDFS-9922
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Fix For: 2.8.2, 3.0.0-alpha1
>
> Attachments: HDFS-9922.branch-2.8.001.patch, 
> HDFS-9922-trunk-v1.patch, HDFS-9922-trunk-v2.patch, HDFS-9922-trunk-v3.patch, 
> HDFS-9922-trunk-v4.patch
>
>
> When there are replicas of a block on a decommissioned node, 
> BlockPlacementStatusWithUpgradeDomain#isUpgradeDomainPolicySatisfied returns 
> false when it should return true. This is because numberOfReplicas is the 
> number of in-service replicas for the block and upgradeDomains.size() is the 
> number of upgrade domains across all replicas of the block. Specifically, we 
> hit this scenario when numberOfReplicas is equal to upgradeDomainFactor and 
> upgradeDomains.size() is greater than numberOfReplicas.
> {code}
> private boolean isUpgradeDomainPolicySatisfied() {
> if (numberOfReplicas <= upgradeDomainFactor) {
>   return (numberOfReplicas == upgradeDomains.size());
> } else {
>   return upgradeDomains.size() >= upgradeDomainFactor;
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9922) Upgrade Domain placement policy status marks a good block in violation when there are decommissioned nodes

2017-05-02 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-9922:
--
Attachment: HDFS-9922.branch-2.8.001.patch

> Upgrade Domain placement policy status marks a good block in violation when 
> there are decommissioned nodes
> --
>
> Key: HDFS-9922
> URL: https://issues.apache.org/jira/browse/HDFS-9922
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: HDFS-9922.branch-2.8.001.patch, 
> HDFS-9922-trunk-v1.patch, HDFS-9922-trunk-v2.patch, HDFS-9922-trunk-v3.patch, 
> HDFS-9922-trunk-v4.patch
>
>
> When there are replicas of a block on a decommissioned node, 
> BlockPlacementStatusWithUpgradeDomain#isUpgradeDomainPolicySatisfied returns 
> false when it should return true. This is because numberOfReplicas is the 
> number of in-service replicas for the block and upgradeDomains.size() is the 
> number of upgrade domains across all replicas of the block. Specifically, we 
> hit this scenario when numberOfReplicas is equal to upgradeDomainFactor and 
> upgradeDomains.size() is greater than numberOfReplicas.
> {code}
> private boolean isUpgradeDomainPolicySatisfied() {
> if (numberOfReplicas <= upgradeDomainFactor) {
>   return (numberOfReplicas == upgradeDomains.size());
> } else {
>   return upgradeDomains.size() >= upgradeDomainFactor;
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9922) Upgrade Domain placement policy status marks a good block in violation when there are decommissioned nodes

2017-05-02 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-9922:
--
Status: Patch Available  (was: Reopened)

> Upgrade Domain placement policy status marks a good block in violation when 
> there are decommissioned nodes
> --
>
> Key: HDFS-9922
> URL: https://issues.apache.org/jira/browse/HDFS-9922
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: HDFS-9922.branch-2.8.001.patch, 
> HDFS-9922-trunk-v1.patch, HDFS-9922-trunk-v2.patch, HDFS-9922-trunk-v3.patch, 
> HDFS-9922-trunk-v4.patch
>
>
> When there are replicas of a block on a decommissioned node, 
> BlockPlacementStatusWithUpgradeDomain#isUpgradeDomainPolicySatisfied returns 
> false when it should return true. This is because numberOfReplicas is the 
> number of in-service replicas for the block and upgradeDomains.size() is the 
> number of upgrade domains across all replicas of the block. Specifically, we 
> hit this scenario when numberOfReplicas is equal to upgradeDomainFactor and 
> upgradeDomains.size() is greater than numberOfReplicas.
> {code}
> private boolean isUpgradeDomainPolicySatisfied() {
> if (numberOfReplicas <= upgradeDomainFactor) {
>   return (numberOfReplicas == upgradeDomains.size());
> } else {
>   return upgradeDomains.size() >= upgradeDomainFactor;
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-9922) Upgrade Domain placement policy status marks a good block in violation when there are decommissioned nodes

2017-05-02 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma reopened HDFS-9922:
---

> Upgrade Domain placement policy status marks a good block in violation when 
> there are decommissioned nodes
> --
>
> Key: HDFS-9922
> URL: https://issues.apache.org/jira/browse/HDFS-9922
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: HDFS-9922-trunk-v1.patch, HDFS-9922-trunk-v2.patch, 
> HDFS-9922-trunk-v3.patch, HDFS-9922-trunk-v4.patch
>
>
> When there are replicas of a block on a decommissioned node, 
> BlockPlacementStatusWithUpgradeDomain#isUpgradeDomainPolicySatisfied returns 
> false when it should return true. This is because numberOfReplicas is the 
> number of in-service replicas for the block and upgradeDomains.size() is the 
> number of upgrade domains across all replicas of the block. Specifically, we 
> hit this scenario when numberOfReplicas is equal to upgradeDomainFactor and 
> upgradeDomains.size() is greater than numberOfReplicas.
> {code}
> private boolean isUpgradeDomainPolicySatisfied() {
> if (numberOfReplicas <= upgradeDomainFactor) {
>   return (numberOfReplicas == upgradeDomains.size());
> } else {
>   return upgradeDomains.size() >= upgradeDomainFactor;
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9016) Display upgrade domain information in fsck

2017-05-02 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-9016:
--
   Resolution: Fixed
Fix Version/s: (was: 2.9.0)
   2.8.2
   Status: Resolved  (was: Patch Available)

Backported to branch-2.8 per the discussion in the umbrella jira.

> Display upgrade domain information in fsck
> --
>
> Key: HDFS-9016
> URL: https://issues.apache.org/jira/browse/HDFS-9016
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.8.2, 3.0.0-alpha1
>
> Attachments: HDFS-9016-2.patch, HDFS-9016-3.patch, HDFS-9016-4.patch, 
> HDFS-9016-4.patch, HDFS-9016-branch-2-2.patch, 
> HDFS-9016.branch-2.8.001.patch, HDFS-9016-branch-2.patch, HDFS-9016.patch
>
>
> This will make it easy for people to use fsck to check block placement when 
> upgrade domain is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9016) Display upgrade domain information in fsck

2017-05-02 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-9016:
--
Status: Patch Available  (was: Reopened)

> Display upgrade domain information in fsck
> --
>
> Key: HDFS-9016
> URL: https://issues.apache.org/jira/browse/HDFS-9016
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: HDFS-9016-2.patch, HDFS-9016-3.patch, HDFS-9016-4.patch, 
> HDFS-9016-4.patch, HDFS-9016-branch-2-2.patch, 
> HDFS-9016.branch-2.8.001.patch, HDFS-9016-branch-2.patch, HDFS-9016.patch
>
>
> This will make it easy for people to use fsck to check block placement when 
> upgrade domain is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9016) Display upgrade domain information in fsck

2017-05-02 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-9016:
--
Attachment: HDFS-9016.branch-2.8.001.patch

> Display upgrade domain information in fsck
> --
>
> Key: HDFS-9016
> URL: https://issues.apache.org/jira/browse/HDFS-9016
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: HDFS-9016-2.patch, HDFS-9016-3.patch, HDFS-9016-4.patch, 
> HDFS-9016-4.patch, HDFS-9016-branch-2-2.patch, 
> HDFS-9016.branch-2.8.001.patch, HDFS-9016-branch-2.patch, HDFS-9016.patch
>
>
> This will make it easy for people to use fsck to check block placement when 
> upgrade domain is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-9016) Display upgrade domain information in fsck

2017-05-02 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma reopened HDFS-9016:
---

> Display upgrade domain information in fsck
> --
>
> Key: HDFS-9016
> URL: https://issues.apache.org/jira/browse/HDFS-9016
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: HDFS-9016-2.patch, HDFS-9016-3.patch, HDFS-9016-4.patch, 
> HDFS-9016-4.patch, HDFS-9016-branch-2-2.patch, HDFS-9016-branch-2.patch, 
> HDFS-9016.patch
>
>
> This will make it easy for people to use fsck to check block placement when 
> upgrade domain is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9005) Provide configuration support for upgrade domain

2017-05-02 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-9005:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: (was: 2.9.0)
   2.8.2
   Status: Resolved  (was: Patch Available)

Backported to branch-2.8.

> Provide configuration support for upgrade domain
> 
>
> Key: HDFS-9005
> URL: https://issues.apache.org/jira/browse/HDFS-9005
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.8.2, 3.0.0-alpha1
>
> Attachments: HDFS-9005-2.patch, HDFS-9005-3.patch, HDFS-9005-4.patch, 
> HDFS-9005.branch-2.8.001.patch, HDFS-9005.patch
>
>
> As part of the upgrade domain feature, we need to provide a mechanism to 
> specify upgrade domain for each datanode. One way to accomplish that is to 
> allow admins specify an upgrade domain script that takes DN ip or hostname as 
> input and return the upgrade domain. Then namenode will use it at run time to 
> set {{DatanodeInfo}}'s upgrade domain string. The configuration can be 
> something like:
> {noformat}
> 
> dfs.namenode.upgrade.domain.script.file.name
> /etc/hadoop/conf/upgrade-domain.sh
> 
> {noformat}
> just like topology script, 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9005) Provide configuration support for upgrade domain

2017-04-30 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-9005:
--
Attachment: HDFS-9005.branch-2.8.001.patch

Per the discussion in the umbrella jira, we want to the feature to be in 2.8. 
Here is the patch for branch-2.8. It requires some manual effort. All HDFS 
tests passed locally.

> Provide configuration support for upgrade domain
> 
>
> Key: HDFS-9005
> URL: https://issues.apache.org/jira/browse/HDFS-9005
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: HDFS-9005-2.patch, HDFS-9005-3.patch, HDFS-9005-4.patch, 
> HDFS-9005.branch-2.8.001.patch, HDFS-9005.patch
>
>
> As part of the upgrade domain feature, we need to provide a mechanism to 
> specify upgrade domain for each datanode. One way to accomplish that is to 
> allow admins specify an upgrade domain script that takes DN ip or hostname as 
> input and return the upgrade domain. Then namenode will use it at run time to 
> set {{DatanodeInfo}}'s upgrade domain string. The configuration can be 
> something like:
> {noformat}
> 
> dfs.namenode.upgrade.domain.script.file.name
> /etc/hadoop/conf/upgrade-domain.sh
> 
> {noformat}
> just like topology script, 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7541) Upgrade Domains in HDFS

2017-04-30 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990559#comment-15990559
 ] 

Ming Ma commented on HDFS-7541:
---

Sure I can backport HDFS-9005, HDFS-9016 and HDFS-9922 to 2.8. Which 2.8 
release do we want, 2.8.1 or 2.8.2? Pushing the feature to 2.7 requires much 
more work though. Regarding the production quality, yes it has been pretty 
reliable. The only feature we don't use in our production is HDFS-9005. We used 
script-based configuration approach while the feature was developed and tested 
and haven't spend time changing the configuration mechanism. 

> Upgrade Domains in HDFS
> ---
>
> Key: HDFS-7541
> URL: https://issues.apache.org/jira/browse/HDFS-7541
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Kihwal Lee
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: HDFS-7541-2.patch, HDFS-7541.patch, 
> SupportforfastHDFSdatanoderollingupgrade.pdf, UpgradeDomains_design_v2.pdf, 
> UpgradeDomains_Design_v3.pdf
>
>
> Current HDFS DN rolling upgrade step requires sequential DN restart to 
> minimize the impact on data availability and read/write operations. The side 
> effect is longer upgrade duration for large clusters. This might be 
> acceptable for DN JVM quick restart to update hadoop code/configuration. 
> However, for OS upgrade that requires machine reboot, the overall upgrade 
> duration will be too long if we continue to do sequential DN rolling restart.
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9922) Upgrade Domain placement policy status marks a good block in violation when there are decommissioned nodes

2017-04-21 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979473#comment-15979473
 ] 

Ming Ma commented on HDFS-9922:
---

Currently upgrade domain isn't considered available in 2.8 due to these 
changes. If we want the feature to be in 2.8, the major backport item is 
HDFS-9005.

> Upgrade Domain placement policy status marks a good block in violation when 
> there are decommissioned nodes
> --
>
> Key: HDFS-9922
> URL: https://issues.apache.org/jira/browse/HDFS-9922
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: HDFS-9922-trunk-v1.patch, HDFS-9922-trunk-v2.patch, 
> HDFS-9922-trunk-v3.patch, HDFS-9922-trunk-v4.patch
>
>
> When there are replicas of a block on a decommissioned node, 
> BlockPlacementStatusWithUpgradeDomain#isUpgradeDomainPolicySatisfied returns 
> false when it should return true. This is because numberOfReplicas is the 
> number of in-service replicas for the block and upgradeDomains.size() is the 
> number of upgrade domains across all replicas of the block. Specifically, we 
> hit this scenario when numberOfReplicas is equal to upgradeDomainFactor and 
> upgradeDomains.size() is greater than numberOfReplicas.
> {code}
> private boolean isUpgradeDomainPolicySatisfied() {
> if (numberOfReplicas <= upgradeDomainFactor) {
>   return (numberOfReplicas == upgradeDomains.size());
> } else {
>   return upgradeDomains.size() >= upgradeDomainFactor;
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11446) TestMaintenanceState#testWithNNAndDNRestart fails intermittently

2017-03-04 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15895823#comment-15895823
 ] 

Ming Ma commented on HDFS-11446:


Thanks. Got the following compile error for branch-2 which uses java 7.

{noformat}
local variable ... is accessed from within inner class; needs to be declared 
final
{noformat}

> TestMaintenanceState#testWithNNAndDNRestart fails intermittently
> 
>
> Key: HDFS-11446
> URL: https://issues.apache.org/jira/browse/HDFS-11446
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-11446.001.patch, HDFS-11446.002.patch
>
>
> The test {{TestMaintenanceState#testWithNNAndDNRestart}} fails in trunk. The 
> stack info( 
> https://builds.apache.org/job/PreCommit-HDFS-Build/18423/testReport/ ):
> {code}
> java.lang.AssertionError: expected null, but was: for block BP-1367163238-172.17.0.2-1487836532907:blk_1073741825_1001: 
> expected 3, got 2 
> ,DatanodeInfoWithStorage[127.0.0.1:42649,DS-c499e6ef-ce14-428b-baef-8cf2a122b248,DISK],DatanodeInfoWithStorage[127.0.0.1:40774,DS-cc484c09-6e32-4804-a337-2871f37b62e1,DISK],pending
>  block # 1 ,under replicated # 0 ,>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.hdfs.TestMaintenanceState.testWithNNAndDNRestart(TestMaintenanceState.java:731)
> {code}
> The failure seems due to pending block has not been replicated. We can bump 
> the retry times since sometimes the cluster would be busy. Also we can use 
> {{GenericTestUtils#waitFor}} to simplified the current compared logic.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9388) Refactor decommission related code to support maintenance state for datanodes

2017-03-03 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15894786#comment-15894786
 ] 

Ming Ma commented on HDFS-9388:
---

Or DatanodeAdminManager? DatanodeMaintenance seems good. DatanodeService and 
DatanodeProvision are too general.

> Refactor decommission related code to support maintenance state for datanodes
> -
>
> Key: HDFS-9388
> URL: https://issues.apache.org/jira/browse/HDFS-9388
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Manoj Govindassamy
>
> Lots of code can be shared between the existing decommission functionality 
> and to-be-added maintenance state support for datanodes. To make it easier to 
> add maintenance state support, let us first modify the existing code to make 
> it more general.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11446) TestMaintenanceState#testWithNNAndDNRestart fails intermittently

2017-03-03 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15894772#comment-15894772
 ] 

Ming Ma commented on HDFS-11446:


Thanks [~linyiqun]. The patch looks good. Do you get compile issue when applied 
to branch-2?

> TestMaintenanceState#testWithNNAndDNRestart fails intermittently
> 
>
> Key: HDFS-11446
> URL: https://issues.apache.org/jira/browse/HDFS-11446
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-11446.001.patch
>
>
> The test {{TestMaintenanceState#testWithNNAndDNRestart}} fails in trunk. The 
> stack info( 
> https://builds.apache.org/job/PreCommit-HDFS-Build/18423/testReport/ ):
> {code}
> java.lang.AssertionError: expected null, but was: for block BP-1367163238-172.17.0.2-1487836532907:blk_1073741825_1001: 
> expected 3, got 2 
> ,DatanodeInfoWithStorage[127.0.0.1:42649,DS-c499e6ef-ce14-428b-baef-8cf2a122b248,DISK],DatanodeInfoWithStorage[127.0.0.1:40774,DS-cc484c09-6e32-4804-a337-2871f37b62e1,DISK],pending
>  block # 1 ,under replicated # 0 ,>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.hdfs.TestMaintenanceState.testWithNNAndDNRestart(TestMaintenanceState.java:731)
> {code}
> The failure seems due to pending block has not been replicated. We can bump 
> the retry times since sometimes the cluster would be busy. Also we can use 
> {{GenericTestUtils#waitFor}} to simplified the current compared logic.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11412) Maintenance minimum replication config value allowable range should be [0, DefaultReplication]

2017-03-02 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-11412:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha3
   2.9.0
   Status: Resolved  (was: Patch Available)

Committed to branch-2. Thanks [~manojg] for the contribution.

> Maintenance minimum replication config value allowable range should be [0, 
> DefaultReplication]
> --
>
> Key: HDFS-11412
> URL: https://issues.apache.org/jira/browse/HDFS-11412
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HDFS-11412.01.patch, HDFS-11412.02.patch, 
> HDFS-11412-branch-2.01.patch
>
>
> Currently the allowed value range for Maintenance Min Replication 
> {{dfs.namenode.maintenance.replication.min}} is 0 to 
> {{dfs.namenode.replication.min}} (default=1). Users wanting not to affect the 
> performance of the cluster would wish to have the Maintenance Min Replication 
> number greater than 1, say 2. In the current design, it is possible to have 
> this Maintenance Min Replication configuration, but only after changing the 
> NameNode level Block Min Replication to 2, and which could slowdown the 
> overall latency for client writes.
> Technically speaking we should be allowing Maintenance Min Replication to be 
> in range 0 to dfs.replication.max.  
> * There is always config value of 0 for users not wanting any 
> availability/performance during maintenance. 
> * And, performance centric workloads can still get maintenance done without 
> major disruptions by having a bigger Maintenance Min Replication. Setting the 
> upper limit as dfs.replication.max could be an overkill as it could trigger 
> re-replication which Maintenance State is trying to avoid. So, we could allow 
> the {{dfs.namenode.maintenance.replication.min}} in the range {{0 to 
> dfs.replication}}
> {noformat}
> if (minMaintenanceR < 0) {
>   throw new IOException("Unexpected configuration parameters: "
>   + DFSConfigKeys.DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY
>   + " = " + minMaintenanceR + " < 0");
> }
> if (minMaintenanceR > minR) {
>   throw new IOException("Unexpected configuration parameters: "
>   + DFSConfigKeys.DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY
>   + " = " + minMaintenanceR + " > "
>   + DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY
>   + " = " + minR);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11412) Maintenance minimum replication config value allowable range should be [0, DefaultReplication]

2017-03-01 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891612#comment-15891612
 ] 

Ming Ma commented on HDFS-11412:


+1. Committed to trunk. [~manojg], could you please provide another patch for 
branch-2 as it doesn't apply? Thanks.

> Maintenance minimum replication config value allowable range should be [0, 
> DefaultReplication]
> --
>
> Key: HDFS-11412
> URL: https://issues.apache.org/jira/browse/HDFS-11412
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Attachments: HDFS-11412.01.patch, HDFS-11412.02.patch
>
>
> Currently the allowed value range for Maintenance Min Replication 
> {{dfs.namenode.maintenance.replication.min}} is 0 to 
> {{dfs.namenode.replication.min}} (default=1). Users wanting not to affect the 
> performance of the cluster would wish to have the Maintenance Min Replication 
> number greater than 1, say 2. In the current design, it is possible to have 
> this Maintenance Min Replication configuration, but only after changing the 
> NameNode level Block Min Replication to 2, and which could slowdown the 
> overall latency for client writes.
> Technically speaking we should be allowing Maintenance Min Replication to be 
> in range 0 to dfs.replication.max.  
> * There is always config value of 0 for users not wanting any 
> availability/performance during maintenance. 
> * And, performance centric workloads can still get maintenance done without 
> major disruptions by having a bigger Maintenance Min Replication. Setting the 
> upper limit as dfs.replication.max could be an overkill as it could trigger 
> re-replication which Maintenance State is trying to avoid. So, we could allow 
> the {{dfs.namenode.maintenance.replication.min}} in the range {{0 to 
> dfs.replication}}
> {noformat}
> if (minMaintenanceR < 0) {
>   throw new IOException("Unexpected configuration parameters: "
>   + DFSConfigKeys.DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY
>   + " = " + minMaintenanceR + " < 0");
> }
> if (minMaintenanceR > minR) {
>   throw new IOException("Unexpected configuration parameters: "
>   + DFSConfigKeys.DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY
>   + " = " + minMaintenanceR + " > "
>   + DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY
>   + " = " + minR);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11412) Maintenance minimum replication config value allowable range should be [0, DefaultReplication]

2017-03-01 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-11412:
---
Summary: Maintenance minimum replication config value allowable range 
should be [0, DefaultReplication]  (was: Maintenance minimum replication config 
value allowable range should be {0 - DefaultReplication})

> Maintenance minimum replication config value allowable range should be [0, 
> DefaultReplication]
> --
>
> Key: HDFS-11412
> URL: https://issues.apache.org/jira/browse/HDFS-11412
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Attachments: HDFS-11412.01.patch, HDFS-11412.02.patch
>
>
> Currently the allowed value range for Maintenance Min Replication 
> {{dfs.namenode.maintenance.replication.min}} is 0 to 
> {{dfs.namenode.replication.min}} (default=1). Users wanting not to affect the 
> performance of the cluster would wish to have the Maintenance Min Replication 
> number greater than 1, say 2. In the current design, it is possible to have 
> this Maintenance Min Replication configuration, but only after changing the 
> NameNode level Block Min Replication to 2, and which could slowdown the 
> overall latency for client writes.
> Technically speaking we should be allowing Maintenance Min Replication to be 
> in range 0 to dfs.replication.max.  
> * There is always config value of 0 for users not wanting any 
> availability/performance during maintenance. 
> * And, performance centric workloads can still get maintenance done without 
> major disruptions by having a bigger Maintenance Min Replication. Setting the 
> upper limit as dfs.replication.max could be an overkill as it could trigger 
> re-replication which Maintenance State is trying to avoid. So, we could allow 
> the {{dfs.namenode.maintenance.replication.min}} in the range {{0 to 
> dfs.replication}}
> {noformat}
> if (minMaintenanceR < 0) {
>   throw new IOException("Unexpected configuration parameters: "
>   + DFSConfigKeys.DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY
>   + " = " + minMaintenanceR + " < 0");
> }
> if (minMaintenanceR > minR) {
>   throw new IOException("Unexpected configuration parameters: "
>   + DFSConfigKeys.DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY
>   + " = " + minMaintenanceR + " > "
>   + DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY
>   + " = " + minR);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11412) Maintenance minimum replication config value allowable range should be {0 - DefaultReplication}

2017-03-01 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890705#comment-15890705
 ] 

Ming Ma commented on HDFS-11412:


bq. this particular range can have adverse effects as it can force replicate to 
larger number of blocks (to honor minReplicationToBeInMaintenance) even for the 
files that aren't created with higher replication factor.
It can choose to only force the replication up to the replication factor of 
that files. So for most files which have default replication factor, the less 
of {default replication factor, minReplicationToBeInMaintenance} will be used 
as the min replication value f during maintenance. So the impact should be 
similar to setting minReplicationToBeInMaintenance to default replication 
factor. Also this is similar to how the following case will be handled. Set 
minReplicationToBeInMaintenance to the default replication factor. For files 
with replication factor of 2, 2 will be used the value for the min replication 
value.

 bq. May be we need to return the max or min value based on how the block 
replication is set compared to the default replication
Maybe we can modify getMinReplicationToBeInMaintenance to return the less of 
{file replication factor, minReplicationToBeInMaintenance}.

> Maintenance minimum replication config value allowable range should be {0 - 
> DefaultReplication}
> ---
>
> Key: HDFS-11412
> URL: https://issues.apache.org/jira/browse/HDFS-11412
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Attachments: HDFS-11412.01.patch
>
>
> Currently the allowed value range for Maintenance Min Replication 
> {{dfs.namenode.maintenance.replication.min}} is 0 to 
> {{dfs.namenode.replication.min}} (default=1). Users wanting not to affect the 
> performance of the cluster would wish to have the Maintenance Min Replication 
> number greater than 1, say 2. In the current design, it is possible to have 
> this Maintenance Min Replication configuration, but only after changing the 
> NameNode level Block Min Replication to 2, and which could slowdown the 
> overall latency for client writes.
> Technically speaking we should be allowing Maintenance Min Replication to be 
> in range 0 to dfs.replication.max.  
> * There is always config value of 0 for users not wanting any 
> availability/performance during maintenance. 
> * And, performance centric workloads can still get maintenance done without 
> major disruptions by having a bigger Maintenance Min Replication. Setting the 
> upper limit as dfs.replication.max could be an overkill as it could trigger 
> re-replication which Maintenance State is trying to avoid. So, we could allow 
> the {{dfs.namenode.maintenance.replication.min}} in the range {{0 to 
> dfs.replication}}
> {noformat}
> if (minMaintenanceR < 0) {
>   throw new IOException("Unexpected configuration parameters: "
>   + DFSConfigKeys.DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY
>   + " = " + minMaintenanceR + " < 0");
> }
> if (minMaintenanceR > minR) {
>   throw new IOException("Unexpected configuration parameters: "
>   + DFSConfigKeys.DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY
>   + " = " + minMaintenanceR + " > "
>   + DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY
>   + " = " + minR);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11411) Avoid OutOfMemoryError in TestMaintenanceState test runs

2017-02-22 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-11411:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha3
   2.9.0
   Status: Resolved  (was: Patch Available)

+1. Thanks [~manojg] for the contribution. Committed to trunk and branch-2.

> Avoid OutOfMemoryError in TestMaintenanceState test runs
> 
>
> Key: HDFS-11411
> URL: https://issues.apache.org/jira/browse/HDFS-11411
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HDFS-11411.01.patch, HDFS-11411.02.patch
>
>
> TestMainteananceState test runs are seeing OutOfMemoryError issues quite 
> frequently now. Need to fix tests that are consuming lots of memory/threads. 
> {noformat}
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.hdfs.TestMaintenanceState
> Tests run: 21, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 219.479 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.Te
> testTransitionFromDecommissioned(org.apache.hadoop.hdfs.TestMaintenanceState) 
>  Time elapsed: 0.64 sec  <<< ERROR!
> java.lang.OutOfMemoryError: unable to create new native thread
> testTakeDeadNodeOutOfMaintenance(org.apache.hadoop.hdfs.TestMaintenanceState) 
>  Time elapsed: 0.031 sec  <<< ERROR!
> java.lang.OutOfMemoryError: unable to create new native thread
> testWithNNAndDNRestart(org.apache.hadoop.hdfs.TestMaintenanceState)  Time 
> elapsed: 0.03 sec  <<< ERROR!
> java.lang.OutOfMemoryError: unable to create new native thread
> testMultipleNodesMaintenance(org.apache.hadoop.hdfs.TestMaintenanceState)  
> Time elapsed: 60.127 sec  <<< ERROR!
> java.io.IOException: Problem starting http server
> Results :
> Tests in error: 
>   
> TestMaintenanceState.testTransitionFromDecommissioned:225->AdminStatesBaseTest.startCluster:413->AdminStatesBaseTest.s
>   
> TestMaintenanceState.testTakeDeadNodeOutOfMaintenance:636->AdminStatesBaseTest.startCluster:413->AdminStatesBaseTest.s
>   
> TestMaintenanceState.testWithNNAndDNRestart:692->AdminStatesBaseTest.startCluster:413->AdminStatesBaseTest.startCluste
>   
> TestMaintenanceState.testMultipleNodesMaintenance:532->AdminStatesBaseTest.startCluster:413->AdminStatesBaseTest.start
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9388) Refactor decommission related code to support maintenance state for datanodes

2017-02-15 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869184#comment-15869184
 ] 

Ming Ma commented on HDFS-9388:
---

[~manojg], most of the work has been done by other jiras. Some specific items 
left include the rename of DecommissionManager and if comments about 
decommission should be updated. Please feel free to assign it to yourself. 
Thank you!

> Refactor decommission related code to support maintenance state for datanodes
> -
>
> Key: HDFS-9388
> URL: https://issues.apache.org/jira/browse/HDFS-9388
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>
> Lots of code can be shared between the existing decommission functionality 
> and to-be-added maintenance state support for datanodes. To make it easier to 
> add maintenance state support, let us first modify the existing code to make 
> it more general.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11411) Avoid OutOfMemoryError in TestMaintenanceState test runs

2017-02-15 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869170#comment-15869170
 ] 

Ming Ma commented on HDFS-11411:


Looks good. Nits:

* For {{testExpectedReplication}} case, should we move the setup() call into 
the function that calls startCluster?
* Maybe at the end of the function, call teardown first, then setup for the 
next iteration. Otherwise, setup will be called twice (one from the test case 
setup and another one from the added explicit call) for the first iteration of 
test case. 

> Avoid OutOfMemoryError in TestMaintenanceState test runs
> 
>
> Key: HDFS-11411
> URL: https://issues.apache.org/jira/browse/HDFS-11411
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Attachments: HDFS-11411.01.patch
>
>
> TestMainteananceState test runs are seeing OutOfMemoryError issues quite 
> frequently now. Need to fix tests that are consuming lots of memory/threads. 
> {noformat}
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.hdfs.TestMaintenanceState
> Tests run: 21, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 219.479 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.Te
> testTransitionFromDecommissioned(org.apache.hadoop.hdfs.TestMaintenanceState) 
>  Time elapsed: 0.64 sec  <<< ERROR!
> java.lang.OutOfMemoryError: unable to create new native thread
> testTakeDeadNodeOutOfMaintenance(org.apache.hadoop.hdfs.TestMaintenanceState) 
>  Time elapsed: 0.031 sec  <<< ERROR!
> java.lang.OutOfMemoryError: unable to create new native thread
> testWithNNAndDNRestart(org.apache.hadoop.hdfs.TestMaintenanceState)  Time 
> elapsed: 0.03 sec  <<< ERROR!
> java.lang.OutOfMemoryError: unable to create new native thread
> testMultipleNodesMaintenance(org.apache.hadoop.hdfs.TestMaintenanceState)  
> Time elapsed: 60.127 sec  <<< ERROR!
> java.io.IOException: Problem starting http server
> Results :
> Tests in error: 
>   
> TestMaintenanceState.testTransitionFromDecommissioned:225->AdminStatesBaseTest.startCluster:413->AdminStatesBaseTest.s
>   
> TestMaintenanceState.testTakeDeadNodeOutOfMaintenance:636->AdminStatesBaseTest.startCluster:413->AdminStatesBaseTest.s
>   
> TestMaintenanceState.testWithNNAndDNRestart:692->AdminStatesBaseTest.startCluster:413->AdminStatesBaseTest.startCluste
>   
> TestMaintenanceState.testMultipleNodesMaintenance:532->AdminStatesBaseTest.startCluster:413->AdminStatesBaseTest.start
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11265) Extend visualization for Maintenance Mode under Datanode tab in the NameNode UI

2017-02-15 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-11265:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha3
   2.9.0
   Status: Resolved  (was: Patch Available)

Thanks [~elektrobank] for the contribution and [~manojg] for the review. 
Committed to trunk and branch-2.

> Extend visualization for Maintenance Mode under Datanode tab in the NameNode 
> UI
> ---
>
> Key: HDFS-11265
> URL: https://issues.apache.org/jira/browse/HDFS-11265
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Elek, Marton
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: ex.png, HDFS-11265.001.patch, icons.png, x.png
>
>
> With HDFS-9391, DataNodes in MaintenanceModes states are shown under DataNode 
> page in NameNode UI, but they are lacking icon visualization like the ones 
> shown for other node states. Need to extend the icon visualization to cover 
> Maintenance Mode.
> {code}
> 

[jira] [Commented] (HDFS-11265) Extend visualization for Maintenance Mode under Datanode tab in the NameNode UI

2017-02-15 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869107#comment-15869107
 ] 

Ming Ma commented on HDFS-11265:


Strictly speaking, live decommissioned nodes can serve read requests as the 
least preferred replicas. But even with that, the existing patch LGTM. +1.

> Extend visualization for Maintenance Mode under Datanode tab in the NameNode 
> UI
> ---
>
> Key: HDFS-11265
> URL: https://issues.apache.org/jira/browse/HDFS-11265
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Elek, Marton
> Attachments: ex.png, HDFS-11265.001.patch, icons.png, x.png
>
>
> With HDFS-9391, DataNodes in MaintenanceModes states are shown under DataNode 
> page in NameNode UI, but they are lacking icon visualization like the ones 
> shown for other node states. Need to extend the icon visualization to cover 
> Maintenance Mode.
> {code}
> 

[jira] [Commented] (HDFS-11412) Maintenance minimum replication config value allowable range should be {0 - DefaultReplication}

2017-02-15 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868977#comment-15868977
 ] 

Ming Ma commented on HDFS-11412:


Thanks [~manojg].

* Regarding whether to use default replication factor or max replication 
factor, do you care about the following use case? default == 3, max = 30. Block 
A have large replication factor 30 and would like to keep at least 20 live 
replicas around during maintenance. Then put 20 nodes with replicas of Block A 
into maintenance at the same time. To make sure at least 20 live replicas after 
maintenance, the system need to honor minReplicationToBeInMaintenance == 20. 
* Impact on {{getExpectedLiveRedundancyNum}} calculation. Set 
minReplicationToBeInMaintenance to 3. Block B's replication factor is 2. Put 
one of its replicas into maintenance. Inside function 
{{getExpectedLiveRedundancyNum}}, {{Math.max(expectedRedundancy - 
numberReplicas.maintenanceReplicas(), getMinReplicationToBeInMaintenance())}} 
== 3.  Ideally the function should return 2.

> Maintenance minimum replication config value allowable range should be {0 - 
> DefaultReplication}
> ---
>
> Key: HDFS-11412
> URL: https://issues.apache.org/jira/browse/HDFS-11412
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Attachments: HDFS-11412.01.patch
>
>
> Currently the allowed value range for Maintenance Min Replication 
> {{dfs.namenode.maintenance.replication.min}} is 0 to 
> {{dfs.namenode.replication.min}} (default=1). Users wanting not to affect the 
> performance of the cluster would wish to have the Maintenance Min Replication 
> number greater than 1, say 2. In the current design, it is possible to have 
> this Maintenance Min Replication configuration, but only after changing the 
> NameNode level Block Min Replication to 2, and which could slowdown the 
> overall latency for client writes.
> Technically speaking we should be allowing Maintenance Min Replication to be 
> in range 0 to dfs.replication.max.  
> * There is always config value of 0 for users not wanting any 
> availability/performance during maintenance. 
> * And, performance centric workloads can still get maintenance done without 
> major disruptions by having a bigger Maintenance Min Replication. Setting the 
> upper limit as dfs.replication.max could be an overkill as it could trigger 
> re-replication which Maintenance State is trying to avoid. So, we could allow 
> the {{dfs.namenode.maintenance.replication.min}} in the range {{0 to 
> dfs.replication}}
> {noformat}
> if (minMaintenanceR < 0) {
>   throw new IOException("Unexpected configuration parameters: "
>   + DFSConfigKeys.DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY
>   + " = " + minMaintenanceR + " < 0");
> }
> if (minMaintenanceR > minR) {
>   throw new IOException("Unexpected configuration parameters: "
>   + DFSConfigKeys.DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY
>   + " = " + minMaintenanceR + " > "
>   + DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY
>   + " = " + minR);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7877) Support maintenance state for datanodes

2017-02-15 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868948#comment-15868948
 ] 

Ming Ma commented on HDFS-7877:
---

ok. Will follow up the discussion in HDFS-11412.

> Support maintenance state for datanodes
> ---
>
> Key: HDFS-7877
> URL: https://issues.apache.org/jira/browse/HDFS-7877
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-7877-2.patch, HDFS-7877.patch, 
> Supportmaintenancestatefordatanodes-2.pdf, 
> Supportmaintenancestatefordatanodes.pdf
>
>
> This requirement came up during the design for HDFS-7541. Given this feature 
> is mostly independent of upgrade domain feature, it is better to track it 
> under a separate jira. The design and draft patch will be available soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-7877) Support maintenance state for datanodes

2017-02-11 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862549#comment-15862549
 ] 

Ming Ma edited comment on HDFS-7877 at 2/11/17 10:23 PM:
-

Thanks [~manojg] and [~dilaver] for the good point. What you suggested makes 
sense. The reason we don't have this requirement so far is probably because 
when we put nodes into maintenance, we often do it one upgrade domain at a 
time, thus no two replicas will be put to maintenance at the same time.

To confirm, given we still allow applications to create blocks with smaller 
replication factor than {{dfs.namenode.maintenance.replication.min}}, the 
transition policy from {{ENTERING_MAINTENANCE}} to {{IN_MAINTENANCE}} will 
become the # of live replicas >= 
min({{dfs.namenode.maintenance.replication.min}}, replication factor).


was (Author: mingma):
Thanks [~manojg]. Good point. What you suggested makes sense. The reason we 
don't have this requirement in our production is probably because we only put 
nodes in one upgrade domain into maintenance at a time; after one batch is 
done, move to the next upgrade domain. Thus no two replicas will be put to 
maintenance at the same time.

To confirm, given we will still allow applications to create blocks with 
smaller replication factor than {{dfs.namenode.maintenance.replication.min}}, 
the transition policy from {{ENTERING_MAINTENANCE}} to {{IN_MAINTENANCE}} 
becomes the # of live replicas >= 
min({{dfs.namenode.maintenance.replication.min}}, replication factor).

> Support maintenance state for datanodes
> ---
>
> Key: HDFS-7877
> URL: https://issues.apache.org/jira/browse/HDFS-7877
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-7877-2.patch, HDFS-7877.patch, 
> Supportmaintenancestatefordatanodes-2.pdf, 
> Supportmaintenancestatefordatanodes.pdf
>
>
> This requirement came up during the design for HDFS-7541. Given this feature 
> is mostly independent of upgrade domain feature, it is better to track it 
> under a separate jira. The design and draft patch will be available soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7877) Support maintenance state for datanodes

2017-02-11 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862549#comment-15862549
 ] 

Ming Ma commented on HDFS-7877:
---

Thanks [~manojg]. Good point. What you suggested makes sense. The reason we 
don't have this requirement in our production is probably because we only put 
nodes in one upgrade domain into maintenance at a time; after one batch is 
done, move to the next upgrade domain. Thus no two replicas will be put to 
maintenance at the same time.

To confirm, given we will still allow applications to create blocks with 
smaller replication factor than {{dfs.namenode.maintenance.replication.min}}, 
the transition policy from {{ENTERING_MAINTENANCE}} to {{IN_MAINTENANCE}} 
becomes the # of live replicas >= 
min({{dfs.namenode.maintenance.replication.min}}, replication factor).

> Support maintenance state for datanodes
> ---
>
> Key: HDFS-7877
> URL: https://issues.apache.org/jira/browse/HDFS-7877
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-7877-2.patch, HDFS-7877.patch, 
> Supportmaintenancestatefordatanodes-2.pdf, 
> Supportmaintenancestatefordatanodes.pdf
>
>
> This requirement came up during the design for HDFS-7541. Given this feature 
> is mostly independent of upgrade domain feature, it is better to track it 
> under a separate jira. The design and draft patch will be available soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11378) Verify multiple DataNodes can be decommissioned/maintenance at the same time

2017-01-27 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-11378:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha3
   2.9.0
   Status: Resolved  (was: Patch Available)

+1. Thanks [~manojg] for the contribution. I have committed it to trunk and 
branch-2.

> Verify multiple DataNodes can be decommissioned/maintenance at the same time
> 
>
> Key: HDFS-11378
> URL: https://issues.apache.org/jira/browse/HDFS-11378
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HDFS-11378.01.patch
>
>
> DecommissionManager is capable of transitioning multiple DataNodes to 
> Decommission/Maintenance states. Current tests under TestDecommission and 
> TestMaintenanceState only request for one DataNode for 
> Decommission/Maintenance. Better if we can simulate real world cases whereby 
> multiple DataNodes can be taken out of service and verify the resulting block 
> replication factor for the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11378) Verify multiple DataNodes can be decommissioned/maintenance at the same time

2017-01-27 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15843195#comment-15843195
 ] 

Ming Ma commented on HDFS-11378:


The patch LGTM. Thanks [~manojg] for the useful test cases! We might want to 
add test cases of putting some nodes to decommission and other nodes to 
maintenance at the same time. But that can be done in a separate jira unless it 
is your attention to do it here.

> Verify multiple DataNodes can be decommissioned/maintenance at the same time
> 
>
> Key: HDFS-11378
> URL: https://issues.apache.org/jira/browse/HDFS-11378
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Attachments: HDFS-11378.01.patch
>
>
> DecommissionManager is capable of transitioning multiple DataNodes to 
> Decommission/Maintenance states. Current tests under TestDecommission and 
> TestMaintenanceState only request for one DataNode for 
> Decommission/Maintenance. Better if we can simulate real world cases whereby 
> multiple DataNodes can be taken out of service and verify the resulting block 
> replication factor for the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11296) Maintenance state expiry should be an epoch time and not jvm monotonic

2017-01-19 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-11296:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha3
   2.9.0
   Status: Resolved  (was: Patch Available)

+1. The failed tests aren't related. Committed to trunk and branch-2. Thanks 
[~manojg] for the contribution. Thanks [~eddyxu] for the review.

> Maintenance state expiry should be an epoch time and not jvm monotonic
> --
>
> Key: HDFS-11296
> URL: https://issues.apache.org/jira/browse/HDFS-11296
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HDFS-11296.01.patch, HDFS-11296.02.patch, 
> HDFS-11296.03.patch, HDFS-11296-branch-2.01.patch, 
> HDFS-11296-branch-2.02.patch, HDFS-11296-branch-2.03.patch
>
>
> Currently it is possible to configure an expiry time in milliseconds for a 
> DataNode in maintenance state. As per the design, the expiry attribute is an 
> absolute time, beyond which NameNode starts to stop the ongoing maintenance 
> operation for that DataNode. Internally in the code, this expiry time is read 
> and checked against {{Time.monotonicNow()}} making the expiry based on more 
> of JVM's runtime, which is very difficult to configure for any external user. 
> The goal is to make the expiry time an absolute epoch time, so that its easy 
> to configure for external users.
> {noformat}
> {
> "hostName": ,
> "port": ,
> "adminState": "IN_MAINTENANCE",
> "maintenanceExpireTimeInMS": 
> }
> {noformat}
> DatanodeInfo.java
> {noformat}
>   public static boolean maintenanceNotExpired(long maintenanceExpireTimeInMS) 
> {
> return Time.monotonicNow() < maintenanceExpireTimeInMS;
>   }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11296) Maintenance state expiry should be an epoch time and not jvm monotonic

2017-01-13 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822666#comment-15822666
 ] 

Ming Ma commented on HDFS-11296:


Thanks Manoj for the fix. Nit: Maybe use Time.now() instead?

> Maintenance state expiry should be an epoch time and not jvm monotonic
> --
>
> Key: HDFS-11296
> URL: https://issues.apache.org/jira/browse/HDFS-11296
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Attachments: HDFS-11296.01.patch, HDFS-11296-branch-2.01.patch
>
>
> Currently it is possible to configure an expiry time in milliseconds for a 
> DataNode in maintenance state. As per the design, the expiry attribute is an 
> absolute time, beyond which NameNode starts to stop the ongoing maintenance 
> operation for that DataNode. Internally in the code, this expiry time is read 
> and checked against {{Time.monotonicNow()}} making the expiry based on more 
> of JVM's runtime, which is very difficult to configure for any external user. 
> The goal is to make the expiry time an absolute epoch time, so that its easy 
> to configure for external users.
> {noformat}
> {
> "hostName": ,
> "port": ,
> "adminState": "IN_MAINTENANCE",
> "maintenanceExpireTimeInMS": 
> }
> {noformat}
> DatanodeInfo.java
> {noformat}
>   public static boolean maintenanceNotExpired(long maintenanceExpireTimeInMS) 
> {
> return Time.monotonicNow() < maintenanceExpireTimeInMS;
>   }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-9391) Update webUI/JMX to display maintenance state info

2017-01-10 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817119#comment-15817119
 ] 

Ming Ma edited comment on HDFS-9391 at 1/11/17 4:23 AM:


Thanks [~manojg] for the contribution. Thanks [~dilaver] and [~eddyxu] for the 
review. I have committed the patch to trunk and branch-2.


was (Author: mingma):
Thanks [~manojg] for the contribution. Thanks [~eddyxu] for the review. I have 
committed the patch to trunk and branch-2.

> Update webUI/JMX to display maintenance state info
> --
>
> Key: HDFS-9391
> URL: https://issues.apache.org/jira/browse/HDFS-9391
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Ming Ma
>Assignee: Manoj Govindassamy
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, 
> HDFS-9391-branch-2-MaintenanceMode-WebUI.pdf, HDFS-9391-branch-2.01.patch, 
> HDFS-9391-branch-2.02.patch, HDFS-9391.01.patch, HDFS-9391.02.patch, 
> HDFS-9391.03.patch, HDFS-9391.04.patch, Maintenance webUI.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9391) Update webUI/JMX to display maintenance state info

2017-01-10 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-9391:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha2
   2.9.0
   Status: Resolved  (was: Patch Available)

Thanks [~manojg] for the contribution. Thanks [~eddyxu] for the review. I have 
committed the patch to trunk and branch-2.

> Update webUI/JMX to display maintenance state info
> --
>
> Key: HDFS-9391
> URL: https://issues.apache.org/jira/browse/HDFS-9391
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Ming Ma
>Assignee: Manoj Govindassamy
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, 
> HDFS-9391-branch-2-MaintenanceMode-WebUI.pdf, HDFS-9391-branch-2.01.patch, 
> HDFS-9391-branch-2.02.patch, HDFS-9391.01.patch, HDFS-9391.02.patch, 
> HDFS-9391.03.patch, HDFS-9391.04.patch, Maintenance webUI.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info

2017-01-10 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816464#comment-15816464
 ] 

Ming Ma commented on HDFS-9391:
---

Thanks [~manojg]. It seems there is a typo in branch-2 patch 
{{getLeavingServiceStatus().set}} which passes the wrong variable, which caused 
TestDecommissioningStatus to fail. In addition, it will be useful to verify UI 
for the branch-2 patch.

> Update webUI/JMX to display maintenance state info
> --
>
> Key: HDFS-9391
> URL: https://issues.apache.org/jira/browse/HDFS-9391
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Ming Ma
>Assignee: Manoj Govindassamy
> Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, 
> HDFS-9391-branch-2.01.patch, HDFS-9391.01.patch, HDFS-9391.02.patch, 
> HDFS-9391.03.patch, HDFS-9391.04.patch, Maintenance webUI.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info

2017-01-09 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813481#comment-15813481
 ] 

Ming Ma commented on HDFS-9391:
---

+1. Manoj, given the patch doesn't apply directly to branch-2, can you please 
provide another patch? Thanks.

> Update webUI/JMX to display maintenance state info
> --
>
> Key: HDFS-9391
> URL: https://issues.apache.org/jira/browse/HDFS-9391
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Ming Ma
>Assignee: Manoj Govindassamy
> Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, 
> HDFS-9391.02.patch, HDFS-9391.03.patch, HDFS-9391.04.patch, Maintenance 
> webUI.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info

2017-01-09 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812642#comment-15812642
 ] 

Ming Ma commented on HDFS-9391:
---

Thanks Manoj. I just found something related to our discussion. For any 
decommissioning node, given getDecommissionOnlyReplicas is the same as 
getOutOfServiceOnlyReplicas, can we just use getOutOfServiceOnlyReplicas value 
for JSON decommissionOnlyReplicas property? Same for any entering maintenance 
node. In other words, we might not need to add the extra 
decommissionOnlyReplicas and maintenanceOnlyReplicas to LeavingServiceStatus.

> Update webUI/JMX to display maintenance state info
> --
>
> Key: HDFS-9391
> URL: https://issues.apache.org/jira/browse/HDFS-9391
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Ming Ma
>Assignee: Manoj Govindassamy
> Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, 
> HDFS-9391.02.patch, HDFS-9391.03.patch, Maintenance webUI.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info

2017-01-06 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804934#comment-15804934
 ] 

Ming Ma commented on HDFS-9391:
---

Thanks. Sounds good.

> Update webUI/JMX to display maintenance state info
> --
>
> Key: HDFS-9391
> URL: https://issues.apache.org/jira/browse/HDFS-9391
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Ming Ma
>Assignee: Manoj Govindassamy
> Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, 
> HDFS-9391.02.patch, Maintenance webUI.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info

2017-01-05 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803589#comment-15803589
 ] 

Ming Ma commented on HDFS-9391:
---

Then for that specific case when 
{{DecommissionManager#Monitor#processBlocksInternal}} is processing the 
decommissioning node, NumberReplicas#decommissionedAndDecommissioning() > 0 and 
NumberReplicas#maintenanceReplicas() > 0 are satisfied. Thus both 
decommissionOnlyReplicas and maintenanceOnlyReplicas will be incremented. The 
same applies to the other two entering maintenance nodes.

> Update webUI/JMX to display maintenance state info
> --
>
> Key: HDFS-9391
> URL: https://issues.apache.org/jira/browse/HDFS-9391
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Ming Ma
>Assignee: Manoj Govindassamy
> Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, 
> HDFS-9391.02.patch, Maintenance webUI.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-9391) Update webUI/JMX to display maintenance state info

2017-01-05 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803538#comment-15803538
 ] 

Ming Ma edited comment on HDFS-9391 at 1/6/17 4:41 AM:
---

A given replica is only in one admin state, normal, decommission or 
maintenance. But {{NumberReplicas}} represents the state of all replicas. Thus 
for the case "One replica is decommissioning and two replicas of the same block 
are entering maintenance", {{NumberReplicas#decommissionedAndDecommissioning == 
1}}, {{NumberReplicas#maintenanceReplicas() == 2}}. No?


was (Author: mingma):
A given replica is only in one state, either decommission or maintenance. But 
{{NumberReplicas}} represents the state of all replicas. Thus for the case "One 
replica is decommissioning and two replicas of the same block are entering 
maintenance", {{NumberReplicas#decommissionedAndDecommissioning == 1}}, 
{{NumberReplicas#maintenanceReplicas() == 2}}. No?

> Update webUI/JMX to display maintenance state info
> --
>
> Key: HDFS-9391
> URL: https://issues.apache.org/jira/browse/HDFS-9391
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Ming Ma
>Assignee: Manoj Govindassamy
> Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, 
> HDFS-9391.02.patch, Maintenance webUI.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info

2017-01-05 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803538#comment-15803538
 ] 

Ming Ma commented on HDFS-9391:
---

A given replica is only in one state, either decommission or maintenance. But 
{{NumberReplicas}} represents the state of all replicas. Thus for the case "One 
replica is decommissioning and two replicas of the same block are entering 
maintenance", {{NumberReplicas#decommissionedAndDecommissioning == 1}}, 
{{NumberReplicas#maintenanceReplicas() == 2}}. No?

> Update webUI/JMX to display maintenance state info
> --
>
> Key: HDFS-9391
> URL: https://issues.apache.org/jira/browse/HDFS-9391
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Ming Ma
>Assignee: Manoj Govindassamy
> Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, 
> HDFS-9391.02.patch, Maintenance webUI.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info

2017-01-04 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15800575#comment-15800575
 ] 

Ming Ma commented on HDFS-9391:
---

Sure let us keep what you have in patch 02. Just to make sure, can you confirm 
the followings?

* For the case of "one replica is decommissioning and two replicas of the same 
block are entering maintenance", the code will still increment 
maintenanceOnlyReplicas when processing the decommissioning node, because 
NumberReplicas includes all replicas stats. Thus decommissionOnlyReplicas == 
maintenanceOnlyReplicas  == outOfServiceReplicas.
* For the case of "all replicas are decommissioning", then EnteringMaintenance 
page will have nothing to show to begin with given no nodes are entering 
maintenance.

> Update webUI/JMX to display maintenance state info
> --
>
> Key: HDFS-9391
> URL: https://issues.apache.org/jira/browse/HDFS-9391
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Ming Ma
>Assignee: Manoj Govindassamy
> Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, 
> HDFS-9391.02.patch, Maintenance webUI.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info

2017-01-04 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15799905#comment-15799905
 ] 

Ming Ma commented on HDFS-9391:
---

Good point. Actually it seems maintenanceOnlyReplicas is the same as 
outOfServiceOnlyReplicas in such case. For example, say one replica is 
decommissioning and two are entering maintenance, both maintenanceOnlyReplicas 
and outOfServiceOnlyReplicas are incremented. In other words, 
maintenanceOnlyReplicas isn't strictly "all 3 replicas are maintenance". Maybe 
this new definition is more desirable. What do you think?

> Update webUI/JMX to display maintenance state info
> --
>
> Key: HDFS-9391
> URL: https://issues.apache.org/jira/browse/HDFS-9391
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Ming Ma
>Assignee: Manoj Govindassamy
> Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, 
> HDFS-9391.02.patch, Maintenance webUI.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info

2017-01-03 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15795843#comment-15795843
 ] 

Ming Ma commented on HDFS-9391:
---

Thanks Manoj. Yep let us keep the existing property as Eddy mentioned.

* In {{getMaintenanceOnlyReplicas}} the check of {{if 
(!isDecommissionInProgress() && !isEnteringMaintenance())}} only needs to check 
for maintenance part.
* It seems you will need to add {{In Maintenance  dead}} to match the 
addition of {{nodes[i].state = "down-maintenance";}}.
* For the {{EnteringMaintenanceNodes}} page, it uses 
{{maintenanceOnlyReplicas}} to describe {{Blocks with no live replicas}}. 
Should we use {{OutOfServiceOnlyReplicas}}?

> Update webUI/JMX to display maintenance state info
> --
>
> Key: HDFS-9391
> URL: https://issues.apache.org/jira/browse/HDFS-9391
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Ming Ma
>Assignee: Manoj Govindassamy
> Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, 
> HDFS-9391.02.patch, Maintenance webUI.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info

2016-12-16 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756345#comment-15756345
 ] 

Ming Ma commented on HDFS-9391:
---

Thanks [~manojg]. Some minor questions:

* {{.put("inMaintenance", node.isInMaintenance())}} might not be necessary 
given it also outputs {{.put("adminState", node.getAdminState().toString())}}.
* Should {{liveDecommissioningReplicas}} be {{OnlyDecommissioningReplicas}} 
which is the old behavior before maintenance? There are two differences, one is 
"Only", another one is "live".

> Update webUI/JMX to display maintenance state info
> --
>
> Key: HDFS-9391
> URL: https://issues.apache.org/jira/browse/HDFS-9391
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Ming Ma
>Assignee: Manoj Govindassamy
> Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, 
> Maintenance webUI.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9391) Update webUI/JMX to display maintenance state info

2016-12-16 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756346#comment-15756346
 ] 

Ming Ma commented on HDFS-9391:
---

Thanks [~manojg]. Some minor questions:

* {{.put("inMaintenance", node.isInMaintenance())}} might not be necessary 
given it also outputs {{.put("adminState", node.getAdminState().toString())}}.
* Should {{liveDecommissioningReplicas}} be {{OnlyDecommissioningReplicas}} 
which is the old behavior before maintenance? There are two differences, one is 
"Only", another one is "live".

> Update webUI/JMX to display maintenance state info
> --
>
> Key: HDFS-9391
> URL: https://issues.apache.org/jira/browse/HDFS-9391
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Ming Ma
>Assignee: Manoj Govindassamy
> Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, 
> Maintenance webUI.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-9391) Update webUI/JMX to display maintenance state info

2016-12-16 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-9391:
--
Comment: was deleted

(was: Thanks [~manojg]. Some minor questions:

* {{.put("inMaintenance", node.isInMaintenance())}} might not be necessary 
given it also outputs {{.put("adminState", node.getAdminState().toString())}}.
* Should {{liveDecommissioningReplicas}} be {{OnlyDecommissioningReplicas}} 
which is the old behavior before maintenance? There are two differences, one is 
"Only", another one is "live".)

> Update webUI/JMX to display maintenance state info
> --
>
> Key: HDFS-9391
> URL: https://issues.apache.org/jira/browse/HDFS-9391
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Ming Ma
>Assignee: Manoj Govindassamy
> Attachments: HDFS-9391-MaintenanceMode-WebUI.pdf, HDFS-9391.01.patch, 
> Maintenance webUI.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9391) Update webUI/JMX/fsck to display maintenance state info

2016-12-15 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-9391:
--
Attachment: Maintenance webUI.png

Thanks [~manojg]!

bq. Shouldn't the DecomNodes include only replicas for DECOMMISSION_INPROGRESS 
nodes?
Good point. The question what value "decommissionOnlyReplicas" property should 
be in the context of maintenance mode. A specific example is if a block has 3 
replicas with one node entering maintenance and the other two being 
decommissioned, if it should be included in "decommissionOnlyReplicas". Given 
we normally use the property as a risk indicator, e.g. what if all 
decommissioning or entering maintenance nodes fail, it seems ok to include 
both. Sure there is backward compatibility semantics here; you can argue it is 
ok given the behavior is the same without maintenance. If we really want to 
separately account for all-3-replicas-being-decommissioned, we can keep the 
strict semantics and add a new property "outOfServiceOnlyReplicas" to account 
for both types. To enable that, we will need to track each type separately in 
LeavingServiceStatus.

bq. should we also have FSNameSystem#getMaintenanceNodes?
Yes something like NameNodeMXBean#getEnteringMaintenanceNodes will be useful.

bq. w.r.t showing Maintenance nodes details ?
getDeadNodes only returns decommissioned case. You can add ".put("adminState", 
node.getAdminState().toString())" to the JSON to cover maintenance. You can 
also add counters to FSNamesystemMBean such as getNumMaintenanceLiveDataNodes, 
similar to getNumDecomLiveDataNodes and getNumDecomDeadDataNodes.

bq. "In Maintenance" Live/Dead nodes count also need to be shown along with 
Decommission nodes ?
That is right. The attached screenshot could be useful. After someone clicks 
the "Entering Maintenance Nodes", it should redirect to another page about its 
progress, similar to the "Decommissioning Nodes". 

bq. Is there a plan to expose the concept of 'OutOfService'?
Based on how we use them, decommissioned nodes are tracked separately from 
maintenance nodes, other than the first point you brought up.

> Update webUI/JMX/fsck to display maintenance state info
> ---
>
> Key: HDFS-9391
> URL: https://issues.apache.org/jira/browse/HDFS-9391
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Manoj Govindassamy
> Attachments: Maintenance webUI.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10206) Datanodes not sorted properly by distance when the reader isn't a datanode

2016-12-07 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-10206:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.0
   Status: Resolved  (was: Patch Available)

+1. Thanks [~nandakumar131] for the contribution. I have committed the patch to 
trunk and branch-2.

> Datanodes not sorted properly by distance when the reader isn't a datanode
> --
>
> Key: HDFS-10206
> URL: https://issues.apache.org/jira/browse/HDFS-10206
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Nandakumar
> Fix For: 2.9.0
>
> Attachments: HDFS-10206-branch-2.8.003.patch, HDFS-10206.000.patch, 
> HDFS-10206.001.patch, HDFS-10206.002.patch, HDFS-10206.003.patch
>
>
> If the DFSClient machine is not a datanode, but it shares its rack with some 
> datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} 
> might not put the local-rack datanodes at the beginning of the sorted list. 
> That is because the function didn't call {{networktopology.add(client);}} to 
> properly set the node's parent node; something required by 
> {{networktopology.sortByDistance}} to compute distance between two nodes in 
> the same topology tree.
> Another issue with {{networktopology.sortByDistance}} is it only 
> distinguishes local rack from remote rack, but it doesn't support general 
> distance calculation to tell how remote the rack is.
> {noformat}
> NetworkTopology.java
>   protected int getWeight(Node reader, Node node) {
> // 0 is local, 1 is same rack, 2 is off rack
> // Start off by initializing to off rack
> int weight = 2;
> if (reader != null) {
>   if (reader.equals(node)) {
> weight = 0;
>   } else if (isOnSameRack(reader, node)) {
> weight = 1;
>   }
> }
> return weight;
>   }
> {noformat}
> HDFS-10203 has suggested moving the sorting from namenode to DFSClient to 
> address another issue. Regardless of where we do the sorting, we still need 
> fix the issues outline here.
> Note that BlockPlacementPolicyDefault shares the same NetworkTopology object 
> used by DatanodeManager and requires Nodes stored in the topology to be 
> {{DatanodeDescriptor}} for block placement. So we need to make sure we don't 
> pollute the  NetworkTopology if we plan to fix it on the server side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10206) Datanodes not sorted properly by distance when the reader isn't a datanode

2016-12-07 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-10206:
---
Summary: Datanodes not sorted properly by distance when the reader isn't a 
datanode  (was: datanodes not sorted properly by distance if the reader isn't a 
datanode)

> Datanodes not sorted properly by distance when the reader isn't a datanode
> --
>
> Key: HDFS-10206
> URL: https://issues.apache.org/jira/browse/HDFS-10206
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Nandakumar
> Attachments: HDFS-10206-branch-2.8.003.patch, HDFS-10206.000.patch, 
> HDFS-10206.001.patch, HDFS-10206.002.patch, HDFS-10206.003.patch
>
>
> If the DFSClient machine is not a datanode, but it shares its rack with some 
> datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} 
> might not put the local-rack datanodes at the beginning of the sorted list. 
> That is because the function didn't call {{networktopology.add(client);}} to 
> properly set the node's parent node; something required by 
> {{networktopology.sortByDistance}} to compute distance between two nodes in 
> the same topology tree.
> Another issue with {{networktopology.sortByDistance}} is it only 
> distinguishes local rack from remote rack, but it doesn't support general 
> distance calculation to tell how remote the rack is.
> {noformat}
> NetworkTopology.java
>   protected int getWeight(Node reader, Node node) {
> // 0 is local, 1 is same rack, 2 is off rack
> // Start off by initializing to off rack
> int weight = 2;
> if (reader != null) {
>   if (reader.equals(node)) {
> weight = 0;
>   } else if (isOnSameRack(reader, node)) {
> weight = 1;
>   }
> }
> return weight;
>   }
> {noformat}
> HDFS-10203 has suggested moving the sorting from namenode to DFSClient to 
> address another issue. Regardless of where we do the sorting, we still need 
> fix the issues outline here.
> Note that BlockPlacementPolicyDefault shares the same NetworkTopology object 
> used by DatanodeManager and requires Nodes stored in the topology to be 
> {{DatanodeDescriptor}} for block placement. So we need to make sure we don't 
> pollute the  NetworkTopology if we plan to fix it on the server side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10206) datanodes not sorted properly by distance if the reader isn't a datanode

2016-12-07 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-10206:
---
Summary: datanodes not sorted properly by distance if the reader isn't a 
datanode  (was: getBlockLocations might not sort datanodes properly by distance)

> datanodes not sorted properly by distance if the reader isn't a datanode
> 
>
> Key: HDFS-10206
> URL: https://issues.apache.org/jira/browse/HDFS-10206
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Nandakumar
> Attachments: HDFS-10206-branch-2.8.003.patch, HDFS-10206.000.patch, 
> HDFS-10206.001.patch, HDFS-10206.002.patch, HDFS-10206.003.patch
>
>
> If the DFSClient machine is not a datanode, but it shares its rack with some 
> datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} 
> might not put the local-rack datanodes at the beginning of the sorted list. 
> That is because the function didn't call {{networktopology.add(client);}} to 
> properly set the node's parent node; something required by 
> {{networktopology.sortByDistance}} to compute distance between two nodes in 
> the same topology tree.
> Another issue with {{networktopology.sortByDistance}} is it only 
> distinguishes local rack from remote rack, but it doesn't support general 
> distance calculation to tell how remote the rack is.
> {noformat}
> NetworkTopology.java
>   protected int getWeight(Node reader, Node node) {
> // 0 is local, 1 is same rack, 2 is off rack
> // Start off by initializing to off rack
> int weight = 2;
> if (reader != null) {
>   if (reader.equals(node)) {
> weight = 0;
>   } else if (isOnSameRack(reader, node)) {
> weight = 1;
>   }
> }
> return weight;
>   }
> {noformat}
> HDFS-10203 has suggested moving the sorting from namenode to DFSClient to 
> address another issue. Regardless of where we do the sorting, we still need 
> fix the issues outline here.
> Note that BlockPlacementPolicyDefault shares the same NetworkTopology object 
> used by DatanodeManager and requires Nodes stored in the topology to be 
> {{DatanodeDescriptor}} for block placement. So we need to make sure we don't 
> pollute the  NetworkTopology if we plan to fix it on the server side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance

2016-12-06 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727676#comment-15727676
 ] 

Ming Ma commented on HDFS-10206:


Thanks [~nandakumar131]. The patch looks good. Given the patch doesn't apply 
directly for branch-2. Can you provide another patch for branch-2? You can use 
the naming convention for the branch-2 patch based on "Naming your patch" 
section in https://wiki.apache.org/hadoop/HowToContribute so that Jenkins can 
can run the precommit job.

> getBlockLocations might not sort datanodes properly by distance
> ---
>
> Key: HDFS-10206
> URL: https://issues.apache.org/jira/browse/HDFS-10206
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Nandakumar
> Attachments: HDFS-10206.000.patch, HDFS-10206.001.patch, 
> HDFS-10206.002.patch, HDFS-10206.003.patch
>
>
> If the DFSClient machine is not a datanode, but it shares its rack with some 
> datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} 
> might not put the local-rack datanodes at the beginning of the sorted list. 
> That is because the function didn't call {{networktopology.add(client);}} to 
> properly set the node's parent node; something required by 
> {{networktopology.sortByDistance}} to compute distance between two nodes in 
> the same topology tree.
> Another issue with {{networktopology.sortByDistance}} is it only 
> distinguishes local rack from remote rack, but it doesn't support general 
> distance calculation to tell how remote the rack is.
> {noformat}
> NetworkTopology.java
>   protected int getWeight(Node reader, Node node) {
> // 0 is local, 1 is same rack, 2 is off rack
> // Start off by initializing to off rack
> int weight = 2;
> if (reader != null) {
>   if (reader.equals(node)) {
> weight = 0;
>   } else if (isOnSameRack(reader, node)) {
> weight = 1;
>   }
> }
> return weight;
>   }
> {noformat}
> HDFS-10203 has suggested moving the sorting from namenode to DFSClient to 
> address another issue. Regardless of where we do the sorting, we still need 
> fix the issues outline here.
> Note that BlockPlacementPolicyDefault shares the same NetworkTopology object 
> used by DatanodeManager and requires Nodes stored in the topology to be 
> {{DatanodeDescriptor}} for block placement. So we need to make sure we don't 
> pollute the  NetworkTopology if we plan to fix it on the server side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance

2016-12-06 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-10206:
---
Status: Patch Available  (was: Open)

> getBlockLocations might not sort datanodes properly by distance
> ---
>
> Key: HDFS-10206
> URL: https://issues.apache.org/jira/browse/HDFS-10206
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Nandakumar
> Attachments: HDFS-10206.000.patch, HDFS-10206.001.patch, 
> HDFS-10206.002.patch, HDFS-10206.003.patch
>
>
> If the DFSClient machine is not a datanode, but it shares its rack with some 
> datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} 
> might not put the local-rack datanodes at the beginning of the sorted list. 
> That is because the function didn't call {{networktopology.add(client);}} to 
> properly set the node's parent node; something required by 
> {{networktopology.sortByDistance}} to compute distance between two nodes in 
> the same topology tree.
> Another issue with {{networktopology.sortByDistance}} is it only 
> distinguishes local rack from remote rack, but it doesn't support general 
> distance calculation to tell how remote the rack is.
> {noformat}
> NetworkTopology.java
>   protected int getWeight(Node reader, Node node) {
> // 0 is local, 1 is same rack, 2 is off rack
> // Start off by initializing to off rack
> int weight = 2;
> if (reader != null) {
>   if (reader.equals(node)) {
> weight = 0;
>   } else if (isOnSameRack(reader, node)) {
> weight = 1;
>   }
> }
> return weight;
>   }
> {noformat}
> HDFS-10203 has suggested moving the sorting from namenode to DFSClient to 
> address another issue. Regardless of where we do the sorting, we still need 
> fix the issues outline here.
> Note that BlockPlacementPolicyDefault shares the same NetworkTopology object 
> used by DatanodeManager and requires Nodes stored in the topology to be 
> {{DatanodeDescriptor}} for block placement. So we need to make sure we don't 
> pollute the  NetworkTopology if we plan to fix it on the server side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance

2016-12-03 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15718536#comment-15718536
 ] 

Ming Ma commented on HDFS-10206:


ok. Maybe it isn't precise way to refer it. The "network path" comes from 
NodeBase#getPath method. Anyway, the point is the new method should return 0 in 
case of two identical nodes.

> getBlockLocations might not sort datanodes properly by distance
> ---
>
> Key: HDFS-10206
> URL: https://issues.apache.org/jira/browse/HDFS-10206
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Nandakumar
> Attachments: HDFS-10206.000.patch, HDFS-10206.001.patch, 
> HDFS-10206.002.patch
>
>
> If the DFSClient machine is not a datanode, but it shares its rack with some 
> datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} 
> might not put the local-rack datanodes at the beginning of the sorted list. 
> That is because the function didn't call {{networktopology.add(client);}} to 
> properly set the node's parent node; something required by 
> {{networktopology.sortByDistance}} to compute distance between two nodes in 
> the same topology tree.
> Another issue with {{networktopology.sortByDistance}} is it only 
> distinguishes local rack from remote rack, but it doesn't support general 
> distance calculation to tell how remote the rack is.
> {noformat}
> NetworkTopology.java
>   protected int getWeight(Node reader, Node node) {
> // 0 is local, 1 is same rack, 2 is off rack
> // Start off by initializing to off rack
> int weight = 2;
> if (reader != null) {
>   if (reader.equals(node)) {
> weight = 0;
>   } else if (isOnSameRack(reader, node)) {
> weight = 1;
>   }
> }
> return weight;
>   }
> {noformat}
> HDFS-10203 has suggested moving the sorting from namenode to DFSClient to 
> address another issue. Regardless of where we do the sorting, we still need 
> fix the issues outline here.
> Note that BlockPlacementPolicyDefault shares the same NetworkTopology object 
> used by DatanodeManager and requires Nodes stored in the topology to be 
> {{DatanodeDescriptor}} for block placement. So we need to make sure we don't 
> pollute the  NetworkTopology if we plan to fix it on the server side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance

2016-12-02 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15717543#comment-15717543
 ] 

Ming Ma commented on HDFS-10206:


To clarify, "two nodes of the same network path" referred to two identical 
nodes, just like how getWeight could return 0 in such case.

> getBlockLocations might not sort datanodes properly by distance
> ---
>
> Key: HDFS-10206
> URL: https://issues.apache.org/jira/browse/HDFS-10206
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Nandakumar
> Attachments: HDFS-10206.000.patch, HDFS-10206.001.patch, 
> HDFS-10206.002.patch
>
>
> If the DFSClient machine is not a datanode, but it shares its rack with some 
> datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} 
> might not put the local-rack datanodes at the beginning of the sorted list. 
> That is because the function didn't call {{networktopology.add(client);}} to 
> properly set the node's parent node; something required by 
> {{networktopology.sortByDistance}} to compute distance between two nodes in 
> the same topology tree.
> Another issue with {{networktopology.sortByDistance}} is it only 
> distinguishes local rack from remote rack, but it doesn't support general 
> distance calculation to tell how remote the rack is.
> {noformat}
> NetworkTopology.java
>   protected int getWeight(Node reader, Node node) {
> // 0 is local, 1 is same rack, 2 is off rack
> // Start off by initializing to off rack
> int weight = 2;
> if (reader != null) {
>   if (reader.equals(node)) {
> weight = 0;
>   } else if (isOnSameRack(reader, node)) {
> weight = 1;
>   }
> }
> return weight;
>   }
> {noformat}
> HDFS-10203 has suggested moving the sorting from namenode to DFSClient to 
> address another issue. Regardless of where we do the sorting, we still need 
> fix the issues outline here.
> Note that BlockPlacementPolicyDefault shares the same NetworkTopology object 
> used by DatanodeManager and requires Nodes stored in the topology to be 
> {{DatanodeDescriptor}} for block placement. So we need to make sure we don't 
> pollute the  NetworkTopology if we plan to fix it on the server side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance

2016-12-02 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15717370#comment-15717370
 ] 

Ming Ma commented on HDFS-10206:


Thanks [~nandakumar131]! The patches look good overall. To make the method more 
general, seems better to have getWeightUsingNetworkLocation return 0 when two 
nodes have the same network path. [~daryn] [~kihwal], any concerns about the 
added 0.1ms latency? Note this only happens for non-datanode reader scenario 
and it doesn't hold FSNamesystem lock.

> getBlockLocations might not sort datanodes properly by distance
> ---
>
> Key: HDFS-10206
> URL: https://issues.apache.org/jira/browse/HDFS-10206
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Nandakumar
> Attachments: HDFS-10206.000.patch, HDFS-10206.001.patch, 
> HDFS-10206.002.patch
>
>
> If the DFSClient machine is not a datanode, but it shares its rack with some 
> datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} 
> might not put the local-rack datanodes at the beginning of the sorted list. 
> That is because the function didn't call {{networktopology.add(client);}} to 
> properly set the node's parent node; something required by 
> {{networktopology.sortByDistance}} to compute distance between two nodes in 
> the same topology tree.
> Another issue with {{networktopology.sortByDistance}} is it only 
> distinguishes local rack from remote rack, but it doesn't support general 
> distance calculation to tell how remote the rack is.
> {noformat}
> NetworkTopology.java
>   protected int getWeight(Node reader, Node node) {
> // 0 is local, 1 is same rack, 2 is off rack
> // Start off by initializing to off rack
> int weight = 2;
> if (reader != null) {
>   if (reader.equals(node)) {
> weight = 0;
>   } else if (isOnSameRack(reader, node)) {
> weight = 1;
>   }
> }
> return weight;
>   }
> {noformat}
> HDFS-10203 has suggested moving the sorting from namenode to DFSClient to 
> address another issue. Regardless of where we do the sorting, we still need 
> fix the issues outline here.
> Note that BlockPlacementPolicyDefault shares the same NetworkTopology object 
> used by DatanodeManager and requires Nodes stored in the topology to be 
> {{DatanodeDescriptor}} for block placement. So we need to make sure we don't 
> pollute the  NetworkTopology if we plan to fix it on the server side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11096) Support rolling upgrade between 2.x and 3.x

2016-11-29 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15707537#comment-15707537
 ] 

Ming Ma commented on HDFS-11096:


Thanks [~andrew.wang] [~ka...@cloudera.com] for bringing up this important 
topic.

bq. this means we can't remove in 3.0 unless it was deprecated in 2.2.
To have comprehensive coverage, can we use static analysis tool to do cross 
reference between branch 2 to trunk's source code, or JACC can handle that?

bq. This has been a problem even within just the 2.x line, so there's a real 
need for better cross-version integration testing
Indeed. Such Investment will pay off in the long term, e.g. whatever 
cross-version integration test system we come up can be used not only for 2.x 
-> 3.x, but 2.x -> 2.y as well.

* Regarding how to automate binary and wired compatibility verification, we can 
do something within the hadoop project first without integration with upper 
layers. In addition, maybe there is a way to test it on a dev machine, for 
example package 2.x jars into a container to run client or datanode, then test 
it with 3.x containers. Or maybe some sort of setup using Jenkins + Docker 
containers. This could have caught some of 2.x incompatibility issues.

* For prioritization, we can also evaluate the impact or there is any work 
around. For example, we don't have to verify 2.7 -> 3.0 if we know 2.7 -> 2.8 
and 2.8 -> 3.0 work.

* Rolling upgrade is important for high SLA data ingestion or maybe hbase 
scenario. Without that, we have to fail over the entire cluster during upgrade.

> Support rolling upgrade between 2.x and 3.x
> ---
>
> Key: HDFS-11096
> URL: https://issues.apache.org/jira/browse/HDFS-11096
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rolling upgrades
>Affects Versions: 3.0.0-alpha1
>Reporter: Andrew Wang
>Priority: Blocker
>
> trunk has a minimum software version of 3.0.0-alpha1. This means we can't 
> rolling upgrade between branch-2 and trunk.
> This is a showstopper for large deployments. Unless there are very compelling 
> reasons to break compatibility, let's restore the ability to rolling upgrade 
> to 3.x releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance

2016-11-29 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15707535#comment-15707535
 ] 

Ming Ma commented on HDFS-10206:


bq. Can you point out the variables which are to be made more generic?
nonDataNodeReader. However, it turns out NetworkTopology has several existing 
references of "datanode". So It is good to have and up to you if you want to 
fix it.

bq. With 000.patch the weight is calculated using network location for off rack 
datanodes which impacts the micro-benchmark results.
Got it. Thanks for the clarification. So 001.patch shouldn't has difference. Do 
you mind confirming?

bq. Weight calculation after this patch
Can you confirm with 0002.patch the weights? It seems to return 0, 2, 4. The 
old behavior is 0, 1, 2.

> getBlockLocations might not sort datanodes properly by distance
> ---
>
> Key: HDFS-10206
> URL: https://issues.apache.org/jira/browse/HDFS-10206
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Nandakumar
> Attachments: HDFS-10206.000.patch, HDFS-10206.001.patch, 
> HDFS-10206.002.patch
>
>
> If the DFSClient machine is not a datanode, but it shares its rack with some 
> datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} 
> might not put the local-rack datanodes at the beginning of the sorted list. 
> That is because the function didn't call {{networktopology.add(client);}} to 
> properly set the node's parent node; something required by 
> {{networktopology.sortByDistance}} to compute distance between two nodes in 
> the same topology tree.
> Another issue with {{networktopology.sortByDistance}} is it only 
> distinguishes local rack from remote rack, but it doesn't support general 
> distance calculation to tell how remote the rack is.
> {noformat}
> NetworkTopology.java
>   protected int getWeight(Node reader, Node node) {
> // 0 is local, 1 is same rack, 2 is off rack
> // Start off by initializing to off rack
> int weight = 2;
> if (reader != null) {
>   if (reader.equals(node)) {
> weight = 0;
>   } else if (isOnSameRack(reader, node)) {
> weight = 1;
>   }
> }
> return weight;
>   }
> {noformat}
> HDFS-10203 has suggested moving the sorting from namenode to DFSClient to 
> address another issue. Regardless of where we do the sorting, we still need 
> fix the issues outline here.
> Note that BlockPlacementPolicyDefault shares the same NetworkTopology object 
> used by DatanodeManager and requires Nodes stored in the topology to be 
> {{DatanodeDescriptor}} for block placement. So we need to make sure we don't 
> pollute the  NetworkTopology if we plan to fix it on the server side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance

2016-11-28 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15702984#comment-15702984
 ] 

Ming Ma commented on HDFS-10206:


* NetworkTopology can be used by HDFS, YARN and MAPREDUCE. It is better to make 
variable names more general.

bq. Out of three replica, one will be in off rack datanode which is causing the 
difference
But the reader should pick the closest one, either "Same Node" and "DataNode in 
same rack". Perhaps you can clarify the setup.

bq. Weight calculation after this patch
So the weight value definition has changed. It should be fine given it isn't a 
public interface. Still NetworkTopologyWithNodeGroup has its own getWeight 
definition based on the old definition. Either we update that or keep the 
weight value.



> getBlockLocations might not sort datanodes properly by distance
> ---
>
> Key: HDFS-10206
> URL: https://issues.apache.org/jira/browse/HDFS-10206
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Nandakumar
> Attachments: HDFS-10206.000.patch, HDFS-10206.001.patch, 
> HDFS-10206.002.patch
>
>
> If the DFSClient machine is not a datanode, but it shares its rack with some 
> datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} 
> might not put the local-rack datanodes at the beginning of the sorted list. 
> That is because the function didn't call {{networktopology.add(client);}} to 
> properly set the node's parent node; something required by 
> {{networktopology.sortByDistance}} to compute distance between two nodes in 
> the same topology tree.
> Another issue with {{networktopology.sortByDistance}} is it only 
> distinguishes local rack from remote rack, but it doesn't support general 
> distance calculation to tell how remote the rack is.
> {noformat}
> NetworkTopology.java
>   protected int getWeight(Node reader, Node node) {
> // 0 is local, 1 is same rack, 2 is off rack
> // Start off by initializing to off rack
> int weight = 2;
> if (reader != null) {
>   if (reader.equals(node)) {
> weight = 0;
>   } else if (isOnSameRack(reader, node)) {
> weight = 1;
>   }
> }
> return weight;
>   }
> {noformat}
> HDFS-10203 has suggested moving the sorting from namenode to DFSClient to 
> address another issue. Regardless of where we do the sorting, we still need 
> fix the issues outline here.
> Note that BlockPlacementPolicyDefault shares the same NetworkTopology object 
> used by DatanodeManager and requires Nodes stored in the topology to be 
> {{DatanodeDescriptor}} for block placement. So we need to make sure we don't 
> pollute the  NetworkTopology if we plan to fix it on the server side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance

2016-11-23 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692234#comment-15692234
 ] 

Ming Ma commented on HDFS-10206:


Thanks [~nanda619] for the micro benchmark and the new patch.

* Any idea why 000.patch makes difference for the "Same Node" and "DataNode in 
same rack"?
* In the context of the overall data transfer duration, the overhead of 0.1ms 
looks acceptable, especially given DatanodeManager#sortLocatedBlocks doesn't 
take FSNamesystem's lock.
* It seems getWeightUsingNetworkLocation and normalizeNetworkLocationPath can 
be static.
* getWeight function calls getDistance, which returns the distance between two 
nodes, not the weight defined as the distance between nodes and ancestors. 
Maybe we can define a new function like getDistanceToClosestCommonAncestor, 
which can also take care of the isOnSameRack case as well.
* About ReadWriteLock.readLock, it might be ok given under normal workload 
there won't be much write to NetworkTopology.


> getBlockLocations might not sort datanodes properly by distance
> ---
>
> Key: HDFS-10206
> URL: https://issues.apache.org/jira/browse/HDFS-10206
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Nandakumar
> Attachments: HDFS-10206.000.patch, HDFS-10206.001.patch
>
>
> If the DFSClient machine is not a datanode, but it shares its rack with some 
> datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} 
> might not put the local-rack datanodes at the beginning of the sorted list. 
> That is because the function didn't call {{networktopology.add(client);}} to 
> properly set the node's parent node; something required by 
> {{networktopology.sortByDistance}} to compute distance between two nodes in 
> the same topology tree.
> Another issue with {{networktopology.sortByDistance}} is it only 
> distinguishes local rack from remote rack, but it doesn't support general 
> distance calculation to tell how remote the rack is.
> {noformat}
> NetworkTopology.java
>   protected int getWeight(Node reader, Node node) {
> // 0 is local, 1 is same rack, 2 is off rack
> // Start off by initializing to off rack
> int weight = 2;
> if (reader != null) {
>   if (reader.equals(node)) {
> weight = 0;
>   } else if (isOnSameRack(reader, node)) {
> weight = 1;
>   }
> }
> return weight;
>   }
> {noformat}
> HDFS-10203 has suggested moving the sorting from namenode to DFSClient to 
> address another issue. Regardless of where we do the sorting, we still need 
> fix the issues outline here.
> Note that BlockPlacementPolicyDefault shares the same NetworkTopology object 
> used by DatanodeManager and requires Nodes stored in the topology to be 
> {{DatanodeDescriptor}} for block placement. So we need to make sure we don't 
> pollute the  NetworkTopology if we plan to fix it on the server side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance

2016-11-19 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15679692#comment-15679692
 ] 

Ming Ma commented on HDFS-10206:


bq. Any comments on using NetworkTopology.contains(node) to check and use 
NetworkTopology.getDistance(node1, node2) to get the distance in case if the 
reader is an off rack datanode?
Here is another option. DatanodeManager#sortLocatedBlock already knows if its a 
datanode. So we can have a new NetworkTopology#sortByDistance that supports 
check-by-reference.

> getBlockLocations might not sort datanodes properly by distance
> ---
>
> Key: HDFS-10206
> URL: https://issues.apache.org/jira/browse/HDFS-10206
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Nandakumar
> Attachments: HDFS-10206.000.patch
>
>
> If the DFSClient machine is not a datanode, but it shares its rack with some 
> datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} 
> might not put the local-rack datanodes at the beginning of the sorted list. 
> That is because the function didn't call {{networktopology.add(client);}} to 
> properly set the node's parent node; something required by 
> {{networktopology.sortByDistance}} to compute distance between two nodes in 
> the same topology tree.
> Another issue with {{networktopology.sortByDistance}} is it only 
> distinguishes local rack from remote rack, but it doesn't support general 
> distance calculation to tell how remote the rack is.
> {noformat}
> NetworkTopology.java
>   protected int getWeight(Node reader, Node node) {
> // 0 is local, 1 is same rack, 2 is off rack
> // Start off by initializing to off rack
> int weight = 2;
> if (reader != null) {
>   if (reader.equals(node)) {
> weight = 0;
>   } else if (isOnSameRack(reader, node)) {
> weight = 1;
>   }
> }
> return weight;
>   }
> {noformat}
> HDFS-10203 has suggested moving the sorting from namenode to DFSClient to 
> address another issue. Regardless of where we do the sorting, we still need 
> fix the issues outline here.
> Note that BlockPlacementPolicyDefault shares the same NetworkTopology object 
> used by DatanodeManager and requires Nodes stored in the topology to be 
> {{DatanodeDescriptor}} for block placement. So we need to make sure we don't 
> pollute the  NetworkTopology if we plan to fix it on the server side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance

2016-11-17 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674317#comment-15674317
 ] 

Ming Ma commented on HDFS-10206:


bq. that is why getDistanceUsingNetworkLocation is called only when the 
conditions reader.equals(node) and isOnSameRack(reader, node) are not satisfied.
There are two scenarios this new function will be called. one is the reader 
being a datanode in a remote rack in a large cluster; for that NetworkTopology 
already has the reader in its tree, it will be faster to compare parents 
reference. Another one is the reader being a non-datanode, the new function 
will be useful here. Do you have any micro benchmark?
bq. With this patch it will be 0 for local, 1 for same rack and after that the 
value is incremented by 1 for each level.
>From the below code, it seems each level will increase by 2.
{noformat}
  weight = (path1Token.length - currentLevel) +
  (path2Token.length - currentLevel);
{noformat}



> getBlockLocations might not sort datanodes properly by distance
> ---
>
> Key: HDFS-10206
> URL: https://issues.apache.org/jira/browse/HDFS-10206
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Nandakumar
> Attachments: HDFS-10206.000.patch
>
>
> If the DFSClient machine is not a datanode, but it shares its rack with some 
> datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} 
> might not put the local-rack datanodes at the beginning of the sorted list. 
> That is because the function didn't call {{networktopology.add(client);}} to 
> properly set the node's parent node; something required by 
> {{networktopology.sortByDistance}} to compute distance between two nodes in 
> the same topology tree.
> Another issue with {{networktopology.sortByDistance}} is it only 
> distinguishes local rack from remote rack, but it doesn't support general 
> distance calculation to tell how remote the rack is.
> {noformat}
> NetworkTopology.java
>   protected int getWeight(Node reader, Node node) {
> // 0 is local, 1 is same rack, 2 is off rack
> // Start off by initializing to off rack
> int weight = 2;
> if (reader != null) {
>   if (reader.equals(node)) {
> weight = 0;
>   } else if (isOnSameRack(reader, node)) {
> weight = 1;
>   }
> }
> return weight;
>   }
> {noformat}
> HDFS-10203 has suggested moving the sorting from namenode to DFSClient to 
> address another issue. Regardless of where we do the sorting, we still need 
> fix the issues outline here.
> Note that BlockPlacementPolicyDefault shares the same NetworkTopology object 
> used by DatanodeManager and requires Nodes stored in the topology to be 
> {{DatanodeDescriptor}} for block placement. So we need to make sure we don't 
> pollute the  NetworkTopology if we plan to fix it on the server side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance

2016-11-16 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15673058#comment-15673058
 ] 

Ming Ma commented on HDFS-10206:


Thank [~nandakumar131]!

* When the conditions {{reader.equals(node) & isOnSameRack(reader, node) }} 
aren't satisfied, this patch will cause extra string parsing. Wonder if there 
is any major performance impact. If that isn't an issue, can 
getDistanceUsingNetworkLocation handle all scenarios including 
{{reader.equals(node) & isOnSameRack(reader, node) }}?
* It probably doesn't matter much. {{getWeight}} used to return 0, 1, 2, 3, 
etc. as network layer increases. With the patch it changes to 0, 1, 2, 4, etc..


> getBlockLocations might not sort datanodes properly by distance
> ---
>
> Key: HDFS-10206
> URL: https://issues.apache.org/jira/browse/HDFS-10206
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Nandakumar
> Attachments: HDFS-10206.000.patch
>
>
> If the DFSClient machine is not a datanode, but it shares its rack with some 
> datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} 
> might not put the local-rack datanodes at the beginning of the sorted list. 
> That is because the function didn't call {{networktopology.add(client);}} to 
> properly set the node's parent node; something required by 
> {{networktopology.sortByDistance}} to compute distance between two nodes in 
> the same topology tree.
> Another issue with {{networktopology.sortByDistance}} is it only 
> distinguishes local rack from remote rack, but it doesn't support general 
> distance calculation to tell how remote the rack is.
> {noformat}
> NetworkTopology.java
>   protected int getWeight(Node reader, Node node) {
> // 0 is local, 1 is same rack, 2 is off rack
> // Start off by initializing to off rack
> int weight = 2;
> if (reader != null) {
>   if (reader.equals(node)) {
> weight = 0;
>   } else if (isOnSameRack(reader, node)) {
> weight = 1;
>   }
> }
> return weight;
>   }
> {noformat}
> HDFS-10203 has suggested moving the sorting from namenode to DFSClient to 
> address another issue. Regardless of where we do the sorting, we still need 
> fix the issues outline here.
> Note that BlockPlacementPolicyDefault shares the same NetworkTopology object 
> used by DatanodeManager and requires Nodes stored in the topology to be 
> {{DatanodeDescriptor}} for block placement. So we need to make sure we don't 
> pollute the  NetworkTopology if we plan to fix it on the server side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby

2016-11-04 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637336#comment-15637336
 ] 

Ming Ma commented on HDFS-10702:


Thanks [~zhz] for the ping. Thanks [~clouderajiayi] [~mackrorysd] for the great 
work.

Yes it might be useful to leverage inotify, or at least evaluating it. In this 
SbNN polling approach, I am interested in knowing more how the applications 
plan to use it, specifically when they will decide to call getSyncInfo. In 
multi tenant environment, an application might care about specific 
files/directories, not necessarily the namespace has changed at a global level.

Here are some comments specific to the patch.

* Standby namenode has its own checkpoint lock to reduce checkpoint's impact on 
block report. Thus there could be some assumption that checkpointer is the only 
reader of namespace in standby. You might want to confirm if there is any 
implication.
* In the case of multiple standbys, one is the checkpointer, thus you can 
consider allowing client to connect to standbys not doing checkpoint.
* if the server config is "dfs.ha.allow.stale.reads" is set to false, and 
client side enables stale read, it seems the client will still keep trying. 
Wonder if client side should consider the server side config as well.
* Federation configuration support might need some more work. It could depend 
on how you want to enable it on client side. Current patch is based on run time 
config on per client instance. You can also allow define client side config 
like "dfs.client..ha.allow.stale.reads".
* After NN failover, does StaleReadProxyProvider#standbyProxies get refreshed? 
If not, a long running client could keep using the old standby.
* RPC layer is more general that HDFS. So it will be better if allowStandbyRead 
can be refactored out.


> Add a Client API and Proxy Provider to enable stale read from Standby
> -
>
> Key: HDFS-10702
> URL: https://issues.apache.org/jira/browse/HDFS-10702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jiayi Zhou
>Assignee: Jiayi Zhou
>Priority: Minor
> Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch, 
> HDFS-10702.003.patch, HDFS-10702.004.patch, HDFS-10702.005.patch, 
> HDFS-10702.006.patch, StaleReadfromStandbyNN.pdf
>
>
> Currently, clients must always talk to the active NameNode when performing 
> any metadata operation, which means active NameNode could be a bottleneck for 
> scalability. One way to solve this problem is to send read-only operations to 
> Standby NameNode. The disadvantage is that it might be a stale read. 
> Here, I'm thinking of adding a Client API to enable/disable stale read from 
> Standby which gives Client the power to set the staleness restriction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9390) Block management for maintenance states

2016-10-17 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-9390:
--
   Resolution: Fixed
Fix Version/s: 3.0.0-alpha2
   2.9.0
   Status: Resolved  (was: Patch Available)

Thanks [~eddyxu] again. I have committed it to trunk and branch-2.

> Block management for maintenance states
> ---
>
> Key: HDFS-9390
> URL: https://issues.apache.org/jira/browse/HDFS-9390
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: HDFS-9390-2.patch, HDFS-9390-3.patch, HDFS-9390-4.patch, 
> HDFS-9390-5.patch, HDFS-9390-branch-2.002.patch, HDFS-9390-branch-2.patch, 
> HDFS-9390.patch
>
>
> When a node is transitioned to/stay in/transitioned out of maintenance state, 
> we need to make sure blocks w.r.t. that nodes are properly handled.
> * When nodes are put into maintenance, it will first go to 
> ENTERING_MAINTENANCE, and make sure blocks are minimally replicated before 
> the nodes are transitioned to IN_MAINTENANCE.
> * Do not replica blocks when nodes are in maintenance states. Maintenance 
> replica will remain in BlockMaps and thus is still considered valid from 
> block replication point of view. In other words, putting a node to 
> “maintenance” mode won’t trigger BlockManager to replicate its blocks.
> * Do not invalidate replicas on node under maintenance. After any file's 
> replication factor is reduced, NN needs to invalidate some replicas. It 
> should exclude nodes under maintenance in the handling.
> * Do not put IN_MAINTENANCE replicas in LocatedBlock for read operation.
> * Do not allocate any new block on nodes under maintenance.
> * Have Balancer exclude nodes under maintenance.
> * Exclude nodes under maintenance for DN cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9390) Block management for maintenance states

2016-10-16 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-9390:
--
Attachment: HDFS-9390-branch-2.002.patch

Reload with the proper patch name for Jenkins to run.

> Block management for maintenance states
> ---
>
> Key: HDFS-9390
> URL: https://issues.apache.org/jira/browse/HDFS-9390
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-9390-2.patch, HDFS-9390-3.patch, HDFS-9390-4.patch, 
> HDFS-9390-5.patch, HDFS-9390-branch-2.002.patch, HDFS-9390-branch-2.patch, 
> HDFS-9390.patch
>
>
> When a node is transitioned to/stay in/transitioned out of maintenance state, 
> we need to make sure blocks w.r.t. that nodes are properly handled.
> * When nodes are put into maintenance, it will first go to 
> ENTERING_MAINTENANCE, and make sure blocks are minimally replicated before 
> the nodes are transitioned to IN_MAINTENANCE.
> * Do not replica blocks when nodes are in maintenance states. Maintenance 
> replica will remain in BlockMaps and thus is still considered valid from 
> block replication point of view. In other words, putting a node to 
> “maintenance” mode won’t trigger BlockManager to replicate its blocks.
> * Do not invalidate replicas on node under maintenance. After any file's 
> replication factor is reduced, NN needs to invalidate some replicas. It 
> should exclude nodes under maintenance in the handling.
> * Do not put IN_MAINTENANCE replicas in LocatedBlock for read operation.
> * Do not allocate any new block on nodes under maintenance.
> * Have Balancer exclude nodes under maintenance.
> * Exclude nodes under maintenance for DN cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9390) Block management for maintenance states

2016-10-16 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-9390:
--
Attachment: (was: HDFS-9390-2-branch-2.patch)

> Block management for maintenance states
> ---
>
> Key: HDFS-9390
> URL: https://issues.apache.org/jira/browse/HDFS-9390
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-9390-2.patch, HDFS-9390-3.patch, HDFS-9390-4.patch, 
> HDFS-9390-5.patch, HDFS-9390-branch-2.patch, HDFS-9390.patch
>
>
> When a node is transitioned to/stay in/transitioned out of maintenance state, 
> we need to make sure blocks w.r.t. that nodes are properly handled.
> * When nodes are put into maintenance, it will first go to 
> ENTERING_MAINTENANCE, and make sure blocks are minimally replicated before 
> the nodes are transitioned to IN_MAINTENANCE.
> * Do not replica blocks when nodes are in maintenance states. Maintenance 
> replica will remain in BlockMaps and thus is still considered valid from 
> block replication point of view. In other words, putting a node to 
> “maintenance” mode won’t trigger BlockManager to replicate its blocks.
> * Do not invalidate replicas on node under maintenance. After any file's 
> replication factor is reduced, NN needs to invalidate some replicas. It 
> should exclude nodes under maintenance in the handling.
> * Do not put IN_MAINTENANCE replicas in LocatedBlock for read operation.
> * Do not allocate any new block on nodes under maintenance.
> * Have Balancer exclude nodes under maintenance.
> * Exclude nodes under maintenance for DN cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   >