[jira] [Commented] (HDFS-9411) HDFS NodeLabel support

2020-05-31 Thread maobaolong (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120730#comment-17120730
 ] 

maobaolong commented on HDFS-9411:
--

+1. Feature is pretty good. Is there any update on this? 

> HDFS NodeLabel support
> --
>
> Key: HDFS-9411
> URL: https://issues.apache.org/jira/browse/HDFS-9411
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Major
> Attachments: HDFS Node Labels-21-08-2017.pdf, 
> HDFSNodeLabels-15-09-2016.pdf, HDFSNodeLabels-20-06-2016.pdf, 
> HDFS_ZoneLabels-16112015.pdf
>
>
> HDFS currently stores data blocks on different datanodes chosen by 
> BlockPlacement Policy. These datanodes are random within the 
> scope(local-rack/different-rack/nodegroup) of network topology. 
> In Multi-tenant (Tenant can be user/service) scenario, blocks of any tenant 
> can be on any datanodes.
>  Based on applications of different tenant, sometimes datanode might get busy 
> making the other tenant's application to slow down. It would be better if 
> admin's have a provision to logically divide the cluster among multi-tenants.
> NodeLabels adds more options to user to specify constraints to select 
> specific nodes with specific requirements.
> High level design doc to follow soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9411) HDFS NodeLabel support

2020-05-31 Thread maobaolong (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120729#comment-17120729
 ] 

maobaolong commented on HDFS-9411:
--

+1. Feature is pretty good. Is there any update on this? 

> HDFS NodeLabel support
> --
>
> Key: HDFS-9411
> URL: https://issues.apache.org/jira/browse/HDFS-9411
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Major
> Attachments: HDFS Node Labels-21-08-2017.pdf, 
> HDFSNodeLabels-15-09-2016.pdf, HDFSNodeLabels-20-06-2016.pdf, 
> HDFS_ZoneLabels-16112015.pdf
>
>
> HDFS currently stores data blocks on different datanodes chosen by 
> BlockPlacement Policy. These datanodes are random within the 
> scope(local-rack/different-rack/nodegroup) of network topology. 
> In Multi-tenant (Tenant can be user/service) scenario, blocks of any tenant 
> can be on any datanodes.
>  Based on applications of different tenant, sometimes datanode might get busy 
> making the other tenant's application to slow down. It would be better if 
> admin's have a provision to logically divide the cluster among multi-tenants.
> NodeLabels adds more options to user to specify constraints to select 
> specific nodes with specific requirements.
> High level design doc to follow soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9411) HDFS NodeLabel support

2018-10-26 Thread Fei Hui (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664914#comment-16664914
 ] 

Fei Hui commented on HDFS-9411:
---

[~vinayrpet] Feature is pretty good. Is there any update on this ?

> HDFS NodeLabel support
> --
>
> Key: HDFS-9411
> URL: https://issues.apache.org/jira/browse/HDFS-9411
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Major
> Attachments: HDFS Node Labels-21-08-2017.pdf, 
> HDFSNodeLabels-15-09-2016.pdf, HDFSNodeLabels-20-06-2016.pdf, 
> HDFS_ZoneLabels-16112015.pdf
>
>
> HDFS currently stores data blocks on different datanodes chosen by 
> BlockPlacement Policy. These datanodes are random within the 
> scope(local-rack/different-rack/nodegroup) of network topology. 
> In Multi-tenant (Tenant can be user/service) scenario, blocks of any tenant 
> can be on any datanodes.
>  Based on applications of different tenant, sometimes datanode might get busy 
> making the other tenant's application to slow down. It would be better if 
> admin's have a provision to logically divide the cluster among multi-tenants.
> NodeLabels adds more options to user to specify constraints to select 
> specific nodes with specific requirements.
> High level design doc to follow soon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9411) HDFS NodeLabel support

2017-03-14 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924558#comment-15924558
 ] 

Vinayakumar B commented on HDFS-9411:
-

bq. Guys .. Whether this is still active? We talked recently with one 
HDFS/HBase user and they also in need of a similar mechanism
Yes [~anoop.hbase]. Will resume on this soon.

As [~rakeshr] mentioned, favoured nodes are just hints for first time write 
only. Nodelabels should help to solve your case.

> HDFS NodeLabel support
> --
>
> Key: HDFS-9411
> URL: https://issues.apache.org/jira/browse/HDFS-9411
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Attachments: HDFSNodeLabels-15-09-2016.pdf, 
> HDFSNodeLabels-20-06-2016.pdf, HDFS_ZoneLabels-16112015.pdf
>
>
> HDFS currently stores data blocks on different datanodes chosen by 
> BlockPlacement Policy. These datanodes are random within the 
> scope(local-rack/different-rack/nodegroup) of network topology. 
> In Multi-tenant (Tenant can be user/service) scenario, blocks of any tenant 
> can be on any datanodes.
>  Based on applications of different tenant, sometimes datanode might get busy 
> making the other tenant's application to slow down. It would be better if 
> admin's have a provision to logically divide the cluster among multi-tenants.
> NodeLabels adds more options to user to specify constraints to select 
> specific nodes with specific requirements.
> High level design doc to follow soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9411) HDFS NodeLabel support

2017-03-09 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902964#comment-15902964
 ] 

Rakesh R commented on HDFS-9411:


bq. But the block re replication wont consider/honor this favored nodes ?
[~anoop.hbase], With the favored nodes, block will be pinned at the datanodes 
to prevent mover/balancer tool operations. Since the favored nodes information 
is not persisted in HDFS(Namenode), during replication these blocks will be 
re-replicated to new datanodes. I hope the following link, [javadoc 
DistributedFileSystem#create function with favored 
nodes|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L390]
 will give you better clarity about the current behaviors.


> HDFS NodeLabel support
> --
>
> Key: HDFS-9411
> URL: https://issues.apache.org/jira/browse/HDFS-9411
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Attachments: HDFSNodeLabels-15-09-2016.pdf, 
> HDFSNodeLabels-20-06-2016.pdf, HDFS_ZoneLabels-16112015.pdf
>
>
> HDFS currently stores data blocks on different datanodes chosen by 
> BlockPlacement Policy. These datanodes are random within the 
> scope(local-rack/different-rack/nodegroup) of network topology. 
> In Multi-tenant (Tenant can be user/service) scenario, blocks of any tenant 
> can be on any datanodes.
>  Based on applications of different tenant, sometimes datanode might get busy 
> making the other tenant's application to slow down. It would be better if 
> admin's have a provision to logically divide the cluster among multi-tenants.
> NodeLabels adds more options to user to specify constraints to select 
> specific nodes with specific requirements.
> High level design doc to follow soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9411) HDFS NodeLabel support

2017-03-08 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902571#comment-15902571
 ] 

Anoop Sam John commented on HDFS-9411:
--

Guys .. Whether this is still active?  We talked recently with one HDFS/HBase 
user and they also in need of a similar mechanism.  We were thinking of using 
the favored nodes feature..  But the block re replication wont consider/honor 
this favored nodes (?)
cc [~ram_krish], [~rakeshr]

> HDFS NodeLabel support
> --
>
> Key: HDFS-9411
> URL: https://issues.apache.org/jira/browse/HDFS-9411
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Attachments: HDFSNodeLabels-15-09-2016.pdf, 
> HDFSNodeLabels-20-06-2016.pdf, HDFS_ZoneLabels-16112015.pdf
>
>
> HDFS currently stores data blocks on different datanodes chosen by 
> BlockPlacement Policy. These datanodes are random within the 
> scope(local-rack/different-rack/nodegroup) of network topology. 
> In Multi-tenant (Tenant can be user/service) scenario, blocks of any tenant 
> can be on any datanodes.
>  Based on applications of different tenant, sometimes datanode might get busy 
> making the other tenant's application to slow down. It would be better if 
> admin's have a provision to logically divide the cluster among multi-tenants.
> NodeLabels adds more options to user to specify constraints to select 
> specific nodes with specific requirements.
> High level design doc to follow soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9411) HDFS NodeLabel support

2016-07-04 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15361119#comment-15361119
 ] 

Vinayakumar B commented on HDFS-9411:
-

Thanks for taking look [~drankye]

bq. This sounds like storage policy? How about rename?
Rename of a file/directory also will carry original label expression. This 
could be done by storing the inherited label expression on file/directory being 
renamed.

bq. Is there any means to specify a label or label expression is STRICT or not 
(OPTIONAL)?
As I mentioned, STRICT is for the initial development, which is not optional. 
Later different modes could be supported.

bq. A minor, I thought you may mean, "So to remove a label, admin can ..."
Thanks for the find, will fix in next rev.

bq. This sounds good. Such label spec would be good to be in common side so HDS 
and YARN can share it consistently.
I thought about it initially. Bringing both code to common may need little more 
changes as YARN node-labels are already part of releases. As of now, looking to 
keep the user-faced API/commands and behavior in sync with YARN.

bq. I'm not sure how it's done in YARN, maybe a property file in datanode 
letting admin list the labels there? Some labels like arch, OS can be 
automatically detected or discovered while datanode starting. I'm thinking 
about how to make labels easy to configure and use.
AFAIK, Yarn also uses admin commands to specify the labels to nodes and then it 
RM stores in node-storage, which is persisted. But in HDFS, unlike YARN, 
nothing related to Nodes are persisted in NN. Everything will be dynamically 
built. Unlike Nodemanagers, datanodes involve persisted user-data, Its better 
to be able to specify only via-admin commands.

bq. From HDFS perspective this sounds pretty good, and my overall suggestion 
would be, define and make the basic node label support in common side, in order 
to: 1) generic node label isn't essentially specific to HDFS, though some 
labels are. 2) shared by both HDFS and YARN in future, so admin may save some 
work, for example, using some common means admin can just specify all the 
labels for a node in a time, for both YARN and HDFS. 3) consistent in logic and 
behavior. Roughly, a job for a tenant should be scheduled to the datanodes 
where the input data reside for locality. 4) broad discussion to involve YARN 
guys. I understand it's not easy to split, but would be good to think about it. 
Thanks.
Thank you. I know it would be good to be generic and make it common.
I think, for current features too, in admin's point of view there are less 
things made common between HDFS and YARN. For ex: Underlying disks might be 
same for both HDFS and YARN, both needs to be configured in different 
configurations.
And morever,I feel refactoring of already available yarn-nodelabel would be 
risky.
May be this combining and refactoring can be taken later? 

> HDFS NodeLabel support
> --
>
> Key: HDFS-9411
> URL: https://issues.apache.org/jira/browse/HDFS-9411
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Attachments: HDFSNodeLabels-20-06-2016.pdf, 
> HDFS_ZoneLabels-16112015.pdf
>
>
> HDFS currently stores data blocks on different datanodes chosen by 
> BlockPlacement Policy. These datanodes are random within the 
> scope(local-rack/different-rack/nodegroup) of network topology. 
> In Multi-tenant (Tenant can be user/service) scenario, blocks of any tenant 
> can be on any datanodes.
>  Based on applications of different tenant, sometimes datanode might get busy 
> making the other tenant's application to slow down. It would be better if 
> admin's have a provision to logically divide the cluster among multi-tenants.
> NodeLabels adds more options to user to specify constraints to select 
> specific nodes with specific requirements.
> High level design doc to follow soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9411) HDFS NodeLabel support

2016-06-23 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347292#comment-15347292
 ] 

Kai Zheng commented on HDFS-9411:
-

Thanks [~vinayrpet] so much for addressing my questions! A few more after 
reading the new revision.

1. >>Change in Label expression set on directories will be inherited for new 
files. Does not reflect on already existing files.
This sounds like storage policy? How about rename?

2. >>Current scope for label expression should be STRICT. i.e. If node doesn’t 
satisfy the expression, will not be chosen. If no node satisfies write should 
fail
Is there any means to specify a label or label expression is STRICT or not 
(OPTIONAL)?

3. >>Labels can be removed only when there are no nodes associated with it. So 
to remove a node, admin can reset/change labels on nodes first, then can remove 
the labels from NameNode. 
A minor, I thought you may mean, "So to remove a label, admin can ..."

4. >>Label for each node should start with an alpha-numeric character...
This sounds good. Such label spec would be good to be in common side so HDS and 
YARN can share it consistently.

5. >>NodeLabel->DataNode mapping will be done by DfsAdmin.
I'm not sure how it's done in YARN, maybe a property file in datanode letting 
admin list the labels there? Some labels like arch, OS can be automatically 
detected or discovered while datanode starting. I'm thinking about how to make 
labels easy to configure and use.

>From HDFS perspective this sounds pretty good, and my overall suggestion would 
>be, define and make the basic node label support in common side, in order to: 
>1) generic node label isn't essentially specific to HDFS, though some labels 
>are. 2) shared by both HDFS and YARN in future, so admin may save some work, 
>for example, using some common means admin can just specify all the labels for 
>a node in a time, for both YARN and HDFS. 3) consistent in logic and behavior. 
>Roughly, a job for a tenant should be scheduled to the datanodes where the 
>input data reside for locality. 4) broad discussion to involve YARN guys. I 
>understand it's not easy to split, but would be good to think about it. Thanks.

> HDFS NodeLabel support
> --
>
> Key: HDFS-9411
> URL: https://issues.apache.org/jira/browse/HDFS-9411
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Attachments: HDFSNodeLabels-20-06-2016.pdf, 
> HDFS_ZoneLabels-16112015.pdf
>
>
> HDFS currently stores data blocks on different datanodes chosen by 
> BlockPlacement Policy. These datanodes are random within the 
> scope(local-rack/different-rack/nodegroup) of network topology. 
> In Multi-tenant (Tenant can be user/service) scenario, blocks of any tenant 
> can be on any datanodes.
>  Based on applications of different tenant, sometimes datanode might get busy 
> making the other tenant's application to slow down. It would be better if 
> admin's have a provision to logically divide the cluster among multi-tenants.
> NodeLabels adds more options to user to specify constraints to select 
> specific nodes with specific requirements.
> High level design doc to follow soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9411) HDFS NodeLabel support

2016-06-22 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343776#comment-15343776
 ] 

Vinayakumar B commented on HDFS-9411:
-

HI [~drankye], Answering your earlier comments.
bq. 1. Would it be good to support generic node label instead of ZoneLabel? I 
thought it may be useful for some considerations like cluster provisioning and 
management, security, repl/EC task scheduling and etc. in addition to block 
placement. The label could help specify some node attributes about network, 
CPU, storage, usage, and some other application domains
Yes, New Design is a Generic Node Labels support, which considers EC tasks as 
well.

bq. 2. Given generic node label is used, maybe we can leverage file/directory 
attributes to implement the requirement? Like we create/manage zones of files 
expressed in file attributes and place blocks based on flexible node label 
combinations.
Yes, Design leverages xAttr to support label expressions on path.

bq. 3. So in the design, Zone or ZoneLabel will be the first factor to block 
placement, and will dominate storage policies, right?
Yes, NodeLabel expression will be another factor to select Node, before 
selecting the storage based on storage policy in a node.

bq. 4. How this might relate to federation and block pool?
IMO, This don't have any specific relation to federation. Datanode's Label is 
applicable for all NameNodes it serving. So Label should be created in all 
Namenodes before DN is labelled.

Hope this answers your earlier questions and waiting for some more from new doc.

> HDFS NodeLabel support
> --
>
> Key: HDFS-9411
> URL: https://issues.apache.org/jira/browse/HDFS-9411
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Attachments: HDFSNodeLabels-20-06-2016.pdf, 
> HDFS_ZoneLabels-16112015.pdf
>
>
> HDFS currently stores data blocks on different datanodes chosen by 
> BlockPlacement Policy. These datanodes are random within the 
> scope(local-rack/different-rack/nodegroup) of network topology. 
> In Multi-tenant (Tenant can be user/service) scenario, blocks of any tenant 
> can be on any datanodes.
>  Based on applications of different tenant, sometimes datanode might get busy 
> making the other tenant's application to slow down. It would be better if 
> admin's have a provision to logically divide the cluster among multi-tenants.
> NodeLabels adds more options to user to specify constraints to select 
> specific nodes with specific requirements.
> High level design doc to follow soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9411) HDFS NodeLabel support

2016-06-21 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341433#comment-15341433
 ] 

Kai Zheng commented on HDFS-9411:
-

Hi Vinay,

Sometime ago I gave some comments about the first version of the design doc and 
am not sure if they need to be addressed. It would be great if you could give 
your answers for me to understand the whole. Would be happy to find some time 
to read your new revision. Thanks a lot.

> HDFS NodeLabel support
> --
>
> Key: HDFS-9411
> URL: https://issues.apache.org/jira/browse/HDFS-9411
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Attachments: HDFSNodeLabels-20-06-2016.pdf, 
> HDFS_ZoneLabels-16112015.pdf
>
>
> HDFS currently stores data blocks on different datanodes chosen by 
> BlockPlacement Policy. These datanodes are random within the 
> scope(local-rack/different-rack/nodegroup) of network topology. 
> In Multi-tenant (Tenant can be user/service) scenario, blocks of any tenant 
> can be on any datanodes.
>  Based on applications of different tenant, sometimes datanode might get busy 
> making the other tenant's application to slow down. It would be better if 
> admin's have a provision to logically divide the cluster among multi-tenants.
> NodeLabels adds more options to user to specify constraints to select 
> specific nodes with specific requirements.
> High level design doc to follow soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org