[jira] [Commented] (YARN-6720) Support updating FPGA related constraint node label after FPGA device re-configuration

2017-07-13 Thread Zhankun Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086747#comment-16086747
 ] 

Zhankun Tang commented on YARN-6720:


[~Naganarasimha], sorry for the late reply.

Yeah. So far, I can only think of several new attributes for GPU/FPGA resource 
handler to use. Maybe it's fine that we defined some constant for GPU/FPGA 
first and improve it if this hard-code is not flexible?

> Support updating FPGA related constraint node label after FPGA device 
> re-configuration
> --
>
> Key: YARN-6720
> URL: https://issues.apache.org/jira/browse/YARN-6720
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Zhankun Tang
> Attachments: 
> Storing-and-Updating-extra-FPGA-resource-attributes-in-hdfs_v1.pdf
>
>
> In order to provide a global optimal scheduling for mutable FPGA resource, it 
> seems an easy and direct way to utilize constraint node labels(YARN-3409) 
> instead of extending the global scheduler(YARN-3926) to match both resource 
> count and attributes.
> The rough idea is that the AM sets the constraint node label expression to 
> request containers on the nodes whose FPGA devices has the matching IP, and 
> then NM resource handler update the node constraint label if there's FPGA 
> device re-configuration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6720) Support updating FPGA related constraint node label after FPGA device re-configuration

2017-07-06 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16077585#comment-16077585
 ] 

Naganarasimha G R commented on YARN-6720:
-

Thanks [~tangzhankun]
bq.  a constraint label update after container finish to indicate docker image 
has been localized is helpful to improve the scheduling. 
This was one of the improvements which i had in my mind, to automatically add 
labels to the nodes for the localized Container images. We will develop it once 
YARN-3409 is in. This is similar to the docker swarm functionality.

bq.  For instance, GPU handler for all different vendor might need to set a 
constraint "GPU_DOCKER_IMAGE_LOCALIZED:True/False" to a node? FPGA handler for 
all vendor might need set "FPGA_IP_NAME:ipname"? If so, is it a burden for end 
users to search and use these scheduling preference?
IIUC you are setting labels for "GPU_DOCKER_IMAGE_LOCALIZED:True/False" and/or 
"FPGA_IP_NAME:ipname", so not many constraints (named newly as attribute ) 
right ? Can you elaborate more to understand the use case ?

> Support updating FPGA related constraint node label after FPGA device 
> re-configuration
> --
>
> Key: YARN-6720
> URL: https://issues.apache.org/jira/browse/YARN-6720
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Zhankun Tang
> Attachments: 
> Storing-and-Updating-extra-FPGA-resource-attributes-in-hdfs_v1.pdf
>
>
> In order to provide a global optimal scheduling for mutable FPGA resource, it 
> seems an easy and direct way to utilize constraint node labels(YARN-3409) 
> instead of extending the global scheduler(YARN-3926) to match both resource 
> count and attributes.
> The rough idea is that the AM sets the constraint node label expression to 
> request containers on the nodes whose FPGA devices has the matching IP, and 
> then NM resource handler update the node constraint label if there's FPGA 
> device re-configuration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6720) Support updating FPGA related constraint node label after FPGA device re-configuration

2017-07-06 Thread Zhankun Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16077524#comment-16077524
 ] 

Zhankun Tang commented on YARN-6720:


[~wangda], I think this is depend on YARN-3409's constraint label APIs.
Agree that for GPU, a constraint label update after container finish to 
indicate docker image has been localized is helpful to improve the scheduling. 
Our idea of updating FPGA IP constraint label is same to this.

One thing uncertain in my mind is that how can we make these constraint labels 
easy to use? Do we need to define plenty of constant key strings? For instance, 
GPU handler for all different vendor might need to set a constraint 
"GPU_DOCKER_IMAGE_LOCALIZED:True/False" to a node?  FPGA handler for all vendor 
might need set "FPGA_IP_NAME:ipname"?  If so, is it a burden for end users to 
search and use these scheduling preference? 

> Support updating FPGA related constraint node label after FPGA device 
> re-configuration
> --
>
> Key: YARN-6720
> URL: https://issues.apache.org/jira/browse/YARN-6720
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Zhankun Tang
> Attachments: 
> Storing-and-Updating-extra-FPGA-resource-attributes-in-hdfs_v1.pdf
>
>
> In order to provide a global optimal scheduling for mutable FPGA resource, it 
> seems an easy and direct way to utilize constraint node labels(YARN-3409) 
> instead of extending the global scheduler(YARN-3926) to match both resource 
> count and attributes.
> The rough idea is that the AM sets the constraint node label expression to 
> request containers on the nodes whose FPGA devices has the matching IP, and 
> then NM resource handler update the node constraint label if there's FPGA 
> device re-configuration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6720) Support updating FPGA related constraint node label after FPGA device re-configuration

2017-07-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075443#comment-16075443
 ] 

Wangda Tan commented on YARN-6720:
--

[~tangzhankun]/[~zyluo], 

bq.  YARN-3409 Wouldn't be a blocker since this JIRA is a improvement of 
YARN-6507.
I'm not sure how to support device meta info in global (RM) scheduler without 
YARN-3409, I couldn't find the answer from attached design doc. Could you 
explain what is the solution in your mind?

Anyway I'm in favor of using a general approach which can be utilized by other 
features instead of customize RM scheduler to support FPGA requirements. GPU 
support is more sensitive to GPU type instead of firmware, but I can see docker 
support can be improved a lot if we can schedule containers to a node which 
already has localized docker image.

> Support updating FPGA related constraint node label after FPGA device 
> re-configuration
> --
>
> Key: YARN-6720
> URL: https://issues.apache.org/jira/browse/YARN-6720
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Zhankun Tang
> Attachments: 
> Storing-and-Updating-extra-FPGA-resource-attributes-in-hdfs_v1.pdf
>
>
> In order to provide a global optimal scheduling for mutable FPGA resource, it 
> seems an easy and direct way to utilize constraint node labels(YARN-3409) 
> instead of extending the global scheduler(YARN-3926) to match both resource 
> count and attributes.
> The rough idea is that the AM sets the constraint node label expression to 
> request containers on the nodes whose FPGA devices has the matching IP, and 
> then NM resource handler update the node constraint label if there's FPGA 
> device re-configuration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6720) Support updating FPGA related constraint node label after FPGA device re-configuration

2017-06-30 Thread Zhankun Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069618#comment-16069618
 ] 

Zhankun Tang commented on YARN-6720:


[~wangda]. Maybe it's my fault. Although the reconfigure FPGA device procedure 
is fast, the downloading may takes a reasonable time which should be avoid. 
That's the key problem this JIRA wants to solve. 

> Support updating FPGA related constraint node label after FPGA device 
> re-configuration
> --
>
> Key: YARN-6720
> URL: https://issues.apache.org/jira/browse/YARN-6720
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Zhankun Tang
> Attachments: 
> Storing-and-Updating-extra-FPGA-resource-attributes-in-hdfs_v1.pdf
>
>
> In order to provide a global optimal scheduling for mutable FPGA resource, it 
> seems an easy and direct way to utilize constraint node labels(YARN-3409) 
> instead of extending the global scheduler(YARN-3926) to match both resource 
> count and attributes.
> The rough idea is that the AM sets the constraint node label expression to 
> request containers on the nodes whose FPGA devices has the matching IP, and 
> then NM resource handler update the node constraint label if there's FPGA 
> device re-configuration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6720) Support updating FPGA related constraint node label after FPGA device re-configuration

2017-06-28 Thread Zhongyue Nah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066397#comment-16066397
 ] 

Zhongyue Nah commented on YARN-6720:


[~leftnoteasy] YARN-3409 Wouldn't be a blocker since this JIRA is a improvement 
of YARN-6507. We intend to move device metadata to global space so that the 
scheduler can make more efficient decisions in terms of IP reuse. I assume 
GPGPUs have similar issues and we wish to find a common solution across all 
resource types before we get this through.

> Support updating FPGA related constraint node label after FPGA device 
> re-configuration
> --
>
> Key: YARN-6720
> URL: https://issues.apache.org/jira/browse/YARN-6720
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Zhankun Tang
> Attachments: 
> Storing-and-Updating-extra-FPGA-resource-attributes-in-hdfs_v1.pdf
>
>
> In order to provide a global optimal scheduling for mutable FPGA resource, it 
> seems an easy and direct way to utilize constraint node labels(YARN-3409) 
> instead of extending the global scheduler(YARN-3926) to match both resource 
> count and attributes.
> The rough idea is that the AM sets the constraint node label expression to 
> request containers on the nodes whose FPGA devices has the matching IP, and 
> then NM resource handler update the node constraint label if there's FPGA 
> device re-configuration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6720) Support updating FPGA related constraint node label after FPGA device re-configuration

2017-06-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16065170#comment-16065170
 ] 

Wangda Tan commented on YARN-6720:
--

Thanks [~tangzhankun] and [~YuQiang Ye] for this proposal.

In general this approach looks good to me. Since this proposal depends on 
YARN-3409, which needs more time to get done. I'm not sure if this is a 
blocker/critical item for this feature, IIRC, offline you mentioned that 
download FPGA firmware and reconfigure FPGA devices only take few seconds, 
which means it is generally fine without this improvement. 

> Support updating FPGA related constraint node label after FPGA device 
> re-configuration
> --
>
> Key: YARN-6720
> URL: https://issues.apache.org/jira/browse/YARN-6720
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Zhankun Tang
> Attachments: 
> Storing-and-Updating-extra-FPGA-resource-attributes-in-hdfs_v1.pdf
>
>
> In order to provide a global optimal scheduling for mutable FPGA resource, it 
> seems an easy and direct way to utilize constraint node labels(YARN-3409) 
> instead of extending the global scheduler(YARN-3926) to match both resource 
> count and attributes.
> The rough idea is that the AM sets the constraint node label expression to 
> request containers on the nodes whose FPGA devices has the matching IP, and 
> then NM resource handler update the node constraint label if there's FPGA 
> device re-configuration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org