[ 
https://issues.apache.org/jira/browse/YARN-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15988319#comment-15988319
 ] 

Devaraj K commented on YARN-5983:
---------------------------------

Thanks [~tangzhankun] and [~zyluo] for the design doc and hardwork, 
[~leftnoteasy] for the discussion.

1.
{code:xml}
The scheduler only considers non-exclusive resource. The exclusive resources may
have extra attributes needs to be matched when scheduling. Not just simply add 
or
reduce a number. For instance, in our PoC, a FPGA slot in one node may already
have one IP flashed so that the scheduler should try to match this IP attribute 
to
reuse it.
{code}

If you are passing all the attributes of the FPGA resources to RM scheduler, 
why do you want to have the NM side resource management? Can you give some 
details about the attributes passing to the RM and details maintain by the NM 
side resource management in abstract terms? 

2. {code:xml}
 Device resource needs additional preparation and isolation before container 
launch.
For instance, FPGA device may need to download an IP file from a repo then 
flash to
an allocated FPGA slot.
{code}
Does this need to be done for each container, Can it be done one time during 
the cluster installation?

3. Can FPGA slots share my multiple containers? How do we prevent if any 
container(Non FPGA allocated container)/application try to use the FPGA 
resources which are not allocated to them?

4. Any changes to ContainerExecutor, how does the application code running in 
the container come to know about the allocated FPGA resource to access/use the 
FPFA?

5. What are the configurations user to need to configure for the application to 
use FPGA resources?


> [Umbrella] Support for FPGA as a Resource in YARN
> -------------------------------------------------
>
>                 Key: YARN-5983
>                 URL: https://issues.apache.org/jira/browse/YARN-5983
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: yarn
>            Reporter: Zhankun Tang
>            Assignee: Zhankun Tang
>         Attachments: YARN-5983-Support-FPGA-resource-on-NM-side_v1.pdf
>
>
> As various big data workload running on YARN, CPU will no longer scale 
> eventually and heterogeneous systems will become more important. ML/DL is a 
> rising star in recent years, applications focused on these areas have to 
> utilize GPU or FPGA to boost performance. Also, hardware vendors such as 
> Intel also invest in such hardware. It is most likely that FPGA will become 
> popular in data centers like CPU in the near future.
> So YARN as a resource managing and scheduling system, would be great to 
> evolve to support this. This JIRA proposes FPGA to be a first-class citizen. 
> The changes roughly includes:
> 1. FPGA resource detection and heartbeat
> 2. Scheduler changes
> 3. FPGA related preparation and isolation before launch container
> We know that YARN-3926 is trying to extend current resource model. But still 
> we can leave some FPGA related discussion here



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to