[ 
https://issues.apache.org/jira/browse/YARN-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201275#comment-16201275
 ] 

Wangda Tan commented on YARN-4511:
----------------------------------

Thanks Haibo, applogize for my late responses, was busy with other tasks.

Regarding to: {{allocationInThisHeartbeat}} discussion. The related JIRA is 
YARN-5139, in short, which makes scheduler allocation to two separate phases:
Phase #1, Scheduler look at existing scheduler states (queue/node/app, etc.) 
and make allocation proposal (on which node, allocate container). This could be 
done in multiple threads.
Phase #2, There's another thread (now is single thread), look at allocation 
proposal and try to accept/reject them. 
Under the context of YARN-5139, we cannot assume an allocation proposal will be 
accepted. I'm not sure how this impact your approach.

To your proposal:
bq. we'd do allocation of guaranteed containers first followed by opportunistic 
containers. W need to consider the just-allocated-yet-to-launch guaranteed 
containers to project how much resource we have left to allocate opportunistic 
containers.
I'm still not quite sure about how it works: just-allocated-yet-to-launch 
guaranteed containers could be allocated in different heartbeats, correct? It 
is possible that AM acquires an guaranteed container and wait for serveral 
minutes to launch it, I'm not sure if recording total allocated in a single 
node update event is enough. 

bq. I only try to preserve the containerLaunched flag. Can you be more specific 
about what you're referring to in the patch?
I'm talking about below method in SchedulerNode: (it seems renamed in the 
latest patch)
{code}
/**
   * Inform the node that a container has launched.
   * @param containerId ID of the launched container
   */
  public synchronized void containerStarted(ContainerId containerId) {
    ContainerInfo info = launchedContainers.get(containerId);
    if (info != null) {
      info.launchedOnNode = true;
    }
  }
{code}
I'm not sure why we need a separate launchedOnNode flag because we already have 
a launchedContainer map.

bq. There is a jira open to consolidate with Resource Profiles (YARN-6690). Is 
that a good place to do the work to accommodate other resources?
I'm fine with moving this to a separate JIRA, but we need to do this before 
release, otherwise it gonna be very hard to modify defined protos in a future 
release. 

I'm not sure if I asks too much: could you include a summary of workflow of 
this patch and how schedulers will use them. I found there're lots of changes 
(especially inside SchedulerNode) but I cannot see the full picture of how 
scheduler will use them. A workflow can help reviews a lot. 


> Common scheduler changes supporting scheduler-specific implementations
> ----------------------------------------------------------------------
>
>                 Key: YARN-4511
>                 URL: https://issues.apache.org/jira/browse/YARN-4511
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Wangda Tan
>            Assignee: Haibo Chen
>         Attachments: YARN-4511-YARN-1011.00.patch, 
> YARN-4511-YARN-1011.01.patch, YARN-4511-YARN-1011.02.patch, 
> YARN-4511-YARN-1011.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to