[ 
https://issues.apache.org/jira/browse/YARN-7612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284659#comment-16284659
 ] 

Arun Suresh commented on YARN-7612:
-----------------------------------

Thanks for the thoughtful review [~leftnoteasy] !!

1.
bq. Algorithm interface is not the true global scheduler: the BatchedRequests 
is requesting for one app. Ideally we should be able to assign containers based 
on requests from multiple apps, correct?
Yup - currently the BatchedRequests consists of requests only from a single 
app. I was thinking we do that as a first cut. Batching across apps might not 
be straight-forward, since any single app can decide how many individual 
requests constitute of a batch (currently a batch of requests is all the 
scheduling requests in a single allocate call). But to batch across apps, we 
would need some other sort of co-ordination via a higher level API - by higher 
level I mean an API that can be invoked by maybe a cluster operator for 
multiple apps together etc. Or we would need to buffer the requests from 1 app 
in the algorithm for some 'x' seconds and then wait for the allocate call from 
the second app. I am not in favor of timed bufferring, since the batching wont 
be deterministic.
The way we plan to do of intra-app, is to use the Tags manager to record the 
tags allocated by one app - and when the algorithm is processing a Batch from 
the second app, it can consult the TagsManager and see what the earlier 
allocations were and then make a decision. This might not provide optimal 
results, but might not be too bad either (and is less complicated)

2.
bq. In addition to that, it's better not to pass 
tagsManager/constraintsManager/node selector to Algorithm. Instead, we should 
have a separate init method in Algorithm API to store these util classes.
Agreed - I can make that change.

3.
bq. The interaction between Algorithm & PlacementProcessor is too tightly 
coupled: Processor called algorithm once and get a response, in the response it 
includes attempt id, etc. 
Ah.. I think you missunderstood the 'attempt'. Let me try to clarify: The 
'attempt' is basically the placementAttempt. It basically records the number of 
times that specific request was placed but was subsequently rejected by the 
scheduler. If the scheduler rejects a placed request (i.e. a Proposal), we 
increment the placement attempt and re-dispatch it to the Placement Algorithm - 
to maybe try placing on another Node. If the number of retries exceed a max, we 
reject the request and send it back to the AM.
There are 2 types of rejections possible:
# The Placement Algorithm rejects the request - due to some conflicting 
constraints (anti-affinily across nodes for 5 containers when there are only 4 
nodes). These will NOT be retried.
#  The Scheduler rejects a placed Request (request exceeds userlimit / queue 
capacity / node size etc ). These will be retried, since the rejection might 
just be transient.

4.
{quote}
1. The algorithm holds reference to scheduler states (including scheduler 
itself, AllocationTagsManager, ConstraintsManager, etc.)
2. The algorithm makes decision truly based on scheduler state (no additional 
inputs).
{quote}
I guess the above 2 will be satisfied, if I have an init method for the 
Algorithm that takes the NodeSelector, TagsManager and ConstraintsManager - and 
the actual 'place' method with be passed just the BatchedRequests. Right ? If 
so, I agree, I can make the change.

5.
{quote}
3. The result of Algorithm can be pulled by Processor or alternatively, notify 
Processor. I would prefer the latter one, which Algorithm can have a 
PlacementResultNotifier reference (passed in init method). And if any placement 
or reject decision is made, Algorithm will invoke the PlacementResultNotifer
{quote}
Sure - I can give this a shot.






> Add Placement Processor and planner framework
> ---------------------------------------------
>
>                 Key: YARN-7612
>                 URL: https://issues.apache.org/jira/browse/YARN-7612
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>         Attachments: YARN-7612-YARN-6592.001.patch, 
> YARN-7612-YARN-6592.002.patch, YARN-7612-v2.wip.patch, YARN-7612.wip.patch
>
>
> This introduces a Placement Processor and a Planning algorithm framework to 
> handle placement constraints and scheduling requests from an app and places 
> them on nodes.
> The actual planning algorithm(s) will be handled in a YARN-7613.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to