[
https://issues.apache.org/jira/browse/YARN-9209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754526#comment-16754526
]
Weiwei Yang commented on YARN-9209:
-----------------------------------
Hi [~leftnoteasy]
I agree with your reasoning about not to support ANY partition, but I think we
are facing a different issue here. User's app runs fine before applying
{code:java}
"placement_policy": {
"constraints": [
{
"type": "ANTI_AFFINITY",
"scope": "NODE",
"target_tags": [
"sleeper"
]
}
]
}
{code}
this placement constraint did not contain node-partition segment, however, it
changes the target partition where this request should be submitted to.
After digging a bit more of the code. For resource requests, I found
{{RMServerUtils.normalizeAndValidateRequests}} that helps to normalize the node
partition info for requests if it is not set, this is according to the queue
settings. Are we missing this for scheduling requests? Should we add something
similar to {{SchedulerUtils#normalizeNodeLabelExpressionInRequest}} in
{{SingleConstraintAppPlacementAllocator}}?
> When nodePartition is not set in Placement Constraints, containers are
> allocated only in default partition
> ----------------------------------------------------------------------------------------------------------
>
> Key: YARN-9209
> URL: https://issues.apache.org/jira/browse/YARN-9209
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler, scheduler
> Affects Versions: 3.1.0
> Reporter: Tarun Parimi
> Assignee: Tarun Parimi
> Priority: Major
> Attachments: YARN-9209.001.patch
>
>
> When application sets a placement constraint without specifying a
> nodePartition, the default partition is always chosen as the constraint when
> allocating containers. This can be a problem. when an application is
> submitted to a queue which has doesn't have enough capacity available on the
> default partition.
> This is a common scenario when node labels are configured for a particular
> queue. The below sample sleeper service cannot get even a single container
> allocated when it is submitted to a "labeled_queue", even though enough
> capacity is available on the label/partition configured for the queue. Only
> the AM container runs.
> {code:java}{
> "name": "sleeper-service",
> "version": "1.0.0",
> "queue": "labeled_queue",
> "components": [
> {
> "name": "sleeper",
> "number_of_containers": 2,
> "launch_command": "sleep 90000",
> "resource": {
> "cpus": 1,
> "memory": "4096"
> },
> "placement_policy": {
> "constraints": [
> {
> "type": "ANTI_AFFINITY",
> "scope": "NODE",
> "target_tags": [
> "sleeper"
> ]
> }
> ]
> }
> }
> ]
> }
> {code}
> It runs fine if I specify the node_partition explicitly in the constraints
> like below.
> {code:java}
> {
> "name": "sleeper-service",
> "version": "1.0.0",
> "queue": "labeled_queue",
> "components": [
> {
> "name": "sleeper",
> "number_of_containers": 2,
> "launch_command": "sleep 90000",
> "resource": {
> "cpus": 1,
> "memory": "4096"
> },
> "placement_policy": {
> "constraints": [
> {
> "type": "ANTI_AFFINITY",
> "scope": "NODE",
> "target_tags": [
> "sleeper"
> ],
> "node_partitions": [
> "label"
> ]
> }
> ]
> }
> }
> ]
> }
> {code}
> The problem seems to be because only the default partition "" is considered
> when node_partition constraint is not specified as seen in below RM log.
> {code:java}
> 2019-01-17 16:51:59,921 INFO placement.SingleConstraintAppPlacementAllocator
> (SingleConstraintAppPlacementAllocator.java:validateAndSetSchedulingRequest(367))
> - Successfully added SchedulingRequest to
> app=appattempt_1547734161165_0010_000001 targetAllocationTags=[sleeper].
> nodePartition=
> {code}
> However, I think it makes more sense to consider "*" or the
> {{default-node-label-expression}} of the queue if configured, when no
> node_partition is specified in the placement constraint. Since not specifying
> any node_partition should ideally mean we don't enforce placement constraints
> on any node_partition. However we are enforcing the default partition instead
> now.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]