[
https://issues.apache.org/jira/browse/YARN-9209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869342#comment-16869342
]
Weiwei Yang commented on YARN-9209:
-----------------------------------
Agree. +1 for this patch, let's get this issue fixed first.
For the documentation, feel free to create a new issue to track.
Thanks
> When nodePartition is not set in Placement Constraints, containers are
> allocated only in default partition
> ----------------------------------------------------------------------------------------------------------
>
> Key: YARN-9209
> URL: https://issues.apache.org/jira/browse/YARN-9209
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler, scheduler
> Affects Versions: 3.1.0
> Reporter: Tarun Parimi
> Assignee: Tarun Parimi
> Priority: Major
> Attachments: YARN-9209.001.patch, YARN-9209.002.patch,
> YARN-9209.003.patch
>
>
> When application sets a placement constraint without specifying a
> nodePartition, the default partition is always chosen as the constraint when
> allocating containers. This can be a problem. when an application is
> submitted to a queue which has doesn't have enough capacity available on the
> default partition.
> This is a common scenario when node labels are configured for a particular
> queue. The below sample sleeper service cannot get even a single container
> allocated when it is submitted to a "labeled_queue", even though enough
> capacity is available on the label/partition configured for the queue. Only
> the AM container runs.
> {code:java}{
> "name": "sleeper-service",
> "version": "1.0.0",
> "queue": "labeled_queue",
> "components": [
> {
> "name": "sleeper",
> "number_of_containers": 2,
> "launch_command": "sleep 90000",
> "resource": {
> "cpus": 1,
> "memory": "4096"
> },
> "placement_policy": {
> "constraints": [
> {
> "type": "ANTI_AFFINITY",
> "scope": "NODE",
> "target_tags": [
> "sleeper"
> ]
> }
> ]
> }
> }
> ]
> }
> {code}
> It runs fine if I specify the node_partition explicitly in the constraints
> like below.
> {code:java}
> {
> "name": "sleeper-service",
> "version": "1.0.0",
> "queue": "labeled_queue",
> "components": [
> {
> "name": "sleeper",
> "number_of_containers": 2,
> "launch_command": "sleep 90000",
> "resource": {
> "cpus": 1,
> "memory": "4096"
> },
> "placement_policy": {
> "constraints": [
> {
> "type": "ANTI_AFFINITY",
> "scope": "NODE",
> "target_tags": [
> "sleeper"
> ],
> "node_partitions": [
> "label"
> ]
> }
> ]
> }
> }
> ]
> }
> {code}
> The problem seems to be because only the default partition "" is considered
> when node_partition constraint is not specified as seen in below RM log.
> {code:java}
> 2019-01-17 16:51:59,921 INFO placement.SingleConstraintAppPlacementAllocator
> (SingleConstraintAppPlacementAllocator.java:validateAndSetSchedulingRequest(367))
> - Successfully added SchedulingRequest to
> app=appattempt_1547734161165_0010_000001 targetAllocationTags=[sleeper].
> nodePartition=
> {code}
> However, I think it makes more sense to consider "*" or the
> {{default-node-label-expression}} of the queue if configured, when no
> node_partition is specified in the placement constraint. Since not specifying
> any node_partition should ideally mean we don't enforce placement constraints
> on any node_partition. However we are enforcing the default partition instead
> now.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]