[
https://issues.apache.org/jira/browse/YARN-9209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869351#comment-16869351
]
Hudson commented on YARN-9209:
------------------------------
FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16802 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/16802/])
YARN-9209. When nodePartition is not set in Placement Constraints, (wwei: rev
83dcb9d87ec75f2be0acb8972f5f0faefe6ffbcd)
* (edit)
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/TestSingleConstraintAppPlacementAllocator.java
* (edit)
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* (edit)
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/SingleConstraintAppPlacementAllocator.java
> When nodePartition is not set in Placement Constraints, containers are
> allocated only in default partition
> ----------------------------------------------------------------------------------------------------------
>
> Key: YARN-9209
> URL: https://issues.apache.org/jira/browse/YARN-9209
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler, scheduler
> Affects Versions: 3.1.0
> Reporter: Tarun Parimi
> Assignee: Tarun Parimi
> Priority: Major
> Attachments: YARN-9209.001.patch, YARN-9209.002.patch,
> YARN-9209.003.patch
>
>
> When application sets a placement constraint without specifying a
> nodePartition, the default partition is always chosen as the constraint when
> allocating containers. This can be a problem. when an application is
> submitted to a queue which has doesn't have enough capacity available on the
> default partition.
> This is a common scenario when node labels are configured for a particular
> queue. The below sample sleeper service cannot get even a single container
> allocated when it is submitted to a "labeled_queue", even though enough
> capacity is available on the label/partition configured for the queue. Only
> the AM container runs.
> {code:java}{
> "name": "sleeper-service",
> "version": "1.0.0",
> "queue": "labeled_queue",
> "components": [
> {
> "name": "sleeper",
> "number_of_containers": 2,
> "launch_command": "sleep 90000",
> "resource": {
> "cpus": 1,
> "memory": "4096"
> },
> "placement_policy": {
> "constraints": [
> {
> "type": "ANTI_AFFINITY",
> "scope": "NODE",
> "target_tags": [
> "sleeper"
> ]
> }
> ]
> }
> }
> ]
> }
> {code}
> It runs fine if I specify the node_partition explicitly in the constraints
> like below.
> {code:java}
> {
> "name": "sleeper-service",
> "version": "1.0.0",
> "queue": "labeled_queue",
> "components": [
> {
> "name": "sleeper",
> "number_of_containers": 2,
> "launch_command": "sleep 90000",
> "resource": {
> "cpus": 1,
> "memory": "4096"
> },
> "placement_policy": {
> "constraints": [
> {
> "type": "ANTI_AFFINITY",
> "scope": "NODE",
> "target_tags": [
> "sleeper"
> ],
> "node_partitions": [
> "label"
> ]
> }
> ]
> }
> }
> ]
> }
> {code}
> The problem seems to be because only the default partition "" is considered
> when node_partition constraint is not specified as seen in below RM log.
> {code:java}
> 2019-01-17 16:51:59,921 INFO placement.SingleConstraintAppPlacementAllocator
> (SingleConstraintAppPlacementAllocator.java:validateAndSetSchedulingRequest(367))
> - Successfully added SchedulingRequest to
> app=appattempt_1547734161165_0010_000001 targetAllocationTags=[sleeper].
> nodePartition=
> {code}
> However, I think it makes more sense to consider "*" or the
> {{default-node-label-expression}} of the queue if configured, when no
> node_partition is specified in the placement constraint. Since not specifying
> any node_partition should ideally mean we don't enforce placement constraints
> on any node_partition. However we are enforcing the default partition instead
> now.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]