Tarun Parimi created YARN-9209:
----------------------------------
Summary: When nodePartition is not set in Placement Constraints,
containers are allocated only in default partition
Key: YARN-9209
URL: https://issues.apache.org/jira/browse/YARN-9209
Project: Hadoop YARN
Issue Type: Bug
Components: capacity scheduler, scheduler
Affects Versions: 3.1.0
Reporter: Tarun Parimi
When application sets a placement constraint without specifying a
nodePartition, the default partition is always chosen as the constraint when
allocating containers. This can be a problem. when an application is submitted
to a queue which has doesn't have enough capacity available on the default
partition.
This is a common scenario when node labels are configured for a particular
queue. The below sample sleeper service cannot get even a single container
allocated when it is submitted to a "labeled_queue", even though enough
capacity is available on the label/partition configured for the queue. Only the
AM container runs.
{code:java} { "name": "sleeper-service", "version": "1.0.0",
"queue":"labeled_queue", "components" : [ { "name": "sleeper",
"number_of_containers": 2, "launch_command": "sleep 90000", "resource": {
"cpus": 1, "memory": "4096" }, "placement_policy": { "constraints": [ { "type":
"ANTI_AFFINITY", "scope": "NODE", "target_tags": [ "sleeper" ] } ] } } ] }
{code}
It runs fine if I specify the node_partition explicitly in the constraints like
below.
{code:java} { "name": "sleeper-service", "version": "1.0.0",
"queue":"labeled_queue", "components" : [ { "name": "sleeper",
"number_of_containers": 2, "launch_command": "sleep 90000", "resource": {
"cpus": 1, "memory": "4096" }, "placement_policy": { "constraints": [ { "type":
"ANTI_AFFINITY", "scope": "NODE", "target_tags": [ "sleeper" ],
"node_partition": [ "label" ] } ] } } ] } {code}
The problem seems to be because only the default partition "" is considered
when node_partition constraint is not specified as seen in below RM log.
{code:java} 2019-01-17 16:51:59,921 INFO
placement.SingleConstraintAppPlacementAllocator
(SingleConstraintAppPlacementAllocator.java:validateAndSetSchedulingRequest(367))
- Successfully added SchedulingRequest to
app=appattempt_1547734161165_0010_000001 targetAllocationTags=[sleeper].
nodePartition= {code}
However, I think it makes more sense to consider "*" when no node_partition is
specified in the placement constraint. Since not specifying any node_partition
should ideally mean we don't enforce placement constraints on any
node_partition. However we are enforcing the default partition instead now.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]