[ 
https://issues.apache.org/jira/browse/YARN-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907508#comment-16907508
 ] 

Prabhu Joseph commented on YARN-9290:
-------------------------------------

Thanks [~snemeth] for checking this.

*Overall design:*

When there is a invalid scheduling request made by an application, the allocate 
will retry for configured retry attempts 
({{yarn.resourcemanager.placement-constraints.retry-attempts)}} and then sets 
the invalid SchedulingRequests in AllocateResponse so that ApplicationMaster is 
aware.

 

*Changes:*

{{AppPlacementAllocator.java}} - Maintains the retry made so far. Added new 
instance variable placementAttempt to track it.

{{SingleConstraintAppPlacementAllocator.java}} - Increments the retry attempt 
when the scheduling request is invalid.

{{AppSchedulingInfo.java}} - Rejects the SchedulingRequest if the retry 
attempts has exceeded the configured value.

{{Allocation.java}} - Maintains the list of rejected SchedulingRequest. 
Constructors changed to accept the list.

{{FiCaSchedulerApp.java}} - Creates Allocation object with the list of rejected 
scheduling requests. It fetches them from 
AppSchedulingInfo#getSchedulingRequest.

{{DefaultAMSProcessor.java}} - Sets the rejected scheduling requests in 
AllocateResponse which is passed to AM.

{{FairScheduler.java}} - Changed the Allocation constructor call.

 

*Test Cases:*

{{TestSchedulingRequestContainerAllocation.java}} - Actual TestCase which 
verifies the rejection of invalid scheduling request.

All Below test classes are added with getYarnConfiguration in the mock 
RMContext object as AppSchedulingInfo is changed to get the YarnConfigiration 
from RMContext to read configured 
{{yarn.resourcemanager.placement-constraints.retry-attempts}}.

{{TestAppSchedulingInfo.java}}
{{TestSchedulerApplicationAttempt.java}}
{{TestLeafQueue.java}}
{{TestUtils.java}}
{{TestFSAppAttempt.java}}
{{TestMaxRunningAppsEnforcer.java}}
{{TestQueueManager.java}}
{{TestFifoScheduler.java}}

> Invalid SchedulingRequest not rejected in Scheduler 
> PlacementConstraintsHandler 
> --------------------------------------------------------------------------------
>
>                 Key: YARN-9290
>                 URL: https://issues.apache.org/jira/browse/YARN-9290
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.2.0
>            Reporter: Prabhu Joseph
>            Assignee: Prabhu Joseph
>            Priority: Major
>         Attachments: YARN-9290-001.patch, YARN-9290-002.patch, 
> YARN-9290-003.patch, YARN-9290-004.patch, YARN-9290-005.patch, 
> YARN-9290-006.patch
>
>
> SchedulingRequest with Invalid namespace is not rejected in Scheduler  
> PlacementConstraintsHandler. RM keeps on trying to allocateOnNode with 
> logging the exception. This is rejected in case of placement-processor 
> handler.
> {code}
> 2019-02-08 16:51:27,548 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator:
>  Failed to query node cardinality:
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.InvalidAllocationTagsQueryException:
>  Invalid namespace prefix: notselfi, valid values are: 
> all,not-self,app-id,app-tag,self
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.TargetApplicationsNamespace.fromString(TargetApplicationsNamespace.java:277)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.TargetApplicationsNamespace.parse(TargetApplicationsNamespace.java:234)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.AllocationTags.createAllocationTags(AllocationTags.java:93)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfySingleConstraintExpression(PlacementConstraintsUtil.java:78)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfySingleConstraint(PlacementConstraintsUtil.java:240)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfyConstraints(PlacementConstraintsUtil.java:321)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfyAndConstraint(PlacementConstraintsUtil.java:272)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfyConstraints(PlacementConstraintsUtil.java:324)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfyConstraints(PlacementConstraintsUtil.java:365)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.checkCardinalityAndPending(SingleConstraintAppPlacementAllocator.java:355)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.precheckNode(SingleConstraintAppPlacementAllocator.java:395)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.precheckNode(AppSchedulingInfo.java:779)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.preCheckForNodeCandidateSet(RegularContainerAllocator.java:145)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.allocate(RegularContainerAllocator.java:837)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(RegularContainerAllocator.java:890)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(ContainerAllocator.java:54)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(FiCaSchedulerApp.java:977)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:1173)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:795)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1630)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1624)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1727)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1476)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:1312)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1785)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
>       at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
>       at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to