[
https://issues.apache.org/jira/browse/YARN-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089325#comment-15089325
]
Karam Singh commented on YARN-4565:
-----------------------------------
Came across this issue while experimenting with Fairness in queue with
CapacityScheduler.
Ecountered a situation when FairOrderingPolicy with SizeBasedWeight is enabled
on queue in CapacityScheduler, while running GridMix V3 that all queue queue
resources are consume AMs
Following are setting:
Cluster Total memory capacity 864GB, Global AMResourcePercent=0.1 Global
MaxApplications=10000, minAllocationMb=2048, AM memory=2048,
mapMemory=reduceMemory=2048
Queue Settings:
Capacity=10
MaxCapacity=80
UserLimitFactor=8,
UserLimitPercent=100,
FairOrderingPolicy with SizeBasedWeight=True
According to this at max only 35 AMs can run at a time simultaneously and total
345 containers can run in queue,
Which was verified While running GridMixV3 (which submits 760 applications)
with FairOderingPolicy Only (without SizeBasedWeight)
While when ran same test with FairOderingPolicy with SizeBasedWeight=true,
345 AMs(applications) running and since all queue resources are used by AMs no
more containers can run, causing all application to get stuck.
Looks like sizeBasedWeight somehow changes/overrides amResoucePercent.
> When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler,
> Sometimes lead to situation where all queue resources consumed by AMs only
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-4565
> URL: https://issues.apache.org/jira/browse/YARN-4565
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler, capacityscheduler
> Affects Versions: 2.8.0, 2.7.1
> Reporter: Karam Singh
>
> When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler,
> Sometimes lead to situation where all queue resources consumed by AMs only,
> So from users perpective it appears that all application in queue are stuck,
> whole queue capacity is comsumed by AMs
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)