[jira] [Comment Edited] (YARN-7494) Add muti-node lookup mechanism and pluggable nodes sorting policies to optimize placement decision
[ https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851660#comment-16851660 ] Juanjuan Tian edited comment on YARN-7494 at 5/30/19 9:02 AM: --- Thanks Weiwei for your reply. Here seems there is another issue in RegularContainerAllocator#allocate, refering to below codes, it iterates though all nodes, but the reservedContainer doesn't change correspondingly with the iterated node, for muti-node policy, the reservedContainer and the iterated node will be inconsistent, and may procude incorrect ContainerAllocation(even though this ContainerAllocation will be abondoned at last, but it seems really wastes opportunity). [~cheersyang] what's your thought about this situation while (iter.hasNext()) { FiCaSchedulerNode node = iter.next(); if (reservedContainer == null) { result = preCheckForNodeCandidateSet(clusterResource, node, schedulingMode, resourceLimits, schedulerKey); if (null != result) { continue; } } else { // pre-check when allocating reserved container if (application.getOutstandingAsksCount(schedulerKey) == 0) { // Release result = new ContainerAllocation(reservedContainer, null, AllocationState.QUEUE_SKIPPED); continue; } } result = tryAllocateOnNode(clusterResource, node, schedulingMode, resourceLimits, schedulerKey, reservedContainer); if (AllocationState.ALLOCATED == result.getAllocationState() || AllocationState.RESERVED == result.getAllocationState()) { result = doAllocation(result, node, schedulerKey, reservedContainer); break; } } was (Author: jutia): Thanks Weiwei for your reply. Here seems there is another issue in RegularContainerAllocator#allocate, refering to below codes, it iterates though all nodes, but the reservedContainer doesn't change correspondingly with the iterated node, for muti-node policy, the reservedContainer and the iterated node will be inconsistent, and may procude incorrect ContainerAllocation(even though this ContainerAllocation will be abondoned at last, but it seems really wastes opportunity). [~cheersyang] what's your thought about this situation while (iter.hasNext()) { FiCaSchedulerNode node = iter.next(); if (reservedContainer == null) { result = preCheckForNodeCandidateSet(clusterResource, node, schedulingMode, resourceLimits, schedulerKey); if (null != result) { continue; } } else { // pre-check when allocating reserved container if (application.getOutstandingAsksCount(schedulerKey) == 0) { // Release result = new ContainerAllocation(reservedContainer, null, AllocationState.QUEUE_SKIPPED); continue; } } result = tryAllocateOnNode(clusterResource, node, schedulingMode, resourceLimits, schedulerKey, reservedContainer); if (AllocationState.ALLOCATED == result.getAllocationState() || AllocationState.RESERVED == result.getAllocationState()) { result = doAllocation(result, node, schedulerKey, reservedContainer); break; } > Add muti-node lookup mechanism and pluggable nodes sorting policies to > optimize placement decision > -- > > Key: YARN-7494 > URL: https://issues.apache.org/jira/browse/YARN-7494 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-7494.001.patch, YARN-7494.002.patch, > YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, > YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, > YARN-7494.009.patch, YARN-7494.010.patch, YARN-7494.11.patch, > YARN-7494.12.patch, YARN-7494.13.patch, YARN-7494.14.patch, > YARN-7494.15.patch, YARN-7494.16.patch, YARN-7494.17.patch, > YARN-7494.18.patch, YARN-7494.19.patch, YARN-7494.20.patch, > YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png > > > Instead of single node, for effectiveness we can consider a multi node lookup > based on partition to start with. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7494) Add muti-node lookup mechanism and pluggable nodes sorting policies to optimize placement decision
[ https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851660#comment-16851660 ] Juanjuan Tian edited comment on YARN-7494 at 5/30/19 9:00 AM: --- Thanks Weiwei for your reply. Here seems there is another issue in RegularContainerAllocator#allocate, refering to below codes, it iterates though all nodes, but the reservedContainer doesn't change correspondingly with the iterated node, for muti-node policy, the reservedContainer and the iterated node will be inconsistent, and may procude incorrect ContainerAllocation(even though this ContainerAllocation will be abondoned at last, but it seems really wastes opportunity). [~cheersyang] what's your thought about this situation while (iter.hasNext()) { FiCaSchedulerNode node = iter.next(); if (reservedContainer == null) { result = preCheckForNodeCandidateSet(clusterResource, node, schedulingMode, resourceLimits, schedulerKey); if (null != result) { continue; } } else { // pre-check when allocating reserved container if (application.getOutstandingAsksCount(schedulerKey) == 0) { // Release result = new ContainerAllocation(reservedContainer, null, AllocationState.QUEUE_SKIPPED); continue; } } result = tryAllocateOnNode(clusterResource, node, schedulingMode, resourceLimits, schedulerKey, reservedContainer); if (AllocationState.ALLOCATED == result.getAllocationState() || AllocationState.RESERVED == result.getAllocationState()) { result = doAllocation(result, node, schedulerKey, reservedContainer); break; } was (Author: jutia): Thanks Weiwei for your reply. Here seems there is another issue in RegularContainerAllocator#allocate, refering to below codes, it iterates though all nodes, but the reservedContainer doesn't change correspondingly with the iterated node, for muti-node policy, the reservedContainer and the iterated node will be inconsistent, and may procude incorrect ContainerAllocation(even though this ContainerAllocation will be abondoned at last, but it seems really wastes opportunity). [~cheersyang] what's your thought about this situation while (iter.hasNext()) { FiCaSchedulerNode node = iter.next(); if (reservedContainer == null) { result = preCheckForNodeCandidateSet(clusterResource, node, schedulingMode, resourceLimits, schedulerKey); if (null != result) { continue; } } else { // pre-check when allocating reserved container if (application.getOutstandingAsksCount(schedulerKey) == 0) { // Release result = new ContainerAllocation(reservedContainer, null, AllocationState.QUEUE_SKIPPED); continue; } } result = tryAllocateOnNode(clusterResource, node, schedulingMode, resourceLimits, schedulerKey, reservedContainer); if (AllocationState.ALLOCATED == result.getAllocationState() ||AllocationState.RESERVED == result.getAllocationState()) \{ result = doAllocation(result, node, schedulerKey, reservedContainer); break; }}|| > Add muti-node lookup mechanism and pluggable nodes sorting policies to > optimize placement decision > -- > > Key: YARN-7494 > URL: https://issues.apache.org/jira/browse/YARN-7494 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-7494.001.patch, YARN-7494.002.patch, > YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, > YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, > YARN-7494.009.patch, YARN-7494.010.patch, YARN-7494.11.patch, > YARN-7494.12.patch, YARN-7494.13.patch, YARN-7494.14.patch, > YARN-7494.15.patch, YARN-7494.16.patch, YARN-7494.17.patch, > YARN-7494.18.patch, YARN-7494.19.patch, YARN-7494.20.patch, > YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png > > > Instead of single node, for effectiveness we can consider a multi node lookup > based on partition to start with. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7494) Add muti-node lookup mechanism and pluggable nodes sorting policies to optimize placement decision
[ https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851660#comment-16851660 ] Juanjuan Tian edited comment on YARN-7494 at 5/30/19 8:59 AM: --- Thanks Weiwei for your reply. Here seems there is another issue in RegularContainerAllocator#allocate, refering to below codes, it iterates though all nodes, but the reservedContainer doesn't change correspondingly with the iterated node, for muti-node policy, the reservedContainer and the iterated node will be inconsistent, and may procude incorrect ContainerAllocation(even though this ContainerAllocation will be abondoned at last, but it seems really wastes opportunity). [~cheersyang] what's your thought about this situation while (iter.hasNext()) { FiCaSchedulerNode node = iter.next(); if (reservedContainer == null) { result = preCheckForNodeCandidateSet(clusterResource, node, schedulingMode, resourceLimits, schedulerKey); if (null != result) { continue; } } else { // pre-check when allocating reserved container if (application.getOutstandingAsksCount(schedulerKey) == 0) { // Release result = new ContainerAllocation(reservedContainer, null, AllocationState.QUEUE_SKIPPED); continue; } } result = tryAllocateOnNode(clusterResource, node, schedulingMode, resourceLimits, schedulerKey, reservedContainer); if (AllocationState.ALLOCATED == result.getAllocationState() ||AllocationState.RESERVED == result.getAllocationState()) \{ result = doAllocation(result, node, schedulerKey, reservedContainer); break; }}|| was (Author: jutia): Thanks Weiwei for your reply. here seems there is another issue in RegularContainerAllocator#allocate, there it iterates though all nodes, but the reservedContainer doesn't change with the iterated node, for muti-node policy, the reservedContainer and the iterated node will be inconsistent, and may procude incorrect ContainerAllocation(even though this ContainerAllocation will be abondoned at last, but it seems really wastes opportunity). [~cheersyang] what's your thought about this situation while (iter.hasNext()) { FiCaSchedulerNode node = iter.next(); if (reservedContainer == null) { result = preCheckForNodeCandidateSet(clusterResource, node, schedulingMode, resourceLimits, schedulerKey); if (null != result) { continue; } } else { // pre-check when allocating reserved container if (application.getOutstandingAsksCount(schedulerKey) == 0) { // Release result = new ContainerAllocation(reservedContainer, null, AllocationState.QUEUE_SKIPPED); continue; } } result = tryAllocateOnNode(clusterResource, node, schedulingMode, resourceLimits, schedulerKey, reservedContainer); if (AllocationState.ALLOCATED == result.getAllocationState() || AllocationState.RESERVED == result.getAllocationState()) { result = doAllocation(result, node, schedulerKey, reservedContainer); break; } } > Add muti-node lookup mechanism and pluggable nodes sorting policies to > optimize placement decision > -- > > Key: YARN-7494 > URL: https://issues.apache.org/jira/browse/YARN-7494 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-7494.001.patch, YARN-7494.002.patch, > YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, > YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, > YARN-7494.009.patch, YARN-7494.010.patch, YARN-7494.11.patch, > YARN-7494.12.patch, YARN-7494.13.patch, YARN-7494.14.patch, > YARN-7494.15.patch, YARN-7494.16.patch, YARN-7494.17.patch, > YARN-7494.18.patch, YARN-7494.19.patch, YARN-7494.20.patch, > YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png > > > Instead of single node, for effectiveness we can consider a multi node lookup > based on partition to start with. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7494) Add muti-node lookup mechanism and pluggable nodes sorting policies to optimize placement decision
[ https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845536#comment-16845536 ] tianjuan edited comment on YARN-7494 at 5/23/19 2:22 AM: - seems that ResourceUsageMultiNodeLookupPolicy may cause Application starve forever for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in cluster, and two queues A,B, each is configured with 50% capacity. firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, and each node of the 10 nodes will have a contianer allocated. Afterwards, another job JobB which requests 3G resource is submited to queue B, and there will be one container with 3G size reserved on node h1, with ResourceUsageMultiNodeLookupPolicy, the order policy will always be h1,h2,..h9,h10, and there will always be one container re-reverved on node h1, no other reservation happen, no preemption happens either, JobB will hang forever, [~sunilg] what's your thought about this situation? was (Author: jutia): seems that ResourceUsageMultiNodeLookupPolicy may cause Application starve forever for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in cluster, and two queues A,B, each is configured with 50% capacity. firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, and each node of the 10 nodes will have a contianer allocated. Afterwards, another job JobB which requests 3G resource is submited to queue B, and there will be one container with 3G size reserved on node h1, with ResourceUsageMultiNodeLookupPolicy, the order policy will always be h1,h2,..h9,h10, and there will always be one container re-reverved on node h1, no other reservation happen, no preemption happens eothr, JobB will hang forever, [~sunilg] what's your thought about this situation? > Add muti-node lookup mechanism and pluggable nodes sorting policies to > optimize placement decision > -- > > Key: YARN-7494 > URL: https://issues.apache.org/jira/browse/YARN-7494 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-7494.001.patch, YARN-7494.002.patch, > YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, > YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, > YARN-7494.009.patch, YARN-7494.010.patch, YARN-7494.11.patch, > YARN-7494.12.patch, YARN-7494.13.patch, YARN-7494.14.patch, > YARN-7494.15.patch, YARN-7494.16.patch, YARN-7494.17.patch, > YARN-7494.18.patch, YARN-7494.19.patch, YARN-7494.20.patch, > YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png > > > Instead of single node, for effectiveness we can consider a multi node lookup > based on partition to start with. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7494) Add muti-node lookup mechanism and pluggable nodes sorting policies to optimize placement decision
[ https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845536#comment-16845536 ] tianjuan edited comment on YARN-7494 at 5/22/19 7:07 AM: - seems that ResourceUsageMultiNodeLookupPolicy may cause Application starve forever for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in cluster, and two queues A,B, each is configured with 50% capacity. firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, and each node of the 10 nodes will have a contianer allocated. Afterwards, another job JobB which requests 3G resource is submited to queue B, and there will be one container with 3G size reserved on node h1, with ResourceUsageMultiNodeLookupPolicy, the order policy will always be h1,h2,..h9,h10, and there will always be one container re-reverved on node h1, no other reservation happen, no preemption happens eothr, JobB will hang forever, [~sunilg] what's your thought about this situation? was (Author: jutia): seems that ResourceUsageMultiNodeLookupPolicy may cause Application starve forever for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in cluster, and two queues A,B, each is configured with 50% capacity. firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, and each node of the 10 nodes will have a contianer allocated. Afterwards, another job JobB which requests 3G resource is submited to queue B, and there will be one container with 3G size reserved on node h1, with ResourceUsageMultiNodeLookupPolicy, the order policy will always be h1,h2,..h9,h10, and there will always be one container re-reverved on node h1, no other reservation happen, JobB will hang forever, [~sunilg] what's ypur thought about this situation? > Add muti-node lookup mechanism and pluggable nodes sorting policies to > optimize placement decision > -- > > Key: YARN-7494 > URL: https://issues.apache.org/jira/browse/YARN-7494 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-7494.001.patch, YARN-7494.002.patch, > YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, > YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, > YARN-7494.009.patch, YARN-7494.010.patch, YARN-7494.11.patch, > YARN-7494.12.patch, YARN-7494.13.patch, YARN-7494.14.patch, > YARN-7494.15.patch, YARN-7494.16.patch, YARN-7494.17.patch, > YARN-7494.18.patch, YARN-7494.19.patch, YARN-7494.20.patch, > YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png > > > Instead of single node, for effectiveness we can consider a multi node lookup > based on partition to start with. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org