[jira] [Commented] (YARN-2027) YARN ignores host-specific resource requests
[ https://issues.apache.org/jira/browse/YARN-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000910#comment-14000910 ] Kam Kasravi commented on YARN-2027: --- I tried this on a 5 node cluster in AWS EC2 instances. I set relaxLocality to false. Hadoop-2.2. Capacity Scheudler. Did not work. I tried setting rack to null. I tried allocating 3 container request as well as just 1 container request. 3 container requests: containers += new ContainerRequest(capability, broker.nodes, null, pri, false) containers += new ContainerRequest(capability, null, Seq[String]("/default-rack").toArray[String], pri, false) containers += new ContainerRequest(capability, null, null, pri, false) 1 container request: containers += new ContainerRequest(capability, broker.nodes, null, pri, false) where broker.nodes is an array of one FQDM host. Did not work. I can try Hong Zhiguo's distributed shell patch on the same AWS cluster and report findings. I'll run run hadoop-2.4 instead of hadoop-2.2. > YARN ignores host-specific resource requests > > > Key: YARN-2027 > URL: https://issues.apache.org/jira/browse/YARN-2027 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.4.0 > Environment: RHEL 6.1 > YARN 2.4 >Reporter: Chris Riccomini > > YARN appears to be ignoring host-level ContainerRequests. > I am creating a container request with code that pretty closely mirrors the > DistributedShell code: > {code} > protected def requestContainers(memMb: Int, cpuCores: Int, containers: Int) > { > info("Requesting %d container(s) with %dmb of memory" format (containers, > memMb)) > val capability = Records.newRecord(classOf[Resource]) > val priority = Records.newRecord(classOf[Priority]) > priority.setPriority(0) > capability.setMemory(memMb) > capability.setVirtualCores(cpuCores) > // Specifying a host in the String[] host parameter here seems to do > nothing. Setting relaxLocality to false also doesn't help. > (0 until containers).foreach(idx => amClient.addContainerRequest(new > ContainerRequest(capability, null, null, priority))) > } > {code} > When I run this code with a specific host in the ContainerRequest, YARN does > not honor the request. Instead, it puts the container on an arbitrary host. > This appears to be true for both the FifoScheduler and the CapacityScheduler. > Currently, we are running the CapacityScheduler with the following settings: > {noformat} > > > yarn.scheduler.capacity.maximum-applications > 1 > > Maximum number of applications that can be pending and running. > > > > yarn.scheduler.capacity.maximum-am-resource-percent > 0.1 > > Maximum percent of resources in the cluster which can be used to run > application masters i.e. controls number of concurrent running > applications. > > > > yarn.scheduler.capacity.resource-calculator > > org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator > > The ResourceCalculator implementation to be used to compare > Resources in the scheduler. > The default i.e. DefaultResourceCalculator only uses Memory while > DominantResourceCalculator uses dominant-resource to compare > multi-dimensional resources such as Memory, CPU etc. > > > > yarn.scheduler.capacity.root.queues > default > > The queues at the this level (root is the root queue). > > > > yarn.scheduler.capacity.root.default.capacity > 100 > Samza queue target capacity. > > > yarn.scheduler.capacity.root.default.user-limit-factor > 1 > > Default queue user limit a percentage from 0.0 to 1.0. > > > > yarn.scheduler.capacity.root.default.maximum-capacity > 100 > > The maximum capacity of the default queue. > > > > yarn.scheduler.capacity.root.default.state > RUNNING > > The state of the default queue. State can be one of RUNNING or STOPPED. > > > > yarn.scheduler.capacity.root.default.acl_submit_applications > * > > The ACL of who can submit jobs to the default queue. > > > > yarn.scheduler.capacity.root.default.acl_administer_queue > * > > The ACL of who can administer jobs on the default queue. > > > > yarn.scheduler.capacity.node-locality-delay > 40 > > Number of missed scheduling opportunities after which the > CapacityScheduler > attempts to schedule rack-local containers. > Typically this should be set to number of nodes in the cluster, By > default is setting > approximately number of nodes in one ra
[jira] [Commented] (YARN-2027) YARN ignores host-specific resource requests
[ https://issues.apache.org/jira/browse/YARN-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999584#comment-13999584 ] Hong Zhiguo commented on YARN-2027: --- I did it in YARN-1974 to specify nodes on which the containers should be allocated(for fair and capacity scheduler), and it works both in unit test and in our real cluster. > YARN ignores host-specific resource requests > > > Key: YARN-2027 > URL: https://issues.apache.org/jira/browse/YARN-2027 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.4.0 > Environment: RHEL 6.1 > YARN 2.4 >Reporter: Chris Riccomini > > YARN appears to be ignoring host-level ContainerRequests. > I am creating a container request with code that pretty closely mirrors the > DistributedShell code: > {code} > protected def requestContainers(memMb: Int, cpuCores: Int, containers: Int) > { > info("Requesting %d container(s) with %dmb of memory" format (containers, > memMb)) > val capability = Records.newRecord(classOf[Resource]) > val priority = Records.newRecord(classOf[Priority]) > priority.setPriority(0) > capability.setMemory(memMb) > capability.setVirtualCores(cpuCores) > // Specifying a host in the String[] host parameter here seems to do > nothing. Setting relaxLocality to false also doesn't help. > (0 until containers).foreach(idx => amClient.addContainerRequest(new > ContainerRequest(capability, null, null, priority))) > } > {code} > When I run this code with a specific host in the ContainerRequest, YARN does > not honor the request. Instead, it puts the container on an arbitrary host. > This appears to be true for both the FifoScheduler and the CapacityScheduler. > Currently, we are running the CapacityScheduler with the following settings: > {noformat} > > > yarn.scheduler.capacity.maximum-applications > 1 > > Maximum number of applications that can be pending and running. > > > > yarn.scheduler.capacity.maximum-am-resource-percent > 0.1 > > Maximum percent of resources in the cluster which can be used to run > application masters i.e. controls number of concurrent running > applications. > > > > yarn.scheduler.capacity.resource-calculator > > org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator > > The ResourceCalculator implementation to be used to compare > Resources in the scheduler. > The default i.e. DefaultResourceCalculator only uses Memory while > DominantResourceCalculator uses dominant-resource to compare > multi-dimensional resources such as Memory, CPU etc. > > > > yarn.scheduler.capacity.root.queues > default > > The queues at the this level (root is the root queue). > > > > yarn.scheduler.capacity.root.default.capacity > 100 > Samza queue target capacity. > > > yarn.scheduler.capacity.root.default.user-limit-factor > 1 > > Default queue user limit a percentage from 0.0 to 1.0. > > > > yarn.scheduler.capacity.root.default.maximum-capacity > 100 > > The maximum capacity of the default queue. > > > > yarn.scheduler.capacity.root.default.state > RUNNING > > The state of the default queue. State can be one of RUNNING or STOPPED. > > > > yarn.scheduler.capacity.root.default.acl_submit_applications > * > > The ACL of who can submit jobs to the default queue. > > > > yarn.scheduler.capacity.root.default.acl_administer_queue > * > > The ACL of who can administer jobs on the default queue. > > > > yarn.scheduler.capacity.node-locality-delay > 40 > > Number of missed scheduling opportunities after which the > CapacityScheduler > attempts to schedule rack-local containers. > Typically this should be set to number of nodes in the cluster, By > default is setting > approximately number of nodes in one rack which is 40. > > > > {noformat} > Digging into the code a bit (props to [~jghoman] for finding this), we have a > theory as to why this is happening. It looks like > RMContainerRequestor.addContainerReq adds three resource requests per > container request: data-local, rack-local, and any: > {code} > protected void addContainerReq(ContainerRequest req) { > // Create resource requests > for (String host : req.hosts) { > // Data-local > if (!isNodeBlacklisted(host)) { > addResourceRequest(req.priority, host, req.capability); > } > } > // Nothing Rack-local for now > for (String rack : req.racks) { > addResourceRequest(req.priority, ra
[jira] [Commented] (YARN-2027) YARN ignores host-specific resource requests
[ https://issues.apache.org/jira/browse/YARN-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1343#comment-1343 ] Chris Riccomini commented on YARN-2027: --- K, feel free to close. I'm fairly sure that I tried a host with a null rack during testing and it didn't work, but it might have been on the FIFO scheduler. Either way, we've figured out a workaround to our problem, and [~zhiguohong] has verified functionality on a real cluster, so I'm OK with closing this ticket out. > YARN ignores host-specific resource requests > > > Key: YARN-2027 > URL: https://issues.apache.org/jira/browse/YARN-2027 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.4.0 > Environment: RHEL 6.1 > YARN 2.4 >Reporter: Chris Riccomini > > YARN appears to be ignoring host-level ContainerRequests. > I am creating a container request with code that pretty closely mirrors the > DistributedShell code: > {code} > protected def requestContainers(memMb: Int, cpuCores: Int, containers: Int) > { > info("Requesting %d container(s) with %dmb of memory" format (containers, > memMb)) > val capability = Records.newRecord(classOf[Resource]) > val priority = Records.newRecord(classOf[Priority]) > priority.setPriority(0) > capability.setMemory(memMb) > capability.setVirtualCores(cpuCores) > // Specifying a host in the String[] host parameter here seems to do > nothing. Setting relaxLocality to false also doesn't help. > (0 until containers).foreach(idx => amClient.addContainerRequest(new > ContainerRequest(capability, null, null, priority))) > } > {code} > When I run this code with a specific host in the ContainerRequest, YARN does > not honor the request. Instead, it puts the container on an arbitrary host. > This appears to be true for both the FifoScheduler and the CapacityScheduler. > Currently, we are running the CapacityScheduler with the following settings: > {noformat} > > > yarn.scheduler.capacity.maximum-applications > 1 > > Maximum number of applications that can be pending and running. > > > > yarn.scheduler.capacity.maximum-am-resource-percent > 0.1 > > Maximum percent of resources in the cluster which can be used to run > application masters i.e. controls number of concurrent running > applications. > > > > yarn.scheduler.capacity.resource-calculator > > org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator > > The ResourceCalculator implementation to be used to compare > Resources in the scheduler. > The default i.e. DefaultResourceCalculator only uses Memory while > DominantResourceCalculator uses dominant-resource to compare > multi-dimensional resources such as Memory, CPU etc. > > > > yarn.scheduler.capacity.root.queues > default > > The queues at the this level (root is the root queue). > > > > yarn.scheduler.capacity.root.default.capacity > 100 > Samza queue target capacity. > > > yarn.scheduler.capacity.root.default.user-limit-factor > 1 > > Default queue user limit a percentage from 0.0 to 1.0. > > > > yarn.scheduler.capacity.root.default.maximum-capacity > 100 > > The maximum capacity of the default queue. > > > > yarn.scheduler.capacity.root.default.state > RUNNING > > The state of the default queue. State can be one of RUNNING or STOPPED. > > > > yarn.scheduler.capacity.root.default.acl_submit_applications > * > > The ACL of who can submit jobs to the default queue. > > > > yarn.scheduler.capacity.root.default.acl_administer_queue > * > > The ACL of who can administer jobs on the default queue. > > > > yarn.scheduler.capacity.node-locality-delay > 40 > > Number of missed scheduling opportunities after which the > CapacityScheduler > attempts to schedule rack-local containers. > Typically this should be set to number of nodes in the cluster, By > default is setting > approximately number of nodes in one rack which is 40. > > > > {noformat} > Digging into the code a bit (props to [~jghoman] for finding this), we have a > theory as to why this is happening. It looks like > RMContainerRequestor.addContainerReq adds three resource requests per > container request: data-local, rack-local, and any: > {code} > protected void addContainerReq(ContainerRequest req) { > // Create resource requests > for (String host : req.hosts) { > // Data-local > if (!isNodeBlacklisted(host)) { > addResourceRequest(req.priorit
[jira] [Commented] (YARN-2027) YARN ignores host-specific resource requests
[ https://issues.apache.org/jira/browse/YARN-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998912#comment-13998912 ] Bikas Saha commented on YARN-2027: -- Yes. If strict node locality is needed then the rack should not be specified. If the rack is specified then it will allow relaxing locality up to the rack but no further. > YARN ignores host-specific resource requests > > > Key: YARN-2027 > URL: https://issues.apache.org/jira/browse/YARN-2027 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.4.0 > Environment: RHEL 6.1 > YARN 2.4 >Reporter: Chris Riccomini > > YARN appears to be ignoring host-level ContainerRequests. > I am creating a container request with code that pretty closely mirrors the > DistributedShell code: > {code} > protected def requestContainers(memMb: Int, cpuCores: Int, containers: Int) > { > info("Requesting %d container(s) with %dmb of memory" format (containers, > memMb)) > val capability = Records.newRecord(classOf[Resource]) > val priority = Records.newRecord(classOf[Priority]) > priority.setPriority(0) > capability.setMemory(memMb) > capability.setVirtualCores(cpuCores) > // Specifying a host in the String[] host parameter here seems to do > nothing. Setting relaxLocality to false also doesn't help. > (0 until containers).foreach(idx => amClient.addContainerRequest(new > ContainerRequest(capability, null, null, priority))) > } > {code} > When I run this code with a specific host in the ContainerRequest, YARN does > not honor the request. Instead, it puts the container on an arbitrary host. > This appears to be true for both the FifoScheduler and the CapacityScheduler. > Currently, we are running the CapacityScheduler with the following settings: > {noformat} > > > yarn.scheduler.capacity.maximum-applications > 1 > > Maximum number of applications that can be pending and running. > > > > yarn.scheduler.capacity.maximum-am-resource-percent > 0.1 > > Maximum percent of resources in the cluster which can be used to run > application masters i.e. controls number of concurrent running > applications. > > > > yarn.scheduler.capacity.resource-calculator > > org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator > > The ResourceCalculator implementation to be used to compare > Resources in the scheduler. > The default i.e. DefaultResourceCalculator only uses Memory while > DominantResourceCalculator uses dominant-resource to compare > multi-dimensional resources such as Memory, CPU etc. > > > > yarn.scheduler.capacity.root.queues > default > > The queues at the this level (root is the root queue). > > > > yarn.scheduler.capacity.root.default.capacity > 100 > Samza queue target capacity. > > > yarn.scheduler.capacity.root.default.user-limit-factor > 1 > > Default queue user limit a percentage from 0.0 to 1.0. > > > > yarn.scheduler.capacity.root.default.maximum-capacity > 100 > > The maximum capacity of the default queue. > > > > yarn.scheduler.capacity.root.default.state > RUNNING > > The state of the default queue. State can be one of RUNNING or STOPPED. > > > > yarn.scheduler.capacity.root.default.acl_submit_applications > * > > The ACL of who can submit jobs to the default queue. > > > > yarn.scheduler.capacity.root.default.acl_administer_queue > * > > The ACL of who can administer jobs on the default queue. > > > > yarn.scheduler.capacity.node-locality-delay > 40 > > Number of missed scheduling opportunities after which the > CapacityScheduler > attempts to schedule rack-local containers. > Typically this should be set to number of nodes in the cluster, By > default is setting > approximately number of nodes in one rack which is 40. > > > > {noformat} > Digging into the code a bit (props to [~jghoman] for finding this), we have a > theory as to why this is happening. It looks like > RMContainerRequestor.addContainerReq adds three resource requests per > container request: data-local, rack-local, and any: > {code} > protected void addContainerReq(ContainerRequest req) { > // Create resource requests > for (String host : req.hosts) { > // Data-local > if (!isNodeBlacklisted(host)) { > addResourceRequest(req.priority, host, req.capability); > } > } > // Nothing Rack-local for now > for (String rack : req.racks) { > addResourceRequest(req.priority, rack,
[jira] [Commented] (YARN-2027) YARN ignores host-specific resource requests
[ https://issues.apache.org/jira/browse/YARN-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997750#comment-13997750 ] Sandy Ryza commented on YARN-2027: -- Including a rack in your request will allow containers to go anywhere on the rack, even when relaxLocality is set to false. >From the AMRMClient.ContainerRequest doc: "If locality relaxation is disabled, >then only within the same request, a node and its rack may be specified >together. This allows for a specific rack with a preference for a specific >node within that rack." So try passing in the rack list as null instead of List("/default-rack").toArray[String]. > YARN ignores host-specific resource requests > > > Key: YARN-2027 > URL: https://issues.apache.org/jira/browse/YARN-2027 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.4.0 > Environment: RHEL 6.1 > YARN 2.4 >Reporter: Chris Riccomini > > YARN appears to be ignoring host-level ContainerRequests. > I am creating a container request with code that pretty closely mirrors the > DistributedShell code: > {code} > protected def requestContainers(memMb: Int, cpuCores: Int, containers: Int) > { > info("Requesting %d container(s) with %dmb of memory" format (containers, > memMb)) > val capability = Records.newRecord(classOf[Resource]) > val priority = Records.newRecord(classOf[Priority]) > priority.setPriority(0) > capability.setMemory(memMb) > capability.setVirtualCores(cpuCores) > // Specifying a host in the String[] host parameter here seems to do > nothing. Setting relaxLocality to false also doesn't help. > (0 until containers).foreach(idx => amClient.addContainerRequest(new > ContainerRequest(capability, null, null, priority))) > } > {code} > When I run this code with a specific host in the ContainerRequest, YARN does > not honor the request. Instead, it puts the container on an arbitrary host. > This appears to be true for both the FifoScheduler and the CapacityScheduler. > Currently, we are running the CapacityScheduler with the following settings: > {noformat} > > > yarn.scheduler.capacity.maximum-applications > 1 > > Maximum number of applications that can be pending and running. > > > > yarn.scheduler.capacity.maximum-am-resource-percent > 0.1 > > Maximum percent of resources in the cluster which can be used to run > application masters i.e. controls number of concurrent running > applications. > > > > yarn.scheduler.capacity.resource-calculator > > org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator > > The ResourceCalculator implementation to be used to compare > Resources in the scheduler. > The default i.e. DefaultResourceCalculator only uses Memory while > DominantResourceCalculator uses dominant-resource to compare > multi-dimensional resources such as Memory, CPU etc. > > > > yarn.scheduler.capacity.root.queues > default > > The queues at the this level (root is the root queue). > > > > yarn.scheduler.capacity.root.default.capacity > 100 > Samza queue target capacity. > > > yarn.scheduler.capacity.root.default.user-limit-factor > 1 > > Default queue user limit a percentage from 0.0 to 1.0. > > > > yarn.scheduler.capacity.root.default.maximum-capacity > 100 > > The maximum capacity of the default queue. > > > > yarn.scheduler.capacity.root.default.state > RUNNING > > The state of the default queue. State can be one of RUNNING or STOPPED. > > > > yarn.scheduler.capacity.root.default.acl_submit_applications > * > > The ACL of who can submit jobs to the default queue. > > > > yarn.scheduler.capacity.root.default.acl_administer_queue > * > > The ACL of who can administer jobs on the default queue. > > > > yarn.scheduler.capacity.node-locality-delay > 40 > > Number of missed scheduling opportunities after which the > CapacityScheduler > attempts to schedule rack-local containers. > Typically this should be set to number of nodes in the cluster, By > default is setting > approximately number of nodes in one rack which is 40. > > > > {noformat} > Digging into the code a bit (props to [~jghoman] for finding this), we have a > theory as to why this is happening. It looks like > RMContainerRequestor.addContainerReq adds three resource requests per > container request: data-local, rack-local, and any: > {code} > protected void addContainerReq(ContainerRequest req) { > // Create resource
[jira] [Commented] (YARN-2027) YARN ignores host-specific resource requests
[ https://issues.apache.org/jira/browse/YARN-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997708#comment-13997708 ] Chris Riccomini commented on YARN-2027: --- relaxLocality was set to false. {noformat} (0 until containers).foreach(idx => amClient.addContainerRequest(new ContainerRequest(capability, getHosts, List("/default-rack").toArray[String], priority, false))) {noformat} The last false in that parameter list is relaxLocality. > YARN ignores host-specific resource requests > > > Key: YARN-2027 > URL: https://issues.apache.org/jira/browse/YARN-2027 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.4.0 > Environment: RHEL 6.1 > YARN 2.4 >Reporter: Chris Riccomini > > YARN appears to be ignoring host-level ContainerRequests. > I am creating a container request with code that pretty closely mirrors the > DistributedShell code: > {code} > protected def requestContainers(memMb: Int, cpuCores: Int, containers: Int) > { > info("Requesting %d container(s) with %dmb of memory" format (containers, > memMb)) > val capability = Records.newRecord(classOf[Resource]) > val priority = Records.newRecord(classOf[Priority]) > priority.setPriority(0) > capability.setMemory(memMb) > capability.setVirtualCores(cpuCores) > // Specifying a host in the String[] host parameter here seems to do > nothing. Setting relaxLocality to false also doesn't help. > (0 until containers).foreach(idx => amClient.addContainerRequest(new > ContainerRequest(capability, null, null, priority))) > } > {code} > When I run this code with a specific host in the ContainerRequest, YARN does > not honor the request. Instead, it puts the container on an arbitrary host. > This appears to be true for both the FifoScheduler and the CapacityScheduler. > Currently, we are running the CapacityScheduler with the following settings: > {noformat} > > > yarn.scheduler.capacity.maximum-applications > 1 > > Maximum number of applications that can be pending and running. > > > > yarn.scheduler.capacity.maximum-am-resource-percent > 0.1 > > Maximum percent of resources in the cluster which can be used to run > application masters i.e. controls number of concurrent running > applications. > > > > yarn.scheduler.capacity.resource-calculator > > org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator > > The ResourceCalculator implementation to be used to compare > Resources in the scheduler. > The default i.e. DefaultResourceCalculator only uses Memory while > DominantResourceCalculator uses dominant-resource to compare > multi-dimensional resources such as Memory, CPU etc. > > > > yarn.scheduler.capacity.root.queues > default > > The queues at the this level (root is the root queue). > > > > yarn.scheduler.capacity.root.default.capacity > 100 > Samza queue target capacity. > > > yarn.scheduler.capacity.root.default.user-limit-factor > 1 > > Default queue user limit a percentage from 0.0 to 1.0. > > > > yarn.scheduler.capacity.root.default.maximum-capacity > 100 > > The maximum capacity of the default queue. > > > > yarn.scheduler.capacity.root.default.state > RUNNING > > The state of the default queue. State can be one of RUNNING or STOPPED. > > > > yarn.scheduler.capacity.root.default.acl_submit_applications > * > > The ACL of who can submit jobs to the default queue. > > > > yarn.scheduler.capacity.root.default.acl_administer_queue > * > > The ACL of who can administer jobs on the default queue. > > > > yarn.scheduler.capacity.node-locality-delay > 40 > > Number of missed scheduling opportunities after which the > CapacityScheduler > attempts to schedule rack-local containers. > Typically this should be set to number of nodes in the cluster, By > default is setting > approximately number of nodes in one rack which is 40. > > > > {noformat} > Digging into the code a bit (props to [~jghoman] for finding this), we have a > theory as to why this is happening. It looks like > RMContainerRequestor.addContainerReq adds three resource requests per > container request: data-local, rack-local, and any: > {code} > protected void addContainerReq(ContainerRequest req) { > // Create resource requests > for (String host : req.hosts) { > // Data-local > if (!isNodeBlacklisted(host)) { > addResourceRequest(req.priority, host, req.capability); > }
[jira] [Commented] (YARN-2027) YARN ignores host-specific resource requests
[ https://issues.apache.org/jira/browse/YARN-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992900#comment-13992900 ] Chris Riccomini commented on YARN-2027: --- Dug into this a bit more. Not entirely convinced that the TreeSet stuff is actually an issue anymore. RMContainerRequestor.makeRemoteRequest calls: {code} allocateResponse = scheduler.allocate(allocateRequest); {code} If you drill down through the capacity scheduler, into SchedulerApplicationAttempt and AppSchedulingInfo, you'll eventually see that AppSchedulingInfo.updateResourceRequests simply adds the items in "ask" into a map based on priority. The order in which these asks come in seem to always be with ANY first (see above), so updatePendingResources will always be true, but this doesn't seem harmful. Anyway, any ideas why YARN is ignoring host requests? > YARN ignores host-specific resource requests > > > Key: YARN-2027 > URL: https://issues.apache.org/jira/browse/YARN-2027 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.4.0 > Environment: RHEL 6.1 > YARN 2.4 >Reporter: Chris Riccomini > > YARN appears to be ignoring host-level ContainerRequests. > I am creating a container request with code that pretty closely mirrors the > DistributedShell code: > {code} > protected def requestContainers(memMb: Int, cpuCores: Int, containers: Int) > { > info("Requesting %d container(s) with %dmb of memory" format (containers, > memMb)) > val capability = Records.newRecord(classOf[Resource]) > val priority = Records.newRecord(classOf[Priority]) > priority.setPriority(0) > capability.setMemory(memMb) > capability.setVirtualCores(cpuCores) > // Specifying a host in the String[] host parameter here seems to do > nothing. Setting relaxLocality to false also doesn't help. > (0 until containers).foreach(idx => amClient.addContainerRequest(new > ContainerRequest(capability, null, null, priority))) > } > {code} > When I run this code with a specific host in the ContainerRequest, YARN does > not honor the request. Instead, it puts the container on an arbitrary host. > This appears to be true for both the FifoScheduler and the CapacityScheduler. > Currently, we are running the CapacityScheduler with the following settings: > {noformat} > > > yarn.scheduler.capacity.maximum-applications > 1 > > Maximum number of applications that can be pending and running. > > > > yarn.scheduler.capacity.maximum-am-resource-percent > 0.1 > > Maximum percent of resources in the cluster which can be used to run > application masters i.e. controls number of concurrent running > applications. > > > > yarn.scheduler.capacity.resource-calculator > > org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator > > The ResourceCalculator implementation to be used to compare > Resources in the scheduler. > The default i.e. DefaultResourceCalculator only uses Memory while > DominantResourceCalculator uses dominant-resource to compare > multi-dimensional resources such as Memory, CPU etc. > > > > yarn.scheduler.capacity.root.queues > default > > The queues at the this level (root is the root queue). > > > > yarn.scheduler.capacity.root.default.capacity > 100 > Samza queue target capacity. > > > yarn.scheduler.capacity.root.default.user-limit-factor > 1 > > Default queue user limit a percentage from 0.0 to 1.0. > > > > yarn.scheduler.capacity.root.default.maximum-capacity > 100 > > The maximum capacity of the default queue. > > > > yarn.scheduler.capacity.root.default.state > RUNNING > > The state of the default queue. State can be one of RUNNING or STOPPED. > > > > yarn.scheduler.capacity.root.default.acl_submit_applications > * > > The ACL of who can submit jobs to the default queue. > > > > yarn.scheduler.capacity.root.default.acl_administer_queue > * > > The ACL of who can administer jobs on the default queue. > > > > yarn.scheduler.capacity.node-locality-delay > 40 > > Number of missed scheduling opportunities after which the > CapacityScheduler > attempts to schedule rack-local containers. > Typically this should be set to number of nodes in the cluster, By > default is setting > approximately number of nodes in one rack which is 40. > > > > {noformat} > Digging into the code a bit (props to [~jghoman] for finding this), we have a > theory as to why this is happening. It looks lik
[jira] [Commented] (YARN-2027) YARN ignores host-specific resource requests
[ https://issues.apache.org/jira/browse/YARN-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995262#comment-13995262 ] Bikas Saha commented on YARN-2027: -- Was the relaxLocality flag set to false in order to make a hard constraint for the node? Or is the jira stating that even soft locality constraints (where YARN is allowed to relax the locality from node to rack to *) is also not working? Soft locality would need delay scheduling to be enabled and that needs the configs that Sandy mentioned. > YARN ignores host-specific resource requests > > > Key: YARN-2027 > URL: https://issues.apache.org/jira/browse/YARN-2027 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.4.0 > Environment: RHEL 6.1 > YARN 2.4 >Reporter: Chris Riccomini > > YARN appears to be ignoring host-level ContainerRequests. > I am creating a container request with code that pretty closely mirrors the > DistributedShell code: > {code} > protected def requestContainers(memMb: Int, cpuCores: Int, containers: Int) > { > info("Requesting %d container(s) with %dmb of memory" format (containers, > memMb)) > val capability = Records.newRecord(classOf[Resource]) > val priority = Records.newRecord(classOf[Priority]) > priority.setPriority(0) > capability.setMemory(memMb) > capability.setVirtualCores(cpuCores) > // Specifying a host in the String[] host parameter here seems to do > nothing. Setting relaxLocality to false also doesn't help. > (0 until containers).foreach(idx => amClient.addContainerRequest(new > ContainerRequest(capability, null, null, priority))) > } > {code} > When I run this code with a specific host in the ContainerRequest, YARN does > not honor the request. Instead, it puts the container on an arbitrary host. > This appears to be true for both the FifoScheduler and the CapacityScheduler. > Currently, we are running the CapacityScheduler with the following settings: > {noformat} > > > yarn.scheduler.capacity.maximum-applications > 1 > > Maximum number of applications that can be pending and running. > > > > yarn.scheduler.capacity.maximum-am-resource-percent > 0.1 > > Maximum percent of resources in the cluster which can be used to run > application masters i.e. controls number of concurrent running > applications. > > > > yarn.scheduler.capacity.resource-calculator > > org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator > > The ResourceCalculator implementation to be used to compare > Resources in the scheduler. > The default i.e. DefaultResourceCalculator only uses Memory while > DominantResourceCalculator uses dominant-resource to compare > multi-dimensional resources such as Memory, CPU etc. > > > > yarn.scheduler.capacity.root.queues > default > > The queues at the this level (root is the root queue). > > > > yarn.scheduler.capacity.root.default.capacity > 100 > Samza queue target capacity. > > > yarn.scheduler.capacity.root.default.user-limit-factor > 1 > > Default queue user limit a percentage from 0.0 to 1.0. > > > > yarn.scheduler.capacity.root.default.maximum-capacity > 100 > > The maximum capacity of the default queue. > > > > yarn.scheduler.capacity.root.default.state > RUNNING > > The state of the default queue. State can be one of RUNNING or STOPPED. > > > > yarn.scheduler.capacity.root.default.acl_submit_applications > * > > The ACL of who can submit jobs to the default queue. > > > > yarn.scheduler.capacity.root.default.acl_administer_queue > * > > The ACL of who can administer jobs on the default queue. > > > > yarn.scheduler.capacity.node-locality-delay > 40 > > Number of missed scheduling opportunities after which the > CapacityScheduler > attempts to schedule rack-local containers. > Typically this should be set to number of nodes in the cluster, By > default is setting > approximately number of nodes in one rack which is 40. > > > > {noformat} > Digging into the code a bit (props to [~jghoman] for finding this), we have a > theory as to why this is happening. It looks like > RMContainerRequestor.addContainerReq adds three resource requests per > container request: data-local, rack-local, and any: > {code} > protected void addContainerReq(ContainerRequest req) { > // Create resource requests > for (String host : req.hosts) { > // Data-local > if (!isNodeBlacklisted(host)) { > addResourceReque
[jira] [Commented] (YARN-2027) YARN ignores host-specific resource requests
[ https://issues.apache.org/jira/browse/YARN-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994652#comment-13994652 ] Sandy Ryza commented on YARN-2027: -- YARN doesn't guarantee any node-locality unless you specify strictLocality=true in your ContainerRequest. The FIFO scheduler does not even make an attempt at node-locality. For the Capacity Scheduler, you need to set yarn.scheduler.capacity.node-locality-delay, which I believes specifies a number of scheduling opportunities to pass on before accepting a non-local container. Apparently it's not included in the Capacity Scheduler doc - http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html. The Fair Scheduler equivalent is documented here, but it works a little bit differently - http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/FairScheduler.html. > YARN ignores host-specific resource requests > > > Key: YARN-2027 > URL: https://issues.apache.org/jira/browse/YARN-2027 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.4.0 > Environment: RHEL 6.1 > YARN 2.4 >Reporter: Chris Riccomini > > YARN appears to be ignoring host-level ContainerRequests. > I am creating a container request with code that pretty closely mirrors the > DistributedShell code: > {code} > protected def requestContainers(memMb: Int, cpuCores: Int, containers: Int) > { > info("Requesting %d container(s) with %dmb of memory" format (containers, > memMb)) > val capability = Records.newRecord(classOf[Resource]) > val priority = Records.newRecord(classOf[Priority]) > priority.setPriority(0) > capability.setMemory(memMb) > capability.setVirtualCores(cpuCores) > // Specifying a host in the String[] host parameter here seems to do > nothing. Setting relaxLocality to false also doesn't help. > (0 until containers).foreach(idx => amClient.addContainerRequest(new > ContainerRequest(capability, null, null, priority))) > } > {code} > When I run this code with a specific host in the ContainerRequest, YARN does > not honor the request. Instead, it puts the container on an arbitrary host. > This appears to be true for both the FifoScheduler and the CapacityScheduler. > Currently, we are running the CapacityScheduler with the following settings: > {noformat} > > > yarn.scheduler.capacity.maximum-applications > 1 > > Maximum number of applications that can be pending and running. > > > > yarn.scheduler.capacity.maximum-am-resource-percent > 0.1 > > Maximum percent of resources in the cluster which can be used to run > application masters i.e. controls number of concurrent running > applications. > > > > yarn.scheduler.capacity.resource-calculator > > org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator > > The ResourceCalculator implementation to be used to compare > Resources in the scheduler. > The default i.e. DefaultResourceCalculator only uses Memory while > DominantResourceCalculator uses dominant-resource to compare > multi-dimensional resources such as Memory, CPU etc. > > > > yarn.scheduler.capacity.root.queues > default > > The queues at the this level (root is the root queue). > > > > yarn.scheduler.capacity.root.default.capacity > 100 > Samza queue target capacity. > > > yarn.scheduler.capacity.root.default.user-limit-factor > 1 > > Default queue user limit a percentage from 0.0 to 1.0. > > > > yarn.scheduler.capacity.root.default.maximum-capacity > 100 > > The maximum capacity of the default queue. > > > > yarn.scheduler.capacity.root.default.state > RUNNING > > The state of the default queue. State can be one of RUNNING or STOPPED. > > > > yarn.scheduler.capacity.root.default.acl_submit_applications > * > > The ACL of who can submit jobs to the default queue. > > > > yarn.scheduler.capacity.root.default.acl_administer_queue > * > > The ACL of who can administer jobs on the default queue. > > > > yarn.scheduler.capacity.node-locality-delay > 40 > > Number of missed scheduling opportunities after which the > CapacityScheduler > attempts to schedule rack-local containers. > Typically this should be set to number of nodes in the cluster, By > default is setting > approximately number of nodes in one rack which is 40. > > > > {noformat} > Digging into the code a bit (props to [~jghoman] for finding this), we have a > theory as to why this is
[jira] [Commented] (YARN-2027) YARN ignores host-specific resource requests
[ https://issues.apache.org/jira/browse/YARN-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993620#comment-13993620 ] Chris Riccomini commented on YARN-2027: --- Thanks for pointing this out. Bummer! The LocalityScheduler idea crossed my mind last night, as well. It still seems to me that the correct solution is to properly patch the RM (or AMRMClient) to work. > YARN ignores host-specific resource requests > > > Key: YARN-2027 > URL: https://issues.apache.org/jira/browse/YARN-2027 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.4.0 > Environment: RHEL 6.1 > YARN 2.4 >Reporter: Chris Riccomini > > YARN appears to be ignoring host-level ContainerRequests. > I am creating a container request with code that pretty closely mirrors the > DistributedShell code: > {code} > protected def requestContainers(memMb: Int, cpuCores: Int, containers: Int) > { > info("Requesting %d container(s) with %dmb of memory" format (containers, > memMb)) > val capability = Records.newRecord(classOf[Resource]) > val priority = Records.newRecord(classOf[Priority]) > priority.setPriority(0) > capability.setMemory(memMb) > capability.setVirtualCores(cpuCores) > // Specifying a host in the String[] host parameter here seems to do > nothing. Setting relaxLocality to false also doesn't help. > (0 until containers).foreach(idx => amClient.addContainerRequest(new > ContainerRequest(capability, null, null, priority))) > } > {code} > When I run this code with a specific host in the ContainerRequest, YARN does > not honor the request. Instead, it puts the container on an arbitrary host. > This appears to be true for both the FifoScheduler and the CapacityScheduler. > Currently, we are running the CapacityScheduler with the following settings: > {noformat} > > > yarn.scheduler.capacity.maximum-applications > 1 > > Maximum number of applications that can be pending and running. > > > > yarn.scheduler.capacity.maximum-am-resource-percent > 0.1 > > Maximum percent of resources in the cluster which can be used to run > application masters i.e. controls number of concurrent running > applications. > > > > yarn.scheduler.capacity.resource-calculator > > org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator > > The ResourceCalculator implementation to be used to compare > Resources in the scheduler. > The default i.e. DefaultResourceCalculator only uses Memory while > DominantResourceCalculator uses dominant-resource to compare > multi-dimensional resources such as Memory, CPU etc. > > > > yarn.scheduler.capacity.root.queues > default > > The queues at the this level (root is the root queue). > > > > yarn.scheduler.capacity.root.default.capacity > 100 > Samza queue target capacity. > > > yarn.scheduler.capacity.root.default.user-limit-factor > 1 > > Default queue user limit a percentage from 0.0 to 1.0. > > > > yarn.scheduler.capacity.root.default.maximum-capacity > 100 > > The maximum capacity of the default queue. > > > > yarn.scheduler.capacity.root.default.state > RUNNING > > The state of the default queue. State can be one of RUNNING or STOPPED. > > > > yarn.scheduler.capacity.root.default.acl_submit_applications > * > > The ACL of who can submit jobs to the default queue. > > > > yarn.scheduler.capacity.root.default.acl_administer_queue > * > > The ACL of who can administer jobs on the default queue. > > > > yarn.scheduler.capacity.node-locality-delay > 40 > > Number of missed scheduling opportunities after which the > CapacityScheduler > attempts to schedule rack-local containers. > Typically this should be set to number of nodes in the cluster, By > default is setting > approximately number of nodes in one rack which is 40. > > > > {noformat} > Digging into the code a bit (props to [~jghoman] for finding this), we have a > theory as to why this is happening. It looks like > RMContainerRequestor.addContainerReq adds three resource requests per > container request: data-local, rack-local, and any: > {code} > protected void addContainerReq(ContainerRequest req) { > // Create resource requests > for (String host : req.hosts) { > // Data-local > if (!isNodeBlacklisted(host)) { > addResourceRequest(req.priority, host, req.capability); > } > } > // Nothing Rack-local for now > for (String rack : req.racks) { >
[jira] [Commented] (YARN-2027) YARN ignores host-specific resource requests
[ https://issues.apache.org/jira/browse/YARN-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993156#comment-13993156 ] Chris Riccomini commented on YARN-2027: --- So, running this request with memMb=3584, cpuCores=1, containers=32: {code} protected def requestContainers(memMb: Int, cpuCores: Int, containers: Int) { info("Requesting %d container(s) with %dmb of memory" format (containers, memMb)) val capability = Records.newRecord(classOf[Resource]) val priority = Records.newRecord(classOf[Priority]) priority.setPriority(0) capability.setMemory(memMb) capability.setVirtualCores(cpuCores) def getHosts = { val hosts = getNextRoundRobinHosts System.err.println(hosts.toList) hosts } (0 until containers).foreach(idx => amClient.addContainerRequest(new ContainerRequest(capability, getHosts, List("/default-rack").toArray[String], priority, false))) } {code} Prints this in the AM logs: {noformat} List(eat1-app857, eat1-app873, eat1-app880) List(eat1-app854, eat1-app864, eat1-app872) List(eat1-app852, eat1-app873, eat1-app880) List(eat1-app854, eat1-app880, eat1-app867) List(eat1-app875, eat1-app852, eat1-app873) List(eat1-app875, eat1-app852, eat1-app872) List(eat1-app873, eat1-app859, eat1-app880) List(eat1-app854, eat1-app873, eat1-app864) List(eat1-app852, eat1-app874, eat1-app875) List(eat1-app864, eat1-app859, eat1-app880) List(eat1-app874, eat1-app872, eat1-app875) List(eat1-app874, eat1-app873, eat1-app864) List(eat1-app873, eat1-app859, eat1-app858) List(eat1-app874, eat1-app873, eat1-app854) List(eat1-app867, eat1-app880, eat1-app872) List(eat1-app859, eat1-app875, eat1-app880) List(eat1-app875, eat1-app872, eat1-app864) List(eat1-app875, eat1-app867, eat1-app852) List(eat1-app857, eat1-app852, eat1-app867) List(eat1-app872, eat1-app854, eat1-app858) List(eat1-app852, eat1-app872, eat1-app858) List(eat1-app880, eat1-app873, eat1-app857) List(eat1-app859, eat1-app871, eat1-app874) List(eat1-app880, eat1-app874, eat1-app865) List(eat1-app867, eat1-app873, eat1-app875) List(eat1-app857, eat1-app858, eat1-app852) List(eat1-app857, eat1-app867, eat1-app873) List(eat1-app857, eat1-app871, eat1-app854) List(eat1-app874, eat1-app865, eat1-app873) List(eat1-app852, eat1-app880, eat1-app858) List(eat1-app875, eat1-app873, eat1-app871) List(eat1-app854, eat1-app880, eat1-app865) {noformat} With DEBUG logging in the RM logs (with no other job on the grid), I see: {noformat} 21:18:02,958 DEBUG AppSchedulingInfo:135 - update: application=application_1399581102453_0003 request={Priority: 0, Capability: , # Containers: 32, Location: *, Relax Locality: false} 21:18:02,958 DEBUG ActiveUsersManager:68 - User my-job-name added to activeUsers, currently: 1 21:18:02,959 DEBUG CapacityScheduler:704 - allocate: post-update 21:18:02,959 DEBUG SchedulerApplicationAttempt:328 - showRequests: application=application_1399581102453_0003 headRoom= currentConsumption=1024 21:18:02,959 DEBUG SchedulerApplicationAttempt:332 - showRequests: application=application_1399581102453_0003 request={Priority: 0, Capability: , # Containers: 9, Location: eat1-app875, Relax Locality: true} 21:18:02,959 DEBUG SchedulerApplicationAttempt:332 - showRequests: application=application_1399581102453_0003 request={Priority: 0, Capability: , # Containers: 6, Location: eat1-app857, Relax Locality: true} 21:18:02,959 DEBUG SchedulerApplicationAttempt:332 - showRequests: application=application_1399581102453_0003 request={Priority: 0, Capability: , # Containers: 11, Location: eat1-app880, Relax Locality: true} 21:18:02,959 DEBUG SchedulerApplicationAttempt:332 - showRequests: application=application_1399581102453_0003 request={Priority: 0, Capability: , # Containers: 7, Location: eat1-app854, Relax Locality: true} 21:18:02,959 DEBUG SchedulerApplicationAttempt:332 - showRequests: application=application_1399581102453_0003 request={Priority: 0, Capability: , # Containers: 32, Location: /default-rack, Relax Locality: true} 21:18:02,959 DEBUG SchedulerApplicationAttempt:332 - showRequests: application=application_1399581102453_0003 request={Priority: 0, Capability: , # Containers: 5, Location: eat1-app858, Relax Locality: true} 21:18:02,959 DEBUG SchedulerApplicationAttempt:332 - showRequests: application=application_1399581102453_0003 request={Priority: 0, Capability: , # Containers: 32, Location: *, Relax Locality: false} 21:18:02,959 DEBUG SchedulerApplicationAttempt:332 - showRequests: application=application_1399581102453_0003 request={Priority: 0, Capability: , # Containers: 7, Location: eat1-app874, Relax Locality: true} 21:18:02,959 DEBUG SchedulerApplicationAttempt:332 - showRequests: application=application_1399581102453_0003 request={Priority: 0, Capability: , # Containers: 7, Location: eat1-app872, Relax Locality: true} 21:18:02,959 DEBUG SchedulerApplicationAttemp