[jira] [Updated] (YARN-4053) Change the way metric values are stored in HBase Storage
[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-4053: --- Attachment: YARN-4053-YARN-2928.01.patch Change the way metric values are stored in HBase Storage Key: YARN-4053 URL: https://issues.apache.org/jira/browse/YARN-4053 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-4053-YARN-2928.01.patch Currently HBase implementation uses GenericObjectMapper to convert and store values in backend HBase storage. This converts everything into a string representation(ASCII/UTF-8 encoded byte array). While this is fine in most cases, it does not quite serve our use case for metrics. So we need to decide how are we going to encode and decode metric values and store them in HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3224) Notify AM with containers (on decommissioning node) could be preempted after timeout.
[ https://issues.apache.org/jira/browse/YARN-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3224: -- Attachment: 0001-YARN-3224.patch Attaching a work in progress patch. As this patch is directly dependent on YARN-3212, I will rebase the same as it gets committed. Also current preemption framework doesnt have support to inform AM about timeout of to-be-preempted container. As YARN-3784 gets in, we can leverage the same here. Notify AM with containers (on decommissioning node) could be preempted after timeout. - Key: YARN-3224 URL: https://issues.apache.org/jira/browse/YARN-3224 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Sunil G Attachments: 0001-YARN-3224.patch We should leverage YARN preemption framework to notify AM that some containers will be preempted after a timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4039) New AM instances waste resource by waiting only for resource availability when all available resources are already used
[ https://issues.apache.org/jira/browse/YARN-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sadayuki Furuhashi updated YARN-4039: - Assignee: Tsuyoshi Ozawa (was: Sadayuki Furuhashi) New AM instances waste resource by waiting only for resource availability when all available resources are already used --- Key: YARN-4039 URL: https://issues.apache.org/jira/browse/YARN-4039 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0 Reporter: Sadayuki Furuhashi Assignee: Tsuyoshi Ozawa Attachments: YARN-4039.1.patch, YARN-4039.2.patch Problem: In FairScheduler, maxRunningApps doesn't work well if we can't predict size of an application in a queue because small maxRunningApps can't use all resources if many small applications are issued, while large maxRunningApps wastes resources if large applications run. Background: We're using FairScheduler. In following scenario, AM instances wastes resources significantly: * A queue has X MB of capacity. * An application requests 32 containers where a container requires (X / 32) MB of memory ** In this case, a single application occupies entire resource of the queue. * Many those applications are issued (10 applications) * Ideal behavior is that applications run one by one to maximize throughput. * However, all applications run simultaneously. As the result, AM instances occupy resources and prevent other tasks from starting. At worst case, most of resources are occupied by waiting AMs and applications progress very slowly. A solution is setting maxRunningApps to 1 or 2. However, it doesn't work well if following workload exists at the same queue: * An application requests 2 containers where a container requires (X / 32) MB of memory * Many those applications are issued (say, 10 applications) * Ideal behavior is that all applications run simultaneously to maximize concurrency and throughput. * However, number of applications are limited by maxRunningApps. At worst case, most of resources are idling. This problem happens especially with Hive because we can't estimate size of a MapReduce application. Solution: AM doesn't have to start if there are waiting resource requests because the AM can't grant resource requests even if it starts. Patch: I attached a patch that implements this behavior. But this implementation has this trade-off: * When AM is registered to FairScheduler, its demand is 0 because even AM attempt is not created. Starting this AM doesn't change resource demand of a queue. So, if many AMs are issued to a queue at the same time, all AMs will be RUNNING. But we want to prevent it. * When a AM starts, demand of the AM is only AM attempt. Then AM requires more resources. Until AM requires resources, demand of the queue is low. But starting AM during this time will start unnecessary AMs. * So, this patch doesn't start immediately when AM is registered. Instead, it starts AM only every continuous-scheduling-sleep-ms. * Setting large continuous-scheduling-sleep-ms will prevent wasting AMs. But increases latency. Therefore, this patch is enabled only if new option demand-blocks-am-enabled is true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3224) Notify AM with containers (on decommissioning node) could be preempted after timeout.
[ https://issues.apache.org/jira/browse/YARN-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3224: -- Release Note: (was: We should leverage YARN preemption framework to notify AM that some containers will be preempted after a timeout.) Notify AM with containers (on decommissioning node) could be preempted after timeout. - Key: YARN-3224 URL: https://issues.apache.org/jira/browse/YARN-3224 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Sunil G -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3224) Notify AM with containers (on decommissioning node) could be preempted after timeout.
[ https://issues.apache.org/jira/browse/YARN-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3224: -- Description: We should leverage YARN preemption framework to notify AM that some containers will be preempted after a timeout. Notify AM with containers (on decommissioning node) could be preempted after timeout. - Key: YARN-3224 URL: https://issues.apache.org/jira/browse/YARN-3224 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Sunil G We should leverage YARN preemption framework to notify AM that some containers will be preempted after a timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage
[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698602#comment-14698602 ] Varun Saxena commented on YARN-4053: This patch demonstrates the approach mentioned above and works for both integral and floating point values. But for floating point values, the restriction on part of the client is that it should send values in decimal format always otherwise when I add metric filters, matching will fail. I guess its a fair enough restriction to place. In the patch, we can indicate that numerical values have to be stored per column/column prefix. We can however extend this logic for all values and indicate if values to be stored are ASCII encoded as well, so that different kind of values can be stored differently in same column. But there is no use case for this as of now, so haven't done so. I will remove the part about floating point numbers from patch, if we dont want it now. Change the way metric values are stored in HBase Storage Key: YARN-4053 URL: https://issues.apache.org/jira/browse/YARN-4053 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-4053-YARN-2928.01.patch Currently HBase implementation uses GenericObjectMapper to convert and store values in backend HBase storage. This converts everything into a string representation(ASCII/UTF-8 encoded byte array). While this is fine in most cases, it does not quite serve our use case for metrics. So we need to decide how are we going to encode and decode metric values and store them in HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage
[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698608#comment-14698608 ] Hadoop QA commented on YARN-4053: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 3s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 55s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 54s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 16s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 26s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 52s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 22s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 38m 58s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750693/YARN-4053-YARN-2928.01.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / f40c735 | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8853/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8853/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8853/console | This message was automatically generated. Change the way metric values are stored in HBase Storage Key: YARN-4053 URL: https://issues.apache.org/jira/browse/YARN-4053 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-4053-YARN-2928.01.patch Currently HBase implementation uses GenericObjectMapper to convert and store values in backend HBase storage. This converts everything into a string representation(ASCII/UTF-8 encoded byte array). While this is fine in most cases, it does not quite serve our use case for metrics. So we need to decide how are we going to encode and decode metric values and store them in HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3997) An Application requesting multiple core containers can't preempt running application made of single core containers
[ https://issues.apache.org/jira/browse/YARN-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Shechter updated YARN-3997: --- Description: When our cluster is configured with preemption, and is fully loaded with an application consuming 1-core containers, it will not kill off these containers when a new application kicks in requesting containers with a size 1, for example 4 core containers. When the second application attempts to us 1-core containers as well, preemption proceeds as planned and everything works properly. It is my assumption, that the fair-scheduler, while recognizing it needs to kill off some container to make room for the new application, fails to find a SINGLE container satisfying the request for a 4-core container (since all existing containers are 1-core containers), and isn't smart enough to realize it needs to kill off 4 single-core containers (in this case) on a single node, for the new application to be able to proceed... The exhibited affect is that the new application is hung indefinitely and never gets the resources it requires. This can easily be replicated with any yarn application. Our goto scenario in this case is running pyspark with 1-core executors (containers) while trying to launch h20.ai framework which INSISTS on having at least 4 cores per container. was: When our cluster is configures with preemption, and is fully loaded with an application consuming 1-core containers, it will not kill off these containers when a new application kicks in requesting, for example 4 core containers. When the second application attempts to us 1-core containers as well, preemption proceeds as planned and everything works properly. It is my assumptiom, that the fair-scheduler, while recognizing it needs to kill off some container to make room for the new application, fails to find a SINGLE container satisfying the request for a 4-core container (since all existing containers are 1-core containers), and isn't smart enough to realize it needs to kill off 4 single-core containers (in this case) on a single node, for the new application to be able to proceed... The exhibited affect is that the new application is hung indefinitely and never gets the resources it requires. This can easily be replicated with any yarn application. Our goto scenario in this case is running pyspark with 1-core executors (containers) while trying to launch h20.ai framework which INSISTS on having at least 4 cores per container. An Application requesting multiple core containers can't preempt running application made of single core containers --- Key: YARN-3997 URL: https://issues.apache.org/jira/browse/YARN-3997 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.1 Environment: Ubuntu 14.04, Hadoop 2.7.1, Physical Machines Reporter: Dan Shechter Assignee: Karthik Kambatla Priority: Critical When our cluster is configured with preemption, and is fully loaded with an application consuming 1-core containers, it will not kill off these containers when a new application kicks in requesting containers with a size 1, for example 4 core containers. When the second application attempts to us 1-core containers as well, preemption proceeds as planned and everything works properly. It is my assumption, that the fair-scheduler, while recognizing it needs to kill off some container to make room for the new application, fails to find a SINGLE container satisfying the request for a 4-core container (since all existing containers are 1-core containers), and isn't smart enough to realize it needs to kill off 4 single-core containers (in this case) on a single node, for the new application to be able to proceed... The exhibited affect is that the new application is hung indefinitely and never gets the resources it requires. This can easily be replicated with any yarn application. Our goto scenario in this case is running pyspark with 1-core executors (containers) while trying to launch h20.ai framework which INSISTS on having at least 4 cores per container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3534) Collect memory/cpu usage on the node
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698663#comment-14698663 ] Karthik Kambatla commented on YARN-3534: +1 Collect memory/cpu usage on the node Key: YARN-3534 URL: https://issues.apache.org/jira/browse/YARN-3534 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Assignee: Inigo Goiri Attachments: YARN-3534-1.patch, YARN-3534-10.patch, YARN-3534-11.patch, YARN-3534-12.patch, YARN-3534-14.patch, YARN-3534-15.patch, YARN-3534-16.patch, YARN-3534-16.patch, YARN-3534-17.patch, YARN-3534-17.patch, YARN-3534-18.patch, YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch, YARN-3534-4.patch, YARN-3534-5.patch, YARN-3534-6.patch, YARN-3534-7.patch, YARN-3534-8.patch, YARN-3534-9.patch Original Estimate: 336h Remaining Estimate: 336h YARN should be aware of the resource utilization of the nodes when scheduling containers. For this, this task will implement the collection of memory/cpu usage on the node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3997) An Application requesting multiple core containers can't preempt running application made of single core containers
[ https://issues.apache.org/jira/browse/YARN-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698648#comment-14698648 ] Chen Avnery commented on YARN-3997: --- This has happened to me too and I would love for it to have a fix! An Application requesting multiple core containers can't preempt running application made of single core containers --- Key: YARN-3997 URL: https://issues.apache.org/jira/browse/YARN-3997 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.1 Environment: Ubuntu 14.04, Hadoop 2.7.1, Physical Machines Reporter: Dan Shechter Assignee: Karthik Kambatla Priority: Critical When our cluster is configured with preemption, and is fully loaded with an application consuming 1-core containers, it will not kill off these containers when a new application kicks in requesting containers with a size 1, for example 4 core containers. When the second application attempts to us 1-core containers as well, preemption proceeds as planned and everything works properly. It is my assumption, that the fair-scheduler, while recognizing it needs to kill off some container to make room for the new application, fails to find a SINGLE container satisfying the request for a 4-core container (since all existing containers are 1-core containers), and isn't smart enough to realize it needs to kill off 4 single-core containers (in this case) on a single node, for the new application to be able to proceed... The exhibited affect is that the new application is hung indefinitely and never gets the resources it requires. This can easily be replicated with any yarn application. Our goto scenario in this case is running pyspark with 1-core executors (containers) while trying to launch h20.ai framework which INSISTS on having at least 4 cores per container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3534) Collect memory/cpu usage on the node
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698674#comment-14698674 ] Hudson commented on YARN-3534: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #286 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/286/]) YARN-3534. Collect memory/cpu usage on the node. (Inigo Goiri via kasha) (kasha: rev def12933b38efd5e47c5144b729c1a1496f09229) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeResourceMonitor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitor.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeResourceMonitor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeResourceMonitorImpl.java Collect memory/cpu usage on the node Key: YARN-3534 URL: https://issues.apache.org/jira/browse/YARN-3534 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Assignee: Inigo Goiri Fix For: 2.8.0 Attachments: YARN-3534-1.patch, YARN-3534-10.patch, YARN-3534-11.patch, YARN-3534-12.patch, YARN-3534-14.patch, YARN-3534-15.patch, YARN-3534-16.patch, YARN-3534-16.patch, YARN-3534-17.patch, YARN-3534-17.patch, YARN-3534-18.patch, YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch, YARN-3534-4.patch, YARN-3534-5.patch, YARN-3534-6.patch, YARN-3534-7.patch, YARN-3534-8.patch, YARN-3534-9.patch Original Estimate: 336h Remaining Estimate: 336h YARN should be aware of the resource utilization of the nodes when scheduling containers. For this, this task will implement the collection of memory/cpu usage on the node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3997) An Application requesting multiple core containers can't preempt running application made of single core containers
[ https://issues.apache.org/jira/browse/YARN-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698709#comment-14698709 ] Ilan Assayag commented on YARN-3997: Got exactly the same issue too. Very painful... An Application requesting multiple core containers can't preempt running application made of single core containers --- Key: YARN-3997 URL: https://issues.apache.org/jira/browse/YARN-3997 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.1 Environment: Ubuntu 14.04, Hadoop 2.7.1, Physical Machines Reporter: Dan Shechter Assignee: Karthik Kambatla Priority: Critical When our cluster is configured with preemption, and is fully loaded with an application consuming 1-core containers, it will not kill off these containers when a new application kicks in requesting containers with a size 1, for example 4 core containers. When the second application attempts to us 1-core containers as well, preemption proceeds as planned and everything works properly. It is my assumption, that the fair-scheduler, while recognizing it needs to kill off some container to make room for the new application, fails to find a SINGLE container satisfying the request for a 4-core container (since all existing containers are 1-core containers), and isn't smart enough to realize it needs to kill off 4 single-core containers (in this case) on a single node, for the new application to be able to proceed... The exhibited affect is that the new application is hung indefinitely and never gets the resources it requires. This can easily be replicated with any yarn application. Our goto scenario in this case is running pyspark with 1-core executors (containers) while trying to launch h20.ai framework which INSISTS on having at least 4 cores per container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2467) Add SpanReceiverHost to ResourceManager
[ https://issues.apache.org/jira/browse/YARN-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698730#comment-14698730 ] Masatake Iwasaki commented on YARN-2467: The failures in test seem not to be related to the patch. TestContainerAllocation succeeded on my local environment. Add SpanReceiverHost to ResourceManager --- Key: YARN-2467 URL: https://issues.apache.org/jira/browse/YARN-2467 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Attachments: YARN-2467.001.patch, YARN-2467.002.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2467) Add SpanReceiverHost to ResourceManager
[ https://issues.apache.org/jira/browse/YARN-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated YARN-2467: --- Description: Per process SpanReceiverHost should be initialized in ResourceManager in the same way as NameNode and DataNode do in order to support tracing. Add SpanReceiverHost to ResourceManager --- Key: YARN-2467 URL: https://issues.apache.org/jira/browse/YARN-2467 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Attachments: YARN-2467.001.patch, YARN-2467.002.patch Per process SpanReceiverHost should be initialized in ResourceManager in the same way as NameNode and DataNode do in order to support tracing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3534) Collect memory/cpu usage on the node
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698667#comment-14698667 ] Hudson commented on YARN-3534: -- FAILURE: Integrated in Hadoop-trunk-Commit #8311 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8311/]) YARN-3534. Collect memory/cpu usage on the node. (Inigo Goiri via kasha) (kasha: rev def12933b38efd5e47c5144b729c1a1496f09229) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitor.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeResourceMonitorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeResourceMonitor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeResourceMonitor.java Collect memory/cpu usage on the node Key: YARN-3534 URL: https://issues.apache.org/jira/browse/YARN-3534 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Assignee: Inigo Goiri Fix For: 2.8.0 Attachments: YARN-3534-1.patch, YARN-3534-10.patch, YARN-3534-11.patch, YARN-3534-12.patch, YARN-3534-14.patch, YARN-3534-15.patch, YARN-3534-16.patch, YARN-3534-16.patch, YARN-3534-17.patch, YARN-3534-17.patch, YARN-3534-18.patch, YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch, YARN-3534-4.patch, YARN-3534-5.patch, YARN-3534-6.patch, YARN-3534-7.patch, YARN-3534-8.patch, YARN-3534-9.patch Original Estimate: 336h Remaining Estimate: 336h YARN should be aware of the resource utilization of the nodes when scheduling containers. For this, this task will implement the collection of memory/cpu usage on the node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3997) An Application requesting multiple core containers can't preempt running application made of single core containers
[ https://issues.apache.org/jira/browse/YARN-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698734#comment-14698734 ] Uri Miron commented on YARN-3997: - I am getting the same issue. An Application requesting multiple core containers can't preempt running application made of single core containers --- Key: YARN-3997 URL: https://issues.apache.org/jira/browse/YARN-3997 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.1 Environment: Ubuntu 14.04, Hadoop 2.7.1, Physical Machines Reporter: Dan Shechter Assignee: Karthik Kambatla Priority: Critical When our cluster is configured with preemption, and is fully loaded with an application consuming 1-core containers, it will not kill off these containers when a new application kicks in requesting containers with a size 1, for example 4 core containers. When the second application attempts to us 1-core containers as well, preemption proceeds as planned and everything works properly. It is my assumption, that the fair-scheduler, while recognizing it needs to kill off some container to make room for the new application, fails to find a SINGLE container satisfying the request for a 4-core container (since all existing containers are 1-core containers), and isn't smart enough to realize it needs to kill off 4 single-core containers (in this case) on a single node, for the new application to be able to proceed... The exhibited affect is that the new application is hung indefinitely and never gets the resources it requires. This can easily be replicated with any yarn application. Our goto scenario in this case is running pyspark with 1-core executors (containers) while trying to launch h20.ai framework which INSISTS on having at least 4 cores per container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2467) Add SpanReceiverHost to ResourceManager
[ https://issues.apache.org/jira/browse/YARN-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698733#comment-14698733 ] Masatake Iwasaki commented on YARN-2467: The TraceAdmin added to AdminService is for online tracing configuration update by {{hadoop trace}} commmand as supported by NameNode and DataNode. The {{hadoop trace}} command requires users to specify hostport string of the target IPC server in order to support usecases in which tracing is enabled on specific slave server only. Though I added TraceAdminProtocol to AdminService in the same way as HAServiceProtocol do, NodeManager does not have ipc server for administration. I think it is ok to remove the TraceAdmin feature from ResourceManager as a starting point. Add SpanReceiverHost to ResourceManager --- Key: YARN-2467 URL: https://issues.apache.org/jira/browse/YARN-2467 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Attachments: YARN-2467.001.patch, YARN-2467.002.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4049) Add SpanReceiverHost to NodeManager
[ https://issues.apache.org/jira/browse/YARN-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated YARN-4049: --- Attachment: YARN-4049.001.patch I'm attaching wip patch. The 001 can not be applied to trunk because it depends on YARN-2467. If the patch attached to YARN-2467 is updated before committing, I will fix here later. Add SpanReceiverHost to NodeManager --- Key: YARN-4049 URL: https://issues.apache.org/jira/browse/YARN-4049 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Attachments: YARN-4049.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3534) Collect memory/cpu usage on the node
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698806#comment-14698806 ] Inigo Goiri commented on YARN-3534: --- Thank you [~kasha] for taking care of the review and the commits. I'll be moving to propagating this info to the scheduler. Collect memory/cpu usage on the node Key: YARN-3534 URL: https://issues.apache.org/jira/browse/YARN-3534 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Assignee: Inigo Goiri Fix For: 2.8.0 Attachments: YARN-3534-1.patch, YARN-3534-10.patch, YARN-3534-11.patch, YARN-3534-12.patch, YARN-3534-14.patch, YARN-3534-15.patch, YARN-3534-16.patch, YARN-3534-16.patch, YARN-3534-17.patch, YARN-3534-17.patch, YARN-3534-18.patch, YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch, YARN-3534-4.patch, YARN-3534-5.patch, YARN-3534-6.patch, YARN-3534-7.patch, YARN-3534-8.patch, YARN-3534-9.patch Original Estimate: 336h Remaining Estimate: 336h YARN should be aware of the resource utilization of the nodes when scheduling containers. For this, this task will implement the collection of memory/cpu usage on the node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698927#comment-14698927 ] Hong Zhiguo commented on YARN-4024: --- That's a good reason to have this cache. [~leftnoteasy], in earlier comments, you said {code} 1) If a host_a, has IP=IP1, IP1 is on whitelist. If we change the IP of host_a to IP2, IP2 is in blacklist. We won't do the re-resolve since the cached IP1 is on whitelist. 2) If a host_a, has IP=IP1, IP1 is on blacklist. We may need to do re-resolve every time when the node doing heartbeat since it may change to its IP to a one not on the blacklist. {code} I think that's too complicated. The cache lookup is a part of resolving (name to address). And the check of IP whitelist/blacklist is just the following stage. I think cache with configurable expiration is enough, we'd better leave the 2 stages orthogonal, not to mix them up. BTW, I think it's not good to have Name in NodeId, but Address in whitelist/blacklist. Different layers of abstraction are mixed up. We'll don't have this issue if Name or Address is used for both NodeId and whitelist/blacklist. a better way is to have Name in whitelist/blacklist, instead of Address. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698929#comment-14698929 ] Hong Zhiguo commented on YARN-4024: --- Please ignore the last sentence a better way is to have Name in whitelist/blacklist, instead of Address. Or could someone help to delete it. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4056) Bundling: Searching for multiple containers in a single pass over {queues, applications, priorities}
Srikanth Kandula created YARN-4056: -- Summary: Bundling: Searching for multiple containers in a single pass over {queues, applications, priorities} Key: YARN-4056 URL: https://issues.apache.org/jira/browse/YARN-4056 Project: Hadoop YARN Issue Type: New Feature Components: capacityscheduler, resourcemanager, scheduler Reporter: Srikanth Kandula More than one container is allocated on many NM heartbeats. Yet, the current scheduler allocates exactly one container per iteration over {queues, applications, priorities}. When there are many queues, applications, or priorities allocating only one container per iteration can needlessly increase the duration of the NM heartbeat. In this JIRA, we propose bundling. That is, allow arbitrarily many containers to be allocated in a single iteration over {queues, applications and priorities}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3901) Populate flow run data in the flow_run table
[ https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3901: - Attachment: YARN-3901-YARN-2928.WIP.patch Uploading a work in progress patch. This patch is not yet rebased against the new branch. I would like to finish the patch and then rebase it. - it has some new classes that deal with flow run table. - it adds in cell level tags to the cells being stored in the flow run table. - it has a coprocessor class that currently handles the put (prePut) and scan (preScannerOpen and postScannerOpen) operations. - it has a new AggregationScanner class that is invoked from the preprocessor, so any scans that hit this table will effectively go through via the AggregationScanner class methods - the start time for a flow is defined as the lowest amongst the start times of all applications in that flow run. Similarly the end time for a flow is defined as the biggest amongst the start times of all applications in that flow run. These are stored per flow run upon application creation and application finish events. The coprocessor prePut intercepts these and puts in only the right values. - for metrics, all metrics are stored as they come in. When a metric for a flow run is to be read back, a special scanner is used. This scanner reads all cells for that metric that belong to all applications. Only the latest cell per application is picked and added up to form a metric value for that flow run. The application states are also stored as running and finished in the cell tag for metrics - TODO: - Working next to add in the get (very similar to scan), flush, compact operations. - The applications that have finished can be compacted and a per flow run metric cell can be created. - The finished application cells can be then removed Populate flow run data in the flow_run table Key: YARN-3901 URL: https://issues.apache.org/jira/browse/YARN-3901 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Vrushali C Assignee: Vrushali C Attachments: YARN-3901-YARN-2928.WIP.patch As per the schema proposed in YARN-3815 in https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf filing jira to track creation and population of data in the flow run table. Some points that are being considered: - Stores per flow run information aggregated across applications, flow version RM’s collector writes to on app creation and app completion - Per App collector writes to it for metric updates at a slower frequency than the metric updates to application table primary key: cluster ! user ! flow ! flow run id - Only the latest version of flow-level aggregated metrics will be kept, even if the entity and application level keep a timeseries. - The running_apps column will be incremented on app creation, and decremented on app completion. - For min_start_time the RM writer will simply write a value with the tag for the applicationId. A coprocessor will return the min value of all written values. - - Upon flush and compactions, the min value between all the cells of this column will be written to the cell without any tag (empty tag) and all the other cells will be discarded. - Ditto for the max_end_time, but then the max will be kept. - Tags are represented as #type:value. The type can be not set (0), or can indicate running (1) or complete (2). In those cases (for metrics) only complete app metrics are collapsed on compaction. - The m! values are aggregated (summed) upon read. Only when applications are completed (indicated by tag type 2) can the values be collapsed. - The application ids that have completed and been aggregated into the flow numbers are retained in a separate column for historical tracking: we don’t want to re-aggregate for those upon replay -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4056) Bundling: Searching for multiple containers in a single pass over {queues, applications, priorities}
[ https://issues.apache.org/jira/browse/YARN-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698925#comment-14698925 ] Srikanth Kandula commented on YARN-4056: Will look. Possibly. However, this arch allows any bundling policy. We will push through a couple different bundled policies. I suspect the packer+dependencies+bounded unfairness bundled will be novel. Bundling: Searching for multiple containers in a single pass over {queues, applications, priorities} Key: YARN-4056 URL: https://issues.apache.org/jira/browse/YARN-4056 Project: Hadoop YARN Issue Type: New Feature Components: capacityscheduler, resourcemanager, scheduler Reporter: Srikanth Kandula Assignee: Robert Grandl Attachments: bundling.docx More than one container is allocated on many NM heartbeats. Yet, the current scheduler allocates exactly one container per iteration over {{queues, applications, priorities}}. When there are many queues, applications, or priorities allocating only one container per iteration can needlessly increase the duration of the NM heartbeat. In this JIRA, we propose bundling. That is, allow arbitrarily many containers to be allocated in a single iteration over {{queues, applications and priorities}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4056) Bundling: Searching for multiple containers in a single pass over {queues, applications, priorities}
[ https://issues.apache.org/jira/browse/YARN-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Kandula updated YARN-4056: --- Description: More than one container is allocated on many NM heartbeats. Yet, the current scheduler allocates exactly one container per iteration over {{queues, applications, priorities}}. When there are many queues, applications, or priorities allocating only one container per iteration can needlessly increase the duration of the NM heartbeat. In this JIRA, we propose bundling. That is, allow arbitrarily many containers to be allocated in a single iteration over {queues, applications and priorities}. was: More than one container is allocated on many NM heartbeats. Yet, the current scheduler allocates exactly one container per iteration over {queues, applications, priorities}. When there are many queues, applications, or priorities allocating only one container per iteration can needlessly increase the duration of the NM heartbeat. In this JIRA, we propose bundling. That is, allow arbitrarily many containers to be allocated in a single iteration over {queues, applications and priorities}. Bundling: Searching for multiple containers in a single pass over {queues, applications, priorities} Key: YARN-4056 URL: https://issues.apache.org/jira/browse/YARN-4056 Project: Hadoop YARN Issue Type: New Feature Components: capacityscheduler, resourcemanager, scheduler Reporter: Srikanth Kandula Attachments: bundling.docx More than one container is allocated on many NM heartbeats. Yet, the current scheduler allocates exactly one container per iteration over {{queues, applications, priorities}}. When there are many queues, applications, or priorities allocating only one container per iteration can needlessly increase the duration of the NM heartbeat. In this JIRA, we propose bundling. That is, allow arbitrarily many containers to be allocated in a single iteration over {queues, applications and priorities}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4056) Bundling: Searching for multiple containers in a single pass over {queues, applications, priorities}
[ https://issues.apache.org/jira/browse/YARN-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Kandula updated YARN-4056: --- Attachment: bundling.docx Bundling: Searching for multiple containers in a single pass over {queues, applications, priorities} Key: YARN-4056 URL: https://issues.apache.org/jira/browse/YARN-4056 Project: Hadoop YARN Issue Type: New Feature Components: capacityscheduler, resourcemanager, scheduler Reporter: Srikanth Kandula Attachments: bundling.docx More than one container is allocated on many NM heartbeats. Yet, the current scheduler allocates exactly one container per iteration over {queues, applications, priorities}. When there are many queues, applications, or priorities allocating only one container per iteration can needlessly increase the duration of the NM heartbeat. In this JIRA, we propose bundling. That is, allow arbitrarily many containers to be allocated in a single iteration over {queues, applications and priorities}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4056) Bundling: Searching for multiple containers in a single pass over {queues, applications, priorities}
[ https://issues.apache.org/jira/browse/YARN-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Kandula updated YARN-4056: --- Description: More than one container is allocated on many NM heartbeats. Yet, the current scheduler allocates exactly one container per iteration over {{queues, applications, priorities}}. When there are many queues, applications, or priorities allocating only one container per iteration can needlessly increase the duration of the NM heartbeat. In this JIRA, we propose bundling. That is, allow arbitrarily many containers to be allocated in a single iteration over {{queues, applications and priorities}}. was: More than one container is allocated on many NM heartbeats. Yet, the current scheduler allocates exactly one container per iteration over {{queues, applications, priorities}}. When there are many queues, applications, or priorities allocating only one container per iteration can needlessly increase the duration of the NM heartbeat. In this JIRA, we propose bundling. That is, allow arbitrarily many containers to be allocated in a single iteration over {queues, applications and priorities}. Bundling: Searching for multiple containers in a single pass over {queues, applications, priorities} Key: YARN-4056 URL: https://issues.apache.org/jira/browse/YARN-4056 Project: Hadoop YARN Issue Type: New Feature Components: capacityscheduler, resourcemanager, scheduler Reporter: Srikanth Kandula Attachments: bundling.docx More than one container is allocated on many NM heartbeats. Yet, the current scheduler allocates exactly one container per iteration over {{queues, applications, priorities}}. When there are many queues, applications, or priorities allocating only one container per iteration can needlessly increase the duration of the NM heartbeat. In this JIRA, we propose bundling. That is, allow arbitrarily many containers to be allocated in a single iteration over {{queues, applications and priorities}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4056) Bundling: Searching for multiple containers in a single pass over {queues, applications, priorities}
[ https://issues.apache.org/jira/browse/YARN-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Grandl reassigned YARN-4056: --- Assignee: Robert Grandl Bundling: Searching for multiple containers in a single pass over {queues, applications, priorities} Key: YARN-4056 URL: https://issues.apache.org/jira/browse/YARN-4056 Project: Hadoop YARN Issue Type: New Feature Components: capacityscheduler, resourcemanager, scheduler Reporter: Srikanth Kandula Assignee: Robert Grandl Attachments: bundling.docx More than one container is allocated on many NM heartbeats. Yet, the current scheduler allocates exactly one container per iteration over {{queues, applications, priorities}}. When there are many queues, applications, or priorities allocating only one container per iteration can needlessly increase the duration of the NM heartbeat. In this JIRA, we propose bundling. That is, allow arbitrarily many containers to be allocated in a single iteration over {{queues, applications and priorities}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698908#comment-14698908 ] Rohith Sharma K S commented on YARN-3893: - Sorry for coming very late.. This issue has become stale, need to move forward!! Regarding the patch, # Instead of setting boolean flag for reinitActiveServices in AdminService and other changes, moving {{createAndInitActiveServices();}} from transitionedToStandby to just before starting activeServices would solve such issues. And on exception transitioningToActive, handle add method stopActiveServices in ResourceManager#transitioningToActive() only. # Probably we can remove refreshAll() from AdminService#transitioneToActive if the above approach. Any thoughts? Both RM in active state when Admin#transitionToActive failure from refeshAll() -- Key: YARN-3893 URL: https://issues.apache.org/jira/browse/YARN-3893 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml Cases that can cause this. # Capacity scheduler xml is wrongly configured during switch # Refresh ACL failure due to configuration # Refresh User group failure due to configuration Continuously both RM will try to be active {code} dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm1 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm2 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active {code} # Both Web UI active # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4056) Bundling: Searching for multiple containers in a single pass over {queues, applications, priorities}
[ https://issues.apache.org/jira/browse/YARN-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698915#comment-14698915 ] Karthik Kambatla commented on YARN-4056: Is this similar to {{assignMultiple}} in {{FairScheduler}}? Bundling: Searching for multiple containers in a single pass over {queues, applications, priorities} Key: YARN-4056 URL: https://issues.apache.org/jira/browse/YARN-4056 Project: Hadoop YARN Issue Type: New Feature Components: capacityscheduler, resourcemanager, scheduler Reporter: Srikanth Kandula Assignee: Robert Grandl Attachments: bundling.docx More than one container is allocated on many NM heartbeats. Yet, the current scheduler allocates exactly one container per iteration over {{queues, applications, priorities}}. When there are many queues, applications, or priorities allocating only one container per iteration can needlessly increase the duration of the NM heartbeat. In this JIRA, we propose bundling. That is, allow arbitrarily many containers to be allocated in a single iteration over {{queues, applications and priorities}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3980: -- Attachment: YARN-3980-v1.patch Fixing broken unit tests. Plumb resource-utilization info in node heartbeat through to the scheduler -- Key: YARN-3980 URL: https://issues.apache.org/jira/browse/YARN-3980 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.7.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch YARN-1012 and YARN-3534 collect resource utilization information for all containers and the node respectively and send it to the RM on node heartbeat. We should plumb it through to the scheduler so the scheduler can make use of it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3980: -- Attachment: YARN-3980-v1.patch Whitespace fix. Plumb resource-utilization info in node heartbeat through to the scheduler -- Key: YARN-3980 URL: https://issues.apache.org/jira/browse/YARN-3980 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.7.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, YARN-3980-v1.patch YARN-1012 and YARN-3534 collect resource utilization information for all containers and the node respectively and send it to the RM on node heartbeat. We should plumb it through to the scheduler so the scheduler can make use of it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3980: -- Attachment: (was: YARN-3980-v1.patch) Plumb resource-utilization info in node heartbeat through to the scheduler -- Key: YARN-3980 URL: https://issues.apache.org/jira/browse/YARN-3980 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.7.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch YARN-1012 and YARN-3534 collect resource utilization information for all containers and the node respectively and send it to the RM on node heartbeat. We should plumb it through to the scheduler so the scheduler can make use of it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698955#comment-14698955 ] Hadoop QA commented on YARN-3980: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 19s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:red}-1{color} | javac | 3m 35s | The patch appears to cause the build to fail. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750738/YARN-3980-v0.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 13604bd | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8855/console | This message was automatically generated. Plumb resource-utilization info in node heartbeat through to the scheduler -- Key: YARN-3980 URL: https://issues.apache.org/jira/browse/YARN-3980 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.7.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: YARN-3980-v0.patch YARN-1012 and YARN-3534 collect resource utilization information for all containers and the node respectively and send it to the RM on node heartbeat. We should plumb it through to the scheduler so the scheduler can make use of it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699003#comment-14699003 ] Hadoop QA commented on YARN-3980: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 45s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:red}-1{color} | javac | 8m 20s | The patch appears to cause the build to fail. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750745/YARN-3980-v1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 13604bd | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8856/console | This message was automatically generated. Plumb resource-utilization info in node heartbeat through to the scheduler -- Key: YARN-3980 URL: https://issues.apache.org/jira/browse/YARN-3980 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.7.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch YARN-1012 and YARN-3534 collect resource utilization information for all containers and the node respectively and send it to the RM on node heartbeat. We should plumb it through to the scheduler so the scheduler can make use of it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3980: -- Attachment: (was: YARN-3980-v1.patch) Plumb resource-utilization info in node heartbeat through to the scheduler -- Key: YARN-3980 URL: https://issues.apache.org/jira/browse/YARN-3980 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.7.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch YARN-1012 and YARN-3534 collect resource utilization information for all containers and the node respectively and send it to the RM on node heartbeat. We should plumb it through to the scheduler so the scheduler can make use of it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3980: -- Attachment: YARN-3980-v1.patch Fixed SLS. Plumb resource-utilization info in node heartbeat through to the scheduler -- Key: YARN-3980 URL: https://issues.apache.org/jira/browse/YARN-3980 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.7.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch YARN-1012 and YARN-3534 collect resource utilization information for all containers and the node respectively and send it to the RM on node heartbeat. We should plumb it through to the scheduler so the scheduler can make use of it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3997) An Application requesting multiple core containers can't preempt running application made of single core containers
[ https://issues.apache.org/jira/browse/YARN-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698830#comment-14698830 ] Dan Shechter commented on YARN-3997: Hi, I was trying to find the existing unit-tests for the Fair-Scheduler preeption... All I could find was this: https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java Are the more tests hiding somewhere else? An Application requesting multiple core containers can't preempt running application made of single core containers --- Key: YARN-3997 URL: https://issues.apache.org/jira/browse/YARN-3997 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.1 Environment: Ubuntu 14.04, Hadoop 2.7.1, Physical Machines Reporter: Dan Shechter Assignee: Arun Suresh Priority: Critical When our cluster is configured with preemption, and is fully loaded with an application consuming 1-core containers, it will not kill off these containers when a new application kicks in requesting containers with a size 1, for example 4 core containers. When the second application attempts to us 1-core containers as well, preemption proceeds as planned and everything works properly. It is my assumption, that the fair-scheduler, while recognizing it needs to kill off some container to make room for the new application, fails to find a SINGLE container satisfying the request for a 4-core container (since all existing containers are 1-core containers), and isn't smart enough to realize it needs to kill off 4 single-core containers (in this case) on a single node, for the new application to be able to proceed... The exhibited affect is that the new application is hung indefinitely and never gets the resources it requires. This can easily be replicated with any yarn application. Our goto scenario in this case is running pyspark with 1-core executors (containers) while trying to launch h20.ai framework which INSISTS on having at least 4 cores per container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4055) Report node resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-4055: -- Attachment: YARN-4055-v0.patch First version for sending the node resource utilization in the heartbeat. Report node resource utilization in heartbeat - Key: YARN-4055 URL: https://issues.apache.org/jira/browse/YARN-4055 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.7.1 Reporter: Inigo Goiri Assignee: Inigo Goiri Fix For: 2.8.0 Attachments: YARN-4055-v0.patch Send the resource utilization from the node (obtained in the NodeResourceMonitor) to the RM in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4055) Report node resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698863#comment-14698863 ] Karthik Kambatla commented on YARN-4055: +1 Report node resource utilization in heartbeat - Key: YARN-4055 URL: https://issues.apache.org/jira/browse/YARN-4055 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.7.1 Reporter: Inigo Goiri Assignee: Inigo Goiri Fix For: 2.8.0 Attachments: YARN-4055-v0.patch, YARN-4055-v1.patch Send the resource utilization from the node (obtained in the NodeResourceMonitor) to the RM in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4055) Report node resource utilization in heartbeat
Inigo Goiri created YARN-4055: - Summary: Report node resource utilization in heartbeat Key: YARN-4055 URL: https://issues.apache.org/jira/browse/YARN-4055 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.7.1 Reporter: Inigo Goiri Assignee: Inigo Goiri Fix For: 2.8.0 Send the resource utilization from the node (obtained in the NodeResourceMonitor) to the RM in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4055) Report node resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-4055: -- Attachment: YARN-4055-v1.patch Changing type for node resource monitor in Node Manager. Report node resource utilization in heartbeat - Key: YARN-4055 URL: https://issues.apache.org/jira/browse/YARN-4055 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.7.1 Reporter: Inigo Goiri Assignee: Inigo Goiri Fix For: 2.8.0 Attachments: YARN-4055-v0.patch, YARN-4055-v1.patch Send the resource utilization from the node (obtained in the NodeResourceMonitor) to the RM in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3997) An Application requesting multiple core containers can't preempt running application made of single core containers
[ https://issues.apache.org/jira/browse/YARN-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla reassigned YARN-3997: -- Assignee: Arun Suresh (was: Karthik Kambatla) Was discussing this with [~asuresh] offline, and he wanted to take this up. An Application requesting multiple core containers can't preempt running application made of single core containers --- Key: YARN-3997 URL: https://issues.apache.org/jira/browse/YARN-3997 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.1 Environment: Ubuntu 14.04, Hadoop 2.7.1, Physical Machines Reporter: Dan Shechter Assignee: Arun Suresh Priority: Critical When our cluster is configured with preemption, and is fully loaded with an application consuming 1-core containers, it will not kill off these containers when a new application kicks in requesting containers with a size 1, for example 4 core containers. When the second application attempts to us 1-core containers as well, preemption proceeds as planned and everything works properly. It is my assumption, that the fair-scheduler, while recognizing it needs to kill off some container to make room for the new application, fails to find a SINGLE container satisfying the request for a 4-core container (since all existing containers are 1-core containers), and isn't smart enough to realize it needs to kill off 4 single-core containers (in this case) on a single node, for the new application to be able to proceed... The exhibited affect is that the new application is hung indefinitely and never gets the resources it requires. This can easily be replicated with any yarn application. Our goto scenario in this case is running pyspark with 1-core executors (containers) while trying to launch h20.ai framework which INSISTS on having at least 4 cores per container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4055) Report node resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698849#comment-14698849 ] Karthik Kambatla commented on YARN-4055: Thanks for filing and working on this, Inigo. Patch looks mostly good, but for one minor comment: # Looks like NodeManager#createNodeResourceMonitor could just return NodeResourceMonitor instead of NodeResourceMonitorImpl Report node resource utilization in heartbeat - Key: YARN-4055 URL: https://issues.apache.org/jira/browse/YARN-4055 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.7.1 Reporter: Inigo Goiri Assignee: Inigo Goiri Fix For: 2.8.0 Attachments: YARN-4055-v0.patch Send the resource utilization from the node (obtained in the NodeResourceMonitor) to the RM in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4055) Report node resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698860#comment-14698860 ] Hadoop QA commented on YARN-4055: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 36s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 11s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 48s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 23s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 9s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 7m 2s | Tests failed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 0m 26s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 6m 12s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 56m 21s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.client.api.impl.TestYarnClient | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750721/YARN-4055-v1.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / def1293 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8854/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8854/artifact/patchprocess/whitespace.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8854/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8854/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8854/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8854/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8854/console | This message was automatically generated. Report node resource utilization in heartbeat - Key: YARN-4055 URL: https://issues.apache.org/jira/browse/YARN-4055 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.7.1 Reporter: Inigo Goiri Assignee: Inigo Goiri Fix For: 2.8.0 Attachments: YARN-4055-v0.patch, YARN-4055-v1.patch Send the resource utilization from the node (obtained in the NodeResourceMonitor) to the RM in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4055) Report node resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698866#comment-14698866 ] Hudson commented on YARN-4055: -- FAILURE: Integrated in Hadoop-trunk-Commit #8312 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8312/]) YARN-4055. Report node resource utilization in heartbeat. (Inigo Goiri via kasha) (kasha: rev 13604bd5f119fc81b9942190dfa366afad61bc92) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/impl/pb/NodeStatusPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/NodeStatus.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/Context.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java * hadoop-yarn-project/CHANGES.txt Report node resource utilization in heartbeat - Key: YARN-4055 URL: https://issues.apache.org/jira/browse/YARN-4055 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.7.1 Reporter: Inigo Goiri Assignee: Inigo Goiri Fix For: 2.8.0 Attachments: YARN-4055-v0.patch, YARN-4055-v1.patch Send the resource utilization from the node (obtained in the NodeResourceMonitor) to the RM in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3980: -- Attachment: YARN-3980-v0.patch First version missing unit test based on MiniYARNCluster (WIP). Plumb resource-utilization info in node heartbeat through to the scheduler -- Key: YARN-3980 URL: https://issues.apache.org/jira/browse/YARN-3980 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.7.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: YARN-3980-v0.patch YARN-1012 and YARN-3534 collect resource utilization information for all containers and the node respectively and send it to the RM on node heartbeat. We should plumb it through to the scheduler so the scheduler can make use of it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3980: --- Assignee: Inigo Goiri (was: Karthik Kambatla) Plumb resource-utilization info in node heartbeat through to the scheduler -- Key: YARN-3980 URL: https://issues.apache.org/jira/browse/YARN-3980 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.7.1 Reporter: Karthik Kambatla Assignee: Inigo Goiri Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch YARN-1012 and YARN-3534 collect resource utilization information for all containers and the node respectively and send it to the RM on node heartbeat. We should plumb it through to the scheduler so the scheduler can make use of it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4057) If ContainersMonitor is not enabled, only print related log info one time
Jun Gong created YARN-4057: -- Summary: If ContainersMonitor is not enabled, only print related log info one time Key: YARN-4057 URL: https://issues.apache.org/jira/browse/YARN-4057 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Jun Gong Assignee: Jun Gong Priority: Minor ContainersMonitorImpl will check whether it is enabled when handling every event, and it will print following messages again and again if not enabled: {quote} 2015-08-17 13:20:13,792 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Neither virutal-memory nor physical-memory is needed. Not running the monitor-thread {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699042#comment-14699042 ] Karthik Kambatla commented on YARN-3980: Barely skimmed through the patch. In ResourceTrackerService, when creating the NodeStatusEvent, should we just include remoteNodeStatus instead of each of its members? Plumb resource-utilization info in node heartbeat through to the scheduler -- Key: YARN-3980 URL: https://issues.apache.org/jira/browse/YARN-3980 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.7.1 Reporter: Karthik Kambatla Assignee: Inigo Goiri Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch YARN-1012 and YARN-3534 collect resource utilization information for all containers and the node respectively and send it to the RM on node heartbeat. We should plumb it through to the scheduler so the scheduler can make use of it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699046#comment-14699046 ] Inigo Goiri commented on YARN-3980: --- It would change the previous code a lot but I think it would be cleaner. I can do a proposal with that. Plumb resource-utilization info in node heartbeat through to the scheduler -- Key: YARN-3980 URL: https://issues.apache.org/jira/browse/YARN-3980 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.7.1 Reporter: Karthik Kambatla Assignee: Inigo Goiri Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch YARN-1012 and YARN-3534 collect resource utilization information for all containers and the node respectively and send it to the RM on node heartbeat. We should plumb it through to the scheduler so the scheduler can make use of it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)