[jira] [Updated] (YARN-3096) RM Configured Min User % still showing as 100% in Scheduler
[ https://issues.apache.org/jira/browse/YARN-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sekhon updated YARN-3096: -- Attachment: screenshot-1.png RM Configured Min User % still showing as 100% in Scheduler --- Key: YARN-3096 URL: https://issues.apache.org/jira/browse/YARN-3096 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager, scheduler Affects Versions: 2.6.0 Environment: HDP 2.2 with RM HA Kerberos managed by Ambari 1.7 Reporter: Hari Sekhon Attachments: screenshot-1.png After setting the Capacity Scheduler minimum-user-limit-percent to 25 it still shows as 100% in the RM web UI under Scheduler even after doing queue refresh and even fully restarting both HA Resource Managers. /etc/hadoop/conf/capacity-scheduler.xml: {code}configuration property nameyarn.scheduler.capacity.default.minimum-user-limit-percent/name value25/value /property ...{code} This cluster is Kerberized and managed by Ambari, I've refreshed via both Ambari and manually on the command line via {code}yarn rmadmin -refreshQueues{code} but it still shows the same. Screenshot attached to show it still shows 100% instead of the expected 25%. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3024) LocalizerRunner should give DIE action when all resources are localized
[ https://issues.apache.org/jira/browse/YARN-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291942#comment-14291942 ] Hudson commented on YARN-3024: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2036 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2036/]) YARN-3024. LocalizerRunner should give DIE action when all resources are (xgong: rev 0d6bd62102f94c55d59f7a0a86a684e99d746127) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java LocalizerRunner should give DIE action when all resources are localized --- Key: YARN-3024 URL: https://issues.apache.org/jira/browse/YARN-3024 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Fix For: 2.7.0 Attachments: YARN-3024.01.patch, YARN-3024.02.patch, YARN-3024.03.patch, YARN-3024.04.patch We have observed that {{LocalizerRunner}} always gives a LIVE action at the end of localization process. The problem is {{findNextResource()}} can return null even when {{pending}} was not empty prior to the call. This method removes localized resources from {{pending}}, therefore we should check the return value, and gives DIE action when it returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3096) RM Configured Min User % still showing as 100% in Scheduler
[ https://issues.apache.org/jira/browse/YARN-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sekhon updated YARN-3096: -- Attachment: (was: screenshot-1.png) RM Configured Min User % still showing as 100% in Scheduler --- Key: YARN-3096 URL: https://issues.apache.org/jira/browse/YARN-3096 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager, scheduler Affects Versions: 2.6.0 Environment: HDP 2.2 with RM HA Kerberos managed by Ambari 1.7 Reporter: Hari Sekhon After setting the Capacity Scheduler minimum-user-limit-percent to 25 it still shows as 100% in the RM web UI under Scheduler even after doing queue refresh and even fully restarting both HA Resource Managers. /etc/hadoop/conf/capacity-scheduler.xml: {code}configuration property nameyarn.scheduler.capacity.default.minimum-user-limit-percent/name value25/value /property ...{code} This cluster is Kerberized and managed by Ambari, I've refreshed via both Ambari and manually on the command line via {code}yarn rmadmin -refreshQueues{code} but it still shows the same. Screenshot attached to show it still shows 100% instead of the expected 25%. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3096) RM Configured Min User % still showing as 100% in Scheduler
[ https://issues.apache.org/jira/browse/YARN-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe resolved YARN-3096. -- Resolution: Invalid The user interface still shows 100% because the wrong property is being set, so the queue is not using a 25% user limit. This can also be verified by examining the RM logs after it starts up or the queues are refreshed. In queue properties the queue name needs to be the full path including root, so the property should be yarn.scheduler.capacity.root.default.miminum-user-limit-percent. See the [CapacityScheduler documentation|http://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html] for more details. RM Configured Min User % still showing as 100% in Scheduler --- Key: YARN-3096 URL: https://issues.apache.org/jira/browse/YARN-3096 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager, scheduler Affects Versions: 2.6.0 Environment: HDP 2.2 with RM HA Kerberos managed by Ambari 1.7 Reporter: Hari Sekhon Attachments: screenshot-1.png After setting the Capacity Scheduler minimum-user-limit-percent to 25 it still shows as 100% in the RM web UI under Scheduler even after doing queue refresh and even fully restarting both HA Resource Managers. /etc/hadoop/conf/capacity-scheduler.xml: {code}configuration property nameyarn.scheduler.capacity.default.minimum-user-limit-percent/name value25/value /property ...{code} This cluster is Kerberized and managed by Ambari, I've refreshed via both Ambari and manually on the command line via {code}yarn rmadmin -refreshQueues{code} but it still shows the same. Screenshot attached to show it still shows 100% instead of the expected 25%. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2897) CrossOriginFilter needs more log statements
[ https://issues.apache.org/jira/browse/YARN-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-2897: Description: CrossOriginFilter does not log as much to make debugging easier (was: CrossOriginFilter does not log as mcch to make debugging easier) CrossOriginFilter needs more log statements --- Key: YARN-2897 URL: https://issues.apache.org/jira/browse/YARN-2897 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-2897.patch CrossOriginFilter does not log as much to make debugging easier -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3024) LocalizerRunner should give DIE action when all resources are localized
[ https://issues.apache.org/jira/browse/YARN-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291915#comment-14291915 ] Hudson commented on YARN-3024: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #86 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/86/]) YARN-3024. LocalizerRunner should give DIE action when all resources are (xgong: rev 0d6bd62102f94c55d59f7a0a86a684e99d746127) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java LocalizerRunner should give DIE action when all resources are localized --- Key: YARN-3024 URL: https://issues.apache.org/jira/browse/YARN-3024 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Fix For: 2.7.0 Attachments: YARN-3024.01.patch, YARN-3024.02.patch, YARN-3024.03.patch, YARN-3024.04.patch We have observed that {{LocalizerRunner}} always gives a LIVE action at the end of localization process. The problem is {{findNextResource()}} can return null even when {{pending}} was not empty prior to the call. This method removes localized resources from {{pending}}, therefore we should check the return value, and gives DIE action when it returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3088) LinuxContainerExecutor.deleteAsUser can throw NPE if native executor returns an error
[ https://issues.apache.org/jira/browse/YARN-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291960#comment-14291960 ] Jason Lowe commented on YARN-3088: -- +1 lgtm. Committing this. LinuxContainerExecutor.deleteAsUser can throw NPE if native executor returns an error - Key: YARN-3088 URL: https://issues.apache.org/jira/browse/YARN-3088 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Jason Lowe Assignee: Eric Payne Attachments: YARN-3088.v1.txt If the native executor returns an error trying to delete a path as a particular user when dir==null then the code can NPE trying to build a log message for the error. It blindly deferences dir in the log message despite the code just above explicitly handling the cases when dir could be null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3096) RM Configured Min User % still showing as 100% in Scheduler
[ https://issues.apache.org/jira/browse/YARN-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291979#comment-14291979 ] Hari Sekhon commented on YARN-3096: --- That's weird, that was set by Ambari, maybe this is an Ambari issue since I literally just changed the number. I should have spotted this but it's been a few months since I was tuning Capacity Scheduler queues... doh. I will raise this as an issue with Ambari's default in this case. RM Configured Min User % still showing as 100% in Scheduler --- Key: YARN-3096 URL: https://issues.apache.org/jira/browse/YARN-3096 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager, scheduler Affects Versions: 2.6.0 Environment: HDP 2.2 with RM HA Kerberos managed by Ambari 1.7 Reporter: Hari Sekhon Attachments: screenshot-1.png After setting the Capacity Scheduler minimum-user-limit-percent to 25 it still shows as 100% in the RM web UI under Scheduler even after doing queue refresh and even fully restarting both HA Resource Managers. /etc/hadoop/conf/capacity-scheduler.xml: {code}configuration property nameyarn.scheduler.capacity.default.minimum-user-limit-percent/name value25/value /property ...{code} This cluster is Kerberized and managed by Ambari, I've refreshed via both Ambari and manually on the command line via {code}yarn rmadmin -refreshQueues{code} but it still shows the same. Screenshot attached to show it still shows 100% instead of the expected 25%. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3096) RM Configured Min User % still showing as 100% in Scheduler
[ https://issues.apache.org/jira/browse/YARN-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291990#comment-14291990 ] Hari Sekhon commented on YARN-3096: --- Raised AMBARI-9331 RM Configured Min User % still showing as 100% in Scheduler --- Key: YARN-3096 URL: https://issues.apache.org/jira/browse/YARN-3096 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager, scheduler Affects Versions: 2.6.0 Environment: HDP 2.2 with RM HA Kerberos managed by Ambari 1.7 Reporter: Hari Sekhon Attachments: screenshot-1.png After setting the Capacity Scheduler minimum-user-limit-percent to 25 it still shows as 100% in the RM web UI under Scheduler even after doing queue refresh and even fully restarting both HA Resource Managers. /etc/hadoop/conf/capacity-scheduler.xml: {code}configuration property nameyarn.scheduler.capacity.default.minimum-user-limit-percent/name value25/value /property ...{code} This cluster is Kerberized and managed by Ambari, I've refreshed via both Ambari and manually on the command line via {code}yarn rmadmin -refreshQueues{code} but it still shows the same. Screenshot attached to show it still shows 100% instead of the expected 25%. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3096) RM Configured Min User % still showing as 100% in Scheduler
[ https://issues.apache.org/jira/browse/YARN-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sekhon updated YARN-3096: -- Attachment: screenshot-1.png RM Configured Min User % still showing as 100% in Scheduler --- Key: YARN-3096 URL: https://issues.apache.org/jira/browse/YARN-3096 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager, scheduler Affects Versions: 2.6.0 Environment: HDP 2.2 with RM HA Kerberos managed by Ambari 1.7 Reporter: Hari Sekhon Attachments: screenshot-1.png After setting the Capacity Scheduler minimum-user-limit-percent to 25 it still shows as 100% in the RM web UI under Scheduler even after doing queue refresh and even fully restarting both HA Resource Managers. /etc/hadoop/conf/capacity-scheduler.xml: {code}configuration property nameyarn.scheduler.capacity.default.minimum-user-limit-percent/name value25/value /property ...{code} This cluster is Kerberized and managed by Ambari, I've refreshed via both Ambari and manually on the command line via {code}yarn rmadmin -refreshQueues{code} but it still shows the same. Screenshot attached to show it still shows 100% instead of the expected 25%. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3096) RM Configured Min User % still showing as 100% in Scheduler
Hari Sekhon created YARN-3096: - Summary: RM Configured Min User % still showing as 100% in Scheduler Key: YARN-3096 URL: https://issues.apache.org/jira/browse/YARN-3096 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager, scheduler Affects Versions: 2.6.0 Environment: HDP 2.2 with RM HA Kerberos managed by Ambari 1.7 Reporter: Hari Sekhon Attachments: screenshot-1.png After setting the Capacity Scheduler minimum-user-limit-percent to 25 it still shows as 100% in the RM web UI under Scheduler even after doing queue refresh and even fully restarting both HA Resource Managers. /etc/hadoop/conf/capacity-scheduler.xml: {code}configuration property nameyarn.scheduler.capacity.default.minimum-user-limit-percent/name value25/value /property ...{code} This cluster is Kerberized and managed by Ambari, I've refreshed via both Ambari and manually on the command line via {code}yarn rmadmin -refreshQueues{code} but it still shows the same. Screenshot attached to show it still shows 100% instead of the expected 25%. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3088) LinuxContainerExecutor.deleteAsUser can throw NPE if native executor returns an error
[ https://issues.apache.org/jira/browse/YARN-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291978#comment-14291978 ] Hudson commented on YARN-3088: -- FAILURE: Integrated in Hadoop-trunk-Commit #6931 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6931/]) YARN-3088. LinuxContainerExecutor.deleteAsUser can throw NPE if native executor returns an error. Contributed by Eric Payne (jlowe: rev 902c6ea7e4d3b49e49d9ce51ae9d12694ecfcf89) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java * hadoop-yarn-project/CHANGES.txt LinuxContainerExecutor.deleteAsUser can throw NPE if native executor returns an error - Key: YARN-3088 URL: https://issues.apache.org/jira/browse/YARN-3088 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Jason Lowe Assignee: Eric Payne Fix For: 2.7.0 Attachments: YARN-3088.v1.txt If the native executor returns an error trying to delete a path as a particular user when dir==null then the code can NPE trying to build a log message for the error. It blindly deferences dir in the log message despite the code just above explicitly handling the cases when dir could be null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3085) Application summary should include the application type
[ https://issues.apache.org/jira/browse/YARN-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292044#comment-14292044 ] Jason Lowe commented on YARN-3085: -- Thanks for the patch, Rohith! For backwards compatibility, I think we need to add the application type at the end of the line. That way people who are using awk or other tools to cut certain columns won't be surprised when this field is added. Application summary should include the application type --- Key: YARN-3085 URL: https://issues.apache.org/jira/browse/YARN-3085 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Jason Lowe Assignee: Rohith Attachments: 0001-YARN-3085.patch Adding the application type to the RM application summary log makes it easier to audit the number of applications from various app frameworks that are running on the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2718) Create a CompositeConatainerExecutor that combines DockerContainerExecutor and DefaultContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292258#comment-14292258 ] Chris Douglas commented on YARN-2718: - I share Allen's skepticism. Adding this to the CLC is an invasive change. If the purpose is debugging, wouldn't a composite CE that does the demux be sufficient? Are there other use cases this supports? Create a CompositeConatainerExecutor that combines DockerContainerExecutor and DefaultContainerExecutor --- Key: YARN-2718 URL: https://issues.apache.org/jira/browse/YARN-2718 Project: Hadoop YARN Issue Type: New Feature Reporter: Abin Shahab Attachments: YARN-2718.patch There should be a composite container that allows users to run their jobs in DockerContainerExecutor, but switch to DefaultContainerExecutor for debugging purposes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3028) Better syntax for replace label CLI
[ https://issues.apache.org/jira/browse/YARN-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292257#comment-14292257 ] Wangda Tan commented on YARN-3028: -- [~rohithsharma], could you take a look at the latest patch, if you feel it's good, I will commit it after Jenkins gets back. Better syntax for replace label CLI --- Key: YARN-3028 URL: https://issues.apache.org/jira/browse/YARN-3028 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Jian He Assignee: Rohith Attachments: 0001-YARN-3028.patch, 0002-YARN-3028.patch, 0003-YARN-3028.patch The command to replace label now is such: {code} yarn rmadmin -replaceLabelsOnNode [node1:port,label1,label2 node2:port,label1,label2] {code} Instead of {code} node1:port,label1,label2 {code} I think it's better to say {code} node1:port=label1,label2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3024) LocalizerRunner should give DIE action when all resources are localized
[ https://issues.apache.org/jira/browse/YARN-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291881#comment-14291881 ] Hudson commented on YARN-3024: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #82 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/82/]) YARN-3024. LocalizerRunner should give DIE action when all resources are (xgong: rev 0d6bd62102f94c55d59f7a0a86a684e99d746127) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java LocalizerRunner should give DIE action when all resources are localized --- Key: YARN-3024 URL: https://issues.apache.org/jira/browse/YARN-3024 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Fix For: 2.7.0 Attachments: YARN-3024.01.patch, YARN-3024.02.patch, YARN-3024.03.patch, YARN-3024.04.patch We have observed that {{LocalizerRunner}} always gives a LIVE action at the end of localization process. The problem is {{findNextResource()}} can return null even when {{pending}} was not empty prior to the call. This method removes localized resources from {{pending}}, therefore we should check the return value, and gives DIE action when it returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3092) Create common ResourceUsage class to track labeled resource usages in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3092: - Summary: Create common ResourceUsage class to track labeled resource usages in Capacity Scheduler (was: Create common resource usage class to track labeled resource/capacity in Capacity Scheduler) Create common ResourceUsage class to track labeled resource usages in Capacity Scheduler Key: YARN-3092 URL: https://issues.apache.org/jira/browse/YARN-3092 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3092.1.patch, YARN-3092.2.patch Since we have labels on nodes, so we need to track resource usage *by labels*, includes - AM resource (to enforce max-am-resource-by-label after YARN-2637) - Used resource (includes AM resource usage) - Reserved resource - Pending resource - Headroom Benefits to have such a common class are: - Reuse lots of code in different places (Queue/App/User), better maintainability and readability. - Can make fine-grained locking (e.g. accessing used resource in a queue doesn't need lock a queue) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292235#comment-14292235 ] Ray Chiang commented on YARN-2868: -- I would like to make this metrics discussion a bit more clear for my own sanity. The current situation: A1) ClusterMetrics, prior to YARN-2802, only had NM metrics. AM metrics were added in YARN-2802, partly because storing in each node isn't useful for debugging. Review from Vinod pushed the metric from the RM (since it really isn't RM related) to ClusterMetrics. A2) QueueMetrics (and derived classes) currently has metrics for App counts and MB/VCore/Container statistics. This JIRA is the first of many, to start placing the metrics to get some sort of YARN profiling in place, at least at some basic level. B1) If it's put into ClusterMetrics, it is as Anubhav mentioned, a good global metric/warning system, but won't necessarily help with debugging other than at the cluster level. B2) If it's put into the QueueMetrics, then there is the additional ability to be able to debug queue vs. network/cluster issues with respect to container allocation. My feedback on the discussion so far: C1) I do believe container allocation has a chance of being queue dependent. Now, whether it's only useful for FairScheduler vs. other schedulers could be debated (which is why it was originally in FSQueueMetrics). C2) QueueMetrics has the advantage of being able to have a customer take a metrics snapshot and use it for debugging application delays (at least for this first metric so far). My goal for the near-future is to continue adding to this area in order to get a clear snapshot of any RM related application runtime metrics for each queue. Any thoughts? PS: I appreciate all the great feedback so far. It's definitely giving me places to look at the code and get a better overall understanding. Thanks. Add metric for initial container launch time to FairScheduler - Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: metrics, supportability Attachments: YARN-2868-01.patch, YARN-2868.002.patch, YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, YARN-2868.006.patch, YARN-2868.007.patch Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3028) Better syntax for replace label CLI
[ https://issues.apache.org/jira/browse/YARN-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3028: - Attachment: 0003-YARN-3028.patch After YARN-2800, test will fail for this patch. Rebased patch a little bit and added a small fix in CommonsNodeLabelsManager to make disable-node-labels-manager-test works correct. Kick Jenkins. Better syntax for replace label CLI --- Key: YARN-3028 URL: https://issues.apache.org/jira/browse/YARN-3028 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Jian He Assignee: Rohith Attachments: 0001-YARN-3028.patch, 0002-YARN-3028.patch, 0003-YARN-3028.patch The command to replace label now is such: {code} yarn rmadmin -replaceLabelsOnNode [node1:port,label1,label2 node2:port,label1,label2] {code} Instead of {code} node1:port,label1,label2 {code} I think it's better to say {code} node1:port=label1,label2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1743) Decorate event transitions and the event-types with their behaviour
[ https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292277#comment-14292277 ] Wangda Tan commented on YARN-1743: -- [~zjffdu], could you add license to these files and re-generate diagram? We can then start a thread in yarn-dev to discuss how to move it forward. Thanks, Wangda Decorate event transitions and the event-types with their behaviour --- Key: YARN-1743 URL: https://issues.apache.org/jira/browse/YARN-1743 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Jeff Zhang Labels: documentation Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743-2.patch, YARN-1743.patch Helps to annotate the transitions with (start-state, end-state) pair and the events with (source, destination) pair. Not just readability, we may also use them to generate the event diagrams across components. Not a blocker for 0.23, but let's see. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3028) Better syntax for replace label CLI
[ https://issues.apache.org/jira/browse/YARN-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292327#comment-14292327 ] Hadoop QA commented on YARN-3028: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694596/0003-YARN-3028.patch against trunk revision 7574df1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6419//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6419//console This message is automatically generated. Better syntax for replace label CLI --- Key: YARN-3028 URL: https://issues.apache.org/jira/browse/YARN-3028 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Jian He Assignee: Rohith Attachments: 0001-YARN-3028.patch, 0002-YARN-3028.patch, 0003-YARN-3028.patch The command to replace label now is such: {code} yarn rmadmin -replaceLabelsOnNode [node1:port,label1,label2 node2:port,label1,label2] {code} Instead of {code} node1:port,label1,label2 {code} I think it's better to say {code} node1:port=label1,label2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2897) CrossOriginFilter needs more log statements
[ https://issues.apache.org/jira/browse/YARN-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292354#comment-14292354 ] Jonathan Eagles commented on YARN-2897: --- [~mitdesai], can you add the get header that is being reject to the log file? CrossOriginFilter needs more log statements --- Key: YARN-2897 URL: https://issues.apache.org/jira/browse/YARN-2897 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-2897.patch CrossOriginFilter does not log as much to make debugging easier -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2718) Create a CompositeConatainerExecutor that combines DockerContainerExecutor and DefaultContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292275#comment-14292275 ] Max commented on YARN-2718: --- I also think that it is not a good approach. Most likely it will not work for a whole industry - life technology. In this industry there is an attempt to make standards. In this standards most likely tools (binaries) will be distributed only in dockers containers. So, binaries will be not available for execution outside containers. Basically, I don't see a simple solution how to debug without dockers containers. Create a CompositeConatainerExecutor that combines DockerContainerExecutor and DefaultContainerExecutor --- Key: YARN-2718 URL: https://issues.apache.org/jira/browse/YARN-2718 Project: Hadoop YARN Issue Type: New Feature Reporter: Abin Shahab Attachments: YARN-2718.patch There should be a composite container that allows users to run their jobs in DockerContainerExecutor, but switch to DefaultContainerExecutor for debugging purposes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2897) CrossOriginFilter needs more log statements
[ https://issues.apache.org/jira/browse/YARN-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292307#comment-14292307 ] Hadoop QA commented on YARN-2897: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12683387/YARN-2897.patch against trunk revision 7574df1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6418//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6418//console This message is automatically generated. CrossOriginFilter needs more log statements --- Key: YARN-2897 URL: https://issues.apache.org/jira/browse/YARN-2897 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-2897.patch CrossOriginFilter does not log as much to make debugging easier -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3098) Create common QueueCapacities class in Capacity Scheduler to track capacities-by-labels of queues
Wangda Tan created YARN-3098: Summary: Create common QueueCapacities class in Capacity Scheduler to track capacities-by-labels of queues Key: YARN-3098 URL: https://issues.apache.org/jira/browse/YARN-3098 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan Similar to YARN-3092, after YARN-796, now queues (ParentQueue and LeafQueue) need to track capacities-label (e.g. absolute-capacity, maximum-capacity, absolute-capacity, absolute-maximum-capacity, etc.). It's better to have a class to encapsulate these capacities to make both better maintainability/readability and fine-grained locking. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes
[ https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292145#comment-14292145 ] Ted Yu commented on YARN-3025: -- The persistence of blacklisted nodes doesn't have to be 1-to-1 with each heartbeat from AM. RM can decide a proper interval. Provide API for retrieving blacklisted nodes Key: YARN-3025 URL: https://issues.apache.org/jira/browse/YARN-3025 Project: Hadoop YARN Issue Type: Improvement Reporter: Ted Yu We have the following method which updates blacklist: {code} public synchronized void updateBlacklist(ListString blacklistAdditions, ListString blacklistRemovals) { {code} Upon AM failover, there should be an API which returns the blacklisted nodes so that the new AM can make consistent decisions. The new API can be: {code} public synchronized ListString getBlacklistedNodes() {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2808) yarn client tool can not list app_attempt's container info correctly
[ https://issues.apache.org/jira/browse/YARN-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292141#comment-14292141 ] Hadoop QA commented on YARN-2808: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694541/YARN-2808.20150126-1.patch against trunk revision 7574df1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.api.impl.TestAMRMClient org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA org.apache.hadoop.yarn.client.cli.TestRMAdminCLI Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6417//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6417//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6417//console This message is automatically generated. yarn client tool can not list app_attempt's container info correctly Key: YARN-2808 URL: https://issues.apache.org/jira/browse/YARN-2808 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Reporter: Gordon Wang Assignee: Naganarasimha G R Attachments: YARN-2808.20150126-1.patch When enabling timeline server, yarn client can not list the container info for a application attempt correctly. Here is the reproduce step. # enabling yarn timeline server # submit a MR job # after the job is finished. use yarn client to list the container info of the app attempt. Then, since the RM has cached the application's attempt info, the output show {noformat} [hadoop@localhost hadoop-3.0.0-SNAPSHOT]$ ./bin/yarn container -list appattempt_1415168250217_0001_01 14/11/05 01:19:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/11/05 01:19:15 INFO impl.TimelineClientImpl: Timeline service address: http://0.0.0.0:8188/ws/v1/timeline/ 14/11/05 01:19:16 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 14/11/05 01:19:16 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200 Total number of containers :0 Container-Id Start Time Finish Time StateHost LOG-URL {noformat} But if the rm is restarted, client can fetch the container info from timeline server correctly. {noformat} [hadoop@localhost hadoop-3.0.0-SNAPSHOT]$ ./bin/yarn container -list appattempt_1415168250217_0001_01 14/11/05 01:21:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/11/05 01:21:06 INFO impl.TimelineClientImpl: Timeline service address: http://0.0.0.0:8188/ws/v1/timeline/ 14/11/05 01:21:06 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 14/11/05 01:21:06 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200 Total number of containers :4 Container-Id Start Time Finish Time StateHost LOG-URL container_1415168250217_0001_01_01 1415168318376 1415168349896COMPLETElocalhost.localdomain:47024 http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_01/container_1415168250217_0001_01_01/hadoop container_1415168250217_0001_01_02 1415168326399 1415168334858COMPLETElocalhost.localdomain:47024
[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292159#comment-14292159 ] Wangda Tan commented on YARN-3075: -- bq. Right now, when we getLabelsToNodes we simply query labelCollections. If we change like above, we will have to query nodeCollections as well to find out what all nodes are associated with the host stored. Good point! Thanks for reminding. I think It will be inefficient if we need query nodeCollections when get nodes associated with label. I prefer to keep it as what you suggested, store all related node-ids (instead of host-only in some cases) to NodeLabel. NodeLabelsManager implementation to retrieve label to node mapping -- Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3075.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3092) Create common resource usage class to track labeled resource/capacity in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292160#comment-14292160 ] Wangda Tan commented on YARN-3092: -- Failed tests are unrelated to this change. Create common resource usage class to track labeled resource/capacity in Capacity Scheduler --- Key: YARN-3092 URL: https://issues.apache.org/jira/browse/YARN-3092 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3092.1.patch, YARN-3092.2.patch Since we have labels on nodes, so we need to track resource usage *by labels*, includes - AM resource (to enforce max-am-resource-by-label after YARN-2637) - Used resource (includes AM resource usage) - Reserved resource - Pending resource - Headroom Benefits to have such a common class are: - Reuse lots of code in different places (Queue/App/User), better maintainability and readability. - Can make fine-grained locking (e.g. accessing used resource in a queue doesn't need lock a queue) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3097) Logging of resource recovery on NM restart has redundancies
Jason Lowe created YARN-3097: Summary: Logging of resource recovery on NM restart has redundancies Key: YARN-3097 URL: https://issues.apache.org/jira/browse/YARN-3097 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Priority: Minor ResourceLocalizationService logs that it is recovering a resource with the remote and local paths, but then very shortly afterwards the LocalizedResource emits an INIT-LOCALIZED transition that also logs the same remote and local paths. The recovery message should be a debug message, since it's not conveying any useful information that isn't already covered by the resource state transition log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3097) Logging of resource recovery on NM restart has redundancies
[ https://issues.apache.org/jira/browse/YARN-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne reassigned YARN-3097: Assignee: Eric Payne Logging of resource recovery on NM restart has redundancies --- Key: YARN-3097 URL: https://issues.apache.org/jira/browse/YARN-3097 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Eric Payne Priority: Minor ResourceLocalizationService logs that it is recovering a resource with the remote and local paths, but then very shortly afterwards the LocalizedResource emits an INIT-LOCALIZED transition that also logs the same remote and local paths. The recovery message should be a debug message, since it's not conveying any useful information that isn't already covered by the resource state transition log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2932) Add entry for preemptable status to scheduler web UI and queue initialize/refresh logging
[ https://issues.apache.org/jira/browse/YARN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292202#comment-14292202 ] Wangda Tan commented on YARN-2932: -- Thanks for updating, [~eepayne]. New patch screenshot looks good to me, will commit it tomorrow if no objections. Add entry for preemptable status to scheduler web UI and queue initialize/refresh logging --- Key: YARN-2932 URL: https://issues.apache.org/jira/browse/YARN-2932 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.7.0 Reporter: Eric Payne Assignee: Eric Payne Attachments: Screenshot.Queue.Preemption.Disabled.jpg, YARN-2932.v1.txt, YARN-2932.v2.txt, YARN-2932.v3.txt, YARN-2932.v4.txt, YARN-2932.v5.txt, YARN-2932.v6.txt, YARN-2932.v7.txt, YARN-2932.v8.txt YARN-2056 enables the ability to turn preemption on or off on a per-queue level. This JIRA will provide the preemption status for each queue in the {{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue refresh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3099) Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track resources-by-label.
Wangda Tan created YARN-3099: Summary: Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track resources-by-label. Key: YARN-3099 URL: https://issues.apache.org/jira/browse/YARN-3099 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Wangda Tan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2897) CrossOriginFilter needs more log statements
[ https://issues.apache.org/jira/browse/YARN-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-2897: Attachment: YARN-2897.patch Updating the patch CrossOriginFilter needs more log statements --- Key: YARN-2897 URL: https://issues.apache.org/jira/browse/YARN-2897 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-2897.patch, YARN-2897.patch, YARN-2897.patch CrossOriginFilter does not log as much to make debugging easier -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3022) Expose Container resource information from NodeManager for monitoring
[ https://issues.apache.org/jira/browse/YARN-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3022: Attachment: YARN-3022.003.patch Addressed feedback Expose Container resource information from NodeManager for monitoring - Key: YARN-3022 URL: https://issues.apache.org/jira/browse/YARN-3022 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3022.001.patch, YARN-3022.002.patch, YARN-3022.003.patch Along with exposing resource consumption of each container such as (YARN-2141) its worth exposing the actual resource limit associated with them to get better insight into YARN allocation and consumption -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3075: --- Attachment: YARN-3075.002.patch NodeLabelsManager implementation to retrieve label to node mapping -- Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3075.001.patch, YARN-3075.002.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3075: --- Attachment: (was: YARN-3075.002.patch) NodeLabelsManager implementation to retrieve label to node mapping -- Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3075.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3075: --- Attachment: YARN-3075.002.patch NodeLabelsManager implementation to retrieve label to node mapping -- Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3075.001.patch, YARN-3075.002.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2897) CrossOriginFilter needs more log statements
[ https://issues.apache.org/jira/browse/YARN-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292436#comment-14292436 ] Hadoop QA commented on YARN-2897: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694611/YARN-2897.patch against trunk revision 21d5599. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6420//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6420//console This message is automatically generated. CrossOriginFilter needs more log statements --- Key: YARN-2897 URL: https://issues.apache.org/jira/browse/YARN-2897 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-2897.patch, YARN-2897.patch CrossOriginFilter does not log as much to make debugging easier -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3099) Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track resources-by-label.
[ https://issues.apache.org/jira/browse/YARN-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3099: - Description: After YARN-3092, resource-by-label (include used-resource/pending-resource/reserved-resource/AM-resource, etc.) should be tracked in ResourceUsage. Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track resources-by-label. -- Key: YARN-3099 URL: https://issues.apache.org/jira/browse/YARN-3099 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan After YARN-3092, resource-by-label (include used-resource/pending-resource/reserved-resource/AM-resource, etc.) should be tracked in ResourceUsage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2897) CrossOriginFilter needs more log statements
[ https://issues.apache.org/jira/browse/YARN-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-2897: Attachment: YARN-2897.patch Thanks for taking a look [~jeagles]. Attaching the modified patch CrossOriginFilter needs more log statements --- Key: YARN-2897 URL: https://issues.apache.org/jira/browse/YARN-2897 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-2897.patch, YARN-2897.patch CrossOriginFilter does not log as much to make debugging easier -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3098) Create common QueueCapacities class in Capacity Scheduler to track capacities-by-labels of queues
[ https://issues.apache.org/jira/browse/YARN-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292802#comment-14292802 ] Hadoop QA commented on YARN-3098: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694659/YARN-3098.4.patch against trunk revision 6f9fe76. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6427//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6427//console This message is automatically generated. Create common QueueCapacities class in Capacity Scheduler to track capacities-by-labels of queues - Key: YARN-3098 URL: https://issues.apache.org/jira/browse/YARN-3098 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3098.1.patch, YARN-3098.2.patch, YARN-3098.3.patch, YARN-3098.4.patch Similar to YARN-3092, after YARN-796, now queues (ParentQueue and LeafQueue) need to track capacities-label (e.g. absolute-capacity, maximum-capacity, absolute-capacity, absolute-maximum-capacity, etc.). It's better to have a class to encapsulate these capacities to make both better maintainability/readability and fine-grained locking. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3030) set up ATS writer with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3030: -- Attachment: YARN-3030.004.patch Posted patch v.4 to address Zhijie's suggestions. By adding the per-node aggregator web service directly to the node manager web app we're able to get around YARN-3087, but it required adding a dependency from node manager to timeline service (yuck). Also it was bit involved because we need to make sure we still retain the aux service behavior (to be able to leverage the aux service lifecycle) and make sure the same app-level service manager is shared between the aux service and the web service. Verified by adding it as an aux-service of the node manager. set up ATS writer with basic request serving structure and lifecycle Key: YARN-3030 URL: https://issues.apache.org/jira/browse/YARN-3030 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-3030.001.patch, YARN-3030.002.patch, YARN-3030.003.patch, YARN-3030.004.patch Per design in YARN-2928, create an ATS writer as a service, and implement the basic service structure including the lifecycle management. Also, as part of this JIRA, we should come up with the ATS client API for sending requests to this ATS writer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes
[ https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292848#comment-14292848 ] Bikas Saha commented on YARN-3025: -- Yes. But that would mean that the RM cannot provide the latest updates. Provide API for retrieving blacklisted nodes Key: YARN-3025 URL: https://issues.apache.org/jira/browse/YARN-3025 Project: Hadoop YARN Issue Type: Improvement Reporter: Ted Yu We have the following method which updates blacklist: {code} public synchronized void updateBlacklist(ListString blacklistAdditions, ListString blacklistRemovals) { {code} Upon AM failover, there should be an API which returns the blacklisted nodes so that the new AM can make consistent decisions. The new API can be: {code} public synchronized ListString getBlacklistedNodes() {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
Anubhav Dhoot created YARN-3101: --- Summary: FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot reassigned YARN-3101: --- Assignee: Anubhav Dhoot FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2718) Create a CompositeConatainerExecutor that combines DockerContainerExecutor and DefaultContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293050#comment-14293050 ] Chun Chen commented on YARN-2718: - Yes, my mistake. Thanks for pointing out. Attach a fixed patch on YARN-1983 Create a CompositeConatainerExecutor that combines DockerContainerExecutor and DefaultContainerExecutor --- Key: YARN-2718 URL: https://issues.apache.org/jira/browse/YARN-2718 Project: Hadoop YARN Issue Type: New Feature Reporter: Abin Shahab Attachments: YARN-2718.patch There should be a composite container that allows users to run their jobs in DockerContainerExecutor, but switch to DefaultContainerExecutor for debugging purposes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.
[ https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3079: Attachment: YARN-3079.003.patch Scheduler should also update maximumAllocation when updateNodeResource. --- Key: YARN-3079 URL: https://issues.apache.org/jira/browse/YARN-3079 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3079.000.patch, YARN-3079.001.patch, YARN-3079.002.patch, YARN-3079.003.patch Scheduler should also update maximumAllocation when updateNodeResource. Otherwise even the node resource is changed by AdminService#updateNodeResource, maximumAllocation won't be changed. Also RMNodeReconnectEvent called from ResourceTrackerService#registerNodeManager will also trigger AbstractYarnScheduler#updateNodeResource being called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292798#comment-14292798 ] Jian He commented on YARN-2868: --- From what I understand, both YARN-2802 and this jira are trying to capture the time interval among these states. {code} SCHEDULED, ALLOCATED, LAUNCHED, RUNNING {code} YARN-2802 addresses last three states, and this jira is trying to capture the time interval between first two. If possible, I think we should make both implementations consistent. Even, we may consider a generic solution to capture the time interval between state-transitions. Add metric for initial container launch time to FairScheduler - Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: metrics, supportability Attachments: YARN-2868-01.patch, YARN-2868.002.patch, YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, YARN-2868.006.patch, YARN-2868.007.patch Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292892#comment-14292892 ] Jian He commented on YARN-3011: --- bq. I feel using ConverterUtils#getPathFromYarnURL to print the full URL will be more debuggable. [~varun_saxena], sorry, I didn't realize that ConverterUtils.getPathFromYarnURL again throws exception. For simplicity, I think your first patch is good enough. would you like to revert to the first approach? NM dies because of the failure of resource localization --- Key: YARN-3011 URL: https://issues.apache.org/jira/browse/YARN-3011 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.1 Reporter: Wang Hao Assignee: Varun Saxena Attachments: YARN-3011.001.patch, YARN-3011.002.patch, YARN-3011.003.patch NM dies because of IllegalArgumentException when localize resource. 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, 1416997035456, FILE, null } 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, 1419831474153, FILE, null } 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-12-29 13:43:58,701 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user hadoop 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2718) Create a CompositeConatainerExecutor that combines DockerContainerExecutor and DefaultContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292809#comment-14292809 ] Leitao Guo commented on YARN-2718: -- I think this is good to our hadoop cluster, since we have a few applications which have to running in docker containers, but most of the apps needs LCE. So, we need a compositeContainerExecutor and let apps configure which containerexecutor they need. Create a CompositeConatainerExecutor that combines DockerContainerExecutor and DefaultContainerExecutor --- Key: YARN-2718 URL: https://issues.apache.org/jira/browse/YARN-2718 Project: Hadoop YARN Issue Type: New Feature Reporter: Abin Shahab Attachments: YARN-2718.patch There should be a composite container that allows users to run their jobs in DockerContainerExecutor, but switch to DefaultContainerExecutor for debugging purposes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1983) Support heterogeneous container types at runtime on YARN
[ https://issues.apache.org/jira/browse/YARN-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun Chen updated YARN-1983: Attachment: YARN-1983.patch Support heterogeneous container types at runtime on YARN Key: YARN-1983 URL: https://issues.apache.org/jira/browse/YARN-1983 Project: Hadoop YARN Issue Type: Improvement Reporter: Junping Du Attachments: YARN-1983.patch Different container types (default, LXC, docker, VM box, etc.) have different semantics on isolation of security, namespace/env, performance, etc. Per discussions in YARN-1964, we have some good thoughts on supporting different types of containers running on YARN and specified by application at runtime which largely enhance YARN's flexibility to meet heterogenous app's requirement on isolation at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292804#comment-14292804 ] Jian He commented on YARN-3100: --- Allen, thanks for your comments. This jira also includes providing necessary hooks inside YARN to support the pluggable interface. So far, I'm unsure how much hdfs and YARN differ in the ACL management. But If needed, we can definitely promote the interface to be common. Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1743) Decorate event transitions and the event-types with their behaviour
[ https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292803#comment-14292803 ] Hadoop QA commented on YARN-1743: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694668/YARN-1743-3.patch against trunk revision 6f9fe76. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:red}-1 javac{color}. The applied patch generated 1188 javac compiler warnings (more than the trunk's current 1187 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6429//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6429//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6429//console This message is automatically generated. Decorate event transitions and the event-types with their behaviour --- Key: YARN-1743 URL: https://issues.apache.org/jira/browse/YARN-1743 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Jeff Zhang Labels: documentation Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743-2.patch, YARN-1743-3.patch, YARN-1743.patch Helps to annotate the transitions with (start-state, end-state) pair and the events with (source, destination) pair. Not just readability, we may also use them to generate the event diagrams across components. Not a blocker for 0.23, but let's see. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2718) Create a CompositeConatainerExecutor that combines DockerContainerExecutor and DefaultContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292909#comment-14292909 ] Allen Wittenauer commented on YARN-2718: IMO, this jira should get closed in favor of YARN-1983. Create a CompositeConatainerExecutor that combines DockerContainerExecutor and DefaultContainerExecutor --- Key: YARN-2718 URL: https://issues.apache.org/jira/browse/YARN-2718 Project: Hadoop YARN Issue Type: New Feature Reporter: Abin Shahab Attachments: YARN-2718.patch There should be a composite container that allows users to run their jobs in DockerContainerExecutor, but switch to DefaultContainerExecutor for debugging purposes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293090#comment-14293090 ] Varun Saxena commented on YARN-3075: [~leftnoteasy], for other review comments. bq. 2) getLabelsToNodes: When there's no NodeLabel associated with label, it's better print warn message. bq. 4) NodeLabel.getNodeIdInfo may not precise enough, rename to getAssociatedNode(Id)s? (or other name if you have) Ok. bq. 5) Also NodeLabel.getNodeIdInfo: I think we can assume NodeId will not changed (nobody calls NodeId.setHost/Port), so copy reference should be enough, agree? If yes, just return new HashSetNodeId(nodeIds). I had kept it like this because same NodeId will be shared across threads after call to CommonNodeLabelsManager#getLabelsToNodes completes. My major concern was there should be no repitition of an issue like YARN-2978 . But QueueInfo there had an underlying list. And in current code, there should be no call to setHost/Port, so we can change it like above. bq. You can use getLabelsByNode to get labels from Host-Node hierarchy. I am not sure why I added both host and nm labels in oldLabels but on the face of it, the pre-existing function can be used. Will change the code. 3) is related to this comment so getHostLabels can be removed as well. bq. 8) Add nodeId to Node to avoid loop like: Sorry didn't quite get as to what you mean by this. bq. 7) When a node/host has no label, it belongs to a special NodeLabel with key = CommonNodeLabelsManager.NO_LABEL. This is necessary because node without label can be considered as a partition as well. We need support it here (even if getLabelsToNode not return it now.). Support it in what sense ? NodeLabelsManager implementation to retrieve label to node mapping -- Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3075.001.patch, YARN-3075.002.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes
[ https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292973#comment-14292973 ] Tsuyoshi OZAWA commented on YARN-3025: -- {quote} Lets say 1000 AMs pinging every 1 sec. {quote} I expected that we only synchronize the state only when RM detect the the difference of blacklists before and after the heartbeat. I thought probability to mark nodes as blacklist is not so high. What do you think? {quote} Yes. But that would mean that the RM cannot provide the latest updates. {quote} I think it can be acceptable for many cases if the blacklist node are updated within 1 min or some minutes e.g. for admin's knowing cluster information. In this case, we should also document it explicitly to know the trade off of the sync interval. Provide API for retrieving blacklisted nodes Key: YARN-3025 URL: https://issues.apache.org/jira/browse/YARN-3025 Project: Hadoop YARN Issue Type: Improvement Reporter: Ted Yu We have the following method which updates blacklist: {code} public synchronized void updateBlacklist(ListString blacklistAdditions, ListString blacklistRemovals) { {code} Upon AM failover, there should be an API which returns the blacklisted nodes so that the new AM can make consistent decisions. The new API can be: {code} public synchronized ListString getBlacklistedNodes() {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292800#comment-14292800 ] Hadoop QA commented on YARN-3011: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694666/YARN-3011.003.patch against trunk revision 6f9fe76. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6430//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6430//console This message is automatically generated. NM dies because of the failure of resource localization --- Key: YARN-3011 URL: https://issues.apache.org/jira/browse/YARN-3011 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.1 Reporter: Wang Hao Assignee: Varun Saxena Attachments: YARN-3011.001.patch, YARN-3011.002.patch, YARN-3011.003.patch NM dies because of IllegalArgumentException when localize resource. 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, 1416997035456, FILE, null } 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, 1419831474153, FILE, null } 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-12-29 13:43:58,701 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user hadoop 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3099) Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label.
[ https://issues.apache.org/jira/browse/YARN-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292821#comment-14292821 ] Hadoop QA commented on YARN-3099: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694661/YARN-3099.2.patch against trunk revision 6f9fe76. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6428//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6428//console This message is automatically generated. Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label. --- Key: YARN-3099 URL: https://issues.apache.org/jira/browse/YARN-3099 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3099.1.patch, YARN-3099.2.patch After YARN-3092, resource-by-label (include used-resource/pending-resource/reserved-resource/AM-resource, etc.) should be tracked in ResourceUsage. To make each individual patch smaller to get easier review, this patch is targeting to make used-resources-by-label in CS Queues are all tracked by ResourceUsage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2718) Create a CompositeConatainerExecutor that combines DockerContainerExecutor and DefaultContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292910#comment-14292910 ] Leitao Guo commented on YARN-2718: -- [~chenchun], in the following codes, I think you should return directly in case 'containerExecutor == null' [code] @Override public void setContainerExecutor(String containerExecutor) { maybeInitBuilder(); if (containerExecutor == null) { builder.clearContainerExecutor(); } builder.setContainerExecutor(containerExecutor); } [code] Create a CompositeConatainerExecutor that combines DockerContainerExecutor and DefaultContainerExecutor --- Key: YARN-2718 URL: https://issues.apache.org/jira/browse/YARN-2718 Project: Hadoop YARN Issue Type: New Feature Reporter: Abin Shahab Attachments: YARN-2718.patch There should be a composite container that allows users to run their jobs in DockerContainerExecutor, but switch to DefaultContainerExecutor for debugging purposes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2992) ZKRMStateStore crashes due to session expiry
[ https://issues.apache.org/jira/browse/YARN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292937#comment-14292937 ] Chun Chen commented on YARN-2992: - [~kasha] [~rohithsharma] [~jianhe], we are constantly facing the following error RM log {code} 2015-01-27 00:13:19,379 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server 10.196.128.13/10.196.128.13:2181. Will not attempt to authenticate using SASL (unknown erro r) 2015-01-27 00:13:19,383 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to 10.196.128.13/10.196.128.13:2181, initiating session 2015-01-27 00:13:19,404 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server 10.196.128.13/10.196.128.13:2181, sessionid = 0x24ab193421e4812, negotiated timeout = 1 2015-01-27 00:13:19,417 WARN org.apache.zookeeper.ClientCnxn: Session 0x24ab193421e4812 for server 10.196.128.13/10.196.128.13:2181, unexpected error, closing socket connection and attempti ng reconnect java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:470) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 2015-01-27 00:13:19,517 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation. org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:895) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:892) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1031) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1050) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:898) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.access$600(ZKRMStateStore.java:82) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$VerifyActiveStatusThread.run(ZKRMStateStore.java:1003) 2015-01-27 00:13:19,518 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Retrying operation on ZK. Retry no. 934 {code} ZK log {code} 2015-01-27 00:13:19,300 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.240.92.100:46464 2015-01-27 00:13:19,302 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861] - Client attempting to renew session 0x24ab193421e4812 at /10.240.92.100:46464 2015-01-27 00:13:19,302 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating client: 0x24ab193421e4812 2015-01-27 00:13:19,303 [myid:1] - INFO [QuorumPeer[myid=1]/0.0.0.0:2181:ZooKeeperServer@617] - Established session 0x24ab193421e4812 with negotiated timeout 1 for client /10.240.92.100:46464 2015-01-27 00:13:19,303 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@892] - got auth packet /10.240.92.100:46464 2015-01-27 00:13:19,303 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@926] - auth success /10.240.92.100:46464 2015-01-27 00:13:19,320 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x24ab193421e4812 due to java.io.IOException: Len error 1425415 2015-01-27 00:13:19,321 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /10.240.92.100:46464 which had sessionid 0x24ab193421e4812 2015-01-27 00:13:23,093 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.240.92.100:46477 2015-01-27 00:13:23,159 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861] - Client attempting to renew session 0x24ab193421e4812 at /10.240.92.100:46477 2015-01-27 00:13:23,159 [myid:1] - INFO
[jira] [Updated] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3101: Attachment: YARN-3101.001.patch Fixes the issue Verified the test passes FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101.001.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293015#comment-14293015 ] Allen Wittenauer commented on YARN-3100: It sounds like this JIRA should get split in half then: a generic interface sitting in common that other components can use and the YARN-specific one. Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293015#comment-14293015 ] Allen Wittenauer edited comment on YARN-3100 at 1/27/15 5:57 AM: - It sounds like this JIRA should get split in half then: a generic interface sitting in common that other components can use and the YARN-specific bits. was (Author: aw): It sounds like this JIRA should get split in half then: a generic interface sitting in common that other components can use and the YARN-specific one. Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes
[ https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292981#comment-14292981 ] Ted Yu commented on YARN-3025: -- Tsuyoshi's comment makes sense. Provide API for retrieving blacklisted nodes Key: YARN-3025 URL: https://issues.apache.org/jira/browse/YARN-3025 Project: Hadoop YARN Issue Type: Improvement Reporter: Ted Yu We have the following method which updates blacklist: {code} public synchronized void updateBlacklist(ListString blacklistAdditions, ListString blacklistRemovals) { {code} Upon AM failover, there should be an API which returns the blacklisted nodes so that the new AM can make consistent decisions. The new API can be: {code} public synchronized ListString getBlacklistedNodes() {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.
[ https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293108#comment-14293108 ] zhihai xu commented on YARN-3079: - thanks [~rchiang] and [~leftnoteasy]'s review. bq. IMHO, I think they're exchangeable, update a node = remove then a Yes, It make sense to keep code clean, since the node will only disappear for a very short time. AbstractYarnScheduler#getMaximumResourceCapability will be called in this period very rarely. Even it happen, the side effect will be very small. The upload a new patch YARN-3079.003.patch which addressed [~leftnoteasy]'s comment. Scheduler should also update maximumAllocation when updateNodeResource. --- Key: YARN-3079 URL: https://issues.apache.org/jira/browse/YARN-3079 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3079.000.patch, YARN-3079.001.patch, YARN-3079.002.patch, YARN-3079.003.patch Scheduler should also update maximumAllocation when updateNodeResource. Otherwise even the node resource is changed by AdminService#updateNodeResource, maximumAllocation won't be changed. Also RMNodeReconnectEvent called from ResourceTrackerService#registerNodeManager will also trigger AbstractYarnScheduler#updateNodeResource being called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292674#comment-14292674 ] Jian He commented on YARN-3011: --- [~varun_saxena], I tried to commit, but patch seems not applying again. mind rebasing the patch ? thx. NM dies because of the failure of resource localization --- Key: YARN-3011 URL: https://issues.apache.org/jira/browse/YARN-3011 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.1 Reporter: Wang Hao Assignee: Varun Saxena Attachments: YARN-3011.001.patch, YARN-3011.002.patch NM dies because of IllegalArgumentException when localize resource. 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, 1416997035456, FILE, null } 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, 1419831474153, FILE, null } 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-12-29 13:43:58,701 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user hadoop 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.
[ https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292729#comment-14292729 ] Wangda Tan commented on YARN-3079: -- [~zxu], Thanks for reply, bq. This is discussable. I prefer to keep the current signature beca Make sense bq. I think it is not completely equivalent . because when you call updateMaximumAllocation(oldNo IMHO, I think they're exchangeable, update a node = remove then add. Its state is discrete, so it is safe to make it disappear for very short time. I think it's very important to keep code clean. Beyond this, patch looks good. Scheduler should also update maximumAllocation when updateNodeResource. --- Key: YARN-3079 URL: https://issues.apache.org/jira/browse/YARN-3079 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3079.000.patch, YARN-3079.001.patch, YARN-3079.002.patch Scheduler should also update maximumAllocation when updateNodeResource. Otherwise even the node resource is changed by AdminService#updateNodeResource, maximumAllocation won't be changed. Also RMNodeReconnectEvent called from ResourceTrackerService#registerNodeManager will also trigger AbstractYarnScheduler#updateNodeResource being called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3011: --- Attachment: YARN-3011.003.patch NM dies because of the failure of resource localization --- Key: YARN-3011 URL: https://issues.apache.org/jira/browse/YARN-3011 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.1 Reporter: Wang Hao Assignee: Varun Saxena Attachments: YARN-3011.001.patch, YARN-3011.002.patch, YARN-3011.003.patch NM dies because of IllegalArgumentException when localize resource. 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, 1416997035456, FILE, null } 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, 1419831474153, FILE, null } 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-12-29 13:43:58,701 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user hadoop 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3098) Create common QueueCapacities class in Capacity Scheduler to track capacities-by-labels of queues
[ https://issues.apache.org/jira/browse/YARN-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292738#comment-14292738 ] Hadoop QA commented on YARN-3098: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694645/YARN-3098.2.patch against trunk revision 1f2b695. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6426//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6426//console This message is automatically generated. Create common QueueCapacities class in Capacity Scheduler to track capacities-by-labels of queues - Key: YARN-3098 URL: https://issues.apache.org/jira/browse/YARN-3098 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3098.1.patch, YARN-3098.2.patch, YARN-3098.3.patch, YARN-3098.4.patch Similar to YARN-3092, after YARN-796, now queues (ParentQueue and LeafQueue) need to track capacities-label (e.g. absolute-capacity, maximum-capacity, absolute-capacity, absolute-maximum-capacity, etc.). It's better to have a class to encapsulate these capacities to make both better maintainability/readability and fine-grained locking. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292747#comment-14292747 ] Allen Wittenauer commented on YARN-3100: This sounds like something that should start out in common rather than in YARN, given that there are ACLs for HDFS as well. Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292755#comment-14292755 ] Wangda Tan commented on YARN-3075: -- You're right, I mis-read your test :). Besides 6), I think other review comments still apply, please let me know your thoughts. For 1) Is there any reason to add both host/node labels to oldLabels? Or only add getLabelsByNode(xx) should be enough? NodeLabelsManager implementation to retrieve label to node mapping -- Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3075.001.patch, YARN-3075.002.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2718) Create a CompositeConatainerExecutor that combines DockerContainerExecutor and DefaultContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292903#comment-14292903 ] Chun Chen commented on YARN-2718: - Well, The patch I've uploaded mainly focuses on making apps which running on a same Yarn cluster able to specify different container executors. Yarn currently only supports using s single container executor to launch containers. As [~guoleitao] said, we want to run both MapReduce jobs and Dockers on the same cluster. I think maybe it's better for me to upload the patch on YARN-1983. As for the debug purpose of Docker containers, we implement a service registry feature to register host ip and ports of the running containers on a highly-available key value store etcd and make use of webshell to debug. Create a CompositeConatainerExecutor that combines DockerContainerExecutor and DefaultContainerExecutor --- Key: YARN-2718 URL: https://issues.apache.org/jira/browse/YARN-2718 Project: Hadoop YARN Issue Type: New Feature Reporter: Abin Shahab Attachments: YARN-2718.patch There should be a composite container that allows users to run their jobs in DockerContainerExecutor, but switch to DefaultContainerExecutor for debugging purposes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293010#comment-14293010 ] Anubhav Dhoot commented on YARN-3101: - [~sandyr] [~l201514] appreciate your review FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101.001.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1983) Support heterogeneous container types at runtime on YARN
[ https://issues.apache.org/jira/browse/YARN-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293060#comment-14293060 ] Chun Chen commented on YARN-1983: - Attach a patch which creates a CompositeContainerExecutor to implement this. The patch allows apps to specify container executor class in ContainerLaunchContext. Also changes ${yarn.nodemanager.container-executor.class} to allow specify a comma separated list of container executor class and adds a new configuration ${yarn.nodemanager.default.container-executor.class}, the default container executor to execute(launch) the containers when submit containers without specify container executor. Support heterogeneous container types at runtime on YARN Key: YARN-1983 URL: https://issues.apache.org/jira/browse/YARN-1983 Project: Hadoop YARN Issue Type: Improvement Reporter: Junping Du Attachments: YARN-1983.patch Different container types (default, LXC, docker, VM box, etc.) have different semantics on isolation of security, namespace/env, performance, etc. Per discussions in YARN-1964, we have some good thoughts on supporting different types of containers running on YARN and specified by application at runtime which largely enhance YARN's flexibility to meet heterogenous app's requirement on isolation at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.
[ https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293141#comment-14293141 ] Hadoop QA commented on YARN-3079: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694732/YARN-3079.003.patch against trunk revision 6f9fe76. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-kms. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6431//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6431//console This message is automatically generated. Scheduler should also update maximumAllocation when updateNodeResource. --- Key: YARN-3079 URL: https://issues.apache.org/jira/browse/YARN-3079 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3079.000.patch, YARN-3079.001.patch, YARN-3079.002.patch, YARN-3079.003.patch Scheduler should also update maximumAllocation when updateNodeResource. Otherwise even the node resource is changed by AdminService#updateNodeResource, maximumAllocation won't be changed. Also RMNodeReconnectEvent called from ResourceTrackerService#registerNodeManager will also trigger AbstractYarnScheduler#updateNodeResource being called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3024) LocalizerRunner should give DIE action when all resources are localized
[ https://issues.apache.org/jira/browse/YARN-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291717#comment-14291717 ] Hudson commented on YARN-3024: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #85 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/85/]) YARN-3024. LocalizerRunner should give DIE action when all resources are (xgong: rev 0d6bd62102f94c55d59f7a0a86a684e99d746127) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java LocalizerRunner should give DIE action when all resources are localized --- Key: YARN-3024 URL: https://issues.apache.org/jira/browse/YARN-3024 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Fix For: 2.7.0 Attachments: YARN-3024.01.patch, YARN-3024.02.patch, YARN-3024.03.patch, YARN-3024.04.patch We have observed that {{LocalizerRunner}} always gives a LIVE action at the end of localization process. The problem is {{findNextResource()}} can return null even when {{pending}} was not empty prior to the call. This method removes localized resources from {{pending}}, therefore we should check the return value, and gives DIE action when it returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291727#comment-14291727 ] Hadoop QA commented on YARN-41: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694514/YARN-41-3.patch against trunk revision 7b82c4a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6416//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6416//console This message is automatically generated. The RM should handle the graceful shutdown of the NM. - Key: YARN-41 URL: https://issues.apache.org/jira/browse/YARN-41 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Ravi Teja Ch N V Assignee: Devaraj K Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, YARN-41.patch Instead of waiting for the NM expiry, RM should remove and handle the NM, which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3024) LocalizerRunner should give DIE action when all resources are localized
[ https://issues.apache.org/jira/browse/YARN-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291731#comment-14291731 ] Hudson commented on YARN-3024: -- FAILURE: Integrated in Hadoop-Yarn-trunk #819 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/819/]) YARN-3024. LocalizerRunner should give DIE action when all resources are (xgong: rev 0d6bd62102f94c55d59f7a0a86a684e99d746127) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt LocalizerRunner should give DIE action when all resources are localized --- Key: YARN-3024 URL: https://issues.apache.org/jira/browse/YARN-3024 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Fix For: 2.7.0 Attachments: YARN-3024.01.patch, YARN-3024.02.patch, YARN-3024.03.patch, YARN-3024.04.patch We have observed that {{LocalizerRunner}} always gives a LIVE action at the end of localization process. The problem is {{findNextResource()}} can return null even when {{pending}} was not empty prior to the call. This method removes localized resources from {{pending}}, therefore we should check the return value, and gives DIE action when it returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2390) Investigating whether generic history service needs to support queue-acls
[ https://issues.apache.org/jira/browse/YARN-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291746#comment-14291746 ] Sunil G commented on YARN-2390: --- HI [~zjshen] Are we still focusing on this JIRA? Kindly share your thoughts. If this is still valid, i would like to pursue on same. Investigating whether generic history service needs to support queue-acls - Key: YARN-2390 URL: https://issues.apache.org/jira/browse/YARN-2390 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Sunil G According YARN-1250, it's arguable whether queue-acls should be applied to the generic history service as well, because the queue admin may not need the access to the completed application that is removed from the queue. Create this ticket to tackle the discussion around. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3093) Support load command from admin [Helps to load big set of labels]
[ https://issues.apache.org/jira/browse/YARN-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291750#comment-14291750 ] Sunil G commented on YARN-3093: --- Thank you [~rohithsharma] and [~leftnoteasy] I was thinking of keeping this config file for add/remove/replaceLabel operations. Internally we could use same existing logic. A conf file for replace can be like {noformat} client $ cat replace.conf replaceLabelsOnNode [ node1:port=label1,label2 node2:port=label1,label2] {noformat} is this what you also thought about? Similarly we can specify the operation as *replaceLabelsOnNode / addToClusterNodeLabels / removeFromClusterNodeLabels* followed by the input separated by new line or space with square braces. Support load command from admin [Helps to load big set of labels] - Key: YARN-3093 URL: https://issues.apache.org/jira/browse/YARN-3093 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.6.0 Reporter: Sunil G Assignee: Sunil G Proposing yarn rmadmin -load -nodelabels filename nodelabels can be one such option here, and this can be generalized by giving other options later. Advantage of this command will be an easier configuration. Assume admin need to load labels to more than 20+ nodes, current command is little difficult. If these config can be preloaded in a file, and then can upload to RM. With existing parsing and update logic, same can be achieved. I am showing a simpler proposed config file. {noformat} rm1 $ cat node_label.conf add [ label1,label2,label3,label4,label11,label12,label13,label14,abel21,label22,label23,label24 ] replace[ node1:port=label1,label2,label23,label24 node2:port=label4,abel11,label12,label13,label14,label21 node3:port=label2,label3,label4,label11,label12,label13,label14 node4:port=label14,label21,label22,label23,label24 node5:port=label14,label21,label22,label23,label24 node6:port=label4,label11,label12,label13,label14,label21,label22,label23,label24 ] {noformat} A restriction on file size can be kept to avoid uploading very huge files. Please share your opinion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2808) yarn client tool can not list app_attempt's container info correctly
[ https://issues.apache.org/jira/browse/YARN-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2808: Attachment: YARN-2808.20150126-1.patch Hi [~zjshen], As discussed unifying the container info got from AHS and RM in YarnClient to solve this issue. Please review the attached patch. yarn client tool can not list app_attempt's container info correctly Key: YARN-2808 URL: https://issues.apache.org/jira/browse/YARN-2808 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Gordon Wang Assignee: Naganarasimha G R Attachments: YARN-2808.20150126-1.patch When enabling timeline server, yarn client can not list the container info for a application attempt correctly. Here is the reproduce step. # enabling yarn timeline server # submit a MR job # after the job is finished. use yarn client to list the container info of the app attempt. Then, since the RM has cached the application's attempt info, the output show {noformat} [hadoop@localhost hadoop-3.0.0-SNAPSHOT]$ ./bin/yarn container -list appattempt_1415168250217_0001_01 14/11/05 01:19:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/11/05 01:19:15 INFO impl.TimelineClientImpl: Timeline service address: http://0.0.0.0:8188/ws/v1/timeline/ 14/11/05 01:19:16 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 14/11/05 01:19:16 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200 Total number of containers :0 Container-Id Start Time Finish Time StateHost LOG-URL {noformat} But if the rm is restarted, client can fetch the container info from timeline server correctly. {noformat} [hadoop@localhost hadoop-3.0.0-SNAPSHOT]$ ./bin/yarn container -list appattempt_1415168250217_0001_01 14/11/05 01:21:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/11/05 01:21:06 INFO impl.TimelineClientImpl: Timeline service address: http://0.0.0.0:8188/ws/v1/timeline/ 14/11/05 01:21:06 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 14/11/05 01:21:06 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200 Total number of containers :4 Container-Id Start Time Finish Time StateHost LOG-URL container_1415168250217_0001_01_01 1415168318376 1415168349896COMPLETElocalhost.localdomain:47024 http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_01/container_1415168250217_0001_01_01/hadoop container_1415168250217_0001_01_02 1415168326399 1415168334858COMPLETElocalhost.localdomain:47024 http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_02/container_1415168250217_0001_01_02/hadoop container_1415168250217_0001_01_03 1415168326400 1415168335277COMPLETElocalhost.localdomain:47024 http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_03/container_1415168250217_0001_01_03/hadoop container_1415168250217_0001_01_04 1415168335825 1415168343873COMPLETElocalhost.localdomain:47024 http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_04/container_1415168250217_0001_01_04/hadoop {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2875) Bump SLF4J to 1.7.7 from 1.7.5
[ https://issues.apache.org/jira/browse/YARN-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292529#comment-14292529 ] Steve Loughran commented on YARN-2875: -- tim, add a patch in HADOOP-11317 to increment the SLF4J version and I'll apply it Bump SLF4J to 1.7.7 from 1.7.5 --- Key: YARN-2875 URL: https://issues.apache.org/jira/browse/YARN-2875 Project: Hadoop YARN Issue Type: Bug Reporter: Tim Robertson Priority: Minor hadoop-yarn-common [uses log4j directly|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml#L167] and when trying to redirect that through an SLF4J bridge version 1.7.5 has issues, due to use of AppenderSkeleton which is missing in log4j-over-slf4j version 1.7.5. This is documented on the [1.7.6 release notes|http://www.slf4j.org/news.html] but 1.7.7 should be suitable. This is applicable to all the projects using Hadoop motherpom, but Yarn appears to be bringing Log4J in, rather than coding to the SLF4J API. The issue shows in the logs as follows in Yarn MR apps, which is painful to diagnose. {code} WARN [2014-11-18 09:58:06,390+0100] [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Caught exception in callback postStart java.lang.reflect.InvocationTargetException: null at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_71] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_71] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_71] at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_71] at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:290) ~[job.jar:0.22-SNAPSHOT] at com.sun.proxy.$Proxy2.postStart(Unknown Source) [na:na] at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:185) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.init(MetricsSystemImpl.java:157) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:54) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.initialize(DefaultMetricsSystem.java:50) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1036) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1478) [job.jar:0.22-SNAPSHOT] at java.security.AccessController.doPrivileged(Native Method) [na:1.7.0_71] at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_71] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1474) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1407) [job.jar:0.22-SNAPSHOT] Caused by: java.lang.IncompatibleClassChangeError: Implementing class at java.lang.ClassLoader.defineClass1(Native Method) ~[na:1.7.0_71] at java.lang.ClassLoader.defineClass(ClassLoader.java:800) ~[na:1.7.0_71] at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) ~[na:1.7.0_71] at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) ~[na:1.7.0_71] at java.net.URLClassLoader.access$100(URLClassLoader.java:71) ~[na:1.7.0_71] at java.net.URLClassLoader$1.run(URLClassLoader.java:361) ~[na:1.7.0_71] at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ~[na:1.7.0_71] at java.security.AccessController.doPrivileged(Native Method) [na:1.7.0_71] at java.net.URLClassLoader.findClass(URLClassLoader.java:354) ~[na:1.7.0_71] at java.lang.ClassLoader.loadClass(ClassLoader.java:425) ~[na:1.7.0_71] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) ~[na:1.7.0_71] at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ~[na:1.7.0_71] at org.apache.hadoop.metrics2.source.JvmMetrics.getEventCounters(JvmMetrics.java:183) ~[job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.source.JvmMetrics.getMetrics(JvmMetrics.java:100) ~[job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) ~[job.jar:0.22-SNAPSHOT] at
[jira] [Commented] (YARN-3098) Create common QueueCapacities class in Capacity Scheduler to track capacities-by-labels of queues
[ https://issues.apache.org/jira/browse/YARN-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292548#comment-14292548 ] Hadoop QA commented on YARN-3098: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694617/YARN-3098.1.patch against trunk revision 21d5599. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6422//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6422//console This message is automatically generated. Create common QueueCapacities class in Capacity Scheduler to track capacities-by-labels of queues - Key: YARN-3098 URL: https://issues.apache.org/jira/browse/YARN-3098 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3098.1.patch Similar to YARN-3092, after YARN-796, now queues (ParentQueue and LeafQueue) need to track capacities-label (e.g. absolute-capacity, maximum-capacity, absolute-capacity, absolute-maximum-capacity, etc.). It's better to have a class to encapsulate these capacities to make both better maintainability/readability and fine-grained locking. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2683) registry config options: document and move to core-default
[ https://issues.apache.org/jira/browse/YARN-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292546#comment-14292546 ] Steve Loughran commented on YARN-2683: -- security: there's a special bit for automatic realm generation. If you add an entry like sasl:oozie@ then it will get the same realm as the user creating the entry registry config options: document and move to core-default -- Key: YARN-2683 URL: https://issues.apache.org/jira/browse/YARN-2683 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: HADOOP-10530-005.patch, YARN-2683-001.patch, YARN-2683-002.patch, YARN-2683-003.patch Original Estimate: 1h Remaining Estimate: 1h Add to {{yarn-site}} a page on registry configuration parameters -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3099) Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label.
[ https://issues.apache.org/jira/browse/YARN-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3099: - Summary: Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label. (was: Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track resources-by-label.) Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label. --- Key: YARN-3099 URL: https://issues.apache.org/jira/browse/YARN-3099 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan After YARN-3092, resource-by-label (include used-resource/pending-resource/reserved-resource/AM-resource, etc.) should be tracked in ResourceUsage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3099) Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label.
[ https://issues.apache.org/jira/browse/YARN-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3099: - Attachment: YARN-3099.1.patch Attached ver.1 patch, since YARN-3092 is not committed, included YARN-3092 changes to make Jenkins can get result. Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label. --- Key: YARN-3099 URL: https://issues.apache.org/jira/browse/YARN-3099 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3099.1.patch After YARN-3092, resource-by-label (include used-resource/pending-resource/reserved-resource/AM-resource, etc.) should be tracked in ResourceUsage. To make each individual patch smaller to get easier review, this patch is targeting to make used-resources-by-label in CS Queues are all tracked by ResourceUsage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3099) Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label.
[ https://issues.apache.org/jira/browse/YARN-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3099: - Description: After YARN-3092, resource-by-label (include used-resource/pending-resource/reserved-resource/AM-resource, etc.) should be tracked in ResourceUsage. To make each individual patch smaller to get easier review, this patch is targeting to make used-resources-by-label in CS Queues are all tracked by ResourceUsage. was:After YARN-3092, resource-by-label (include used-resource/pending-resource/reserved-resource/AM-resource, etc.) should be tracked in ResourceUsage. Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label. --- Key: YARN-3099 URL: https://issues.apache.org/jira/browse/YARN-3099 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3099.1.patch After YARN-3092, resource-by-label (include used-resource/pending-resource/reserved-resource/AM-resource, etc.) should be tracked in ResourceUsage. To make each individual patch smaller to get easier review, this patch is targeting to make used-resources-by-label in CS Queues are all tracked by ResourceUsage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3098) Create common QueueCapacities class in Capacity Scheduler to track capacities-by-labels of queues
[ https://issues.apache.org/jira/browse/YARN-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3098: - Attachment: YARN-3098.2.patch Thanks for review, [~jianhe]. Addressed all comments and attached new patch. Create common QueueCapacities class in Capacity Scheduler to track capacities-by-labels of queues - Key: YARN-3098 URL: https://issues.apache.org/jira/browse/YARN-3098 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3098.1.patch, YARN-3098.2.patch Similar to YARN-3092, after YARN-796, now queues (ParentQueue and LeafQueue) need to track capacities-label (e.g. absolute-capacity, maximum-capacity, absolute-capacity, absolute-maximum-capacity, etc.). It's better to have a class to encapsulate these capacities to make both better maintainability/readability and fine-grained locking. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3022) Expose Container resource information from NodeManager for monitoring
[ https://issues.apache.org/jira/browse/YARN-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292604#comment-14292604 ] Hadoop QA commented on YARN-3022: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694633/YARN-3022.003.patch against trunk revision 1f2b695. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6424//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6424//console This message is automatically generated. Expose Container resource information from NodeManager for monitoring - Key: YARN-3022 URL: https://issues.apache.org/jira/browse/YARN-3022 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3022.001.patch, YARN-3022.002.patch, YARN-3022.003.patch Along with exposing resource consumption of each container such as (YARN-2141) its worth exposing the actual resource limit associated with them to get better insight into YARN allocation and consumption -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292520#comment-14292520 ] Hadoop QA commented on YARN-3075: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694616/YARN-3075.002.patch against trunk revision 21d5599. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6421//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6421//console This message is automatically generated. NodeLabelsManager implementation to retrieve label to node mapping -- Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3075.001.patch, YARN-3075.002.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2897) CrossOriginFilter needs more log statements
[ https://issues.apache.org/jira/browse/YARN-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292532#comment-14292532 ] Hadoop QA commented on YARN-2897: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694632/YARN-2897.patch against trunk revision 1f2b695. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6423//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6423//console This message is automatically generated. CrossOriginFilter needs more log statements --- Key: YARN-2897 URL: https://issues.apache.org/jira/browse/YARN-2897 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-2897.patch, YARN-2897.patch, YARN-2897.patch CrossOriginFilter does not log as much to make debugging easier -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3098) Create common QueueCapacities class in Capacity Scheduler to track capacities-by-labels of queues
[ https://issues.apache.org/jira/browse/YARN-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292550#comment-14292550 ] Wangda Tan commented on YARN-3098: -- Failed test is not related to this patch. Create common QueueCapacities class in Capacity Scheduler to track capacities-by-labels of queues - Key: YARN-3098 URL: https://issues.apache.org/jira/browse/YARN-3098 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3098.1.patch Similar to YARN-3092, after YARN-796, now queues (ParentQueue and LeafQueue) need to track capacities-label (e.g. absolute-capacity, maximum-capacity, absolute-capacity, absolute-maximum-capacity, etc.). It's better to have a class to encapsulate these capacities to make both better maintainability/readability and fine-grained locking. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3098) Create common QueueCapacities class in Capacity Scheduler to track capacities-by-labels of queues
[ https://issues.apache.org/jira/browse/YARN-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292566#comment-14292566 ] Jian He commented on YARN-3098: --- - maybe a bug ? it's internally using NL {code} public void setUsedCapacity(String label, float value) { try { writeLock.lock(); Capacities cap = capacitiesMap.get(NL); {code} - {{ public float getAbsoluteUsedCapacity() }}, the implementation can be {{getAbsoluteUsedCapacity(NL)}} - maybe have a generic getter/setter and try parametrizing the capacity type to avoid duplication. Create common QueueCapacities class in Capacity Scheduler to track capacities-by-labels of queues - Key: YARN-3098 URL: https://issues.apache.org/jira/browse/YARN-3098 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3098.1.patch Similar to YARN-3092, after YARN-796, now queues (ParentQueue and LeafQueue) need to track capacities-label (e.g. absolute-capacity, maximum-capacity, absolute-capacity, absolute-maximum-capacity, etc.). It's better to have a class to encapsulate these capacities to make both better maintainability/readability and fine-grained locking. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3092) Create common ResourceUsage class to track labeled resource usages in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292642#comment-14292642 ] Hudson commented on YARN-3092: -- FAILURE: Integrated in Hadoop-trunk-Commit #6936 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6936/]) YARN-3092. Created a common ResourceUsage class to track labeled resource usages in Capacity Scheduler. Contributed by Wangda Tan (jianhe: rev 6f9fe76918bbc79109653edc6cde85df05148ba3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestResourceUsage.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java Create common ResourceUsage class to track labeled resource usages in Capacity Scheduler Key: YARN-3092 URL: https://issues.apache.org/jira/browse/YARN-3092 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.7.0 Attachments: YARN-3092.1.patch, YARN-3092.2.patch Since we have labels on nodes, so we need to track resource usage *by labels*, includes - AM resource (to enforce max-am-resource-by-label after YARN-2637) - Used resource (includes AM resource usage) - Reserved resource - Pending resource - Headroom Benefits to have such a common class are: - Reuse lots of code in different places (Queue/App/User), better maintainability and readability. - Can make fine-grained locking (e.g. accessing used resource in a queue doesn't need lock a queue) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3098) Create common QueueCapacities class in Capacity Scheduler to track capacities-by-labels of queues
[ https://issues.apache.org/jira/browse/YARN-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3098: - Attachment: YARN-3098.3.patch Make code style consistent with YARN-3092 Create common QueueCapacities class in Capacity Scheduler to track capacities-by-labels of queues - Key: YARN-3098 URL: https://issues.apache.org/jira/browse/YARN-3098 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3098.1.patch, YARN-3098.2.patch, YARN-3098.3.patch Similar to YARN-3092, after YARN-796, now queues (ParentQueue and LeafQueue) need to track capacities-label (e.g. absolute-capacity, maximum-capacity, absolute-capacity, absolute-maximum-capacity, etc.). It's better to have a class to encapsulate these capacities to make both better maintainability/readability and fine-grained locking. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3098) Create common QueueCapacities class in Capacity Scheduler to track capacities-by-labels of queues
[ https://issues.apache.org/jira/browse/YARN-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3098: - Attachment: YARN-3098.4.patch Create common QueueCapacities class in Capacity Scheduler to track capacities-by-labels of queues - Key: YARN-3098 URL: https://issues.apache.org/jira/browse/YARN-3098 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3098.1.patch, YARN-3098.2.patch, YARN-3098.3.patch, YARN-3098.4.patch Similar to YARN-3092, after YARN-796, now queues (ParentQueue and LeafQueue) need to track capacities-label (e.g. absolute-capacity, maximum-capacity, absolute-capacity, absolute-maximum-capacity, etc.). It's better to have a class to encapsulate these capacities to make both better maintainability/readability and fine-grained locking. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3099) Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label.
[ https://issues.apache.org/jira/browse/YARN-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292685#comment-14292685 ] Hadoop QA commented on YARN-3099: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694643/YARN-3099.1.patch against trunk revision 1f2b695. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6425//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6425//console This message is automatically generated. Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label. --- Key: YARN-3099 URL: https://issues.apache.org/jira/browse/YARN-3099 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3099.1.patch After YARN-3092, resource-by-label (include used-resource/pending-resource/reserved-resource/AM-resource, etc.) should be tracked in ResourceUsage. To make each individual patch smaller to get easier review, this patch is targeting to make used-resources-by-label in CS Queues are all tracked by ResourceUsage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3099) Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label.
[ https://issues.apache.org/jira/browse/YARN-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3099: - Attachment: YARN-3099.2.patch Since YARN-3092 is just committed, rebased patch against trunk. (ver.2) Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label. --- Key: YARN-3099 URL: https://issues.apache.org/jira/browse/YARN-3099 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3099.1.patch, YARN-3099.2.patch After YARN-3092, resource-by-label (include used-resource/pending-resource/reserved-resource/AM-resource, etc.) should be tracked in ResourceUsage. To make each individual patch smaller to get easier review, this patch is targeting to make used-resources-by-label in CS Queues are all tracked by ResourceUsage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292697#comment-14292697 ] Varun Saxena commented on YARN-3075: [~leftnoteasy], bq. This seems not correct to me. n1 appears on p1 and p2, but label on host should not include label on nodes of the host. Now I can understand why 1) exists. I suggest to make this behavior consistent with getNodeLabels(), or at least we can make getNodeLabels() consistent with getLabelsToNodes. Thoughts? This is consistent with getNodeLabels(). I have included below piece of assertion in these test cases to check precisely whether getLabelsToNodes is consistent with getNodeLabels. {{transposeNodeToLabels}} transposes result from getNodeLabels() and matches it against getLabelsToNodes(). {code} assertLabelsToNodesEquals( labelsToNodes, transposeNodeToLabels(mgr.getNodeLabels())); {code} NodeLabelsManager implementation to retrieve label to node mapping -- Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3075.001.patch, YARN-3075.002.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)