[jira] [Updated] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2893: Attachment: YARN-2893.005.patch AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: zhihai xu Attachments: YARN-2893.000.patch, YARN-2893.001.patch, YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch, YARN-2893.005.patch MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3561) Non-AM Containers continue to run even after AM is stopped
[ https://issues.apache.org/jira/browse/YARN-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-3561: - Fix Version/s: (was: 2.6.1) Non-AM Containers continue to run even after AM is stopped -- Key: YARN-3561 URL: https://issues.apache.org/jira/browse/YARN-3561 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, yarn Affects Versions: 2.6.0 Environment: debian 7 Reporter: Gour Saha Priority: Critical Non-AM containers continue to run even after application is stopped. This occurred while deploying Storm 0.9.3 using Slider (0.60.0 and 0.70.1) in a Hadoop 2.6 deployment. Following are the NM logs from 2 different nodes: *host-07* - where Slider AM was running *host-03* - where Storm NIMBUS container was running. *Note:* The logs are partial, starting with the time when the relevant Slider AM and NIMBUS containers were allocated, till the time when the Slider AM was stopped. Also, the large number of Memory usage log lines were removed keeping only a few starts and ends of every segment. *NM log from host-07 where Slider AM container was running:* {noformat} 2015-04-29 00:39:24,614 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(356)) - Stopping resource-monitoring for container_1428575950531_0020_02_01 2015-04-29 00:41:10,310 INFO ipc.Server (Server.java:saslProcess(1306)) - Auth successful for appattempt_1428575950531_0021_01 (auth:SIMPLE) 2015-04-29 00:41:10,322 INFO containermanager.ContainerManagerImpl (ContainerManagerImpl.java:startContainerInternal(803)) - Start request for container_1428575950531_0021_01_01 by user yarn 2015-04-29 00:41:10,322 INFO containermanager.ContainerManagerImpl (ContainerManagerImpl.java:startContainerInternal(843)) - Creating a new application reference for app application_1428575950531_0021 2015-04-29 00:41:10,323 INFO application.Application (ApplicationImpl.java:handle(464)) - Application application_1428575950531_0021 transitioned from NEW to INITING 2015-04-29 00:41:10,325 INFO nodemanager.NMAuditLogger (NMAuditLogger.java:logSuccess(89)) - USER=yarn IP=10.84.105.162 OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1428575950531_0021 CONTAINERID=container_1428575950531_0021_01_01 2015-04-29 00:41:10,328 WARN logaggregation.LogAggregationService (LogAggregationService.java:verifyAndCreateRemoteLogDir(195)) - Remote Root Log Dir [/app-logs] already exist, but with incorrect permissions. Expected: [rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple users. 2015-04-29 00:41:10,328 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:init(182)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished. 2015-04-29 00:41:10,351 INFO application.Application (ApplicationImpl.java:transition(304)) - Adding container_1428575950531_0021_01_01 to application application_1428575950531_0021 2015-04-29 00:41:10,352 INFO application.Application (ApplicationImpl.java:handle(464)) - Application application_1428575950531_0021 transitioned from INITING to RUNNING 2015-04-29 00:41:10,356 INFO container.Container (ContainerImpl.java:handle(999)) - Container container_1428575950531_0021_01_01 transitioned from NEW to LOCALIZING 2015-04-29 00:41:10,357 INFO containermanager.AuxServices (AuxServices.java:handle(196)) - Got event CONTAINER_INIT for appId application_1428575950531_0021 2015-04-29 00:41:10,357 INFO localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/htrace-core-3.0.4.jar transitioned from INIT to DOWNLOADING 2015-04-29 00:41:10,357 INFO localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/jettison-1.1.jar transitioned from INIT to DOWNLOADING 2015-04-29 00:41:10,358 INFO localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/api-util-1.0.0-M20.jar transitioned from INIT to DOWNLOADING 2015-04-29 00:41:10,358 INFO localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource hdfs://zsexp/user/yarn/.slider/cluster/storm1/confdir/log4j-server.properties transitioned from INIT to DOWNLOADING 2015-04-29 00:41:10,358 INFO localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource
[jira] [Commented] (YARN-3561) Non-AM Containers continue to run even after AM is stopped
[ https://issues.apache.org/jira/browse/YARN-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521200#comment-14521200 ] Steve Loughran commented on YARN-3561: -- Vinod probably means the AM restart flag. When the Slider AM is stopped it can be done in two ways {code} slider stop $clustername {code} Sends an RPC call to the AM, which then unregisters and shuts down. I'll check to make sure we explicitly release containers. {code} slider stop $clustername --force {code} This asks YARN to kill the app; the AM doesn't get told about it. there's a third way {code} slider am-suicide $clustername {code} This is only for testing, causes the AM to call {{System.exit(-1)}}; YARN will restart it unless it has failed too many times already. Non-AM Containers continue to run even after AM is stopped -- Key: YARN-3561 URL: https://issues.apache.org/jira/browse/YARN-3561 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, yarn Affects Versions: 2.6.0 Environment: debian 7 Reporter: Gour Saha Priority: Critical Non-AM containers continue to run even after application is stopped. This occurred while deploying Storm 0.9.3 using Slider (0.60.0 and 0.70.1) in a Hadoop 2.6 deployment. Following are the NM logs from 2 different nodes: *host-07* - where Slider AM was running *host-03* - where Storm NIMBUS container was running. *Note:* The logs are partial, starting with the time when the relevant Slider AM and NIMBUS containers were allocated, till the time when the Slider AM was stopped. Also, the large number of Memory usage log lines were removed keeping only a few starts and ends of every segment. *NM log from host-07 where Slider AM container was running:* {noformat} 2015-04-29 00:39:24,614 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(356)) - Stopping resource-monitoring for container_1428575950531_0020_02_01 2015-04-29 00:41:10,310 INFO ipc.Server (Server.java:saslProcess(1306)) - Auth successful for appattempt_1428575950531_0021_01 (auth:SIMPLE) 2015-04-29 00:41:10,322 INFO containermanager.ContainerManagerImpl (ContainerManagerImpl.java:startContainerInternal(803)) - Start request for container_1428575950531_0021_01_01 by user yarn 2015-04-29 00:41:10,322 INFO containermanager.ContainerManagerImpl (ContainerManagerImpl.java:startContainerInternal(843)) - Creating a new application reference for app application_1428575950531_0021 2015-04-29 00:41:10,323 INFO application.Application (ApplicationImpl.java:handle(464)) - Application application_1428575950531_0021 transitioned from NEW to INITING 2015-04-29 00:41:10,325 INFO nodemanager.NMAuditLogger (NMAuditLogger.java:logSuccess(89)) - USER=yarn IP=10.84.105.162 OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1428575950531_0021 CONTAINERID=container_1428575950531_0021_01_01 2015-04-29 00:41:10,328 WARN logaggregation.LogAggregationService (LogAggregationService.java:verifyAndCreateRemoteLogDir(195)) - Remote Root Log Dir [/app-logs] already exist, but with incorrect permissions. Expected: [rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple users. 2015-04-29 00:41:10,328 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:init(182)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished. 2015-04-29 00:41:10,351 INFO application.Application (ApplicationImpl.java:transition(304)) - Adding container_1428575950531_0021_01_01 to application application_1428575950531_0021 2015-04-29 00:41:10,352 INFO application.Application (ApplicationImpl.java:handle(464)) - Application application_1428575950531_0021 transitioned from INITING to RUNNING 2015-04-29 00:41:10,356 INFO container.Container (ContainerImpl.java:handle(999)) - Container container_1428575950531_0021_01_01 transitioned from NEW to LOCALIZING 2015-04-29 00:41:10,357 INFO containermanager.AuxServices (AuxServices.java:handle(196)) - Got event CONTAINER_INIT for appId application_1428575950531_0021 2015-04-29 00:41:10,357 INFO localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/htrace-core-3.0.4.jar transitioned from INIT to DOWNLOADING 2015-04-29 00:41:10,357 INFO localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/jettison-1.1.jar transitioned from INIT to
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520961#comment-14520961 ] zhihai xu commented on YARN-2893: - thanks [~jira.shegalov], that is a good catch. I uploaded a new patch YARN-2893.005.patch, which fixed the double indentation checkstyle violation. AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: zhihai xu Attachments: YARN-2893.000.patch, YARN-2893.001.patch, YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch, YARN-2893.005.patch MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3552) RM Web UI shows -1 running containers for completed apps
[ https://issues.apache.org/jira/browse/YARN-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3552: - Attachment: 0001-YARN-3552.patch RM Web UI shows -1 running containers for completed apps Key: YARN-3552 URL: https://issues.apache.org/jira/browse/YARN-3552 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Rohith Assignee: Rohith Priority: Trivial Attachments: 0001-YARN-3552.patch, 0001-YARN-3552.patch, 0001-YARN-3552.patch, yarn-3352.PNG In the RMServerUtils, the default values are negative number which results in the displayiing the RM web UI also negative number. {code} public static final ApplicationResourceUsageReport DUMMY_APPLICATION_RESOURCE_USAGE_REPORT = BuilderUtils.newApplicationResourceUsageReport(-1, -1, Resources.createResource(-1, -1), Resources.createResource(-1, -1), Resources.createResource(-1, -1), 0, 0); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3552) RM Web UI shows -1 running containers for completed apps
[ https://issues.apache.org/jira/browse/YARN-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521027#comment-14521027 ] Hadoop QA commented on YARN-3552: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 43s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 7m 38s | The applied patch generated 1 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 16s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 62m 58s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 106m 14s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | | | hadoop.yarn.server.resourcemanager.TestRMRestart | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729401/0001-YARN-3552.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / aa22450 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7550/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7550/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7550/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7550/console | This message was automatically generated. RM Web UI shows -1 running containers for completed apps Key: YARN-3552 URL: https://issues.apache.org/jira/browse/YARN-3552 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Rohith Assignee: Rohith Priority: Trivial Attachments: 0001-YARN-3552.patch, 0001-YARN-3552.patch, 0001-YARN-3552.patch, yarn-3352.PNG In the RMServerUtils, the default values are negative number which results in the displayiing the RM web UI also negative number. {code} public static final ApplicationResourceUsageReport DUMMY_APPLICATION_RESOURCE_USAGE_REPORT = BuilderUtils.newApplicationResourceUsageReport(-1, -1, Resources.createResource(-1, -1), Resources.createResource(-1, -1), Resources.createResource(-1, -1), 0, 0); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability
[ https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520957#comment-14520957 ] Hadoop QA commented on YARN-3271: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 7m 0s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 3 line(s) that end in whitespace. | | {color:green}+1{color} | javac | 8m 52s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 4m 43s | There were no new checkstyle issues. | | {color:green}+1{color} | install | 1m 43s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 30s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 51m 45s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 76m 33s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAppRunnability | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729398/YARN-3271.2.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / aa22450 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7549/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7549/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7549/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7549/console | This message was automatically generated. FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability --- Key: YARN-3271 URL: https://issues.apache.org/jira/browse/YARN-3271 Project: Hadoop YARN Issue Type: Improvement Reporter: Karthik Kambatla Assignee: nijel Attachments: YARN-3271.1.patch, YARN-3271.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3552) RM Web UI shows -1 running containers for completed apps
[ https://issues.apache.org/jira/browse/YARN-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520954#comment-14520954 ] Ryu Kobayashi commented on YARN-3552: - I think this problem exists even FairScheduler. FairSchedulerAppsBlock.java: {code} .append(appInfo.getRunningContainers()).append(\,\) {code} RM Web UI shows -1 running containers for completed apps Key: YARN-3552 URL: https://issues.apache.org/jira/browse/YARN-3552 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Rohith Assignee: Rohith Priority: Trivial Attachments: 0001-YARN-3552.patch, 0001-YARN-3552.patch, yarn-3352.PNG In the RMServerUtils, the default values are negative number which results in the displayiing the RM web UI also negative number. {code} public static final ApplicationResourceUsageReport DUMMY_APPLICATION_RESOURCE_USAGE_REPORT = BuilderUtils.newApplicationResourceUsageReport(-1, -1, Resources.createResource(-1, -1), Resources.createResource(-1, -1), Resources.createResource(-1, -1), 0, 0); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3557) Support Intel Trusted Execution Technology(TXT) in YARN scheduler
[ https://issues.apache.org/jira/browse/YARN-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu updated YARN-3557: -- Attachment: Support TXT in YARN high level design doc.pdf A high level design doc attached. Support Intel Trusted Execution Technology(TXT) in YARN scheduler - Key: YARN-3557 URL: https://issues.apache.org/jira/browse/YARN-3557 Project: Hadoop YARN Issue Type: New Feature Reporter: Dian Fu Attachments: Support TXT in YARN high level design doc.pdf Intel TXT defines platform-level enhancements that provide the building blocks for creating trusted platforms. A TXT aware YARN scheduler can schedule security sensitive jobs on TXT enabled nodes only. YARN-2492 provides the capacity to restrict YARN applications to run only on cluster nodes that have a specified node label. This is a good mechanism that be utilized for TXT aware YARN scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3552) RM Web UI shows -1 running containers for completed apps
[ https://issues.apache.org/jira/browse/YARN-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521161#comment-14521161 ] Hadoop QA commented on YARN-3552: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 1s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 42s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 44s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 3m 56s | The applied patch generated 2 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 15s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 52m 27s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 92m 35s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729420/0001-YARN-3552.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f5b3847 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7552/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7552/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7552/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7552/console | This message was automatically generated. RM Web UI shows -1 running containers for completed apps Key: YARN-3552 URL: https://issues.apache.org/jira/browse/YARN-3552 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Rohith Assignee: Rohith Priority: Trivial Attachments: 0001-YARN-3552.patch, 0001-YARN-3552.patch, 0001-YARN-3552.patch, yarn-3352.PNG In the RMServerUtils, the default values are negative number which results in the displayiing the RM web UI also negative number. {code} public static final ApplicationResourceUsageReport DUMMY_APPLICATION_RESOURCE_USAGE_REPORT = BuilderUtils.newApplicationResourceUsageReport(-1, -1, Resources.createResource(-1, -1), Resources.createResource(-1, -1), Resources.createResource(-1, -1), 0, 0); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3552) RM Web UI shows -1 running containers for completed apps
[ https://issues.apache.org/jira/browse/YARN-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521001#comment-14521001 ] Rohith commented on YARN-3552: -- My bad, missed it. Thanks for pointing out, will update patch for it RM Web UI shows -1 running containers for completed apps Key: YARN-3552 URL: https://issues.apache.org/jira/browse/YARN-3552 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Rohith Assignee: Rohith Priority: Trivial Attachments: 0001-YARN-3552.patch, 0001-YARN-3552.patch, yarn-3352.PNG In the RMServerUtils, the default values are negative number which results in the displayiing the RM web UI also negative number. {code} public static final ApplicationResourceUsageReport DUMMY_APPLICATION_RESOURCE_USAGE_REPORT = BuilderUtils.newApplicationResourceUsageReport(-1, -1, Resources.createResource(-1, -1), Resources.createResource(-1, -1), Resources.createResource(-1, -1), 0, 0); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521095#comment-14521095 ] Hadoop QA commented on YARN-2893: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 45s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 43s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 5m 32s | The applied patch generated 1 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 16s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 52m 35s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 93m 57s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729414/YARN-2893.005.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / aa22450 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7551/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7551/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7551/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7551/console | This message was automatically generated. AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: zhihai xu Attachments: YARN-2893.000.patch, YARN-2893.001.patch, YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch, YARN-2893.005.patch MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API
[ https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521217#comment-14521217 ] Steve Loughran commented on YARN-3539: -- works for me Compatibility doc to state that ATS v1 is a stable REST API --- Key: YARN-3539 URL: https://issues.apache.org/jira/browse/YARN-3539 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.7.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, YARN-3539-003.patch, YARN-3539-004.patch The ATS v2 discussion and YARN-2423 have raised the question: how stable are the ATSv1 APIs? The existing compatibility document actually states that the History Server is [a stable REST API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs], which effectively means that ATSv1 has already been declared as a stable API. Clarify this by patching the compatibility document appropriately -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails
[ https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3476: - Attachment: 0001-YARN-3476.patch Apologies for looking back to issue delayed. bq. Would it be better to wrap the cleanup in a finally block or something a little more broadly applicable to errors that occur? Make sense to me. Uploading the patch with handling exception and do the post clean up. Nodemanager can fail to delete local logs if log aggregation fails -- Key: YARN-3476 URL: https://issues.apache.org/jira/browse/YARN-3476 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation, nodemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Rohith Attachments: 0001-YARN-3476.patch, 0001-YARN-3476.patch If log aggregation encounters an error trying to upload the file then the underlying TFile can throw an illegalstateexception which will bubble up through the top of the thread and prevent the application logs from being deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3534) Report node resource utilization
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521261#comment-14521261 ] Varun Vasudev commented on YARN-3534: - Thanks for the path [~elgoiri]! Is it possible for you to split up the patch into two - one to record the memory and cpu utilization in NodeResourceMonitorImpl and one for the RPC changes. If you could file a separate JIRA for the NodeResourceMonitorImpl changes that you have made(they are definitely required and should go in), that would be ideal. I'm not sure if we should use RPC to expose the stats. Using RPC restricts the stats to a limited number of services whereas using REST lets a larger number of services access the stats - which is why YARN-3332 wants to use REST to expose the NM stats. Using REST should also make it easier to add more stats with respect to utilization. Report node resource utilization Key: YARN-3534 URL: https://issues.apache.org/jira/browse/YARN-3534 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Assignee: Inigo Goiri Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch Original Estimate: 336h Remaining Estimate: 336h YARN should be aware of the resource utilization of the nodes when scheduling containers. For this, this task will implement the NodeResourceMonitor and send this information to the Resource Manager in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only
[ https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521406#comment-14521406 ] Hudson commented on YARN-3517: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #170 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/170/]) YARN-3517. RM web ui for dumping scheduler logs should be for admins only (Varun Vasudev via tgraves) (tgraves: rev 2e215484bd05cd5e3b7a81d3558c6879a05dd2d2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/security/ApplicationACLsManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java RM web ui for dumping scheduler logs should be for admins only -- Key: YARN-3517 URL: https://issues.apache.org/jira/browse/YARN-3517 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, security Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Blocker Labels: security Fix For: 2.8.0 Attachments: YARN-3517.001.patch, YARN-3517.002.patch, YARN-3517.003.patch, YARN-3517.004.patch, YARN-3517.005.patch, YARN-3517.006.patch YARN-3294 allows users to dump scheduler logs from the web UI. This should be for admins only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled
[ https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521403#comment-14521403 ] Hudson commented on YARN-3533: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #170 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/170/]) YARN-3533. Test: Fix launchAM in MockRM to wait for attempt to be scheduled. Contributed by Anubhav Dhoot (jianhe: rev 4c1af156aef4f3bb1d9823d5980c59b12007dc77) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java Test: Fix launchAM in MockRM to wait for attempt to be scheduled Key: YARN-3533 URL: https://issues.apache.org/jira/browse/YARN-3533 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.8.0 Attachments: YARN-3533.001.patch MockRM#launchAM fails in many test runs because it does not wait for the app attempt to be scheduled before NM update is sent as noted in [recent builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled
[ https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521449#comment-14521449 ] Hudson commented on YARN-3533: -- FAILURE: Integrated in Hadoop-Yarn-trunk #913 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/913/]) YARN-3533. Test: Fix launchAM in MockRM to wait for attempt to be scheduled. Contributed by Anubhav Dhoot (jianhe: rev 4c1af156aef4f3bb1d9823d5980c59b12007dc77) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java Test: Fix launchAM in MockRM to wait for attempt to be scheduled Key: YARN-3533 URL: https://issues.apache.org/jira/browse/YARN-3533 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.8.0 Attachments: YARN-3533.001.patch MockRM#launchAM fails in many test runs because it does not wait for the app attempt to be scheduled before NM update is sent as noted in [recent builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability
[ https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521465#comment-14521465 ] Hadoop QA commented on YARN-3271: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 5m 11s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 36s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 5m 32s | There were no new checkstyle issues. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 15s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 52m 56s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 74m 59s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729453/YARN-3271.3.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / de9404f | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7553/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7553/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7553/console | This message was automatically generated. FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability --- Key: YARN-3271 URL: https://issues.apache.org/jira/browse/YARN-3271 Project: Hadoop YARN Issue Type: Improvement Reporter: Karthik Kambatla Assignee: nijel Attachments: YARN-3271.1.patch, YARN-3271.2.patch, YARN-3271.3.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails
[ https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521471#comment-14521471 ] Hadoop QA commented on YARN-3476: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 35s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 2 line(s) that end in whitespace. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 7m 40s | The applied patch generated 1 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 1s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 5m 58s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 48m 51s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729459/0001-YARN-3476.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / de9404f | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7554/artifact/patchprocess/whitespace.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7554/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7554/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7554/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7554/console | This message was automatically generated. Nodemanager can fail to delete local logs if log aggregation fails -- Key: YARN-3476 URL: https://issues.apache.org/jira/browse/YARN-3476 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation, nodemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Rohith Attachments: 0001-YARN-3476.patch, 0001-YARN-3476.patch If log aggregation encounters an error trying to upload the file then the underlying TFile can throw an illegalstateexception which will bubble up through the top of the thread and prevent the application logs from being deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled
[ https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521393#comment-14521393 ] Hudson commented on YARN-3533: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #179 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/179/]) YARN-3533. Test: Fix launchAM in MockRM to wait for attempt to be scheduled. Contributed by Anubhav Dhoot (jianhe: rev 4c1af156aef4f3bb1d9823d5980c59b12007dc77) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java Test: Fix launchAM in MockRM to wait for attempt to be scheduled Key: YARN-3533 URL: https://issues.apache.org/jira/browse/YARN-3533 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.8.0 Attachments: YARN-3533.001.patch MockRM#launchAM fails in many test runs because it does not wait for the app attempt to be scheduled before NM update is sent as noted in [recent builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only
[ https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521396#comment-14521396 ] Hudson commented on YARN-3517: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #179 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/179/]) YARN-3517. RM web ui for dumping scheduler logs should be for admins only (Varun Vasudev via tgraves) (tgraves: rev 2e215484bd05cd5e3b7a81d3558c6879a05dd2d2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/security/ApplicationACLsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java RM web ui for dumping scheduler logs should be for admins only -- Key: YARN-3517 URL: https://issues.apache.org/jira/browse/YARN-3517 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, security Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Blocker Labels: security Fix For: 2.8.0 Attachments: YARN-3517.001.patch, YARN-3517.002.patch, YARN-3517.003.patch, YARN-3517.004.patch, YARN-3517.005.patch, YARN-3517.006.patch YARN-3294 allows users to dump scheduler logs from the web UI. This should be for admins only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1402) Related Web UI, CLI changes on exposing client API to check log aggregation status
[ https://issues.apache.org/jira/browse/YARN-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521475#comment-14521475 ] Junping Du commented on YARN-1402: -- Forget to mention, after this patch, keepAliveApplications in Heartbeat request is not required anymore. We should start work to remove it in another JIRA. Related Web UI, CLI changes on exposing client API to check log aggregation status -- Key: YARN-1402 URL: https://issues.apache.org/jira/browse/YARN-1402 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.8.0 Attachments: YARN-1402.1.patch, YARN-1402.2.patch, YARN-1402.3.1.patch, YARN-1402.3.2.patch, YARN-1402.3.patch, YARN-1402.4.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled
[ https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521383#comment-14521383 ] Hudson commented on YARN-3533: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2111 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2111/]) YARN-3533. Test: Fix launchAM in MockRM to wait for attempt to be scheduled. Contributed by Anubhav Dhoot (jianhe: rev 4c1af156aef4f3bb1d9823d5980c59b12007dc77) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java Test: Fix launchAM in MockRM to wait for attempt to be scheduled Key: YARN-3533 URL: https://issues.apache.org/jira/browse/YARN-3533 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.8.0 Attachments: YARN-3533.001.patch MockRM#launchAM fails in many test runs because it does not wait for the app attempt to be scheduled before NM update is sent as noted in [recent builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only
[ https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521386#comment-14521386 ] Hudson commented on YARN-3517: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2111 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2111/]) YARN-3517. RM web ui for dumping scheduler logs should be for admins only (Varun Vasudev via tgraves) (tgraves: rev 2e215484bd05cd5e3b7a81d3558c6879a05dd2d2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/security/ApplicationACLsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/CHANGES.txt RM web ui for dumping scheduler logs should be for admins only -- Key: YARN-3517 URL: https://issues.apache.org/jira/browse/YARN-3517 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, security Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Blocker Labels: security Fix For: 2.8.0 Attachments: YARN-3517.001.patch, YARN-3517.002.patch, YARN-3517.003.patch, YARN-3517.004.patch, YARN-3517.005.patch, YARN-3517.006.patch YARN-3294 allows users to dump scheduler logs from the web UI. This should be for admins only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only
[ https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521452#comment-14521452 ] Hudson commented on YARN-3517: -- FAILURE: Integrated in Hadoop-Yarn-trunk #913 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/913/]) YARN-3517. RM web ui for dumping scheduler logs should be for admins only (Varun Vasudev via tgraves) (tgraves: rev 2e215484bd05cd5e3b7a81d3558c6879a05dd2d2) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/security/ApplicationACLsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java RM web ui for dumping scheduler logs should be for admins only -- Key: YARN-3517 URL: https://issues.apache.org/jira/browse/YARN-3517 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, security Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Blocker Labels: security Fix For: 2.8.0 Attachments: YARN-3517.001.patch, YARN-3517.002.patch, YARN-3517.003.patch, YARN-3517.004.patch, YARN-3517.005.patch, YARN-3517.006.patch YARN-3294 allows users to dump scheduler logs from the web UI. This should be for admins only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability
[ https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel updated YARN-3271: Attachment: YARN-3271.3.patch Updated patch for whitespace fix and test fix FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability --- Key: YARN-3271 URL: https://issues.apache.org/jira/browse/YARN-3271 Project: Hadoop YARN Issue Type: Improvement Reporter: Karthik Kambatla Assignee: nijel Attachments: YARN-3271.1.patch, YARN-3271.2.patch, YARN-3271.3.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3559) Mark org.apache.hadoop.security.token.Token as @InterfaceAudience.Public
[ https://issues.apache.org/jira/browse/YARN-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521226#comment-14521226 ] Steve Loughran commented on YARN-3559: -- I've not touched it -it is one line. What it does need though is agreement that the class really is public. (moving to HADOOP-) Mark org.apache.hadoop.security.token.Token as @InterfaceAudience.Public Key: YARN-3559 URL: https://issues.apache.org/jira/browse/YARN-3559 Project: Hadoop YARN Issue Type: Improvement Components: security Affects Versions: 2.6.0 Reporter: Steve Loughran {{org.apache.hadoop.security.token.Token}} is tagged {{@InterfaceAudience.LimitedPrivate}} for HDFS and MapReduce. However, it is used throughout YARN apps, where both the clients and the AM need to work with tokens. This class and related all need to be declared public. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3445) Cache runningApps in RMNode for getting running apps on given NodeId
[ https://issues.apache.org/jira/browse/YARN-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521578#comment-14521578 ] Junping Du commented on YARN-3445: -- Thanks for comments, [~vinodkv]! bq. logAggregationReportsForApps itself is a map of ApplicationID with a nested LogAggregationReport.ApplicationID - duplicate AppID information Are u suggest we should replace map with list in NodeHeartbeatRequest? I fully agree and I will suggest to do so in YARN-3505. bq. runningApplications in this patch In v2 patch, runningApplications is already removed. Kindly check the v2 patch again? bq. NodeStatus.keepAliveApplications I agree. This shouldn't be needed anymore after YARN-1402. I had the similar idea before in synced with Xuan but forget to put it on JIRA. May be we should file a separated JIRA to fix it? CC [~xgong]. Cache runningApps in RMNode for getting running apps on given NodeId Key: YARN-3445 URL: https://issues.apache.org/jira/browse/YARN-3445 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Junping Du Assignee: Junping Du Attachments: YARN-3445-v2.patch, YARN-3445.patch Per discussion in YARN-3334, we need filter out unnecessary collectors info from RM in heartbeat response. Our propose is to add cache for runningApps in RMNode, so RM only send collectors for local running apps back. This is also needed in YARN-914 (graceful decommission) that if no running apps in NM which is in decommissioning stage, it will get decommissioned immediately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521580#comment-14521580 ] Thomas Graves commented on YARN-3243: - [~leftnoteasy] Can we pull this back into the branch-2.7? CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.8.0 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch, YARN-3243.5.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource need to be unreserved to keep its parent's resource limit. - More over, with this, YARN-3026 will make a clear boundary between LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3534) Report node resource utilization
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521924#comment-14521924 ] Karthik Kambatla commented on YARN-3534: bq. I'm not sure if we should use RPC to expose the stats. As YARN-3332 suggests, we should expose these stats through REST. However, for the RM (scheduler) to get this information, I think RPC (through node heartbeat) is better. That way, the RM doesn't have to poll the REST API for every node there is an NM heartbeat for. If the suggestion is for the NM to get this information from the REST API, I think that is fine. However, since this communication is private to NM, we can always change this later once YARN-3332 is ready. Report node resource utilization Key: YARN-3534 URL: https://issues.apache.org/jira/browse/YARN-3534 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Assignee: Inigo Goiri Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch Original Estimate: 336h Remaining Estimate: 336h YARN should be aware of the resource utilization of the nodes when scheduling containers. For this, this task will implement the NodeResourceMonitor and send this information to the Resource Manager in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522021#comment-14522021 ] Thomas Graves commented on YARN-3243: - I was wanting to pull YARN-3434 back into 2.7. It kind of depends on this one. Atleast I think it would merge cleanly if this one was there. This is also fixing a bug which I would like to see fixed in the 2.7 line if we are going to use it. Its not a blocker since it exists in our 2.6 but it would be nice to have. If we decide its to big then I'll just port YARN-3434 back without it CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.8.0 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch, YARN-3243.5.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource need to be unreserved to keep its parent's resource limit. - More over, with this, YARN-3026 will make a clear boundary between LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3473) Fix RM Web UI configuration for some properties
[ https://issues.apache.org/jira/browse/YARN-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521845#comment-14521845 ] Ray Chiang commented on YARN-3473: -- RE: No new unit tests Passed visual inspection on the RM UI web interface. Fix RM Web UI configuration for some properties --- Key: YARN-3473 URL: https://issues.apache.org/jira/browse/YARN-3473 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: BB2015-05-TBR Attachments: YARN-3473.001.patch Using the RM Web UI, the Tools-Configuration page shows some properties as something like BufferedInputStream instead of the appropriate .xml file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522014#comment-14522014 ] Wangda Tan commented on YARN-3243: -- Thanks comment from [~vinodkv]. I would +1 for back port this into branch-2.7, even if this patch is potentially required to support non-exclusive node label, but this patch itself is a bug fix instead of new feature. CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.8.0 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch, YARN-3243.5.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource need to be unreserved to keep its parent's resource limit. - More over, with this, YARN-3026 will make a clear boundary between LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3564) Fix TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails randomly
[ https://issues.apache.org/jira/browse/YARN-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521973#comment-14521973 ] Hudson commented on YARN-3564: -- FAILURE: Integrated in Hadoop-trunk-Commit #7706 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7706/]) YARN-3564. Fix TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails randomly. (Jian He via wangda) (wangda: rev e2e8f771183df798e926abc97116316a05b19c9a) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java Fix TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails randomly --- Key: YARN-3564 URL: https://issues.apache.org/jira/browse/YARN-3564 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Fix For: 2.8.0 Attachments: YARN-3564.1.patch the test fails intermittently in jenkins https://builds.apache.org/job/PreCommit-YARN-Build/7467/testReport/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2497) Changes for fair scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522001#comment-14522001 ] Viplav Madasu commented on YARN-2497: - I have same comment as [~john.jian.fang]. [~yufeldman], any updates? Are you actively working on it? We would like to see the Fair Scheduler work with admin lables. Thanks. Changes for fair scheduler to support allocate resource respect labels -- Key: YARN-2497 URL: https://issues.apache.org/jira/browse/YARN-2497 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Yuliya Feldman -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3557) Support Intel Trusted Execution Technology(TXT) in YARN scheduler
[ https://issues.apache.org/jira/browse/YARN-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521999#comment-14521999 ] Wangda Tan commented on YARN-3557: -- Hi [~dian.fu], Thanks for posting design doc, I just done a quick look at the doc, it seems to me supporting TXT can stay outside of YARN scheduler. Scheduler doesn't need to know if a node is trusted or not, trusted will be a generic label of a node. And some questions for design: bq. Currently for centralized node label configuration, it only supports admin configure node label through CLI. Need to provide a mechanism at RM side which can configure node label in the similar way as YARN-2495. Now RM supports using CLI or REST API, are they enough for you to configure NM's trusted status? bq. Currently user can configure centralized node label configuration or distributed node label configuration, but cannot configure both. Configure both could be problematic, see my comment: https://issues.apache.org/jira/browse/YARN-2495?focusedCommentId=14317048page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14317048. Please let me know if your thoughts. Support Intel Trusted Execution Technology(TXT) in YARN scheduler - Key: YARN-3557 URL: https://issues.apache.org/jira/browse/YARN-3557 Project: Hadoop YARN Issue Type: New Feature Reporter: Dian Fu Attachments: Support TXT in YARN high level design doc.pdf Intel TXT defines platform-level enhancements that provide the building blocks for creating trusted platforms. A TXT aware YARN scheduler can schedule security sensitive jobs on TXT enabled nodes only. YARN-2492 provides the capacity to restrict YARN applications to run only on cluster nodes that have a specified node label. This is a good mechanism that be utilized for TXT aware YARN scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3544) AM logs link missing in the RM UI for a completed app
[ https://issues.apache.org/jira/browse/YARN-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521802#comment-14521802 ] Hadoop QA commented on YARN-3544: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 0s | Pre-patch branch-2.7 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729519/YARN-3544-branch-2.7-1.2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | branch-2.7 / 185a1ff | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7555/console | This message was automatically generated. AM logs link missing in the RM UI for a completed app -- Key: YARN-3544 URL: https://issues.apache.org/jira/browse/YARN-3544 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.7.0 Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Blocker Attachments: Screen Shot 2015-04-27 at 6.24.05 PM.png, YARN-3544-branch-2.7-1.2.patch, YARN-3544-branch-2.7-1.patch, YARN-3544.1.patch AM log links should always be present ( for both running and completed apps). Likewise node info is also empty. This is usually quite crucial when trying to debug where an AM was launched and a pointer to which NM's logs to look at if the AM failed to launch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521881#comment-14521881 ] Sidharta Seethana commented on YARN-3366: - hi [~hex108] , The goal here is to ensure that we have a maximum assigned bandwidth for all YARN containers when cluster admins require this. The behavior you want (where YARN containers are allowed to use the total bandwidth) is possible by simply not configuring max YARN outbound bandwidth - in which case it defaults to the total bandwidth. thanks Outbound network bandwidth : classify/shape traffic originating from YARN containers Key: YARN-3366 URL: https://issues.apache.org/jira/browse/YARN-3366 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sidharta Seethana Assignee: Sidharta Seethana Fix For: 2.8.0 Attachments: YARN-3366.001.patch, YARN-3366.002.patch, YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, YARN-3366.006.patch, YARN-3366.007.patch In order to be able to isolate based on/enforce outbound traffic bandwidth limits, we need a mechanism to classify/shape network traffic in the nodemanager. For more information on the design, please see the attached design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API
[ https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-3539: - Attachment: YARN-3539-005.patch patch -005 h3. Valid entity types. Currently / and are allowed in entity types. I'm not sure that is a good idea. I'd also worry about and for security reasons. A regexp of A-Za-z0-9 + others of a limited set may be enough. We can always check with all known users to see what entity types they are using. Locking it down for V1 allows v2 to be consistent. bq. 1. For the bullet points of Current Status and Future Plans, can we organize them a bit better. For example, we partition them into the groups of a) current status and b) future plans. For bullet 4, not just history, but all timeline data. done bq. 2. Can we move Timeline Server REST API section before Generic Data REST APIs? done bq. 3. Application elements table seems to be wrongly formatted. I think that's why site compilation is failed. fixed bq. 4. Generic Data REST APIs output examples need to be slightly updated. Some more fields are added or changed. Those are the examples from YARN-1876. If you have some more up to date ones I'll replace them. done bq. 5. Timeline Server REST API output examples are not genuine. Perhaps, we can run a simple MR example job, and get the up-to-date timeline entity and application info to show as the examples. +1. Do you have this? Compatibility doc to state that ATS v1 is a stable REST API --- Key: YARN-3539 URL: https://issues.apache.org/jira/browse/YARN-3539 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.7.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, YARN-3539-003.patch, YARN-3539-004.patch, YARN-3539-005.patch The ATS v2 discussion and YARN-2423 have raised the question: how stable are the ATSv1 APIs? The existing compatibility document actually states that the History Server is [a stable REST API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs], which effectively means that ATSv1 has already been declared as a stable API. Clarify this by patching the compatibility document appropriately -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String
[ https://issues.apache.org/jira/browse/YARN-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521888#comment-14521888 ] Wangda Tan commented on YARN-3565: -- [~Naganarasimha], sure, please go ahead. NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String - Key: YARN-3565 URL: https://issues.apache.org/jira/browse/YARN-3565 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Priority: Blocker Now NM HB/Register uses SetString, it will be hard to add new fields if we want to support specifying NodeLabel type such as exclusivity/constraints, etc. We need to make sure rolling upgrade works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3445) Cache runningApps in RMNode for getting running apps on given NodeId
[ https://issues.apache.org/jira/browse/YARN-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521893#comment-14521893 ] Jian He commented on YARN-3445: --- One other thing, the LogAggregationReport#(get/set)getNodeId can also be removed as it's not used anywhere. I'm also unsure about the usage of LogAggregationReport#(get/set)DiagnosticMessage as it's only set with an empty string. agree we can have a separate jira to fix this, preferably in the same 2.8 release. Cache runningApps in RMNode for getting running apps on given NodeId Key: YARN-3445 URL: https://issues.apache.org/jira/browse/YARN-3445 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Junping Du Assignee: Junping Du Attachments: YARN-3445-v2.patch, YARN-3445.patch Per discussion in YARN-3334, we need filter out unnecessary collectors info from RM in heartbeat response. Our propose is to add cache for runningApps in RMNode, so RM only send collectors for local running apps back. This is also needed in YARN-914 (graceful decommission) that if no running apps in NM which is in decommissioning stage, it will get decommissioned immediately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API
[ https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521896#comment-14521896 ] Hadoop QA commented on YARN-3539: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 3m 38s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 4 line(s) that end in whitespace. | | {color:green}+1{color} | release audit | 0m 20s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | site | 1m 56s | Site compilation is broken. | | | | 5m 59s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729541/YARN-3539-005.patch | | Optional Tests | site | | git revision | trunk / de9404f | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7556/artifact/patchprocess/whitespace.txt | | site | https://builds.apache.org/job/PreCommit-YARN-Build/7556/artifact/patchprocess/patchSiteWarnings.txt | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7556/console | This message was automatically generated. Compatibility doc to state that ATS v1 is a stable REST API --- Key: YARN-3539 URL: https://issues.apache.org/jira/browse/YARN-3539 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.7.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, YARN-3539-003.patch, YARN-3539-004.patch, YARN-3539-005.patch The ATS v2 discussion and YARN-2423 have raised the question: how stable are the ATSv1 APIs? The existing compatibility document actually states that the History Server is [a stable REST API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs], which effectively means that ATSv1 has already been declared as a stable API. Clarify this by patching the compatibility document appropriately -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node
[ https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521926#comment-14521926 ] Karthik Kambatla commented on YARN-3332: [~vinodkv] - did you start implementing this? I would like to be involved in the work here - either implementing parts of it or reviewing most of it. [Umbrella] Unified Resource Statistics Collection per node -- Key: YARN-3332 URL: https://issues.apache.org/jira/browse/YARN-3332 Project: Hadoop YARN Issue Type: Improvement Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: Design - UnifiedResourceStatisticsCollection.pdf Today in YARN, NodeManager collects statistics like per container resource usage and overall physical resources available on the machine. Currently this is used internally in YARN by the NodeManager for only a limited usage: automatically determining the capacity of resources on node and enforcing memory usage to what is reserved per container. This proposal is to extend the existing architecture and collect statistics for usage beyond the existing usecases. Proposal attached in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels
[ https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3521: -- Attachment: 0002-YARN-3521.patch Thank you [~leftnoteasy] for sharing the comments. Pls find an updated patch addressing comments. Pls check the same and let me know your thoughts. Support return structured NodeLabel objects in REST API when call getClusterNodeLabels -- Key: YARN-3521 URL: https://issues.apache.org/jira/browse/YARN-3521 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G Attachments: 0001-YARN-3521.patch, 0002-YARN-3521.patch In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should make the same change in REST API side to make them consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3481) Report NM aggregated container resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521936#comment-14521936 ] Karthik Kambatla commented on YARN-3481: I have been working with [~rgrandl] on YARN-2965 (he shared his Tetris code privately). YARN-2965 aims to expose more than just CPU and memory - disk in/out bandwidth and network in/out bandwidth. I think it is okay to capture CPU and memory here, and add the remaining items in the context of that JIRA. Report NM aggregated container resource utilization in heartbeat Key: YARN-3481 URL: https://issues.apache.org/jira/browse/YARN-3481 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Original Estimate: 336h Remaining Estimate: 336h To allow the RM take better scheduling decisions, it should be aware of the actual utilization of the containers. The NM would aggregate the ContainerMetrics and report it in every heartbeat. Related to YARN-1012 but aggregated to reduce the heartbeat overhead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3564) Fix TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails randomly
[ https://issues.apache.org/jira/browse/YARN-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3564: - Summary: Fix TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails randomly (was: TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails randomly ) Fix TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails randomly --- Key: YARN-3564 URL: https://issues.apache.org/jira/browse/YARN-3564 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3564.1.patch the test fails intermittently in jenkins https://builds.apache.org/job/PreCommit-YARN-Build/7467/testReport/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522008#comment-14522008 ] Vinod Kumar Vavilapalli commented on YARN-3243: --- bq. Wangda Tan Can we pull this back into the branch-2.7? I am not against it, but we need to rationalize why it needs to be pulled in. Specifically given the fact that this is a big patch. Also, it wouldn't stop here and you'll need more patches that depend on this? CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.8.0 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch, YARN-3243.5.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource need to be unreserved to keep its parent's resource limit. - More over, with this, YARN-3026 will make a clear boundary between LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3445) Cache runningApps in RMNode for getting running apps on given NodeId
[ https://issues.apache.org/jira/browse/YARN-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522058#comment-14522058 ] Vinod Kumar Vavilapalli commented on YARN-3445: --- bq. I agree. This shouldn't be needed anymore after YARN-1402. I had the similar idea before in synced with Xuan but forget to put it on JIRA. May be we should file a separated JIRA to fix it? keepAliveApplications cannot be removed as we need to support protocol compatibility. But the new ones you added for logs can be removed as they are new. Can you take this forward also on YARN-3505? bq. .. LogAggregationReport#(get/set)getNodeId .. LogAggregationReport#(get/set)DiagnosticMessage .. [~jianhe], can you file a ticket please? Cache runningApps in RMNode for getting running apps on given NodeId Key: YARN-3445 URL: https://issues.apache.org/jira/browse/YARN-3445 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Junping Du Assignee: Junping Du Attachments: YARN-3445-v2.patch, YARN-3445.patch Per discussion in YARN-3334, we need filter out unnecessary collectors info from RM in heartbeat response. Our propose is to add cache for runningApps in RMNode, so RM only send collectors for local running apps back. This is also needed in YARN-914 (graceful decommission) that if no running apps in NM which is in decommissioning stage, it will get decommissioned immediately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps
[ https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1451#comment-1451 ] Jian He commented on YARN-3505: --- One other thing, the LogAggregationReport#(get/set)getNodeId can also be removed as it's not used anywhere. I'm also unsure about the usage of LogAggregationReport#(get/set)DiagnosticMessage as it's only set with an empty string. Node's Log Aggregation Report with SUCCEED should not cached in RMApps -- Key: YARN-3505 URL: https://issues.apache.org/jira/browse/YARN-3505 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: 2.8.0 Reporter: Junping Du Assignee: Xuan Gong Priority: Critical Attachments: YARN-3505.1.patch Per discussions in YARN-1402, we shouldn't cache all node's log aggregation reports in RMApps for always, especially for those finished with SUCCEED. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3534) Collect node resource utilization
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3534: -- Summary: Collect node resource utilization (was: Report node resource utilization) Collect node resource utilization - Key: YARN-3534 URL: https://issues.apache.org/jira/browse/YARN-3534 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Assignee: Inigo Goiri Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch Original Estimate: 336h Remaining Estimate: 336h YARN should be aware of the resource utilization of the nodes when scheduling containers. For this, this task will implement the NodeResourceMonitor and send this information to the Resource Manager in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522044#comment-14522044 ] Wangda Tan commented on YARN-3243: -- [~tgraves], I think YARN-3434 needs YARN-3361, it cannot cleanly merge only with YARN-3243. Could you check it? CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.8.0 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch, YARN-3243.5.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource need to be unreserved to keep its parent's resource limit. - More over, with this, YARN-3026 will make a clear boundary between LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3534) Report node resource utilization
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522068#comment-14522068 ] Inigo Goiri commented on YARN-3534: --- It looks like there's a consensus on splitting on monitoring and reporting. If everybody is cool with it, I will rename this JIRA to Collect node resource utilization and just do the implementaiton of NodeResourceMonitor. The argument of how to pass this information to the RM seems still open. Should I open a new JIRA for that and move the discussion about using REST, heartbeat etc to that one? Or should we have that discussion in one of the related JIRAs (e.g., YARN-3332)? Report node resource utilization Key: YARN-3534 URL: https://issues.apache.org/jira/browse/YARN-3534 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Assignee: Inigo Goiri Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch Original Estimate: 336h Remaining Estimate: 336h YARN should be aware of the resource utilization of the nodes when scheduling containers. For this, this task will implement the NodeResourceMonitor and send this information to the Resource Manager in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3544) AM logs link missing in the RM UI for a completed app
[ https://issues.apache.org/jira/browse/YARN-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522150#comment-14522150 ] Zhijie Shen commented on YARN-3544: --- Verified 2.7 branch patch. It works fine too. Will commit it. AM logs link missing in the RM UI for a completed app -- Key: YARN-3544 URL: https://issues.apache.org/jira/browse/YARN-3544 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.7.0 Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Blocker Attachments: Screen Shot 2015-04-27 at 6.24.05 PM.png, YARN-3544-branch-2.7-1.2.patch, YARN-3544-branch-2.7-1.patch, YARN-3544.1.patch AM log links should always be present ( for both running and completed apps). Likewise node info is also empty. This is usually quite crucial when trying to debug where an AM was launched and a pointer to which NM's logs to look at if the AM failed to launch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3544) AM logs link missing in the RM UI for a completed app
[ https://issues.apache.org/jira/browse/YARN-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522170#comment-14522170 ] Hudson commented on YARN-3544: -- FAILURE: Integrated in Hadoop-trunk-Commit #7707 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7707/]) YARN-3544. Got back AM logs link on the RM web UI for a completed app. Contributed by Xuan Gong. (zjshen: rev 7e8639fda40c13fe163128d7a725fcd0f2fce3c5) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java AM logs link missing in the RM UI for a completed app -- Key: YARN-3544 URL: https://issues.apache.org/jira/browse/YARN-3544 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.7.0 Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Blocker Fix For: 2.7.1 Attachments: Screen Shot 2015-04-27 at 6.24.05 PM.png, YARN-3544-branch-2.7-1.2.patch, YARN-3544-branch-2.7-1.patch, YARN-3544.1.patch AM log links should always be present ( for both running and completed apps). Likewise node info is also empty. This is usually quite crucial when trying to debug where an AM was launched and a pointer to which NM's logs to look at if the AM failed to launch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3566) YARN Scheduler Web UI not properly sorting through Application ID or Progress bar
Anthony Rojas created YARN-3566: --- Summary: YARN Scheduler Web UI not properly sorting through Application ID or Progress bar Key: YARN-3566 URL: https://issues.apache.org/jira/browse/YARN-3566 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.5.0 Reporter: Anthony Rojas Noticed that the progress bar web UI component of the RM WebUI Cluster scheduler is not sorting at all, whereas the as the RM web UI main view is sortable. The actual web URL that has the broken fields: http://resource_manager.company.com:8088/cluster/scheduler This URL however does have functional fields: http://resource_manager.company.com:8088/cluster/apps I'll attach a screenshot that shows which specific fields within the Web UI table that aren't sorting when clicked on. Clicking either the Progress Bar column or the Application ID column from /cluster/scheduler did not trigger any changes at all; Shouldn't it have sorted through ascending or descending of the jobs based on Application ID or through the actual progress from the Progress bar? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522066#comment-14522066 ] Thomas Graves commented on YARN-3243: - It might to merge completely clean but it wouldn't require it for functionality. It would be nice to have this in 2.7 either way though. CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.8.0 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch, YARN-3243.5.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource need to be unreserved to keep its parent's resource limit. - More over, with this, YARN-3026 will make a clear boundary between LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522066#comment-14522066 ] Thomas Graves edited comment on YARN-3243 at 4/30/15 7:02 PM: -- It might to merge completely clean but it wouldn't require it for functionality. It would be nice to have this in 2.7 either way though. I'll try it out later and see. was (Author: tgraves): It might to merge completely clean but it wouldn't require it for functionality. It would be nice to have this in 2.7 either way though. CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.8.0 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch, YARN-3243.5.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource need to be unreserved to keep its parent's resource limit. - More over, with this, YARN-3026 will make a clear boundary between LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522241#comment-14522241 ] Wangda Tan commented on YARN-3243: -- [~tgraves], I just merged this to branch-2.7. CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.8.0 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch, YARN-3243.5.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource need to be unreserved to keep its parent's resource limit. - More over, with this, YARN-3026 will make a clear boundary between LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API
[ https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-3539: - Attachment: YARN-3539-006.patch Patch -006 also uprates all of the data structures and {{TimelineClient}} from {{@Unstable}} to {{@Evolving}} Compatibility doc to state that ATS v1 is a stable REST API --- Key: YARN-3539 URL: https://issues.apache.org/jira/browse/YARN-3539 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.7.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, YARN-3539-003.patch, YARN-3539-004.patch, YARN-3539-005.patch, YARN-3539-006.patch The ATS v2 discussion and YARN-2423 have raised the question: how stable are the ATSv1 APIs? The existing compatibility document actually states that the History Server is [a stable REST API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs], which effectively means that ATSv1 has already been declared as a stable API. Clarify this by patching the compatibility document appropriately -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API
[ https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522083#comment-14522083 ] Hadoop QA commented on YARN-3539: - (!) The patch artifact directory on has been removed! This is a fatal error for test-patch.sh. Aborting. Jenkins (node H3) information at https://builds.apache.org/job/PreCommit-YARN-Build/7558/ may provide some hints. Compatibility doc to state that ATS v1 is a stable REST API --- Key: YARN-3539 URL: https://issues.apache.org/jira/browse/YARN-3539 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.7.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, YARN-3539-003.patch, YARN-3539-004.patch, YARN-3539-005.patch, YARN-3539-006.patch The ATS v2 discussion and YARN-2423 have raised the question: how stable are the ATSv1 APIs? The existing compatibility document actually states that the History Server is [a stable REST API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs], which effectively means that ATSv1 has already been declared as a stable API. Clarify this by patching the compatibility document appropriately -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522133#comment-14522133 ] Craig Welch commented on YARN-1680: --- Hi [~airbots], any luck on this? Do you mind if I take it on again? availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Assignee: Chen He Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3566) YARN Scheduler Web UI not properly sorting through Application ID or Progress bar
[ https://issues.apache.org/jira/browse/YARN-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Rojas updated YARN-3566: Attachment: Screen Shot 2015-04-30 at 1.23.56 PM.png YARN Scheduler Web UI not properly sorting through Application ID or Progress bar - Key: YARN-3566 URL: https://issues.apache.org/jira/browse/YARN-3566 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.5.0 Reporter: Anthony Rojas Attachments: Screen Shot 2015-04-30 at 1.23.56 PM.png Noticed that the progress bar web UI component of the RM WebUI Cluster scheduler is not sorting at all, whereas the as the RM web UI main view is sortable. The actual web URL that has the broken fields: http://resource_manager.company.com:8088/cluster/scheduler This URL however does have functional fields: http://resource_manager.company.com:8088/cluster/apps I'll attach a screenshot that shows which specific fields within the Web UI table that aren't sorting when clicked on. Clicking either the Progress Bar column or the Application ID column from /cluster/scheduler did not trigger any changes at all; Shouldn't it have sorted through ascending or descending of the jobs based on Application ID or through the actual progress from the Progress bar? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3534) Collect node resource utilization
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3534: -- Attachment: YARN-3534-4.patch Isolating the changes for the NodeResourceMonitorImpl. The aggregationg of data and sending to the RM will be done in another JIRA. Collect node resource utilization - Key: YARN-3534 URL: https://issues.apache.org/jira/browse/YARN-3534 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Assignee: Inigo Goiri Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch, YARN-3534-4.patch Original Estimate: 336h Remaining Estimate: 336h YARN should be aware of the resource utilization of the nodes when scheduling containers. For this, this task will implement the NodeResourceMonitor and send this information to the Resource Manager in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels
[ https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522180#comment-14522180 ] Wangda Tan commented on YARN-3521: -- [~sunilg], Thanks for updating, just tried it locally, some comments: 1) it seems the structure of REST response is not correct for NodeLabelsInfo: {code} nodeLabelsInfo nodeLabelsInfo namex/name exclusitytrue/exclusity /nodeLabelsInfo nodeLabelsInfo namey/name exclusitytrue/exclusity /nodeLabelsInfo /nodeLabelsInfo {code} It should be {{nodeLabelInfo}} instead of {{nodeLabelsInfo}}, could you solve this issue? 2) It's better to add a test for specifying exclusivity when adding node labels. (Verify exclusivity added to NodeLabelsManager). Support return structured NodeLabel objects in REST API when call getClusterNodeLabels -- Key: YARN-3521 URL: https://issues.apache.org/jira/browse/YARN-3521 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G Attachments: 0001-YARN-3521.patch, 0002-YARN-3521.patch In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should make the same change in REST API side to make them consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3546) AbstractYarnScheduler.getApplicationAttempt seems misleading, and there're some misuse of it
[ https://issues.apache.org/jira/browse/YARN-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522271#comment-14522271 ] Jian He commented on YARN-3546: --- bq. Let's consider below situation, Hi [~sandflee], it's a valid situation. But waitForSchedulerAppAttemptAdded is really just a test utility method, it's not used in any production code. The whole MockRM is used for testing only. AbstractYarnScheduler.getApplicationAttempt seems misleading, and there're some misuse of it - Key: YARN-3546 URL: https://issues.apache.org/jira/browse/YARN-3546 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: sandflee I'm not familiar with scheduler, with first eyes, I thought this func returns the schdulerAppAttempt info corresponding to appAttemptId, but actually it returns the current schdulerAppAttempt. It seems misled others too, such as TestWorkPreservingRMRestart.waitForNumContainersToRecover MockRM.waitForSchedulerAppAttemptAdded should I rename it to T getCurrentSchedulerApplicationAttempt(ApplicationId applicationid) or returns null if current attempt id not equals to the request attempt id ? comment preferred! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3445) Cache runningApps in RMNode for getting running apps on given NodeId
[ https://issues.apache.org/jira/browse/YARN-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522060#comment-14522060 ] Vinod Kumar Vavilapalli commented on YARN-3445: --- bq. Jian He, can you file a ticket please? Actually, we can do this too on YARN-3505 as that is related to LogAggregationReport. Please leave a comment there.. Cache runningApps in RMNode for getting running apps on given NodeId Key: YARN-3445 URL: https://issues.apache.org/jira/browse/YARN-3445 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Junping Du Assignee: Junping Du Attachments: YARN-3445-v2.patch, YARN-3445.patch Per discussion in YARN-3334, we need filter out unnecessary collectors info from RM in heartbeat response. Our propose is to add cache for runningApps in RMNode, so RM only send collectors for local running apps back. This is also needed in YARN-914 (graceful decommission) that if no running apps in NM which is in decommissioning stage, it will get decommissioned immediately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels
[ https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522141#comment-14522141 ] Hadoop QA commented on YARN-3521: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 38s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 19 line(s) that end in whitespace. | | {color:green}+1{color} | javac | 7m 34s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 40s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 5m 29s | The applied patch generated 3 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 14s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 63m 37s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 104m 45s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729530/0002-YARN-3521.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / e2e8f77 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7557/artifact/patchprocess/whitespace.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7557/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7557/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7557/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7557/console | This message was automatically generated. Support return structured NodeLabel objects in REST API when call getClusterNodeLabels -- Key: YARN-3521 URL: https://issues.apache.org/jira/browse/YARN-3521 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G Attachments: 0001-YARN-3521.patch, 0002-YARN-3521.patch In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should make the same change in REST API side to make them consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522144#comment-14522144 ] Chen He commented on YARN-1680: --- Sure, I just assigned back to you. I may not have free cycle recently. availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Assignee: Craig Welch Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-1680: -- Assignee: Craig Welch (was: Chen He) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Assignee: Craig Welch Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2369) Environment variable handling assumes values should be appended
[ https://issues.apache.org/jira/browse/YARN-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote updated YARN-2369: -- Attachment: YARN-2369-2.patch New patch has a configurable whitelist for variables with append enabled as yarn.application.variables.with.append. The existing unit tests are passing from TestMRApps. Next up, a test to try to append to a variable not on the default white list and verify it gets replaced instead of appended to. [~jlowe], any thoughts on things that are missing from this latest patch or problems with the design? Environment variable handling assumes values should be appended --- Key: YARN-2369 URL: https://issues.apache.org/jira/browse/YARN-2369 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Jason Lowe Assignee: Dustin Cote Attachments: YARN-2369-1.patch, YARN-2369-2.patch When processing environment variables for a container context the code assumes that the value should be appended to any pre-existing value in the environment. This may be desired behavior for handling path-like environment variables such as PATH, LD_LIBRARY_PATH, CLASSPATH, etc. but it is a non-intuitive and harmful way to handle any variable that does not have path-like semantics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3551) Consolidate data model change according to the backend implementation
[ https://issues.apache.org/jira/browse/YARN-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522744#comment-14522744 ] Zhijie Shen commented on YARN-3551: --- Created a new patch to change timeline metric APIs according to the aforementioned comments. In addition, change the collection to TreeMap, and sort the data points according to timestamp in descending order. Consolidate data model change according to the backend implementation - Key: YARN-3551 URL: https://issues.apache.org/jira/browse/YARN-3551 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3551-YARN-2928.4.patch, YARN-3551.1.patch, YARN-3551.2.patch, YARN-3551.3.patch Based on the comments on [YARN-3134|https://issues.apache.org/jira/browse/YARN-3134?focusedCommentId=14512080page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14512080] and [YARN-3411|https://issues.apache.org/jira/browse/YARN-3411?focusedCommentId=14512098page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14512098], we need to change the data model to restrict the data type of info/config/metric section. 1. Info: the value could be all kinds object that is able to be serialized/deserialized by jackson. 2. Config: the value will always be assumed as String. 3. Metric: single data or time series value have to be number for aggregation. Other than that, info/start time/finish time of metric seem not to be necessary for storage. They should be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3481) Report NM aggregated container resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522766#comment-14522766 ] Vinod Kumar Vavilapalli commented on YARN-3481: --- bq. Vinod Kumar Vavilapalli, it looks like YARN-2965 is very similar to this. Actually, this also looks like a clone to YARN-1012. Anyway, from what I understand, those JIRAs want to send utilization metrics in the heartbeat and that's pretty much what I'm targeting here. My current prototype extends ContainersMonitorImpl and puts this information into the NodeHealthStatus. I think I could do that in any of those JIRAs. Okay, I am going to assign YARN-1012 to you and close this as dup. Will also make YARN-3534 a sub-task of YARN-1011. Report NM aggregated container resource utilization in heartbeat Key: YARN-3481 URL: https://issues.apache.org/jira/browse/YARN-3481 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Original Estimate: 336h Remaining Estimate: 336h To allow the RM take better scheduling decisions, it should be aware of the actual utilization of the containers. The NM would aggregate the ContainerMetrics and report it in every heartbeat. Related to YARN-1012 but aggregated to reduce the heartbeat overhead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3481) Report NM aggregated container resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-3481. --- Resolution: Duplicate Report NM aggregated container resource utilization in heartbeat Key: YARN-3481 URL: https://issues.apache.org/jira/browse/YARN-3481 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Original Estimate: 336h Remaining Estimate: 336h To allow the RM take better scheduling decisions, it should be aware of the actual utilization of the containers. The NM would aggregate the ContainerMetrics and report it in every heartbeat. Related to YARN-1012 but aggregated to reduce the heartbeat overhead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node
[ https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522774#comment-14522774 ] Vinod Kumar Vavilapalli commented on YARN-3332: --- Unfortunately, other pieces starting moving in sooner than I could start on this: YARN-3534 (in progress), YARN-3334 (part of Timeline service next-gen YARN-2928). So I am planning to do a refactor once those two go into trunk. Tx for offering involvement, once they go in, I can file sub-tasks for moving forward. [Umbrella] Unified Resource Statistics Collection per node -- Key: YARN-3332 URL: https://issues.apache.org/jira/browse/YARN-3332 Project: Hadoop YARN Issue Type: Improvement Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: Design - UnifiedResourceStatisticsCollection.pdf Today in YARN, NodeManager collects statistics like per container resource usage and overall physical resources available on the machine. Currently this is used internally in YARN by the NodeManager for only a limited usage: automatically determining the capacity of resources on node and enforcing memory usage to what is reserved per container. This proposal is to extend the existing architecture and collect statistics for usage beyond the existing usecases. Proposal attached in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522790#comment-14522790 ] Vinod Kumar Vavilapalli commented on YARN-3044: --- Apologies for dropping off the send info from RM vs from NM discussion mid way through. We all agree that sending information from NMs is *more scalable*. The concern isn't really about information ownership. RM and NM both form the platform, so we can rely on NMs to publish information. But it's really about potential *loss of information* in many not-so rare cases like when container gets allocated but gets preempted or released by AM before it really starts. As long as containers successfully start on NMs (which will be the vast majority assuming the cluster isn't bad), we can rely on NMs to post all sorts of information - allocation time, wait time, execution time, information like priority, host, port , resource-usage-over-time etc. We can just tunnel some of the RM-originated information through AMs to the NM. The missing dots occur when a container's life-cycle ends either on the RM or the AM. We can take a dual pronged approach here? That or we make the RM-publisher itself a distributed push. [Event producers] Implement RM writing app lifecycle events to ATS -- Key: YARN-3044 URL: https://issues.apache.org/jira/browse/YARN-3044 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3044-YARN-2928.004.patch, YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, YARN-3044.20150416-1.patch Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2619) NodeManager: Add cgroups support for disk I/O isolation
[ https://issues.apache.org/jira/browse/YARN-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522754#comment-14522754 ] Vinod Kumar Vavilapalli commented on YARN-2619: --- Looks good to me too, +1. Checking this in.. NodeManager: Add cgroups support for disk I/O isolation --- Key: YARN-2619 URL: https://issues.apache.org/jira/browse/YARN-2619 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-2619-1.patch, YARN-2619.002.patch, YARN-2619.003.patch, YARN-2619.004.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522779#comment-14522779 ] Vinod Kumar Vavilapalli commented on YARN-2942: --- That will definitely simplify things a lot more IMO, we will no longer need a ZK dependency on core of YARN (outside if HA). Aggregated Log Files should be combined --- Key: YARN-2942 URL: https://issues.apache.org/jira/browse/YARN-2942 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.6.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: CombinedAggregatedLogsProposal_v3.pdf, CombinedAggregatedLogsProposal_v6.pdf, CompactedAggregatedLogsProposal_v1.pdf, CompactedAggregatedLogsProposal_v2.pdf, ConcatableAggregatedLogsProposal_v4.pdf, ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, YARN-2942.003.patch Turning on log aggregation allows users to easily store container logs in HDFS and subsequently view them in the YARN web UIs from a central place. Currently, there is a separate log file for each Node Manager. This can be a problem for HDFS if you have a cluster with many nodes as you’ll slowly start accumulating many (possibly small) files per YARN application. The current “solution” for this problem is to configure YARN (actually the JHS) to automatically delete these files after some amount of time. We should improve this by compacting the per-node aggregated log files into one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2123) Progress bars in Web UI always at 100% due to non-US locale
[ https://issues.apache.org/jira/browse/YARN-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522781#comment-14522781 ] Tsuyoshi Ozawa commented on YARN-2123: -- [~ajisakaa] maybe did you forget attaching the v4 patch? Progress bars in Web UI always at 100% due to non-US locale --- Key: YARN-2123 URL: https://issues.apache.org/jira/browse/YARN-2123 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.3.0 Reporter: Johannes Simon Assignee: Akira AJISAKA Attachments: NaN_after_launching_RM.png, YARN-2123-001.patch, YARN-2123-002.patch, YARN-2123-003.patch, fair-scheduler-ajisaka.xml, screenshot-noPatch.png, screenshot-patch.png, screenshot.png, yarn-site-ajisaka.xml In our cluster setup, the YARN web UI always shows progress bars at 100% (see screenshot, progress of the reduce step is roughly at 32.82%). I opened the HTML source code to check (also see screenshot), and it seems the problem is that it uses a comma as decimal mark, where most browsers expect a dot for floating-point numbers. This could possibly be due to localized number formatting being used in the wrong place, which would also explain why this bug is not always visible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3551) Consolidate data model change according to the backend implementation
[ https://issues.apache.org/jira/browse/YARN-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3551: -- Attachment: YARN-3551-YARN-2928.4.patch Consolidate data model change according to the backend implementation - Key: YARN-3551 URL: https://issues.apache.org/jira/browse/YARN-3551 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3551-YARN-2928.4.patch, YARN-3551.1.patch, YARN-3551.2.patch, YARN-3551.3.patch Based on the comments on [YARN-3134|https://issues.apache.org/jira/browse/YARN-3134?focusedCommentId=14512080page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14512080] and [YARN-3411|https://issues.apache.org/jira/browse/YARN-3411?focusedCommentId=14512098page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14512098], we need to change the data model to restrict the data type of info/config/metric section. 1. Info: the value could be all kinds object that is able to be serialized/deserialized by jackson. 2. Config: the value will always be assumed as String. 3. Metric: single data or time series value have to be number for aggregation. Other than that, info/start time/finish time of metric seem not to be necessary for storage. They should be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3534) Collect node resource utilization
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522784#comment-14522784 ] Hadoop QA commented on YARN-3534: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | reexec | 0m 0s | dev-support patch detected. | | {color:blue}0{color} | pre-patch | 14m 45s | Pre-patch trunk compilation is healthy. | | {color:blue}0{color} | @author | 0m 0s | Skipping @author checks as test-patch has been patched. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 6m 29s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 7m 25s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 22m 41s | The applied patch generated 364 release audit warnings. | | {color:red}-1{color} | checkstyle | 4m 26s | The applied patch generated 2 additional checkstyle issues. | | {color:blue}0{color} | shellcheck | 4m 26s | Shellcheck was not available. | | {color:green}+1{color} | install | 1m 12s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 30s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 4s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | common tests | 23m 7s | Tests passed in hadoop-common. | | {color:green}+1{color} | yarn tests | 0m 29s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 5m 47s | Tests failed in hadoop-yarn-server-nodemanager. | | | | 90m 58s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729676/YARN-3534-5.patch | | Optional Tests | shellcheck javadoc javac unit findbugs checkstyle | | git revision | trunk / 98a6176 | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/7564/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7564/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7564/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7564/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7564/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7564/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7564/console | This message was automatically generated. Collect node resource utilization - Key: YARN-3534 URL: https://issues.apache.org/jira/browse/YARN-3534 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Assignee: Inigo Goiri Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch, YARN-3534-4.patch, YARN-3534-5.patch Original Estimate: 336h Remaining Estimate: 336h YARN should be aware of the resource utilization of the nodes when scheduling containers. For this, this task will implement the NodeResourceMonitor and send this information to the Resource Manager in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3551) Consolidate data model change according to the backend implementation
[ https://issues.apache.org/jira/browse/YARN-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522785#comment-14522785 ] Sangjin Lee commented on YARN-3551: --- The latest patch looks good. Thanks for addressing the feedback and updating the patch [~zjshen]! There is one nit. I would prefer method and field names that don't include timeSeries in them, as the metric can be used for either a single value or time series. How about the following? timeSeries = values, getTimeSeriesJAXB() = getValuesJAXB(), getTimeSeries() = getValues(), setTimeSeries() = setValues(), addTimeSeries() = addValues(), addTimeSeriesData() = addValue() Consolidate data model change according to the backend implementation - Key: YARN-3551 URL: https://issues.apache.org/jira/browse/YARN-3551 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3551-YARN-2928.4.patch, YARN-3551.1.patch, YARN-3551.2.patch, YARN-3551.3.patch Based on the comments on [YARN-3134|https://issues.apache.org/jira/browse/YARN-3134?focusedCommentId=14512080page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14512080] and [YARN-3411|https://issues.apache.org/jira/browse/YARN-3411?focusedCommentId=14512098page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14512098], we need to change the data model to restrict the data type of info/config/metric section. 1. Info: the value could be all kinds object that is able to be serialized/deserialized by jackson. 2. Config: the value will always be assumed as String. 3. Metric: single data or time series value have to be number for aggregation. Other than that, info/start time/finish time of metric seem not to be necessary for storage. They should be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3534) Collect node resource utilization
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3534: -- Issue Type: Sub-task (was: New Feature) Parent: YARN-1011 Collect node resource utilization - Key: YARN-3534 URL: https://issues.apache.org/jira/browse/YARN-3534 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Assignee: Inigo Goiri Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch, YARN-3534-4.patch, YARN-3534-5.patch Original Estimate: 336h Remaining Estimate: 336h YARN should be aware of the resource utilization of the nodes when scheduling containers. For this, this task will implement the NodeResourceMonitor and send this information to the Resource Manager in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3534) Collect node resource utilization
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522768#comment-14522768 ] Vinod Kumar Vavilapalli commented on YARN-3534: --- Converting this to be a sub-task of YARN-1011. See my last comment at YARN-3481. Collect node resource utilization - Key: YARN-3534 URL: https://issues.apache.org/jira/browse/YARN-3534 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Assignee: Inigo Goiri Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch, YARN-3534-4.patch, YARN-3534-5.patch Original Estimate: 336h Remaining Estimate: 336h YARN should be aware of the resource utilization of the nodes when scheduling containers. For this, this task will implement the NodeResourceMonitor and send this information to the Resource Manager in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1012) NM should report resource utilization of running containers to RM in heartbeat
[ https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1012: -- Assignee: Inigo Goiri (was: Vinod Kumar Vavilapalli) YARN-3481 is closed as dup of this JIRA. Assigning this one to [~elgoiri] who was moving forward on that JIRA. NM should report resource utilization of running containers to RM in heartbeat -- Key: YARN-1012 URL: https://issues.apache.org/jira/browse/YARN-1012 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Inigo Goiri -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2369) Environment variable handling assumes values should be appended
[ https://issues.apache.org/jira/browse/YARN-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522325#comment-14522325 ] Hadoop QA commented on YARN-2369: - (!) The patch artifact directory on has been removed! This is a fatal error for test-patch.sh. Aborting. Jenkins (node H4) information at https://builds.apache.org/job/PreCommit-YARN-Build/7559/ may provide some hints. Environment variable handling assumes values should be appended --- Key: YARN-2369 URL: https://issues.apache.org/jira/browse/YARN-2369 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Jason Lowe Assignee: Dustin Cote Attachments: YARN-2369-1.patch, YARN-2369-2.patch When processing environment variables for a container context the code assumes that the value should be appended to any pre-existing value in the environment. This may be desired behavior for handling path-like environment variables such as PATH, LD_LIBRARY_PATH, CLASSPATH, etc. but it is a non-intuitive and harmful way to handle any variable that does not have path-like semantics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API
[ https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522418#comment-14522418 ] Zhijie Shen commented on YARN-3539: --- We perhaps need to mark generic history APIs related classes/methods stable too, or we want to exclude them in this jira. Those classes are ApplicationBaseProtocol, YarnClient, ApplicationReport, ApplicationAttemptReport and ContainerReport. Compatibility doc to state that ATS v1 is a stable REST API --- Key: YARN-3539 URL: https://issues.apache.org/jira/browse/YARN-3539 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.7.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, YARN-3539-003.patch, YARN-3539-004.patch, YARN-3539-005.patch, YARN-3539-006.patch, timeline_get_api_examples.txt The ATS v2 discussion and YARN-2423 have raised the question: how stable are the ATSv1 APIs? The existing compatibility document actually states that the History Server is [a stable REST API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs], which effectively means that ATSv1 has already been declared as a stable API. Clarify this by patching the compatibility document appropriately -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3546) AbstractYarnScheduler.getApplicationAttempt seems misleading, and there're some misuse of it
[ https://issues.apache.org/jira/browse/YARN-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522456#comment-14522456 ] sandflee commented on YARN-3546: ok, close it now, thanks [~jianhe] AbstractYarnScheduler.getApplicationAttempt seems misleading, and there're some misuse of it - Key: YARN-3546 URL: https://issues.apache.org/jira/browse/YARN-3546 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: sandflee I'm not familiar with scheduler, with first eyes, I thought this func returns the schdulerAppAttempt info corresponding to appAttemptId, but actually it returns the current schdulerAppAttempt. It seems misled others too, such as TestWorkPreservingRMRestart.waitForNumContainersToRecover MockRM.waitForSchedulerAppAttemptAdded should I rename it to T getCurrentSchedulerApplicationAttempt(ApplicationId applicationid) or returns null if current attempt id not equals to the request attempt id ? comment preferred! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522569#comment-14522569 ] Sangjin Lee commented on YARN-3411: --- +1 on 1.0.1. [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522556#comment-14522556 ] Li Lu commented on YARN-3411: - Awesome, thanks! [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1462: Attachment: YARN-1462-branch-2.7-1.patch AHS API and other AHS changes to handle tags for completed MR jobs -- Key: YARN-1462 URL: https://issues.apache.org/jira/browse/YARN-1462 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Xuan Gong Attachments: YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1462: Attachment: YARN-1462.2.patch AHS API and other AHS changes to handle tags for completed MR jobs -- Key: YARN-1462 URL: https://issues.apache.org/jira/browse/YARN-1462 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Xuan Gong Attachments: YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522678#comment-14522678 ] Hadoop QA commented on YARN-3134: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 48s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 37s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 34s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 22s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 26m 8s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729627/YARN-3134-YARN-2928.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / b689f5d | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7562/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7562/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7562/console | This message was automatically generated. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3534) Collect node resource utilization
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522724#comment-14522724 ] Hadoop QA commented on YARN-3534: - (!) A patch to test-patch or smart-apply-patch has been detected. Re-executing against the patched versions to perform further tests. The console is at https://builds.apache.org/job/PreCommit-YARN-Build/7564/console in case of problems. Collect node resource utilization - Key: YARN-3534 URL: https://issues.apache.org/jira/browse/YARN-3534 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Assignee: Inigo Goiri Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch, YARN-3534-4.patch, YARN-3534-5.patch Original Estimate: 336h Remaining Estimate: 336h YARN should be aware of the resource utilization of the nodes when scheduling containers. For this, this task will implement the NodeResourceMonitor and send this information to the Resource Manager in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522737#comment-14522737 ] zhihai xu commented on YARN-2893: - TestContainerAllocation failure is not related to my change, it is just fixed at YARN-3564. Also this checkstyle issue may be caused by the import statement: {code} import org.apache.hadoop.classification.InterfaceAudience.Private; {code} but this import statement doesn't look like an issue for me. I found similar checkstyle issue at MAPREDUCE-6339, which was caused by the import statement. Hi [~jira.shegalov], Do you want me to do the same experiment as MAPREDUCE-6339 to prove the import statement cause this checkstyle issue? AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: zhihai xu Attachments: YARN-2893.000.patch, YARN-2893.001.patch, YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch, YARN-2893.005.patch MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522639#comment-14522639 ] Jun Gong commented on YARN-3366: [~sidharta-s] Thank you for the explanation. Could we set YARNRootClass's ceil rate to yarnBandwidthMbit when 'strictMode'(which is configured through YarnConfiguration.NM_LINUX_CONTAINER_CGROUPS_STRICT_RESOURCE_USAGE) is set to true, otherwise set it to rootBandwidthMbit? If cluster admins require to set a maximum bandwidth for YARN, he/she could set strictMode to true. Outbound network bandwidth : classify/shape traffic originating from YARN containers Key: YARN-3366 URL: https://issues.apache.org/jira/browse/YARN-3366 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sidharta Seethana Assignee: Sidharta Seethana Fix For: 2.8.0 Attachments: YARN-3366.001.patch, YARN-3366.002.patch, YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, YARN-3366.006.patch, YARN-3366.007.patch In order to be able to isolate based on/enforce outbound traffic bandwidth limits, we need a mechanism to classify/shape network traffic in the nodemanager. For more information on the design, please see the attached design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522672#comment-14522672 ] Xuan Gong commented on YARN-1462: - Address all the latest comments. And created a patch for branch-2.7 AHS API and other AHS changes to handle tags for completed MR jobs -- Key: YARN-1462 URL: https://issues.apache.org/jira/browse/YARN-1462 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Xuan Gong Attachments: YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522730#comment-14522730 ] Hadoop QA commented on YARN-1462: - (!) The patch artifact directory has been removed! This is a fatal error for test-patch.sh. Aborting. Jenkins (node H3) information at https://builds.apache.org/job/PreCommit-YARN-Build/7566/ may provide some hints. AHS API and other AHS changes to handle tags for completed MR jobs -- Key: YARN-1462 URL: https://issues.apache.org/jira/browse/YARN-1462 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Xuan Gong Attachments: YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3411: - Attachment: YARN-3411.poc.4.txt Hi [~gtCarrera9] Yes, sure we can use hbase 1.0.1. Attaching an updated patch that uses hbase 1.0.1. It works fine for the unit test. We will also have the hbase cluster set up with version 1.0.1. thanks Vrushali [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522642#comment-14522642 ] Robert Kanter commented on YARN-2942: - Thanks for pointing me to YARN-1376 and related. I'll have to look into the code to get a better idea, but perhaps we can take advantage of this to do a completely different approach for combining the logs. Now that we have a way of checking the status of log aggregation across all nodes in the cluster, instead of having to use ZK locks to coordinate all the NMs to append the logs, we can have a single server append the logs (maybe a small thread pool in the RM that handles this?). We'd still use append, and the new format, but we wouldn't need to use ZooKeeper, and using a single Server to do the combining should simplify things. We'd probably need to add a new {{LogAggregationStatus}} enums for COMBINING and COMBINED or something. I'll look into this some more, though what do you think [~vinodkv], [~jlowe], [~knoguchi]? Aggregated Log Files should be combined --- Key: YARN-2942 URL: https://issues.apache.org/jira/browse/YARN-2942 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.6.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: CombinedAggregatedLogsProposal_v3.pdf, CombinedAggregatedLogsProposal_v6.pdf, CompactedAggregatedLogsProposal_v1.pdf, CompactedAggregatedLogsProposal_v2.pdf, ConcatableAggregatedLogsProposal_v4.pdf, ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, YARN-2942.003.patch Turning on log aggregation allows users to easily store container logs in HDFS and subsequently view them in the YARN web UIs from a central place. Currently, there is a separate log file for each Node Manager. This can be a problem for HDFS if you have a cluster with many nodes as you’ll slowly start accumulating many (possibly small) files per YARN application. The current “solution” for this problem is to configure YARN (actually the JHS) to automatically delete these files after some amount of time. We should improve this by compacting the per-node aggregated log files into one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)