[jira] [Updated] (YARN-2283) RM failed to release the AM container
[ https://issues.apache.org/jira/browse/YARN-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2283: Affects Version/s: (was: 2.5.0) 2.4.0 RM failed to release the AM container - Key: YARN-2283 URL: https://issues.apache.org/jira/browse/YARN-2283 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Environment: NM1: AM running NM2: Map task running mapreduce.map.maxattempts=1 Reporter: Nishan Shetty Priority: Critical During container stability test i faced this problem While job is running map task got killed Observe that eventhough application is FAILED MRAppMaster process is running till timeout because RM did not release the AM container {code} 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1405318134611_0002_01_05 Container Transitioned from RUNNING to COMPLETED 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: Completed container: container_1405318134611_0002_01_05 in state: COMPLETED event:FINISHED 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1405318134611_0002 CONTAINERID=container_1405318134611_0002_01_05 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore: Finish information of container container_1405318134611_0002_01_05 is written 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: Stored the finish data of container container_1405318134611_0002_01_05 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode: Released container container_1405318134611_0002_01_05 of capacity memory:1024, vCores:1 on host HOST-10-18-40-153:45026, which currently has 1 containers, memory:2048, vCores:1 used and memory:6144, vCores:7 available, release resources=true 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default used=memory:2048, vCores:1 numContainers=1 user=testos user-resources=memory:2048, vCores:1 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: completedContainer container=Container: [ContainerId: container_1405318134611_0002_01_05, NodeId: HOST-10-18-40-153:45026, NodeHttpAddress: HOST-10-18-40-153:45025, Resource: memory:1024, vCores:1, Priority: 5, Token: Token { kind: ContainerToken, service: 10.18.40.153:45026 }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=memory:2048, vCores:1, usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1 cluster=memory:8192, vCores:8 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 used=memory:2048, vCores:1 cluster=memory:8192, vCores:8 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Re-sorting completed queue: root.default stats: default: capacity=1.0, absoluteCapacity=1.0, usedResources=memory:2048, vCores:1, usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application attempt appattempt_1405318134611_0002_01 released container container_1405318134611_0002_01_05 on node: host: HOST-10-18-40-153:45026 #containers=1 available=6144 used=2048 with event: FINISHED 2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Updating application attempt appattempt_1405318134611_0002_01 with final state: FINISHING 2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1405318134611_0002_01 State change from RUNNING to FINAL_SAVING 2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating application application_1405318134611_0002 with final state: FINISHING 2014-07-14 14:43:34,947 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher event type: NodeDataChanged with state:SyncConnected for path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1405318134611_0002/appattempt_1405318134611_0002_01
[jira] [Commented] (YARN-2427) Add support for moving apps between queues in RM web services
[ https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108787#comment-14108787 ] Tsuyoshi OZAWA commented on YARN-2427: -- Hi [~vvasudev], how about calling rm.stop in testGetAppQueue after testing? Add support for moving apps between queues in RM web services - Key: YARN-2427 URL: https://issues.apache.org/jira/browse/YARN-2427 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2427.0.patch Support for moving apps from one queue to another is now present in CapacityScheduler and FairScheduler. We should expose the functionality via RM web services as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2427) Add support for moving apps between queues in RM web services
[ https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2427: Attachment: apache-yarn-2427.1.patch [~ozawa] thanks for the suggestion! I thought the tearDown method would handle it. I've uploaded a new patch with your suggestion. Hopefully, it'll fix the issue. Add support for moving apps between queues in RM web services - Key: YARN-2427 URL: https://issues.apache.org/jira/browse/YARN-2427 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2427.0.patch, apache-yarn-2427.1.patch Support for moving apps from one queue to another is now present in CapacityScheduler and FairScheduler. We should expose the functionality via RM web services as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2448) RM should expose the name of the ResourceCalculator being used when AMs register
[ https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2448: Attachment: apache-yarn-2448.1.patch I didn't do a clean which led to me missing an override in hadoop-tools. Uploaded new patch with fix. Thanks for [~leftnoteasy] for the help! RM should expose the name of the ResourceCalculator being used when AMs register Key: YARN-2448 URL: https://issues.apache.org/jira/browse/YARN-2448 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch The RM should expose the name of the ResourceCalculator being used when AMs register, as part of the RegisterApplicationMasterResponse. This will allow applications to make better decisions when scheduling. MapReduce for example, only looks at memory when deciding it's scheduling, even though the RM could potentially be using the DominantResourceCalculator. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-2435) Capacity scheduler should only allow Kill Application Requests from ADMINISTER_QUEUE users
[ https://issues.apache.org/jira/browse/YARN-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amir Mal resolved YARN-2435. Resolution: Invalid I was missing the following setting in my yarn-site.xml: yarn.acl.enable = true yarn.admin.acl = the default is '*' which allow everyone to be admin Capacity scheduler should only allow Kill Application Requests from ADMINISTER_QUEUE users -- Key: YARN-2435 URL: https://issues.apache.org/jira/browse/YARN-2435 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.5.0, 2.4.1 Environment: Red Hat Enterprise Linux Server release 6.4 (Santiago); Linux 2.6.32-358.el6.x86_64 GNU/Linux; $JAVA_HOME/bin/java -version java version 1.7.0_55 OpenJDK Runtime Environment (rhel-2.4.7.1.el6_5-x86_64 u55-b13) OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode) Reporter: Amir Mal A user without ADMINISTER_QUEUE privilege can kill application from all queues. to replicate the bug: 1) install cluster with {{yarn.resourcemanager.scheduler.class}} set to org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.*CapacityScheduler* 2) created 2 users (user1, user2) each belong to a separate group (group1, group2) 3) set {{acl_submit_applications}} and {{acl_administer_queue}} of the {{root}} and {{root.default}} queues to group1 4) submit job to {{default}} queue by user1 {quote} [user1@htc2n3 ~]$ mapred queue -showacls ... Queue acls for user : user1 Queue Operations = root ADMINISTER_QUEUE,SUBMIT_APPLICATIONS default ADMINISTER_QUEUE,SUBMIT_APPLICATIONS [user1@htc2n3 ~]$ yarn jar /opt/apache/hadoop-2.5.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar pi -Dmapreduce.job.queuename=default 4 10 {quote} 5) kill the application by user2 {quote} [user2@htc2n4 ~]$ mapred queue -showacls ... Queue acls for user : user2 Queue Operations = root default [user2@htc2n4 ~]$ yarn application -kill application_1408540602935_0004 ... Killing application application_1408540602935_0004 14/08/21 14:37:54 INFO impl.YarnClientImpl: Killed application application_1408540602935_0004 {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2448) RM should expose the name of the ResourceCalculator being used when AMs register
[ https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108822#comment-14108822 ] Wangda Tan commented on YARN-2448: -- [~vvasudev], Thanks for working on the patch, it is LGTM, +1 Wangda RM should expose the name of the ResourceCalculator being used when AMs register Key: YARN-2448 URL: https://issues.apache.org/jira/browse/YARN-2448 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch The RM should expose the name of the ResourceCalculator being used when AMs register, as part of the RegisterApplicationMasterResponse. This will allow applications to make better decisions when scheduling. MapReduce for example, only looks at memory when deciding it's scheduling, even though the RM could potentially be using the DominantResourceCalculator. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2427) Add support for moving apps between queues in RM web services
[ https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108844#comment-14108844 ] Hadoop QA commented on YARN-2427: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664104/apache-yarn-2427.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4716//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4716//console This message is automatically generated. Add support for moving apps between queues in RM web services - Key: YARN-2427 URL: https://issues.apache.org/jira/browse/YARN-2427 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2427.0.patch, apache-yarn-2427.1.patch Support for moving apps from one queue to another is now present in CapacityScheduler and FairScheduler. We should expose the functionality via RM web services as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2448) RM should expose the name of the ResourceCalculator being used when AMs register
[ https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108845#comment-14108845 ] Hadoop QA commented on YARN-2448: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664105/apache-yarn-2448.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.sls.TestSLSRunner org.apache.hadoop.yarn.sls.nodemanager.TestNMSimulator org.apache.hadoop.yarn.sls.appmaster.TestAMSimulator org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4717//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4717//console This message is automatically generated. RM should expose the name of the ResourceCalculator being used when AMs register Key: YARN-2448 URL: https://issues.apache.org/jira/browse/YARN-2448 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch The RM should expose the name of the ResourceCalculator being used when AMs register, as part of the RegisterApplicationMasterResponse. This will allow applications to make better decisions when scheduling. MapReduce for example, only looks at memory when deciding it's scheduling, even though the RM could potentially be using the DominantResourceCalculator. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2427) Add support for moving apps between queues in RM web services
[ https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2427: Attachment: apache-yarn-2427.2.patch That fixed some of the tests. I found a similar missing rm.stop() in TestFifoScheduler that was probably leading to the failing TestSchedulerUtils. I'm unsure why the other test is failing. Add support for moving apps between queues in RM web services - Key: YARN-2427 URL: https://issues.apache.org/jira/browse/YARN-2427 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2427.0.patch, apache-yarn-2427.1.patch, apache-yarn-2427.2.patch Support for moving apps from one queue to another is now present in CapacityScheduler and FairScheduler. We should expose the functionality via RM web services as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set
Karam Singh created YARN-2449: - Summary: Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Key: YARN-2449 URL: https://issues.apache.org/jira/browse/YARN-2449 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Environment: Deploy security enabled cluster is ATS also enabled and running, but no hadoop.http.filter.initializers set in core-site.xml Reporter: Karam Singh Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from ATS, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b '/grid/0/hadoopqe/y6/YarnWSAPISubmitAppKillApp/timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set
[ https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karam Singh updated YARN-2449: -- Priority: Critical (was: Major) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set --- Key: YARN-2449 URL: https://issues.apache.org/jira/browse/YARN-2449 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Environment: Deploy security enabled cluster is ATS also enabled and running, but no hadoop.http.filter.initializers set in core-site.xml Reporter: Karam Singh Priority: Critical Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from ATS, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b '/grid/0/hadoopqe/y6/YarnWSAPISubmitAppKillApp/timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set
[ https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karam Singh updated YARN-2449: -- Description: Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from timelineserver, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b '/grid/0/hadoopqe/y6/YarnWSAPISubmitAppKillApp/timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail was: Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from ATS, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b '/grid/0/hadoopqe/y6/YarnWSAPISubmitAppKillApp/timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set --- Key: YARN-2449 URL: https://issues.apache.org/jira/browse/YARN-2449 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Environment: Deploy security enabled cluster is ATS also enabled and running, but no hadoop.http.filter.initializers set in core-site.xml Reporter: Karam Singh Priority: Critical Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from timelineserver, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b '/grid/0/hadoopqe/y6/YarnWSAPISubmitAppKillApp/timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set
[ https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108873#comment-14108873 ] Karam Singh commented on YARN-2449: --- Similarly If you run hadoop applications e.g. without settings hadoop.http.filter.initializers with timelineserver enabled e.g : {code} hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.5.0.2.2.0.0-532.jar pi 10 10 {code} Application submission fails with following type of excpetion: {code} org.codehaus.jackson.map.exc.UnrecognizedPropertyException: Unrecognized field About (Class org.apache.hadoop.yarn.api.records.timeline.TimelineDelegationTokenResponse), not marked as ignorable at [Source: N/A; line: -1, column: -1] (through reference chain: org.apache.hadoop.yarn.api.records.timeline.TimelineDelegationTokenResponse[About]) {code} Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set --- Key: YARN-2449 URL: https://issues.apache.org/jira/browse/YARN-2449 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Environment: Deploy security enabled cluster is ATS also enabled and running, but no hadoop.http.filter.initializers set in core-site.xml Reporter: Karam Singh Priority: Critical Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from ATS, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b '/grid/0/hadoopqe/y6/YarnWSAPISubmitAppKillApp/timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set
[ https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karam Singh updated YARN-2449: -- Description: Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from timelineserver, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b 'timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail was: Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from timelineserver, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b '/grid/0/hadoopqe/y6/YarnWSAPISubmitAppKillApp/timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set --- Key: YARN-2449 URL: https://issues.apache.org/jira/browse/YARN-2449 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Environment: Deploy security enabled cluster is ATS also enabled and running, but no hadoop.http.filter.initializers set in core-site.xml Reporter: Karam Singh Assignee: Varun Vasudev Priority: Critical Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from timelineserver, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b 'timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set
[ https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev reassigned YARN-2449: --- Assignee: Varun Vasudev Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set --- Key: YARN-2449 URL: https://issues.apache.org/jira/browse/YARN-2449 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Environment: Deploy security enabled cluster is ATS also enabled and running, but no hadoop.http.filter.initializers set in core-site.xml Reporter: Karam Singh Assignee: Varun Vasudev Priority: Critical Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from timelineserver, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b '/grid/0/hadoopqe/y6/YarnWSAPISubmitAppKillApp/timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2427) Add support for moving apps between queues in RM web services
[ https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108901#comment-14108901 ] Hadoop QA commented on YARN-2427: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664114/apache-yarn-2427.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4718//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4718//console This message is automatically generated. Add support for moving apps between queues in RM web services - Key: YARN-2427 URL: https://issues.apache.org/jira/browse/YARN-2427 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2427.0.patch, apache-yarn-2427.1.patch, apache-yarn-2427.2.patch Support for moving apps from one queue to another is now present in CapacityScheduler and FairScheduler. We should expose the functionality via RM web services as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic
[ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108932#comment-14108932 ] Wangda Tan commented on YARN-1707: -- Hi [~curino], Thanks for updating, I just took a look, some minor comments, 1) CapacityScheduler#removeQueue {code} if (disposableLeafQueue.getCapacity() 0) { throw new SchedulerConfigEditException(The queue + queueName + has non-zero capacity: + disposableLeafQueue.getCapacity()); } {code} removeQueue check disposableLeafQueue's capacity 0, but addQueue doesn't check. In addition, After previous check, ParentQueue#removeChildQueue/addChildQueue doesn't need check its capacity again. And they should throw same type of exception (both SchedulerConfigEditException or both IllegalArgumentException) 2) CS#addQueue {code} throw new SchedulerConfigEditException(Queue + queue.getQueueName() + is not a dynamic Queue); {code} Should dynamic Queue should be reservation queue comparing to similar exception throw in removeQueue? 3) CS#setEntitlement {code} if (sesConf.getCapacity() queue.getCapacity()) { newQueue.addCapacity((sesConf.getCapacity() - queue.getCapacity())); } else { newQueue .subtractCapacity((queue.getCapacity() - sesConf.getCapacity())); } {code} Maybe it's better to merge the add/substractCapacity to changeCapacity(delta) Or just create a setCapacity in ReservationQueue? 4) CS#getReservableQueues Is it better to rename it to getPlanQueues? 5) ReservationQueue#getQueueName {code} @Override public String getQueueName() { return this.getParent().getQueueName(); } {code} I'm not sure why doing this, could you please elaborate? This makes this.queueName and this.getQueueName has different semantic. 6) ReservationQueue#substractCapacity {code} this.setCapacity(this.getCapacity() - capacity); {code} With EPSILON, it is possible this.capacity 0 set substract, its better to cap this.capacity in range of [0,1]. Also addCapacity 7) DynamicQueueConf I think unfold it to two float as parameter for setEntitlement maybe more straigtforward, is it possible more fields will be add to DynamicQueueConf? 8) ParentQueue#setChildQueues Since only PlanQueue need sum of capacity = 1, I would suggest make this method protected, and PlanQueue can overwrite this method. Or add a check in ParentQueue#setChildQueues. Wangda Making the CapacityScheduler more dynamic - Key: YARN-1707 URL: https://issues.apache.org/jira/browse/YARN-1707 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Labels: capacity-scheduler Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.patch The CapacityScheduler is a rather static at the moment, and refreshqueue provides a rather heavy-handed way to reconfigure it. Moving towards long-running services (tracked in YARN-896) and to enable more advanced admission control and resource parcelling we need to make the CapacityScheduler more dynamic. This is instrumental to the umbrella jira YARN-1051. Concretely this require the following changes: * create queues dynamically * destroy queues dynamically * dynamically change queue parameters (e.g., capacity) * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% instead of ==100% We limit this to LeafQueues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic
[ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108948#comment-14108948 ] Hadoop QA commented on YARN-1707: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12663571/YARN-1707.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4719//console This message is automatically generated. Making the CapacityScheduler more dynamic - Key: YARN-1707 URL: https://issues.apache.org/jira/browse/YARN-1707 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Labels: capacity-scheduler Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.patch The CapacityScheduler is a rather static at the moment, and refreshqueue provides a rather heavy-handed way to reconfigure it. Moving towards long-running services (tracked in YARN-896) and to enable more advanced admission control and resource parcelling we need to make the CapacityScheduler more dynamic. This is instrumental to the umbrella jira YARN-1051. Concretely this require the following changes: * create queues dynamically * destroy queues dynamically * dynamically change queue parameters (e.g., capacity) * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% instead of ==100% We limit this to LeafQueues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2427) Add support for moving apps between queues in RM web services
[ https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108975#comment-14108975 ] Varun Vasudev commented on YARN-2427: - The TestAMRestart failure is unrelated. Add support for moving apps between queues in RM web services - Key: YARN-2427 URL: https://issues.apache.org/jira/browse/YARN-2427 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2427.0.patch, apache-yarn-2427.1.patch, apache-yarn-2427.2.patch Support for moving apps from one queue to another is now present in CapacityScheduler and FairScheduler. We should expose the functionality via RM web services as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set
[ https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2449: Attachment: apache-yarn-2449.0.patch Uploaded patch with fix. Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set --- Key: YARN-2449 URL: https://issues.apache.org/jira/browse/YARN-2449 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Environment: Deploy security enabled cluster is ATS also enabled and running, but no hadoop.http.filter.initializers set in core-site.xml Reporter: Karam Singh Assignee: Varun Vasudev Priority: Critical Attachments: apache-yarn-2449.0.patch Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from timelineserver, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b 'timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108991#comment-14108991 ] Wangda Tan commented on YARN-1198: -- Hi [~cwelch], Thanks for updating, I went through your patch just now. I think the current approach makes more sense to me comparing to patch#4, it avoids iterating all apps when computing headroom. But currently, CapacityHeadroomProvider#getHeadroom will recompute headroom for each application heartbeat. Assume we have #application #user in a queue (the most possible case), it's still a little costly. I agree with the method which mentioned by Jason more: Specifically, we can create a map of user, headroom for each queue, when we need update headroom, we can update the all headroom in the map. And each SchedulerApplicationAttempt will hold a reference to headroom. The headroom in the map maybe as same as the {{HeadroomProvider}} in your patch. I would suggest to rename the {{HeadroomProvider}} to {{HeadroomReference}}, because we don't need do any computation in it anymore. Another benefit is, we don't need write HeadroomProvider for each scheduler. A simple HeadroomReference with getter/setter should be enough. Two more things we should take care with previous method: 1) As mentioned by Jason, currently, fair/capacity scheduler all support moving app between queues, we should recompute and change the reference after finished moving app. 2) In LeafQueue#assignContainers, we don't need call {code} Resource userLimit = computeUserLimitAndSetHeadroom(application, clusterResource, required); {code} For each application, and in LeafQueue#updateClusterResource iterate and update the map of user, headroom should be enough Wangda Capacity Scheduler headroom calculation does not work as expected - Key: YARN-1198 URL: https://issues.apache.org/jira/browse/YARN-1198 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Craig Welch Attachments: YARN-1198.1.patch, YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch Today headroom calculation (for the app) takes place only when * New node is added/removed from the cluster * New container is getting assigned to the application. However there are potentially lot of situations which are not considered for this calculation * If a container finishes then headroom for that application will change and should be notified to the AM accordingly. * If a single user has submitted multiple applications (app1 and app2) to the same queue then ** If app1's container finishes then not only app1's but also app2's AM should be notified about the change in headroom. ** Similarly if a container is assigned to any applications app1/app2 then both AM should be notified about their headroom. ** To simplify the whole communication process it is ideal to keep headroom per User per LeafQueue so that everyone gets the same picture (apps belonging to same user and submitted in same queue). * If a new user submits an application to the queue then all applications submitted by all users in that queue should be notified of the headroom change. * Also today headroom is an absolute number ( I think it should be normalized but then this is going to be not backward compatible..) * Also when admin user refreshes queue headroom has to be updated. These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set
[ https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109005#comment-14109005 ] Hadoop QA commented on YARN-2449: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664132/apache-yarn-2449.0.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4720//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4720//console This message is automatically generated. Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set --- Key: YARN-2449 URL: https://issues.apache.org/jira/browse/YARN-2449 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Environment: Deploy security enabled cluster is ATS also enabled and running, but no hadoop.http.filter.initializers set in core-site.xml Reporter: Karam Singh Assignee: Varun Vasudev Priority: Critical Attachments: apache-yarn-2449.0.patch Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from timelineserver, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b 'timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS
[ https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-160: --- Attachment: apache-yarn-160.2.patch Comments from [~jlowe] in YARN-2440 about this feature led to some more changes. The latest patch introduces some new config variables 1. yarn.nodemanager.containers-cpu-cores - the number of cores to be used for yarn containers. By default we use all cores. 2. yarn.nodemanager.containers-cpu-percentage - the percentage of overall cpu to be used for yarn containers. By default we use all CPU. 3. yarn.nodemanager.pcores-vcores-multiplier - a multiplier to convert pcores to vcores. By default it is 1. This can be used on clusters with heterogeneous hardware to have more containers run on faster CPUs. 4. yarn.nodemanager.count-logical-processors-as-cores - flag to determine if hperthreads should be counted as cores. By default it is true. There's a some code between YARN-2440 and this patch. Depending on which one gets committed first, I'll change the patch appropriately. nodemanagers should obtain cpu/memory values from underlying OS --- Key: YARN-160 URL: https://issues.apache.org/jira/browse/YARN-160 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Varun Vasudev Fix For: 2.6.0 Attachments: apache-yarn-160.0.patch, apache-yarn-160.1.patch, apache-yarn-160.2.patch As mentioned in YARN-2 *NM memory and CPU configs* Currently these values are coming from the config of the NM, we should be able to obtain those values from the OS (ie, in the case of Linux from /proc/meminfo /proc/cpuinfo). As this is highly OS dependent we should have an interface that obtains this information. In addition implementations of this interface should be able to specify a mem/cpu offset (amount of mem/cpu not to be avail as YARN resource), this would allow to reserve mem/cpu for the OS and other services outside of YARN containers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS
[ https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109075#comment-14109075 ] Hadoop QA commented on YARN-160: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664137/apache-yarn-160.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-gridmix hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4721//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4721//console This message is automatically generated. nodemanagers should obtain cpu/memory values from underlying OS --- Key: YARN-160 URL: https://issues.apache.org/jira/browse/YARN-160 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Varun Vasudev Fix For: 2.6.0 Attachments: apache-yarn-160.0.patch, apache-yarn-160.1.patch, apache-yarn-160.2.patch As mentioned in YARN-2 *NM memory and CPU configs* Currently these values are coming from the config of the NM, we should be able to obtain those values from the OS (ie, in the case of Linux from /proc/meminfo /proc/cpuinfo). As this is highly OS dependent we should have an interface that obtains this information. In addition implementations of this interface should be able to specify a mem/cpu offset (amount of mem/cpu not to be avail as YARN resource), this would allow to reserve mem/cpu for the OS and other services outside of YARN containers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2035) FileSystemApplicationHistoryStore blocks RM and AHS while NN is in safemode
[ https://issues.apache.org/jira/browse/YARN-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109340#comment-14109340 ] Zhijie Shen commented on YARN-2035: --- +1 for the latest patch. Will commit it. FileSystemApplicationHistoryStore blocks RM and AHS while NN is in safemode --- Key: YARN-2035 URL: https://issues.apache.org/jira/browse/YARN-2035 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.1 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-2035-v2.patch, YARN-2035-v3.patch, YARN-2035.patch Small bug that prevents ResourceManager and ApplicationHistoryService from coming up while Namenode is in safemode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2450) Fix typos in log messages
[ https://issues.apache.org/jira/browse/YARN-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang reassigned YARN-2450: Assignee: Ray Chiang Fix typos in log messages - Key: YARN-2450 URL: https://issues.apache.org/jira/browse/YARN-2450 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Trivial Labels: newbie There are a bunch of typos in log messages. HADOOP-10946 was initially created, but may have failed due to being in multiple components. Try fixing typos on a per-component basis. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2450) Fix typos in log messages
Ray Chiang created YARN-2450: Summary: Fix typos in log messages Key: YARN-2450 URL: https://issues.apache.org/jira/browse/YARN-2450 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Ray Chiang Priority: Trivial There are a bunch of typos in log messages. HADOOP-10946 was initially created, but may have failed due to being in multiple components. Try fixing typos on a per-component basis. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2450) Fix typos in log messages
[ https://issues.apache.org/jira/browse/YARN-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated YARN-2450: - Attachment: YARN-2450-01.patch First attempt for YARN-only log fixes. Fix typos in log messages - Key: YARN-2450 URL: https://issues.apache.org/jira/browse/YARN-2450 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Trivial Labels: newbie Attachments: YARN-2450-01.patch There are a bunch of typos in log messages. HADOOP-10946 was initially created, but may have failed due to being in multiple components. Try fixing typos on a per-component basis. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (YARN-2035) FileSystemApplicationHistoryStore blocks RM and AHS while NN is in safemode
[ https://issues.apache.org/jira/browse/YARN-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109340#comment-14109340 ] Zhijie Shen edited comment on YARN-2035 at 8/25/14 6:01 PM: +1 for the latest patch. Hold on commit until figure out the proper way to commit via git. was (Author: zjshen): +1 for the latest patch. Will commit it. FileSystemApplicationHistoryStore blocks RM and AHS while NN is in safemode --- Key: YARN-2035 URL: https://issues.apache.org/jira/browse/YARN-2035 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.1 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-2035-v2.patch, YARN-2035-v3.patch, YARN-2035.patch Small bug that prevents ResourceManager and ApplicationHistoryService from coming up while Namenode is in safemode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-810) Support CGroup ceiling enforcement on CPU
[ https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109442#comment-14109442 ] Wei Yan commented on YARN-810: -- [~vvasudev], for the cfs_quota_us and cfs_period_us settings problem, as we need to get the number of physical cores used by YARN, I'll update a patch here once your YARN-2440 committed. Support CGroup ceiling enforcement on CPU - Key: YARN-810 URL: https://issues.apache.org/jira/browse/YARN-810 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.0-beta, 2.0.5-alpha Reporter: Chris Riccomini Assignee: Sandy Ryza Attachments: YARN-810.patch, YARN-810.patch Problem statement: YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. Containers are then allowed to request vcores between the minimum and maximum defined in the yarn-site.xml. In the case where a single-threaded container requests 1 vcore, with a pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of the core it's using, provided that no other container is also using it. This happens, even though the only guarantee that YARN/CGroups is making is that the container will get at least 1/4th of the core. If a second container then comes along, the second container can take resources from the first, provided that the first container is still getting at least its fair share (1/4th). There are certain cases where this is desirable. There are also certain cases where it might be desirable to have a hard limit on CPU usage, and not allow the process to go above the specified resource requirement, even if it's available. Here's an RFC that describes the problem in more detail: http://lwn.net/Articles/336127/ Solution: As it happens, when CFS is used in combination with CGroups, you can enforce a ceiling using two files in cgroups: {noformat} cpu.cfs_quota_us cpu.cfs_period_us {noformat} The usage of these two files is documented in more detail here: https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html Testing: I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, it behaves as described above (it is a soft cap, and allows containers to use more than they asked for). I then tested CFS CPU quotas manually with YARN. First, you can see that CFS is in use in the CGroup, based on the file names: {noformat} [criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/ total 0 -r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares -r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat -rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release -rw-r--r-- 1 app app 0 Jun 13 16:46 tasks [criccomi@eat1-qa464 ~]$ sudo -u app cat /cgroup/cpu/hadoop-yarn/cpu.cfs_period_us 10 [criccomi@eat1-qa464 ~]$ sudo -u app cat /cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us -1 {noformat} Oddly, it appears that the cfs_period_us is set to .1s, not 1s. We can place processes in hard limits. I have process 4370 running YARN container container_1371141151815_0003_01_03 on a host. By default, it's running at ~300% cpu usage. {noformat} CPU 4370 criccomi 20 0 1157m 551m 14m S 240.3 0.8 87:10.91 ... {noformat} When I set the CFS quote: {noformat} echo 1000 /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us CPU 4370 criccomi 20 0 1157m 563m 14m S 1.0 0.8 90:08.39 ... {noformat} It drops to 1% usage, and you can see the box has room to spare: {noformat} Cpu(s): 2.4%us, 1.0%sy, 0.0%ni, 92.2%id, 4.2%wa, 0.0%hi, 0.1%si, 0.0%st {noformat} Turning the quota back to -1: {noformat} echo -1 /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us {noformat} Burns the cores again: {noformat} Cpu(s): 11.1%us, 1.7%sy, 0.0%ni, 83.9%id, 3.1%wa, 0.0%hi, 0.2%si, 0.0%st CPU 4370 criccomi 20 0 1157m 563m 14m S 253.9 0.8 89:32.31 ... {noformat} On my dev box, I was testing CGroups by running a python process eight times, to burn through all the cores, since it was doing as described above (giving extra CPU to the process, even with
[jira] [Commented] (YARN-2377) Localization exception stack traces are not passed as diagnostic info
[ https://issues.apache.org/jira/browse/YARN-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109479#comment-14109479 ] Karthik Kambatla commented on YARN-2377: +1 to improving the debuggability here. Can we re-use StringUtils.stringifyException, preferably in ResourceLocalizationService? Localization exception stack traces are not passed as diagnostic info - Key: YARN-2377 URL: https://issues.apache.org/jira/browse/YARN-2377 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-2377.v01.patch In the Localizer log one can only see this kind of message {code} 14/07/31 10:29:00 INFO localizer.ResourceLocalizationService: DEBUG: FAILED { hdfs://ha-nn-uri-0:8020/tmp/hadoop-yarn/staging/gshegalov/.staging/job_1406825443306_0004/job.jar, 1406827248944, PATTERN, (?:classes/|lib/).* }, java.net.UnknownHos tException: ha-nn-uri-0 {code} And then only {{ java.net.UnknownHostException: ha-nn-uri-0}} message is propagated as diagnostics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2450) Fix typos in log messages
[ https://issues.apache.org/jira/browse/YARN-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109488#comment-14109488 ] Hadoop QA commented on YARN-2450: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664177/YARN-2450-01.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4722//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4722//console This message is automatically generated. Fix typos in log messages - Key: YARN-2450 URL: https://issues.apache.org/jira/browse/YARN-2450 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Trivial Labels: newbie Attachments: YARN-2450-01.patch There are a bunch of typos in log messages. HADOOP-10946 was initially created, but may have failed due to being in multiple components. Try fixing typos on a per-component basis. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1326) RM should log using RMStore at startup time
[ https://issues.apache.org/jira/browse/YARN-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109490#comment-14109490 ] Karthik Kambatla commented on YARN-1326: Minor nit that I can fix at commit time: Change {code} this.rmStateStoreName = rm.getRMContext().getStateStore().getClass() .getName(); {code} to {code} this.rmStateStoreName = rm.getRMContext().getStateStore().getClass().getName(); {code} Otherwise, +1. Will commit this when the repo becomes writable. RM should log using RMStore at startup time --- Key: YARN-1326 URL: https://issues.apache.org/jira/browse/YARN-1326 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.5.0 Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-1326.1.patch, YARN-1326.2.patch, YARN-1326.3.patch, YARN-1326.4.patch, demo.png Original Estimate: 3h Remaining Estimate: 3h Currently there are no way to know which RMStore RM uses. It's useful to log the information at RM's startup time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2450) Fix typos in log messages
[ https://issues.apache.org/jira/browse/YARN-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109504#comment-14109504 ] Ray Chiang commented on YARN-2450: -- Changes restricted to log messages only. Will not write tests specific to log messages. Fix typos in log messages - Key: YARN-2450 URL: https://issues.apache.org/jira/browse/YARN-2450 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Trivial Labels: newbie Attachments: YARN-2450-01.patch There are a bunch of typos in log messages. HADOOP-10946 was initially created, but may have failed due to being in multiple components. Try fixing typos on a per-component basis. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2451) Delete .orig files
Karthik Kambatla created YARN-2451: -- Summary: Delete .orig files Key: YARN-2451 URL: https://issues.apache.org/jira/browse/YARN-2451 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Karthik Kambatla Assignee: Karthik Kambatla Looks like we checked in a few .orig files. We should delete them. {noformat} ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java.orig ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java.orig ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java.orig ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java.orig {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2450) Fix typos in log messages
[ https://issues.apache.org/jira/browse/YARN-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109524#comment-14109524 ] Akira AJISAKA commented on YARN-2450: - Thanks [~rchiang] for splitting the patch. LGTM, +1 (non-binding). Fix typos in log messages - Key: YARN-2450 URL: https://issues.apache.org/jira/browse/YARN-2450 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Trivial Labels: newbie Attachments: YARN-2450-01.patch There are a bunch of typos in log messages. HADOOP-10946 was initially created, but may have failed due to being in multiple components. Try fixing typos on a per-component basis. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2448) RM should expose the name of the ResourceCalculator being used when AMs register
[ https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109577#comment-14109577 ] Karthik Kambatla commented on YARN-2448: I am not sure I understand the usecase very well. The AM's requirements shouldn't change based on what the RM does internally. Shouldn't the application ask for all the resources that YARN supports? It is upto the scheduler (queue, user, app type etc.) to decide on what resources it would consider for scheduling. If the app doesn't specify any resources at all for a type, we can assume zero for that type (e.g. in clusters not configured to use a particular type). RM should expose the name of the ResourceCalculator being used when AMs register Key: YARN-2448 URL: https://issues.apache.org/jira/browse/YARN-2448 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch The RM should expose the name of the ResourceCalculator being used when AMs register, as part of the RegisterApplicationMasterResponse. This will allow applications to make better decisions when scheduling. MapReduce for example, only looks at memory when deciding it's scheduling, even though the RM could potentially be using the DominantResourceCalculator. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2448) RM should expose the name of the ResourceCalculator being used when AMs register
[ https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109589#comment-14109589 ] Varun Vasudev commented on YARN-2448: - The use case that springs to mind is adding support for cpu to map-reduce. Currently the map-reduce AM only looks at memory when it is deciding things like pre-empting reducers. If we wish to add support for cpu as a resource to map-reduce, it needs to consider vcores as well. However, if the YARN scheduler if using the DefaultResourceCalculator, which ignores cpu, and the map-reduce AM doesn't know this, it leads to inefficient asks and allocations. The aim is just to let the AM know which calculator is being used and let the AM go from there. Does that make sense? RM should expose the name of the ResourceCalculator being used when AMs register Key: YARN-2448 URL: https://issues.apache.org/jira/browse/YARN-2448 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch The RM should expose the name of the ResourceCalculator being used when AMs register, as part of the RegisterApplicationMasterResponse. This will allow applications to make better decisions when scheduling. MapReduce for example, only looks at memory when deciding it's scheduling, even though the RM could potentially be using the DominantResourceCalculator. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2448) RM should expose the name of the ResourceCalculator being used when AMs register
[ https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109603#comment-14109603 ] Karthik Kambatla commented on YARN-2448: May be I am missing your point. Why not have the MR AM request both CPU and memory? If the RM/scheduler doesn't consider CPU, it will just ignore it. Related, but orthogonal point: In the case of FairScheduler, the policy depends on the queue the app is submitted to. So, some queues might consider only CPU, some only memory, and some both. So, exposing the ResourceCalculator doesn't really tell the AM anything, it has to look at the queue configuration. RM should expose the name of the ResourceCalculator being used when AMs register Key: YARN-2448 URL: https://issues.apache.org/jira/browse/YARN-2448 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch The RM should expose the name of the ResourceCalculator being used when AMs register, as part of the RegisterApplicationMasterResponse. This will allow applications to make better decisions when scheduling. MapReduce for example, only looks at memory when deciding it's scheduling, even though the RM could potentially be using the DominantResourceCalculator. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109738#comment-14109738 ] Craig Welch commented on YARN-1198: --- I initially considered an approach like this one, but did not go in that direction for a couple of reasons. One is that, to avoid introducing a calculation during the heartbeat, you do end up iterating all the users in the queue with every headroom calculation. While this may generally be less than iterating all of the applications in a queue it may still be fairly significant in some usage patterns, and in a worst case (different user for each application) it is exactly equivalent to what we are trying to avoid. The other is the Resource required which is application specific and included in the userlimit calculation - the comments indicate this ensures that jobs in queues // with miniscule capacity ( 1 slot) make progress - I notice that updateClusterResource just provides the .none for this value - so it is not being honored in all cases, but I'm concerned about breaking the case it is meant to handle by detaching it generally from the headroom calculation. Handling this value as we do today requires an application specific calculation - hence placing it in the application path and handling it as I do in the .7 patch during heartbeat/using an application specific value. If we move to calculating it at the user level then we would have to choose one value for the required from one of the user's applications to avoid iterating them otherwise we are back to iterating all applications at each go. In a practical sense that might be fine, unless different applications for the same user are passing significantly different values for required - I suppose we could use a max for that value, but then an unusually large value for required could be carried forward indefinitely (for as long as a user has active applications) - or we could just use the last one provided for that user and understand that it changes the results a bit, possibly in an undesired way. Couple of other points: -re we don't need write HeadroomProvider for each scheduler - we already don't need one - the base implementation I've provided maintains the current behavior for other schedulers, and it appears that other schedulers may not require the same treatment as they do not necessarily vary their headroom as dynamically/in the interrelated way that the capacity scheduler does - in any case, the pattern I'm introducing here can be reused by them - but they would, in any case, require their own logic to effect this kind of update if they require it. -re As mentioned by Jason, currently, fair/capacity scheduler all support moving app between queues, we should recompute and change the reference after finished moving app I take this to properly be a task to take on when providing support for moving between queues - not having the location in code at present where this will happen prevents me from really addressing it, it's not part of the current effort, and in any case this change is not making that any more difficult (it may be making it easier... hard to be sure until we're ready to do it... but I am sure it is not making it more difficult - The first time an application calls computUserLimit... after it is moved it will automatically update to the proper configuration to provide headroom from then onward, with no other changes so far as I can see. We could also effect this by simply setting the headroom provider during the move.) Provider vs Reference - I went with a more general term as I'm not sure that in all cases it will be simple reference/will have no logic of it's own - Provider is a superset/more generic term :-) -re the cost of the calculation - if you look through the code, it's factored such that everything is referring to local members of a relatively small object graph - basically, it's just doing a few member lookups and a little math (I know, you could say that about anything - but in this case, it really isn't very much) - no significant data structures have to be accessed and while it's hidden behind calls to Resources it really is just a bit of calculation... That said, I can see benefits to avoiding some of the work being done in the heartbeat - the one hard limit is the impact to how the Resource required value is handled, possibly not a significant tradeoff. I also had some concurrency concerns - by moving this out to the heartbeat we are accessing some shared Resource values concurrently which are not at present, and I ran into some concurrency issues with LeafQueue when making the change (all resolved, but caused some alarm/required some workaround) - there could be other latent concurrency issues there which will be corner cases, where if we have all calculation happening in the calculate... call in
[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109798#comment-14109798 ] Craig Welch commented on YARN-1857: --- [~jianhe] [~wangda] could you have a look at this patch? CapacityScheduler headroom doesn't account for other AM's running - Key: YARN-1857 URL: https://issues.apache.org/jira/browse/YARN-1857 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Chen He Priority: Critical Attachments: YARN-1857.1.patch, YARN-1857.patch, YARN-1857.patch, YARN-1857.patch Its possible to get an application to hang forever (or a long time) in a cluster with multiple users. The reason why is that the headroom sent to the application is based on the user limit but it doesn't account for other Application masters using space in that queue. So the headroom (user limit - user consumed) can be 0 even though the cluster is 100% full because the other space is being used by application masters from other users. For instance if you have a cluster with 1 queue, user limit is 100%, you have multiple users submitting applications. One very large application by user 1 starts up, runs most of its maps and starts running reducers. other users try to start applications and get their application masters started but not tasks. The very large application then gets to the point where it has consumed the rest of the cluster resources with all reduces. But at this point it needs to still finish a few maps. The headroom being sent to this application is only based on the user limit (which is 100% of the cluster capacity) its using lets say 95% of the cluster for reduces and then other 5% is being used by other users running application masters. The MRAppMaster thinks it still has 5% so it doesn't know that it should kill a reduce in order to run a map. This can happen in other scenarios also. Generally in a large cluster with multiple queues this shouldn't cause a hang forever but it could cause the application to take much longer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2440) Cgroups should allow YARN containers to be limited to allocated cores
[ https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109801#comment-14109801 ] Jason Lowe commented on YARN-2440: -- Thanks for updating the patch, Varun. I don't see why we need both a containers-cpu-cores and containers-cpu-percentage, and I think it leads to confusion when both exist. At first I did not realize that one overrode the other. Instead I assumed that if you set cpu-cores to X and cpu-percentage to Y then you were requesting Y% of X cores. Then there's the additional question of whether container usage is pinned to those cores, etc. Only having cpu-percentage is a simpler model that still allows the user to specify cores indirectly (e.g.: 25% of an 8 core system is 2 cores). Maybe I'm missing the use case where we really need containers-cpu-cores and the confusing (to me at least) override behavior between the two properties. Other comments on the patch: - I'm not thrilled about the name template containers-cpu-* since it could easily be misinterpreted as a per-container thing as well, but I'm currently at a loss for a better prefix. Suggestions welcome. - Does getOverallLimits need to check for a quotaUS that's too low as well? - I think minimally we need to log a warning if we're going to ignore setting up cgroups to limit CPU usage across all containers if the user specified to do so. - Related to the previous comment, I think it would be nice if we didn't try to setup any limits if none were specified. That way if there's some issue with correctly determining the number of cores on a particular system it can still work in the default, use everything scenario. - NodeManagerHardwareUtils.getContainerCores should be getContainersCores (the per-container vs. all-containers confusion again) Cgroups should allow YARN containers to be limited to allocated cores - Key: YARN-2440 URL: https://issues.apache.org/jira/browse/YARN-2440 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, apache-yarn-2440.2.patch, screenshot-current-implementation.jpg The current cgroups implementation does not limit YARN containers to the cores allocated in yarn-site.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-1857: -- Target Version/s: 2.6.0 (was: 2.4.1) CapacityScheduler headroom doesn't account for other AM's running - Key: YARN-1857 URL: https://issues.apache.org/jira/browse/YARN-1857 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Chen He Priority: Critical Attachments: YARN-1857.1.patch, YARN-1857.patch, YARN-1857.patch, YARN-1857.patch Its possible to get an application to hang forever (or a long time) in a cluster with multiple users. The reason why is that the headroom sent to the application is based on the user limit but it doesn't account for other Application masters using space in that queue. So the headroom (user limit - user consumed) can be 0 even though the cluster is 100% full because the other space is being used by application masters from other users. For instance if you have a cluster with 1 queue, user limit is 100%, you have multiple users submitting applications. One very large application by user 1 starts up, runs most of its maps and starts running reducers. other users try to start applications and get their application masters started but not tasks. The very large application then gets to the point where it has consumed the rest of the cluster resources with all reduces. But at this point it needs to still finish a few maps. The headroom being sent to this application is only based on the user limit (which is 100% of the cluster capacity) its using lets say 95% of the cluster for reduces and then other 5% is being used by other users running application masters. The MRAppMaster thinks it still has 5% so it doesn't know that it should kill a reduce in order to run a map. This can happen in other scenarios also. Generally in a large cluster with multiple queues this shouldn't cause a hang forever but it could cause the application to take much longer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1442) change yarn minicluster base directory via system property
[ https://issues.apache.org/jira/browse/YARN-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109849#comment-14109849 ] Ken Krugler commented on YARN-1442: --- I'm curious why we wouldn't use the existing yarn.nodemanager.xxx conf settings for controlling where to put files. That's what I had originally done, and would seem like the most consistent approach. Related, I was assuming dfs.data.dir would control where to put HDFS blocks, but instead there's an undocumented MiniDFSCluster.HDFS_MINIDFS_BASEDIR property...why? change yarn minicluster base directory via system property -- Key: YARN-1442 URL: https://issues.apache.org/jira/browse/YARN-1442 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.2.0 Reporter: André Kelpe Priority: Minor Attachments: HADOOP-10122.patch The yarn minicluster used for testing uses the target directory by default. We use gradle for building our projects and we would like to see it using a different directory. This patch makes it possible to use a different directory by setting the yarn.minicluster.directory system property. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2440) Cgroups should allow YARN containers to be limited to allocated cores
[ https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109856#comment-14109856 ] Sangjin Lee commented on YARN-2440: --- It might be good to use several fairly representative scenarios and see how we can satisfy them with clear configuration. One scenario I can see pretty common is this (just for illustration): - 8-core system - want to use only 6 cores for containers (reserving 2 for NM and DN, etc.) - want to allocate 1/2 core per container by default IMO, the simplest config is {panel} yarn.nodemanager.resource.cpu-vcores = 60 yarn.nodemanager.containers-cores-to-vcores = 10 each container asks 5 vcores {panel} Or I could have {panel} yarn.nodemanager.resource.cpu-vcores = 60 yarn.nodemanager.containers-cpu-cores = 6 (core-to-vcore ratio understood as the ratio of these two) each container asks 5 vcores {panel} I'm not sure how I can use containers-cpu-percentage to describe this scenario... Does this help? Are there other types of use cases that we should review this with? Cgroups should allow YARN containers to be limited to allocated cores - Key: YARN-2440 URL: https://issues.apache.org/jira/browse/YARN-2440 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, apache-yarn-2440.2.patch, screenshot-current-implementation.jpg The current cgroups implementation does not limit YARN containers to the cores allocated in yarn-site.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2182) Update ContainerId#toString() to avoid conflicts before and after RM restart
[ https://issues.apache.org/jira/browse/YARN-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109943#comment-14109943 ] Jian He commented on YARN-2182: --- looks good, +1 Update ContainerId#toString() to avoid conflicts before and after RM restart Key: YARN-2182 URL: https://issues.apache.org/jira/browse/YARN-2182 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2182.1.patch, YARN-2182.2.patch ContainerId#toString() doesn't include any information about current cluster id. This leads conflict between container ids. We can avoid the conflicts without breaking backward compatibility by using epoch introduced on YARN-2052. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue
[ https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110007#comment-14110007 ] Karthik Kambatla commented on YARN-2395: Comments on the latest patch: # Typo - should be If and not In {code} // Fair share preemption timeout for each queue in seconds. In a job in the {code} # Documentation typos - s/will inherits/will inherit in two places. # In TestAllocationFileLoaderService, can we make some changes to minShare as well and verify them. FairScheduler: Preemption timeout should be configurable per queue -- Key: YARN-2395 URL: https://issues.apache.org/jira/browse/YARN-2395 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Wei Yan Attachments: YARN-2395-1.patch, YARN-2395-2.patch Currently in fair scheduler, the preemption logic considers fair share starvation only at leaf queue level. This jira is created to implement it at the parent queue as well. It involves : 1. Making check for fair share starvation and amount of resource to preempt recursive such that they traverse the queue hierarchy from root to leaf. 2. Currently fairSharePreemptionTimeout is a global config. We could make it configurable on a per queue basis,so that we can specify different timeouts for parent queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue
[ https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2395: -- Attachment: YARN-2395-3.patch Update a new patch to address Karthik's comments. FairScheduler: Preemption timeout should be configurable per queue -- Key: YARN-2395 URL: https://issues.apache.org/jira/browse/YARN-2395 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Wei Yan Attachments: YARN-2395-1.patch, YARN-2395-2.patch, YARN-2395-3.patch Currently in fair scheduler, the preemption logic considers fair share starvation only at leaf queue level. This jira is created to implement it at the parent queue as well. It involves : 1. Making check for fair share starvation and amount of resource to preempt recursive such that they traverse the queue hierarchy from root to leaf. 2. Currently fairSharePreemptionTimeout is a global config. We could make it configurable on a per queue basis,so that we can specify different timeouts for parent queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue
[ https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110059#comment-14110059 ] Ashwin Shankar commented on YARN-2395: -- [~ywskycn], I'll post my comments soon. But quick comment on skimming through the patch - I see you have NOT made FairScheduler#isStarvedForMinShare and isStarvedForFairShare recursive. Which means starvation at parent queues would not be detected and preemption at parent will not happen. Am I missing something ? FairScheduler: Preemption timeout should be configurable per queue -- Key: YARN-2395 URL: https://issues.apache.org/jira/browse/YARN-2395 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Wei Yan Attachments: YARN-2395-1.patch, YARN-2395-2.patch, YARN-2395-3.patch Currently in fair scheduler, the preemption logic considers fair share starvation only at leaf queue level. This jira is created to implement it at the parent queue as well. It involves : 1. Making check for fair share starvation and amount of resource to preempt recursive such that they traverse the queue hierarchy from root to leaf. 2. Currently fairSharePreemptionTimeout is a global config. We could make it configurable on a per queue basis,so that we can specify different timeouts for parent queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2452) TestRMApplicationHistoryWriter is failed for FairScheduler
zhihai xu created YARN-2452: --- Summary: TestRMApplicationHistoryWriter is failed for FairScheduler Key: YARN-2452 URL: https://issues.apache.org/jira/browse/YARN-2452 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu TestRMApplicationHistoryWriter is failed for FairScheduler. The failure is the following: T E S T S --- Running org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 69.311 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter testRMWritingMassiveHistory(org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter) Time elapsed: 66.261 sec FAILURE! java.lang.AssertionError: expected:1 but was:200 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:430) at org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:391) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2453) TestProportionalCapacityPreemptionPolicy is failed for FairScheduler
zhihai xu created YARN-2453: --- Summary: TestProportionalCapacityPreemptionPolicy is failed for FairScheduler Key: YARN-2453 URL: https://issues.apache.org/jira/browse/YARN-2453 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu TestProportionalCapacityPreemptionPolicy is failed for FairScheduler. The following is error message: Running org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy) Time elapsed: 1.61 sec FAILURE! java.lang.AssertionError: Failed to find SchedulingMonitor service, please check what happened at org.junit.Assert.fail(Assert.java:88) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469) This test should only work for capacity scheduler because the following source code in ResourceManager.java prove it will only work for capacity scheduler. {code} if (scheduler instanceof PreemptableResourceScheduler conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) { {code} Because CapacityScheduler is instance of PreemptableResourceScheduler and FairScheduler is not instance of PreemptableResourceScheduler. I will upload a patch to fix this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-1198: -- Attachment: YARN-1198.8.patch Patch based on my last comment which iterates/calculates headroom at the user level - which is (I believe) favored by [~jlowe] and [~wangda] (I'm comfortable with it, too...) Capacity Scheduler headroom calculation does not work as expected - Key: YARN-1198 URL: https://issues.apache.org/jira/browse/YARN-1198 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Craig Welch Attachments: YARN-1198.1.patch, YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch, YARN-1198.8.patch Today headroom calculation (for the app) takes place only when * New node is added/removed from the cluster * New container is getting assigned to the application. However there are potentially lot of situations which are not considered for this calculation * If a container finishes then headroom for that application will change and should be notified to the AM accordingly. * If a single user has submitted multiple applications (app1 and app2) to the same queue then ** If app1's container finishes then not only app1's but also app2's AM should be notified about the change in headroom. ** Similarly if a container is assigned to any applications app1/app2 then both AM should be notified about their headroom. ** To simplify the whole communication process it is ideal to keep headroom per User per LeafQueue so that everyone gets the same picture (apps belonging to same user and submitted in same queue). * If a new user submits an application to the queue then all applications submitted by all users in that queue should be notified of the headroom change. * Also today headroom is an absolute number ( I think it should be normalized but then this is going to be not backward compatible..) * Also when admin user refreshes queue headroom has to be updated. These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2404) Remove ApplicationAttemptState and ApplicationState class in RMStateStore class
[ https://issues.apache.org/jira/browse/YARN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2404: -- Issue Type: Sub-task (was: Improvement) Parent: YARN-128 Remove ApplicationAttemptState and ApplicationState class in RMStateStore class Key: YARN-2404 URL: https://issues.apache.org/jira/browse/YARN-2404 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He We can remove ApplicationState and ApplicationAttemptState class in RMStateStore, given that we already have ApplicationStateData and ApplicationAttemptStateData records. we may just replace ApplicationState with ApplicationStateData, similarly for ApplicationAttemptState. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue
[ https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110191#comment-14110191 ] Hadoop QA commented on YARN-2395: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664263/YARN-2395-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4724//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4724//console This message is automatically generated. FairScheduler: Preemption timeout should be configurable per queue -- Key: YARN-2395 URL: https://issues.apache.org/jira/browse/YARN-2395 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Wei Yan Attachments: YARN-2395-1.patch, YARN-2395-2.patch, YARN-2395-3.patch Currently in fair scheduler, the preemption logic considers fair share starvation only at leaf queue level. This jira is created to implement it at the parent queue as well. It involves : 1. Making check for fair share starvation and amount of resource to preempt recursive such that they traverse the queue hierarchy from root to leaf. 2. Currently fairSharePreemptionTimeout is a global config. We could make it configurable on a per queue basis,so that we can specify different timeouts for parent queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2056) Disable preemption at Queue level
[ https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110204#comment-14110204 ] Hadoop QA commented on YARN-2056: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664283/YARN-2056.201408260128.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4723//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4723//console This message is automatically generated. Disable preemption at Queue level - Key: YARN-2056 URL: https://issues.apache.org/jira/browse/YARN-2056 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Mayank Bansal Assignee: Eric Payne Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt We need to be able to disable preemption at individual queue level -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2377) Localization exception stack traces are not passed as diagnostic info
[ https://issues.apache.org/jira/browse/YARN-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110221#comment-14110221 ] Gera Shegalov commented on YARN-2377: - Hi [~kasha], I considered {{StringUtils#stringifyException}} but discarded it due to the following disadvantages: # redundant deserialization of the exception object just for the sake of serializing it right away # as a consequence, hypothetically, when localization service runs as a separate process with a dedicated classpath, we can encounter a {{ClassNotFoundException}} during deserialization Localization exception stack traces are not passed as diagnostic info - Key: YARN-2377 URL: https://issues.apache.org/jira/browse/YARN-2377 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-2377.v01.patch In the Localizer log one can only see this kind of message {code} 14/07/31 10:29:00 INFO localizer.ResourceLocalizationService: DEBUG: FAILED { hdfs://ha-nn-uri-0:8020/tmp/hadoop-yarn/staging/gshegalov/.staging/job_1406825443306_0004/job.jar, 1406827248944, PATTERN, (?:classes/|lib/).* }, java.net.UnknownHos tException: ha-nn-uri-0 {code} And then only {{ java.net.UnknownHostException: ha-nn-uri-0}} message is propagated as diagnostics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2440) Cgroups should allow YARN containers to be limited to allocated cores
[ https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110227#comment-14110227 ] Beckham007 commented on YARN-2440: -- Hi, all Why not use the cpuset subsystem of cgroups? The cpuset could make container to run on allocated cores, and reserving some cores for system. Cgroups should allow YARN containers to be limited to allocated cores - Key: YARN-2440 URL: https://issues.apache.org/jira/browse/YARN-2440 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, apache-yarn-2440.2.patch, screenshot-current-implementation.jpg The current cgroups implementation does not limit YARN containers to the cores allocated in yarn-site.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2406) Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto
[ https://issues.apache.org/jira/browse/YARN-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2406: - Attachment: YARN-2406.1.patch Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto Key: YARN-2406 URL: https://issues.apache.org/jira/browse/YARN-2406 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Tsuyoshi OZAWA Attachments: YARN-2406.1.patch Today most recovery related proto records are defined in yarn_server_resourcemanager_service_protos.proto which is inside YARN-API module. Since these records are internally used by RM only, we can move them to the yarn_server_resourcemanager_recovery.proto file inside RM-server module -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110230#comment-14110230 ] Hadoop QA commented on YARN-1198: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664285/YARN-1198.8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4726//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4726//console This message is automatically generated. Capacity Scheduler headroom calculation does not work as expected - Key: YARN-1198 URL: https://issues.apache.org/jira/browse/YARN-1198 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Craig Welch Attachments: YARN-1198.1.patch, YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch, YARN-1198.8.patch Today headroom calculation (for the app) takes place only when * New node is added/removed from the cluster * New container is getting assigned to the application. However there are potentially lot of situations which are not considered for this calculation * If a container finishes then headroom for that application will change and should be notified to the AM accordingly. * If a single user has submitted multiple applications (app1 and app2) to the same queue then ** If app1's container finishes then not only app1's but also app2's AM should be notified about the change in headroom. ** Similarly if a container is assigned to any applications app1/app2 then both AM should be notified about their headroom. ** To simplify the whole communication process it is ideal to keep headroom per User per LeafQueue so that everyone gets the same picture (apps belonging to same user and submitted in same queue). * If a new user submits an application to the queue then all applications submitted by all users in that queue should be notified of the headroom change. * Also today headroom is an absolute number ( I think it should be normalized but then this is going to be not backward compatible..) * Also when admin user refreshes queue headroom has to be updated. These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2404) Remove ApplicationAttemptState and ApplicationState class in RMStateStore class
[ https://issues.apache.org/jira/browse/YARN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA reassigned YARN-2404: Assignee: Tsuyoshi OZAWA Remove ApplicationAttemptState and ApplicationState class in RMStateStore class Key: YARN-2404 URL: https://issues.apache.org/jira/browse/YARN-2404 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Tsuyoshi OZAWA We can remove ApplicationState and ApplicationAttemptState class in RMStateStore, given that we already have ApplicationStateData and ApplicationAttemptStateData records. we may just replace ApplicationState with ApplicationStateData, similarly for ApplicationAttemptState. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2453) TestProportionalCapacityPreemptionPolicy is failed for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2453: Attachment: YARN-2453.000.patch TestProportionalCapacityPreemptionPolicy is failed for FairScheduler Key: YARN-2453 URL: https://issues.apache.org/jira/browse/YARN-2453 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2453.000.patch TestProportionalCapacityPreemptionPolicy is failed for FairScheduler. The following is error message: Running org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy) Time elapsed: 1.61 sec FAILURE! java.lang.AssertionError: Failed to find SchedulingMonitor service, please check what happened at org.junit.Assert.fail(Assert.java:88) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469) This test should only work for capacity scheduler because the following source code in ResourceManager.java prove it will only work for capacity scheduler. {code} if (scheduler instanceof PreemptableResourceScheduler conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) { {code} Because CapacityScheduler is instance of PreemptableResourceScheduler and FairScheduler is not instance of PreemptableResourceScheduler. I will upload a patch to fix this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2406) Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto
[ https://issues.apache.org/jira/browse/YARN-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110262#comment-14110262 ] Hadoop QA commented on YARN-2406: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664302/YARN-2406.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4727//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4727//console This message is automatically generated. Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto Key: YARN-2406 URL: https://issues.apache.org/jira/browse/YARN-2406 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Tsuyoshi OZAWA Attachments: YARN-2406.1.patch Today most recovery related proto records are defined in yarn_server_resourcemanager_service_protos.proto which is inside YARN-API module. Since these records are internally used by RM only, we can move them to the yarn_server_resourcemanager_recovery.proto file inside RM-server module -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2452) TestRMApplicationHistoryWriter is failed for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2452: Attachment: YARN-2452.000.patch TestRMApplicationHistoryWriter is failed for FairScheduler -- Key: YARN-2452 URL: https://issues.apache.org/jira/browse/YARN-2452 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2452.000.patch TestRMApplicationHistoryWriter is failed for FairScheduler. The failure is the following: T E S T S --- Running org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 69.311 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter testRMWritingMassiveHistory(org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter) Time elapsed: 66.261 sec FAILURE! java.lang.AssertionError: expected:1 but was:200 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:430) at org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:391) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2406) Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto
[ https://issues.apache.org/jira/browse/YARN-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110269#comment-14110269 ] Tsuyoshi OZAWA commented on YARN-2406: -- The test failure looks not related to a patch. [~jianhe], could you take a look? Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto Key: YARN-2406 URL: https://issues.apache.org/jira/browse/YARN-2406 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Tsuyoshi OZAWA Attachments: YARN-2406.1.patch Today most recovery related proto records are defined in yarn_server_resourcemanager_service_protos.proto which is inside YARN-API module. Since these records are internally used by RM only, we can move them to the yarn_server_resourcemanager_recovery.proto file inside RM-server module -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2453) TestProportionalCapacityPreemptionPolicy is failed for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110272#comment-14110272 ] zhihai xu commented on YARN-2453: - I uploaded a patch YARN-2453.000.patch for review. This patch is to skip the test testPolicyInitializeAfterSchedulerInitialized for FairScheduler. TestProportionalCapacityPreemptionPolicy is failed for FairScheduler Key: YARN-2453 URL: https://issues.apache.org/jira/browse/YARN-2453 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2453.000.patch TestProportionalCapacityPreemptionPolicy is failed for FairScheduler. The following is error message: Running org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy) Time elapsed: 1.61 sec FAILURE! java.lang.AssertionError: Failed to find SchedulingMonitor service, please check what happened at org.junit.Assert.fail(Assert.java:88) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469) This test should only work for capacity scheduler because the following source code in ResourceManager.java prove it will only work for capacity scheduler. {code} if (scheduler instanceof PreemptableResourceScheduler conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) { {code} Because CapacityScheduler is instance of PreemptableResourceScheduler and FairScheduler is not instance of PreemptableResourceScheduler. I will upload a patch to fix this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2452) TestRMApplicationHistoryWriter is failed for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110276#comment-14110276 ] zhihai xu commented on YARN-2452: - I uploaded a patch YARN-2452.000.patch for review. This patch is to enable assignmultiple, so FairScheduler can assign multiple containers on each Node HeartBeat otherwise by default FairScheduler can only assign one container on each Node HeartBeat. TestRMApplicationHistoryWriter is failed for FairScheduler -- Key: YARN-2452 URL: https://issues.apache.org/jira/browse/YARN-2452 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2452.000.patch TestRMApplicationHistoryWriter is failed for FairScheduler. The failure is the following: T E S T S --- Running org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 69.311 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter testRMWritingMassiveHistory(org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter) Time elapsed: 66.261 sec FAILURE! java.lang.AssertionError: expected:1 but was:200 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:430) at org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:391) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2453) TestProportionalCapacityPreemptionPolicy is failed for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110283#comment-14110283 ] Hadoop QA commented on YARN-2453: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664307/YARN-2453.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4728//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4728//console This message is automatically generated. TestProportionalCapacityPreemptionPolicy is failed for FairScheduler Key: YARN-2453 URL: https://issues.apache.org/jira/browse/YARN-2453 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2453.000.patch TestProportionalCapacityPreemptionPolicy is failed for FairScheduler. The following is error message: Running org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy) Time elapsed: 1.61 sec FAILURE! java.lang.AssertionError: Failed to find SchedulingMonitor service, please check what happened at org.junit.Assert.fail(Assert.java:88) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469) This test should only work for capacity scheduler because the following source code in ResourceManager.java prove it will only work for capacity scheduler. {code} if (scheduler instanceof PreemptableResourceScheduler conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) { {code} Because CapacityScheduler is instance of PreemptableResourceScheduler and FairScheduler is not instance of PreemptableResourceScheduler. I will upload a patch to fix this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2448) RM should expose the name of the ResourceCalculator being used when AMs register
[ https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110319#comment-14110319 ] Hadoop QA commented on YARN-2448: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664105/apache-yarn-2448.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-tools/hadoop-sls {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4729//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4729//console This message is automatically generated. RM should expose the name of the ResourceCalculator being used when AMs register Key: YARN-2448 URL: https://issues.apache.org/jira/browse/YARN-2448 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch The RM should expose the name of the ResourceCalculator being used when AMs register, as part of the RegisterApplicationMasterResponse. This will allow applications to make better decisions when scheduling. MapReduce for example, only looks at memory when deciding it's scheduling, even though the RM could potentially be using the DominantResourceCalculator. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2452) TestRMApplicationHistoryWriter is failed for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110324#comment-14110324 ] Hadoop QA commented on YARN-2452: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664310/YARN-2452.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4730//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4730//console This message is automatically generated. TestRMApplicationHistoryWriter is failed for FairScheduler -- Key: YARN-2452 URL: https://issues.apache.org/jira/browse/YARN-2452 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2452.000.patch TestRMApplicationHistoryWriter is failed for FairScheduler. The failure is the following: T E S T S --- Running org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 69.311 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter testRMWritingMassiveHistory(org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter) Time elapsed: 66.261 sec FAILURE! java.lang.AssertionError: expected:1 but was:200 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:430) at org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:391) -- This message was sent by Atlassian JIRA (v6.2#6252)