[jira] [Commented] (YARN-1537) TestLocalResourcesTrackerImpl.testLocalResourceCache often failed
[ https://issues.apache.org/jira/browse/YARN-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308817#comment-14308817 ] Hudson commented on YARN-1537: -- FAILURE: Integrated in Hadoop-trunk-Commit #7038 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7038/]) YARN-1537. Fix race condition in TestLocalResourcesTrackerImpl.testLocalResourceCache. Contributed by Xuan Gong. (acmurthy: rev 02f154a0016b7321bbe5b09f2da44a9b33797c36) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java * hadoop-yarn-project/CHANGES.txt > TestLocalResourcesTrackerImpl.testLocalResourceCache often failed > - > > Key: YARN-1537 > URL: https://issues.apache.org/jira/browse/YARN-1537 > Project: Hadoop YARN > Issue Type: Test > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Hong Shen >Assignee: Xuan Gong > Fix For: 2.7.0 > > Attachments: YARN-1537.1.patch > > > Here is the error log > {code} > Results : > Failed tests: > TestLocalResourcesTrackerImpl.testLocalResourceCache:351 > Wanted but not invoked: > eventHandler.handle( > > isA(org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerResourceLocalizedEvent) > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl.testLocalResourceCache(TestLocalResourcesTrackerImpl.java:351) > However, there were other interactions with this mock: > -> at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) > -> at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3151) On Failover tracking url wrong in application cli for KILLED application
Bibin A Chundatt created YARN-3151: -- Summary: On Failover tracking url wrong in application cli for KILLED application Key: YARN-3151 URL: https://issues.apache.org/jira/browse/YARN-3151 Project: Hadoop YARN Issue Type: Bug Components: client, resourcemanager Affects Versions: 2.6.0 Environment: 2 RM HA Reporter: Bibin A Chundatt Priority: Minor Run an application and kill the same after starting Check {color:red} ./yarn application -list -appStates KILLED {color} (empty line) {quote} Application-Id Tracking-URL application_1423219262738_0001 http://:PORT>/cluster/app/application_1423219262738_0001 {quote} Shutdown the active RM1 Check the same command {color:red} ./yarn application -list -appStates KILLED {color} after RM2 is active {quote} Application-Id Tracking-URL application_1423219262738_0001 null {quote} Tracking url for application is shown as null Expected : Same url before failover should be shown ApplicationReport .getOriginalTrackingUrl() is null after failover org.apache.hadoop.yarn.client.cli.ApplicationCLI listApplications(Set appTypes, EnumSet appStates) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3151) On Failover tracking url wrong in application cli for KILLED application
[ https://issues.apache.org/jira/browse/YARN-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith reassigned YARN-3151: Assignee: Rohith > On Failover tracking url wrong in application cli for KILLED application > > > Key: YARN-3151 > URL: https://issues.apache.org/jira/browse/YARN-3151 > Project: Hadoop YARN > Issue Type: Bug > Components: client, resourcemanager >Affects Versions: 2.6.0 > Environment: 2 RM HA >Reporter: Bibin A Chundatt >Assignee: Rohith >Priority: Minor > > Run an application and kill the same after starting > Check {color:red} ./yarn application -list -appStates KILLED {color} > (empty line) > {quote} > Application-Id Tracking-URL > application_1423219262738_0001 > http://:PORT>/cluster/app/application_1423219262738_0001 > {quote} > Shutdown the active RM1 > Check the same command {color:red} ./yarn application -list -appStates KILLED > {color} after RM2 is active > {quote} > Application-Id Tracking-URL > application_1423219262738_0001 null > {quote} > Tracking url for application is shown as null > Expected : Same url before failover should be shown > ApplicationReport .getOriginalTrackingUrl() is null after failover > org.apache.hadoop.yarn.client.cli.ApplicationCLI > listApplications(Set appTypes, > EnumSet appStates) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-933) After an AppAttempt_1 got failed [ removal and releasing of container is done , AppAttempt_2 is scheduled ] again relaunching of AppAttempt_1 throws Exception at RM .And
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308862#comment-14308862 ] Rohith commented on YARN-933: - Sure, I will recheck the code for existence of problem and update the patch. > After an AppAttempt_1 got failed [ removal and releasing of container is done > , AppAttempt_2 is scheduled ] again relaunching of AppAttempt_1 throws > Exception at RM .And client exited before appattempt retries got over > -- > > Key: YARN-933 > URL: https://issues.apache.org/jira/browse/YARN-933 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.5-alpha >Reporter: J.Andreina >Assignee: Rohith > Attachments: YARN-933.patch > > > am max retries configured as 3 at client and RM side. > Step 1: Install cluster with NM on 2 Machines > Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But > using Hostname should fail > Step 3: Execute a job > Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , > connection loss happened. > Observation : > == > After AppAttempt_1 has moved to failed state ,release of container for > AppAttempt_1 and Application removal are successful. New AppAttempt_2 is > sponed. > 1. Then again retry for AppAttempt_1 happens. > 2. Again RM side it is trying to launch AppAttempt_1, hence fails with > InvalidStateTransitonException > 3. Client got exited after AppAttempt_1 is been finished [But actually job is > still running ], while the appattempts configured is 3 and rest appattempts > are all sponed and running. > RMLogs: > == > 2013-07-17 16:22:51,013 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED > 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); > maxRetries=45 > 2013-07-17 16:36:07,091 INFO > org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: > Expired:container_1373952096466_0056_01_01 Timed out after 600 secs > 2013-07-17 16:36:07,093 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED > to EXPIRED > 2013-07-17 16:36:07,093 INFO > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: > Registering appattempt_1373952096466_0056_02 > 2013-07-17 16:36:07,131 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Application appattempt_1373952096466_0056_01 is done. finalState=FAILED > 2013-07-17 16:36:07,131 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Application removed - appId: application_1373952096466_0056 user: Rex > leaf-queue of parent: root #applications: 35 > 2013-07-17 16:36:07,132 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Application Submission: appattempt_1373952096466_0056_02, > 2013-07-17 16:36:07,138 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED > 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); > maxRetries=45 > 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); > maxRetries=45 > 2013-07-17 16:38:56,207 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error > launching appattempt_1373952096466_0056_01. Got exception: > java.lang.reflect.UndeclaredThrowableException > 2013-07-17 16:38:56,207 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > LAUNCH_FAILED at FAILED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630) > at > org.apache.hadoop.yarn.server.
[jira] [Commented] (YARN-1537) TestLocalResourcesTrackerImpl.testLocalResourceCache often failed
[ https://issues.apache.org/jira/browse/YARN-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308947#comment-14308947 ] Hudson commented on YARN-1537: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #96 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/96/]) YARN-1537. Fix race condition in TestLocalResourcesTrackerImpl.testLocalResourceCache. Contributed by Xuan Gong. (acmurthy: rev 02f154a0016b7321bbe5b09f2da44a9b33797c36) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java > TestLocalResourcesTrackerImpl.testLocalResourceCache often failed > - > > Key: YARN-1537 > URL: https://issues.apache.org/jira/browse/YARN-1537 > Project: Hadoop YARN > Issue Type: Test > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Hong Shen >Assignee: Xuan Gong > Fix For: 2.7.0 > > Attachments: YARN-1537.1.patch > > > Here is the error log > {code} > Results : > Failed tests: > TestLocalResourcesTrackerImpl.testLocalResourceCache:351 > Wanted but not invoked: > eventHandler.handle( > > isA(org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerResourceLocalizedEvent) > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl.testLocalResourceCache(TestLocalResourcesTrackerImpl.java:351) > However, there were other interactions with this mock: > -> at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) > -> at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) In Fair Scheduler, fix canceling of reservations for exceeding max share
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308949#comment-14308949 ] Hudson commented on YARN-3101: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #96 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/96/]) YARN-3101. In Fair Scheduler, fix canceling of reservations for exceeding max share (Anubhav Dhoot via Sandy Ryza) (sandy: rev b6466deac6d5d6344f693144290b46e2bef83a02) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/CHANGES.txt > In Fair Scheduler, fix canceling of reservations for exceeding max share > > > Key: YARN-3101 > URL: https://issues.apache.org/jira/browse/YARN-3101 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.7.0 > > Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, > YARN-3101.001.patch, YARN-3101.002.patch, YARN-3101.003.patch, > YARN-3101.003.patch, YARN-3101.004.patch, YARN-3101.004.patch > > > YARN-2811 added fitInMaxShare to validate reservations on a queue, but did > not count it during its calculations. It also had the condition reversed so > the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1904) Uniform the XXXXNotFound messages from ClientRMService and ApplicationHistoryClientService
[ https://issues.apache.org/jira/browse/YARN-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308953#comment-14308953 ] Hudson commented on YARN-1904: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #96 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/96/]) YARN-1904. Ensure exceptions thrown in ClientRMService & ApplicationHistoryClientService are uniform when application-attempt is not found. Contributed by Zhijie Shen. (acmurthy: rev 18b2507edaac991e3ed68d2f27eb96f6882137b9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryClientService.java > Uniform the NotFound messages from ClientRMService and > ApplicationHistoryClientService > -- > > Key: YARN-1904 > URL: https://issues.apache.org/jira/browse/YARN-1904 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.7.0 > > Attachments: YARN-1904.1.patch > > > It's good to make ClientRMService and ApplicationHistoryClientService throw > NotFoundException with similar messages -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3149) Typo in message for invalid application id
[ https://issues.apache.org/jira/browse/YARN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308957#comment-14308957 ] Hudson commented on YARN-3149: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #96 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/96/]) YARN-3149. Fix typo in message for invalid application id. Contributed (xgong: rev b77ff37686e01b7497d3869fbc62789a5b123c0a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java * hadoop-yarn-project/CHANGES.txt > Typo in message for invalid application id > -- > > Key: YARN-3149 > URL: https://issues.apache.org/jira/browse/YARN-3149 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Trivial > Fix For: 2.7.0 > > Attachments: YARN-3149.patch, YARN-3149.patch, screenshot-1.png > > > Message in console wrong when application id format wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue
[ https://issues.apache.org/jira/browse/YARN-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308946#comment-14308946 ] Hudson commented on YARN-1582: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #96 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/96/]) YARN-1582. Capacity Scheduler: add a maximum-allocation-mb setting per queue. Contributed by Thomas Graves (jlowe: rev 69c8a7f45be5c0aa6787b07f328d74f1e2ba5628) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/CapacityScheduler.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java > Capacity Scheduler: add a maximum-allocation-mb setting per queue > -- > > Key: YARN-1582 > URL: https://issues.apache.org/jira/browse/YARN-1582 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 3.0.0, 0.23.10, 2.2.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Fix For: 2.7.0 > > Attachments: YARN-1582-branch-0.23.patch, YARN-1582.002.patch, > YARN-1582.003.patch > > > We want to allow certain queues to use larger container sizes while limiting > other queues to smaller container sizes. Setting it per queue will help > prevent abuse, help limit the impact of reservations, and allow changes in > the maximum container size to be rolled out more easily. > One reason this is needed is more application types are becoming available on > yarn and certain applications require more memory to run efficiently. While > we want to allow for that we don't want other applications to abuse that and > start requesting bigger containers then what they really need. > Note that we could have this based on application type, but that might not be > totally accurate either since for example you might want to allow certain > users on MapReduce to use larger containers, while limiting other users of > MapReduce to smaller containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3145) ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308952#comment-14308952 ] Hudson commented on YARN-3145: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #96 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/96/]) YARN-3145. Fixed ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo. Contributed by Tsuyoshi OZAWA (jianhe: rev 4641196fe02af5cab3d56a9f3c78875c495dbe03) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/CHANGES.txt > ConcurrentModificationException on CapacityScheduler > ParentQueue#getQueueUserAclInfo > > > Key: YARN-3145 > URL: https://issues.apache.org/jira/browse/YARN-3145 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Tsuyoshi OZAWA > Fix For: 2.7.0 > > Attachments: YARN-3145.001.patch, YARN-3145.002.patch > > > {code} > ava.util.ConcurrentModificationException(java.util.ConcurrentModificationException > at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115) > at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:347) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:348) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getQueueUserAclInfo(CapacityScheduler.java:850) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:844) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:250) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:335) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308987#comment-14308987 ] Chris Douglas commented on YARN-3100: - Looking through {{AbstractCSQueue}} and {{CSQueueUtils}}, it looks like there are many misconfigurations that leave queues in an inconsistent state... > Make YARN authorization pluggable > - > > Key: YARN-3100 > URL: https://issues.apache.org/jira/browse/YARN-3100 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-3100.1.patch, YARN-3100.2.patch > > > The goal is to have YARN acl model pluggable so as to integrate other > authorization tool such as Apache Ranger, Sentry. > Currently, we have > - admin ACL > - queue ACL > - application ACL > - time line domain ACL > - service ACL > The proposal is to create a YarnAuthorizationProvider interface. Current > implementation will be the default implementation. Ranger or Sentry plug-in > can implement this interface. > Benefit: > - Unify the code base. With the default implementation, we can get rid of > each specific ACL manager such as AdminAclManager, ApplicationACLsManager, > QueueAclsManager etc. > - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue
[ https://issues.apache.org/jira/browse/YARN-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309007#comment-14309007 ] Hudson commented on YARN-1582: -- FAILURE: Integrated in Hadoop-Yarn-trunk #830 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/830/]) YARN-1582. Capacity Scheduler: add a maximum-allocation-mb setting per queue. Contributed by Thomas Graves (jlowe: rev 69c8a7f45be5c0aa6787b07f328d74f1e2ba5628) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/CapacityScheduler.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java > Capacity Scheduler: add a maximum-allocation-mb setting per queue > -- > > Key: YARN-1582 > URL: https://issues.apache.org/jira/browse/YARN-1582 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 3.0.0, 0.23.10, 2.2.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Fix For: 2.7.0 > > Attachments: YARN-1582-branch-0.23.patch, YARN-1582.002.patch, > YARN-1582.003.patch > > > We want to allow certain queues to use larger container sizes while limiting > other queues to smaller container sizes. Setting it per queue will help > prevent abuse, help limit the impact of reservations, and allow changes in > the maximum container size to be rolled out more easily. > One reason this is needed is more application types are becoming available on > yarn and certain applications require more memory to run efficiently. While > we want to allow for that we don't want other applications to abuse that and > start requesting bigger containers then what they really need. > Note that we could have this based on application type, but that might not be > totally accurate either since for example you might want to allow certain > users on MapReduce to use larger containers, while limiting other users of > MapReduce to smaller containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1537) TestLocalResourcesTrackerImpl.testLocalResourceCache often failed
[ https://issues.apache.org/jira/browse/YARN-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309008#comment-14309008 ] Hudson commented on YARN-1537: -- FAILURE: Integrated in Hadoop-Yarn-trunk #830 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/830/]) YARN-1537. Fix race condition in TestLocalResourcesTrackerImpl.testLocalResourceCache. Contributed by Xuan Gong. (acmurthy: rev 02f154a0016b7321bbe5b09f2da44a9b33797c36) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java > TestLocalResourcesTrackerImpl.testLocalResourceCache often failed > - > > Key: YARN-1537 > URL: https://issues.apache.org/jira/browse/YARN-1537 > Project: Hadoop YARN > Issue Type: Test > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Hong Shen >Assignee: Xuan Gong > Fix For: 2.7.0 > > Attachments: YARN-1537.1.patch > > > Here is the error log > {code} > Results : > Failed tests: > TestLocalResourcesTrackerImpl.testLocalResourceCache:351 > Wanted but not invoked: > eventHandler.handle( > > isA(org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerResourceLocalizedEvent) > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl.testLocalResourceCache(TestLocalResourcesTrackerImpl.java:351) > However, there were other interactions with this mock: > -> at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) > -> at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) In Fair Scheduler, fix canceling of reservations for exceeding max share
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309010#comment-14309010 ] Hudson commented on YARN-3101: -- FAILURE: Integrated in Hadoop-Yarn-trunk #830 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/830/]) YARN-3101. In Fair Scheduler, fix canceling of reservations for exceeding max share (Anubhav Dhoot via Sandy Ryza) (sandy: rev b6466deac6d5d6344f693144290b46e2bef83a02) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java > In Fair Scheduler, fix canceling of reservations for exceeding max share > > > Key: YARN-3101 > URL: https://issues.apache.org/jira/browse/YARN-3101 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.7.0 > > Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, > YARN-3101.001.patch, YARN-3101.002.patch, YARN-3101.003.patch, > YARN-3101.003.patch, YARN-3101.004.patch, YARN-3101.004.patch > > > YARN-2811 added fitInMaxShare to validate reservations on a queue, but did > not count it during its calculations. It also had the condition reversed so > the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3145) ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309013#comment-14309013 ] Hudson commented on YARN-3145: -- FAILURE: Integrated in Hadoop-Yarn-trunk #830 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/830/]) YARN-3145. Fixed ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo. Contributed by Tsuyoshi OZAWA (jianhe: rev 4641196fe02af5cab3d56a9f3c78875c495dbe03) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java > ConcurrentModificationException on CapacityScheduler > ParentQueue#getQueueUserAclInfo > > > Key: YARN-3145 > URL: https://issues.apache.org/jira/browse/YARN-3145 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Tsuyoshi OZAWA > Fix For: 2.7.0 > > Attachments: YARN-3145.001.patch, YARN-3145.002.patch > > > {code} > ava.util.ConcurrentModificationException(java.util.ConcurrentModificationException > at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115) > at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:347) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:348) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getQueueUserAclInfo(CapacityScheduler.java:850) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:844) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:250) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:335) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3149) Typo in message for invalid application id
[ https://issues.apache.org/jira/browse/YARN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309018#comment-14309018 ] Hudson commented on YARN-3149: -- FAILURE: Integrated in Hadoop-Yarn-trunk #830 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/830/]) YARN-3149. Fix typo in message for invalid application id. Contributed (xgong: rev b77ff37686e01b7497d3869fbc62789a5b123c0a) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java > Typo in message for invalid application id > -- > > Key: YARN-3149 > URL: https://issues.apache.org/jira/browse/YARN-3149 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Trivial > Fix For: 2.7.0 > > Attachments: YARN-3149.patch, YARN-3149.patch, screenshot-1.png > > > Message in console wrong when application id format wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1904) Uniform the XXXXNotFound messages from ClientRMService and ApplicationHistoryClientService
[ https://issues.apache.org/jira/browse/YARN-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309014#comment-14309014 ] Hudson commented on YARN-1904: -- FAILURE: Integrated in Hadoop-Yarn-trunk #830 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/830/]) YARN-1904. Ensure exceptions thrown in ClientRMService & ApplicationHistoryClientService are uniform when application-attempt is not found. Contributed by Zhijie Shen. (acmurthy: rev 18b2507edaac991e3ed68d2f27eb96f6882137b9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryClientService.java * hadoop-yarn-project/CHANGES.txt > Uniform the NotFound messages from ClientRMService and > ApplicationHistoryClientService > -- > > Key: YARN-1904 > URL: https://issues.apache.org/jira/browse/YARN-1904 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.7.0 > > Attachments: YARN-1904.1.patch > > > It's good to make ClientRMService and ApplicationHistoryClientService throw > NotFoundException with similar messages -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) In Fair Scheduler, fix canceling of reservations for exceeding max share
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309173#comment-14309173 ] Hudson commented on YARN-3101: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #93 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/93/]) YARN-3101. In Fair Scheduler, fix canceling of reservations for exceeding max share (Anubhav Dhoot via Sandy Ryza) (sandy: rev b6466deac6d5d6344f693144290b46e2bef83a02) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java > In Fair Scheduler, fix canceling of reservations for exceeding max share > > > Key: YARN-3101 > URL: https://issues.apache.org/jira/browse/YARN-3101 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.7.0 > > Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, > YARN-3101.001.patch, YARN-3101.002.patch, YARN-3101.003.patch, > YARN-3101.003.patch, YARN-3101.004.patch, YARN-3101.004.patch > > > YARN-2811 added fitInMaxShare to validate reservations on a queue, but did > not count it during its calculations. It also had the condition reversed so > the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1904) Uniform the XXXXNotFound messages from ClientRMService and ApplicationHistoryClientService
[ https://issues.apache.org/jira/browse/YARN-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309177#comment-14309177 ] Hudson commented on YARN-1904: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #93 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/93/]) YARN-1904. Ensure exceptions thrown in ClientRMService & ApplicationHistoryClientService are uniform when application-attempt is not found. Contributed by Zhijie Shen. (acmurthy: rev 18b2507edaac991e3ed68d2f27eb96f6882137b9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryClientService.java * hadoop-yarn-project/CHANGES.txt > Uniform the NotFound messages from ClientRMService and > ApplicationHistoryClientService > -- > > Key: YARN-1904 > URL: https://issues.apache.org/jira/browse/YARN-1904 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.7.0 > > Attachments: YARN-1904.1.patch > > > It's good to make ClientRMService and ApplicationHistoryClientService throw > NotFoundException with similar messages -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1537) TestLocalResourcesTrackerImpl.testLocalResourceCache often failed
[ https://issues.apache.org/jira/browse/YARN-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309171#comment-14309171 ] Hudson commented on YARN-1537: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #93 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/93/]) YARN-1537. Fix race condition in TestLocalResourcesTrackerImpl.testLocalResourceCache. Contributed by Xuan Gong. (acmurthy: rev 02f154a0016b7321bbe5b09f2da44a9b33797c36) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java * hadoop-yarn-project/CHANGES.txt > TestLocalResourcesTrackerImpl.testLocalResourceCache often failed > - > > Key: YARN-1537 > URL: https://issues.apache.org/jira/browse/YARN-1537 > Project: Hadoop YARN > Issue Type: Test > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Hong Shen >Assignee: Xuan Gong > Fix For: 2.7.0 > > Attachments: YARN-1537.1.patch > > > Here is the error log > {code} > Results : > Failed tests: > TestLocalResourcesTrackerImpl.testLocalResourceCache:351 > Wanted but not invoked: > eventHandler.handle( > > isA(org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerResourceLocalizedEvent) > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl.testLocalResourceCache(TestLocalResourcesTrackerImpl.java:351) > However, there were other interactions with this mock: > -> at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) > -> at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3149) Typo in message for invalid application id
[ https://issues.apache.org/jira/browse/YARN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309180#comment-14309180 ] Hudson commented on YARN-3149: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #93 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/93/]) YARN-3149. Fix typo in message for invalid application id. Contributed (xgong: rev b77ff37686e01b7497d3869fbc62789a5b123c0a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java * hadoop-yarn-project/CHANGES.txt > Typo in message for invalid application id > -- > > Key: YARN-3149 > URL: https://issues.apache.org/jira/browse/YARN-3149 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Trivial > Fix For: 2.7.0 > > Attachments: YARN-3149.patch, YARN-3149.patch, screenshot-1.png > > > Message in console wrong when application id format wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue
[ https://issues.apache.org/jira/browse/YARN-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309170#comment-14309170 ] Hudson commented on YARN-1582: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #93 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/93/]) YARN-1582. Capacity Scheduler: add a maximum-allocation-mb setting per queue. Contributed by Thomas Graves (jlowe: rev 69c8a7f45be5c0aa6787b07f328d74f1e2ba5628) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/CapacityScheduler.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java > Capacity Scheduler: add a maximum-allocation-mb setting per queue > -- > > Key: YARN-1582 > URL: https://issues.apache.org/jira/browse/YARN-1582 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 3.0.0, 0.23.10, 2.2.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Fix For: 2.7.0 > > Attachments: YARN-1582-branch-0.23.patch, YARN-1582.002.patch, > YARN-1582.003.patch > > > We want to allow certain queues to use larger container sizes while limiting > other queues to smaller container sizes. Setting it per queue will help > prevent abuse, help limit the impact of reservations, and allow changes in > the maximum container size to be rolled out more easily. > One reason this is needed is more application types are becoming available on > yarn and certain applications require more memory to run efficiently. While > we want to allow for that we don't want other applications to abuse that and > start requesting bigger containers then what they really need. > Note that we could have this based on application type, but that might not be > totally accurate either since for example you might want to allow certain > users on MapReduce to use larger containers, while limiting other users of > MapReduce to smaller containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3145) ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309176#comment-14309176 ] Hudson commented on YARN-3145: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #93 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/93/]) YARN-3145. Fixed ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo. Contributed by Tsuyoshi OZAWA (jianhe: rev 4641196fe02af5cab3d56a9f3c78875c495dbe03) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java > ConcurrentModificationException on CapacityScheduler > ParentQueue#getQueueUserAclInfo > > > Key: YARN-3145 > URL: https://issues.apache.org/jira/browse/YARN-3145 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Tsuyoshi OZAWA > Fix For: 2.7.0 > > Attachments: YARN-3145.001.patch, YARN-3145.002.patch > > > {code} > ava.util.ConcurrentModificationException(java.util.ConcurrentModificationException > at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115) > at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:347) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:348) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getQueueUserAclInfo(CapacityScheduler.java:850) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:844) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:250) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:335) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue
[ https://issues.apache.org/jira/browse/YARN-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309195#comment-14309195 ] Hudson commented on YARN-1582: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2028 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2028/]) YARN-1582. Capacity Scheduler: add a maximum-allocation-mb setting per queue. Contributed by Thomas Graves (jlowe: rev 69c8a7f45be5c0aa6787b07f328d74f1e2ba5628) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/CapacityScheduler.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java > Capacity Scheduler: add a maximum-allocation-mb setting per queue > -- > > Key: YARN-1582 > URL: https://issues.apache.org/jira/browse/YARN-1582 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 3.0.0, 0.23.10, 2.2.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Fix For: 2.7.0 > > Attachments: YARN-1582-branch-0.23.patch, YARN-1582.002.patch, > YARN-1582.003.patch > > > We want to allow certain queues to use larger container sizes while limiting > other queues to smaller container sizes. Setting it per queue will help > prevent abuse, help limit the impact of reservations, and allow changes in > the maximum container size to be rolled out more easily. > One reason this is needed is more application types are becoming available on > yarn and certain applications require more memory to run efficiently. While > we want to allow for that we don't want other applications to abuse that and > start requesting bigger containers then what they really need. > Note that we could have this based on application type, but that might not be > totally accurate either since for example you might want to allow certain > users on MapReduce to use larger containers, while limiting other users of > MapReduce to smaller containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3145) ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309202#comment-14309202 ] Hudson commented on YARN-3145: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2028 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2028/]) YARN-3145. Fixed ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo. Contributed by Tsuyoshi OZAWA (jianhe: rev 4641196fe02af5cab3d56a9f3c78875c495dbe03) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java > ConcurrentModificationException on CapacityScheduler > ParentQueue#getQueueUserAclInfo > > > Key: YARN-3145 > URL: https://issues.apache.org/jira/browse/YARN-3145 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Tsuyoshi OZAWA > Fix For: 2.7.0 > > Attachments: YARN-3145.001.patch, YARN-3145.002.patch > > > {code} > ava.util.ConcurrentModificationException(java.util.ConcurrentModificationException > at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115) > at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:347) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:348) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getQueueUserAclInfo(CapacityScheduler.java:850) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:844) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:250) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:335) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3149) Typo in message for invalid application id
[ https://issues.apache.org/jira/browse/YARN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309206#comment-14309206 ] Hudson commented on YARN-3149: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2028 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2028/]) YARN-3149. Fix typo in message for invalid application id. Contributed (xgong: rev b77ff37686e01b7497d3869fbc62789a5b123c0a) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java > Typo in message for invalid application id > -- > > Key: YARN-3149 > URL: https://issues.apache.org/jira/browse/YARN-3149 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Trivial > Fix For: 2.7.0 > > Attachments: YARN-3149.patch, YARN-3149.patch, screenshot-1.png > > > Message in console wrong when application id format wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1904) Uniform the XXXXNotFound messages from ClientRMService and ApplicationHistoryClientService
[ https://issues.apache.org/jira/browse/YARN-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309203#comment-14309203 ] Hudson commented on YARN-1904: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2028 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2028/]) YARN-1904. Ensure exceptions thrown in ClientRMService & ApplicationHistoryClientService are uniform when application-attempt is not found. Contributed by Zhijie Shen. (acmurthy: rev 18b2507edaac991e3ed68d2f27eb96f6882137b9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryClientService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/CHANGES.txt > Uniform the NotFound messages from ClientRMService and > ApplicationHistoryClientService > -- > > Key: YARN-1904 > URL: https://issues.apache.org/jira/browse/YARN-1904 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.7.0 > > Attachments: YARN-1904.1.patch > > > It's good to make ClientRMService and ApplicationHistoryClientService throw > NotFoundException with similar messages -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1537) TestLocalResourcesTrackerImpl.testLocalResourceCache often failed
[ https://issues.apache.org/jira/browse/YARN-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309196#comment-14309196 ] Hudson commented on YARN-1537: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2028 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2028/]) YARN-1537. Fix race condition in TestLocalResourcesTrackerImpl.testLocalResourceCache. Contributed by Xuan Gong. (acmurthy: rev 02f154a0016b7321bbe5b09f2da44a9b33797c36) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java * hadoop-yarn-project/CHANGES.txt > TestLocalResourcesTrackerImpl.testLocalResourceCache often failed > - > > Key: YARN-1537 > URL: https://issues.apache.org/jira/browse/YARN-1537 > Project: Hadoop YARN > Issue Type: Test > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Hong Shen >Assignee: Xuan Gong > Fix For: 2.7.0 > > Attachments: YARN-1537.1.patch > > > Here is the error log > {code} > Results : > Failed tests: > TestLocalResourcesTrackerImpl.testLocalResourceCache:351 > Wanted but not invoked: > eventHandler.handle( > > isA(org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerResourceLocalizedEvent) > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl.testLocalResourceCache(TestLocalResourcesTrackerImpl.java:351) > However, there were other interactions with this mock: > -> at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) > -> at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) In Fair Scheduler, fix canceling of reservations for exceeding max share
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309199#comment-14309199 ] Hudson commented on YARN-3101: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2028 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2028/]) YARN-3101. In Fair Scheduler, fix canceling of reservations for exceeding max share (Anubhav Dhoot via Sandy Ryza) (sandy: rev b6466deac6d5d6344f693144290b46e2bef83a02) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java > In Fair Scheduler, fix canceling of reservations for exceeding max share > > > Key: YARN-3101 > URL: https://issues.apache.org/jira/browse/YARN-3101 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.7.0 > > Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, > YARN-3101.001.patch, YARN-3101.002.patch, YARN-3101.003.patch, > YARN-3101.003.patch, YARN-3101.004.patch, YARN-3101.004.patch > > > YARN-2811 added fitInMaxShare to validate reservations on a queue, but did > not count it during its calculations. It also had the condition reversed so > the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1537) TestLocalResourcesTrackerImpl.testLocalResourceCache often failed
[ https://issues.apache.org/jira/browse/YARN-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309253#comment-14309253 ] Hudson commented on YARN-1537: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #97 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/97/]) YARN-1537. Fix race condition in TestLocalResourcesTrackerImpl.testLocalResourceCache. Contributed by Xuan Gong. (acmurthy: rev 02f154a0016b7321bbe5b09f2da44a9b33797c36) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java > TestLocalResourcesTrackerImpl.testLocalResourceCache often failed > - > > Key: YARN-1537 > URL: https://issues.apache.org/jira/browse/YARN-1537 > Project: Hadoop YARN > Issue Type: Test > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Hong Shen >Assignee: Xuan Gong > Fix For: 2.7.0 > > Attachments: YARN-1537.1.patch > > > Here is the error log > {code} > Results : > Failed tests: > TestLocalResourcesTrackerImpl.testLocalResourceCache:351 > Wanted but not invoked: > eventHandler.handle( > > isA(org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerResourceLocalizedEvent) > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl.testLocalResourceCache(TestLocalResourcesTrackerImpl.java:351) > However, there were other interactions with this mock: > -> at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) > -> at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue
[ https://issues.apache.org/jira/browse/YARN-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309252#comment-14309252 ] Hudson commented on YARN-1582: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #97 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/97/]) YARN-1582. Capacity Scheduler: add a maximum-allocation-mb setting per queue. Contributed by Thomas Graves (jlowe: rev 69c8a7f45be5c0aa6787b07f328d74f1e2ba5628) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/CapacityScheduler.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java > Capacity Scheduler: add a maximum-allocation-mb setting per queue > -- > > Key: YARN-1582 > URL: https://issues.apache.org/jira/browse/YARN-1582 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 3.0.0, 0.23.10, 2.2.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Fix For: 2.7.0 > > Attachments: YARN-1582-branch-0.23.patch, YARN-1582.002.patch, > YARN-1582.003.patch > > > We want to allow certain queues to use larger container sizes while limiting > other queues to smaller container sizes. Setting it per queue will help > prevent abuse, help limit the impact of reservations, and allow changes in > the maximum container size to be rolled out more easily. > One reason this is needed is more application types are becoming available on > yarn and certain applications require more memory to run efficiently. While > we want to allow for that we don't want other applications to abuse that and > start requesting bigger containers then what they really need. > Note that we could have this based on application type, but that might not be > totally accurate either since for example you might want to allow certain > users on MapReduce to use larger containers, while limiting other users of > MapReduce to smaller containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1904) Uniform the XXXXNotFound messages from ClientRMService and ApplicationHistoryClientService
[ https://issues.apache.org/jira/browse/YARN-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309259#comment-14309259 ] Hudson commented on YARN-1904: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #97 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/97/]) YARN-1904. Ensure exceptions thrown in ClientRMService & ApplicationHistoryClientService are uniform when application-attempt is not found. Contributed by Zhijie Shen. (acmurthy: rev 18b2507edaac991e3ed68d2f27eb96f6882137b9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryClientService.java * hadoop-yarn-project/CHANGES.txt > Uniform the NotFound messages from ClientRMService and > ApplicationHistoryClientService > -- > > Key: YARN-1904 > URL: https://issues.apache.org/jira/browse/YARN-1904 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.7.0 > > Attachments: YARN-1904.1.patch > > > It's good to make ClientRMService and ApplicationHistoryClientService throw > NotFoundException with similar messages -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) In Fair Scheduler, fix canceling of reservations for exceeding max share
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309255#comment-14309255 ] Hudson commented on YARN-3101: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #97 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/97/]) YARN-3101. In Fair Scheduler, fix canceling of reservations for exceeding max share (Anubhav Dhoot via Sandy Ryza) (sandy: rev b6466deac6d5d6344f693144290b46e2bef83a02) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/CHANGES.txt > In Fair Scheduler, fix canceling of reservations for exceeding max share > > > Key: YARN-3101 > URL: https://issues.apache.org/jira/browse/YARN-3101 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.7.0 > > Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, > YARN-3101.001.patch, YARN-3101.002.patch, YARN-3101.003.patch, > YARN-3101.003.patch, YARN-3101.004.patch, YARN-3101.004.patch > > > YARN-2811 added fitInMaxShare to validate reservations on a queue, but did > not count it during its calculations. It also had the condition reversed so > the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3149) Typo in message for invalid application id
[ https://issues.apache.org/jira/browse/YARN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309262#comment-14309262 ] Hudson commented on YARN-3149: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #97 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/97/]) YARN-3149. Fix typo in message for invalid application id. Contributed (xgong: rev b77ff37686e01b7497d3869fbc62789a5b123c0a) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java > Typo in message for invalid application id > -- > > Key: YARN-3149 > URL: https://issues.apache.org/jira/browse/YARN-3149 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Trivial > Fix For: 2.7.0 > > Attachments: YARN-3149.patch, YARN-3149.patch, screenshot-1.png > > > Message in console wrong when application id format wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3145) ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309258#comment-14309258 ] Hudson commented on YARN-3145: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #97 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/97/]) YARN-3145. Fixed ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo. Contributed by Tsuyoshi OZAWA (jianhe: rev 4641196fe02af5cab3d56a9f3c78875c495dbe03) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java > ConcurrentModificationException on CapacityScheduler > ParentQueue#getQueueUserAclInfo > > > Key: YARN-3145 > URL: https://issues.apache.org/jira/browse/YARN-3145 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Tsuyoshi OZAWA > Fix For: 2.7.0 > > Attachments: YARN-3145.001.patch, YARN-3145.002.patch > > > {code} > ava.util.ConcurrentModificationException(java.util.ConcurrentModificationException > at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115) > at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:347) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:348) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getQueueUserAclInfo(CapacityScheduler.java:850) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:844) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:250) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:335) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2809) Implement workaround for linux kernel panic when removing cgroup
[ https://issues.apache.org/jira/browse/YARN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Roberts updated YARN-2809: - Attachment: YARN-2809-v2.patch upmerge to latest trunk > Implement workaround for linux kernel panic when removing cgroup > > > Key: YARN-2809 > URL: https://issues.apache.org/jira/browse/YARN-2809 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 > Environment: RHEL 6.4 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-2809-v2.patch, YARN-2809.patch > > > Some older versions of linux have a bug that can cause a kernel panic when > the LCE attempts to remove a cgroup. It is a race condition so it's a bit > rare but on a few thousand node cluster it can result in a couple of panics > per day. > This is the commit that likely (haven't verified) fixes the problem in linux: > https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.y&id=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267 > Details will be added in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice
[ https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309280#comment-14309280 ] Jason Lowe commented on YARN-2246: -- [~devaraj.k] are you still planning to address this issue? It's a benign problem with the history server UI since it ignores the extra components of the URL, but there are some use cases with Tez and other instances where this needs to be fixed. > Job History Link in RM UI is redirecting to the URL which contains Job Id > twice > --- > > Key: YARN-2246 > URL: https://issues.apache.org/jira/browse/YARN-2246 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.0.0, 0.23.11, 2.5.0 >Reporter: Devaraj K >Assignee: Devaraj K > Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch > > > {code:xml} > http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1537) TestLocalResourcesTrackerImpl.testLocalResourceCache often failed
[ https://issues.apache.org/jira/browse/YARN-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309302#comment-14309302 ] Hudson commented on YARN-1537: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2047 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2047/]) YARN-1537. Fix race condition in TestLocalResourcesTrackerImpl.testLocalResourceCache. Contributed by Xuan Gong. (acmurthy: rev 02f154a0016b7321bbe5b09f2da44a9b33797c36) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java * hadoop-yarn-project/CHANGES.txt > TestLocalResourcesTrackerImpl.testLocalResourceCache often failed > - > > Key: YARN-1537 > URL: https://issues.apache.org/jira/browse/YARN-1537 > Project: Hadoop YARN > Issue Type: Test > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Hong Shen >Assignee: Xuan Gong > Fix For: 2.7.0 > > Attachments: YARN-1537.1.patch > > > Here is the error log > {code} > Results : > Failed tests: > TestLocalResourcesTrackerImpl.testLocalResourceCache:351 > Wanted but not invoked: > eventHandler.handle( > > isA(org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerResourceLocalizedEvent) > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl.testLocalResourceCache(TestLocalResourcesTrackerImpl.java:351) > However, there were other interactions with this mock: > -> at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) > -> at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3145) ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309307#comment-14309307 ] Hudson commented on YARN-3145: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2047 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2047/]) YARN-3145. Fixed ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo. Contributed by Tsuyoshi OZAWA (jianhe: rev 4641196fe02af5cab3d56a9f3c78875c495dbe03) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java > ConcurrentModificationException on CapacityScheduler > ParentQueue#getQueueUserAclInfo > > > Key: YARN-3145 > URL: https://issues.apache.org/jira/browse/YARN-3145 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Tsuyoshi OZAWA > Fix For: 2.7.0 > > Attachments: YARN-3145.001.patch, YARN-3145.002.patch > > > {code} > ava.util.ConcurrentModificationException(java.util.ConcurrentModificationException > at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115) > at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:347) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:348) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getQueueUserAclInfo(CapacityScheduler.java:850) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:844) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:250) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:335) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) In Fair Scheduler, fix canceling of reservations for exceeding max share
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309304#comment-14309304 ] Hudson commented on YARN-3101: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2047 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2047/]) YARN-3101. In Fair Scheduler, fix canceling of reservations for exceeding max share (Anubhav Dhoot via Sandy Ryza) (sandy: rev b6466deac6d5d6344f693144290b46e2bef83a02) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/CHANGES.txt > In Fair Scheduler, fix canceling of reservations for exceeding max share > > > Key: YARN-3101 > URL: https://issues.apache.org/jira/browse/YARN-3101 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.7.0 > > Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, > YARN-3101.001.patch, YARN-3101.002.patch, YARN-3101.003.patch, > YARN-3101.003.patch, YARN-3101.004.patch, YARN-3101.004.patch > > > YARN-2811 added fitInMaxShare to validate reservations on a queue, but did > not count it during its calculations. It also had the condition reversed so > the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue
[ https://issues.apache.org/jira/browse/YARN-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309301#comment-14309301 ] Hudson commented on YARN-1582: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2047 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2047/]) YARN-1582. Capacity Scheduler: add a maximum-allocation-mb setting per queue. Contributed by Thomas Graves (jlowe: rev 69c8a7f45be5c0aa6787b07f328d74f1e2ba5628) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/CapacityScheduler.apt.vm * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java > Capacity Scheduler: add a maximum-allocation-mb setting per queue > -- > > Key: YARN-1582 > URL: https://issues.apache.org/jira/browse/YARN-1582 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 3.0.0, 0.23.10, 2.2.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Fix For: 2.7.0 > > Attachments: YARN-1582-branch-0.23.patch, YARN-1582.002.patch, > YARN-1582.003.patch > > > We want to allow certain queues to use larger container sizes while limiting > other queues to smaller container sizes. Setting it per queue will help > prevent abuse, help limit the impact of reservations, and allow changes in > the maximum container size to be rolled out more easily. > One reason this is needed is more application types are becoming available on > yarn and certain applications require more memory to run efficiently. While > we want to allow for that we don't want other applications to abuse that and > start requesting bigger containers then what they really need. > Note that we could have this based on application type, but that might not be > totally accurate either since for example you might want to allow certain > users on MapReduce to use larger containers, while limiting other users of > MapReduce to smaller containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3149) Typo in message for invalid application id
[ https://issues.apache.org/jira/browse/YARN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309311#comment-14309311 ] Hudson commented on YARN-3149: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2047 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2047/]) YARN-3149. Fix typo in message for invalid application id. Contributed (xgong: rev b77ff37686e01b7497d3869fbc62789a5b123c0a) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java > Typo in message for invalid application id > -- > > Key: YARN-3149 > URL: https://issues.apache.org/jira/browse/YARN-3149 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Trivial > Fix For: 2.7.0 > > Attachments: YARN-3149.patch, YARN-3149.patch, screenshot-1.png > > > Message in console wrong when application id format wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1904) Uniform the XXXXNotFound messages from ClientRMService and ApplicationHistoryClientService
[ https://issues.apache.org/jira/browse/YARN-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309308#comment-14309308 ] Hudson commented on YARN-1904: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2047 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2047/]) YARN-1904. Ensure exceptions thrown in ClientRMService & ApplicationHistoryClientService are uniform when application-attempt is not found. Contributed by Zhijie Shen. (acmurthy: rev 18b2507edaac991e3ed68d2f27eb96f6882137b9) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryClientService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java > Uniform the NotFound messages from ClientRMService and > ApplicationHistoryClientService > -- > > Key: YARN-1904 > URL: https://issues.apache.org/jira/browse/YARN-1904 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.7.0 > > Attachments: YARN-1904.1.patch > > > It's good to make ClientRMService and ApplicationHistoryClientService throw > NotFoundException with similar messages -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-1126) Add validation of users input nodes-states options to nodes CLI
[ https://issues.apache.org/jira/browse/YARN-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy reopened YARN-1126: - I'm re-opening this to commit the addendum patch from YARN-905 (https://issues.apache.org/jira/secure/attachment/12606009/YARN-905-addendum.patch) since the other jira already went out in 2.3.0. Targeting this for 2.7.0. > Add validation of users input nodes-states options to nodes CLI > --- > > Key: YARN-1126 > URL: https://issues.apache.org/jira/browse/YARN-1126 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wei Yan >Assignee: Wei Yan > > Follow the discussion in YARN-905. > (1) case-insensitive checks for "all". > (2) validation of users input, exit with non-zero code and print all valid > states when user gives an invalid state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2809) Implement workaround for linux kernel panic when removing cgroup
[ https://issues.apache.org/jira/browse/YARN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309343#comment-14309343 ] Hadoop QA commented on YARN-2809: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697032/YARN-2809-v2.patch against trunk revision 1425e3d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6535//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6535//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6535//console This message is automatically generated. > Implement workaround for linux kernel panic when removing cgroup > > > Key: YARN-2809 > URL: https://issues.apache.org/jira/browse/YARN-2809 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 > Environment: RHEL 6.4 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-2809-v2.patch, YARN-2809.patch > > > Some older versions of linux have a bug that can cause a kernel panic when > the LCE attempts to remove a cgroup. It is a race condition so it's a bit > rare but on a few thousand node cluster it can result in a couple of panics > per day. > This is the commit that likely (haven't verified) fixes the problem in linux: > https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.y&id=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267 > Details will be added in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3144) Configuration for making delegation token failures to timeline server not-fatal
[ https://issues.apache.org/jira/browse/YARN-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309344#comment-14309344 ] Jason Lowe commented on YARN-3144: -- Thanks for updating the patch. Comments: * The added test now no longer mocks the TimelineClient as it did before? The test requires the timeline client to throw to work properly, and we could accidentally connect to a timeline server. * Nit: Does timelineServicesBestEffort need to be visible anymore? * Nit: Reading the doc string for the property in yarn-default.xml implies it should be true to make timeline operations fatal. > Configuration for making delegation token failures to timeline server > not-fatal > --- > > Key: YARN-3144 > URL: https://issues.apache.org/jira/browse/YARN-3144 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-3144.1.patch, YARN-3144.2.patch > > > Posting events to the timeline server is best-effort. However, getting the > delegation tokens from the timeline server will kill the job. This patch adds > a configuration to make get delegation token operations "best-effort". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3144) Configuration for making delegation token failures to timeline server not-fatal
[ https://issues.apache.org/jira/browse/YARN-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-3144: -- Attachment: YARN-3144.3.patch Thanks, [~jlowe]. One more patch to fix up those issues. > Configuration for making delegation token failures to timeline server > not-fatal > --- > > Key: YARN-3144 > URL: https://issues.apache.org/jira/browse/YARN-3144 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-3144.1.patch, YARN-3144.2.patch, YARN-3144.3.patch > > > Posting events to the timeline server is best-effort. However, getting the > delegation tokens from the timeline server will kill the job. This patch adds > a configuration to make get delegation token operations "best-effort". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice
[ https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309390#comment-14309390 ] Devaraj K commented on YARN-2246: - [~jlowe], [~zjshen] Thanks for your inputs. [~jlowe], I have started working on this, will provide patch today. Thanks > Job History Link in RM UI is redirecting to the URL which contains Job Id > twice > --- > > Key: YARN-2246 > URL: https://issues.apache.org/jira/browse/YARN-2246 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.0.0, 0.23.11, 2.5.0 >Reporter: Devaraj K >Assignee: Devaraj K > Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch > > > {code:xml} > http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3152) Missing hadoop exclude file fails RMs in HA
[ https://issues.apache.org/jira/browse/YARN-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla moved HADOOP-11555 to YARN-3152: - Component/s: (was: ha) resourcemanager Affects Version/s: (was: 2.6.0) 2.6.0 Key: YARN-3152 (was: HADOOP-11555) Project: Hadoop YARN (was: Hadoop Common) > Missing hadoop exclude file fails RMs in HA > --- > > Key: YARN-3152 > URL: https://issues.apache.org/jira/browse/YARN-3152 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 > Environment: Debian 7 >Reporter: Neill Lima > > I have two NNs in HA, they do not fail when the exclude file is not present > (hadoop-2.6.0/etc/hadoop/exclude). I had one RM and I wanted to make two in > HA. I didn't create the exclude file at this point as well. I applied the HA > RM settings properly and when I started both RMs I started getting this > exception: > 2015-02-06 12:25:25,326 WARN > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root > OPERATION=transitionToActiveTARGET=RMHAProtocolService > RESULT=FAILURE DESCRIPTION=Exception transitioning to active > PERMISSIONS=All users are allowed > 2015-02-06 12:25:25,326 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when > transitioning to Active mode > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304) > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException: > java.io.FileNotFoundException: /hadoop-2.6.0/etc/hadoop/exclude (No such file > or directory) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:626) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297) > ... 5 more > 2015-02-06 12:25:25,327 INFO org.apache.hadoop.ha.ActiveStandbyElector: > Trying to re-establish ZK session > 2015-02-06 12:25:25,339 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x44af32566180094 closed > 2015-02-06 12:25:26,340 INFO org.apache.zookeeper.ZooKeeper: Initiating > client connection, connectString=x.x.x.x:2181,x.x.x.x:2181 > sessionTimeout=1 > watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@307587c > 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Opening socket > connection to server x.x.x.x/x.x.x.x:2181. Will not attempt to authenticate > using SASL (unknown error) > 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to x.x.x.x/x.x.x.x:2181, initiating session > The issue is descriptive enough to resolve the problem - and it has been > fixed by creating the exclude file. > I just think as of a improvement: > - Should RMs ignore the missing file as the NNs did? > - Should single RM fail even when the file is not present? > Just suggesting this improvement to keep the behavior consistent when working > with in HA (both NNs and RMs). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3144) Configuration for making delegation token failures to timeline server not-fatal
[ https://issues.apache.org/jira/browse/YARN-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309454#comment-14309454 ] Hadoop QA commented on YARN-3144: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697051/YARN-3144.3.patch against trunk revision 1425e3d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6536//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6536//console This message is automatically generated. > Configuration for making delegation token failures to timeline server > not-fatal > --- > > Key: YARN-3144 > URL: https://issues.apache.org/jira/browse/YARN-3144 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-3144.1.patch, YARN-3144.2.patch, YARN-3144.3.patch > > > Posting events to the timeline server is best-effort. However, getting the > delegation tokens from the timeline server will kill the job. This patch adds > a configuration to make get delegation token operations "best-effort". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-1449) Protocol changes and implementations in NM side to support change container resource
[ https://issues.apache.org/jira/browse/YARN-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan reassigned YARN-1449: Assignee: Wangda Tan (was: Wangda Tan (No longer used)) > Protocol changes and implementations in NM side to support change container > resource > > > Key: YARN-1449 > URL: https://issues.apache.org/jira/browse/YARN-1449 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Wangda Tan (No longer used) >Assignee: Wangda Tan > Attachments: yarn-1449.1.patch, yarn-1449.3.patch, yarn-1449.4.patch, > yarn-1449.5.patch > > > As described in YARN-1197, we need add API/implementation changes, > 1) Add a "changeContainersResources" method in ContainerManagementProtocol > 2) Can get succeed/failed increased/decreased containers in response of > "changeContainersResources" > 3) Add a "new decreased containers" field in NodeStatus which can help NM > notify RM such changes > 4) Added changeContainersResources implementation in ContainerManagerImpl > 5) Added changes in ContainersMonitorImpl to support change resource limit of > containers -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1449) Protocol changes and implementations in NM side to support change container resource
[ https://issues.apache.org/jira/browse/YARN-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309516#comment-14309516 ] Wangda Tan commented on YARN-1449: -- Canceled patch. > Protocol changes and implementations in NM side to support change container > resource > > > Key: YARN-1449 > URL: https://issues.apache.org/jira/browse/YARN-1449 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Wangda Tan (No longer used) >Assignee: Wangda Tan > Attachments: yarn-1449.1.patch, yarn-1449.3.patch, yarn-1449.4.patch, > yarn-1449.5.patch > > > As described in YARN-1197, we need add API/implementation changes, > 1) Add a "changeContainersResources" method in ContainerManagementProtocol > 2) Can get succeed/failed increased/decreased containers in response of > "changeContainersResources" > 3) Add a "new decreased containers" field in NodeStatus which can help NM > notify RM such changes > 4) Added changeContainersResources implementation in ContainerManagerImpl > 5) Added changes in ContainersMonitorImpl to support change resource limit of > containers -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3147) Clean up RM web proxy code
[ https://issues.apache.org/jira/browse/YARN-3147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309555#comment-14309555 ] Xuan Gong commented on YARN-3147: - Thanks for the patch. [~ste...@apache.org] I will take a look shortly. > Clean up RM web proxy code > --- > > Key: YARN-3147 > URL: https://issues.apache.org/jira/browse/YARN-3147 > Project: Hadoop YARN > Issue Type: Improvement > Components: webapp >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-3147-001.patch, YARN-3147-002.patch > > > YARN-2084 covers fixing up the RM proxy & filter for REST support. > Before doing that, prepare for it by cleaning up the codebase: factoring out > the redirect logic into a single method, some minor reformatting, move to > SLF4J and Java7 code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3033) implement NM starting the ATS writer companion
[ https://issues.apache.org/jira/browse/YARN-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309574#comment-14309574 ] Sangjin Lee commented on YARN-3033: --- [~djp], I think it'd be good to support *both* options. I do see some may want to run it as an aux service for simplicity of deployments (one fewer daemons to start), especially in a small setup. However, we do need to address the web app issue at YARN-3087 to avoid the undesirable module dependency. A standalone is probably safer as it would affect the node manager less. We still need to poke this daemon for the AM lifecycle (thus the service part). > implement NM starting the ATS writer companion > -- > > Key: YARN-3033 > URL: https://issues.apache.org/jira/browse/YARN-3033 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee > > Per design in YARN-2928, implement node managers starting the ATS writer > companion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3144) Configuration for making delegation token failures to timeline server not-fatal
[ https://issues.apache.org/jira/browse/YARN-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309585#comment-14309585 ] Jason Lowe commented on YARN-3144: -- Thanks, Jon! We're almost there, but on the final review before commit I found one last thing that I think should be fixed. My apologies for not catching it sooner: {code} +} catch (Exception e ) { + if (timelineServiceBestEffort) { +LOG.warn("Failed to get delegation token from the timeline server"); +return null; + } {code} I think it's important to log something about the exception that was received, otherwise it can be very frustrating to debug. Not sure if we should log the full exception stack or just the message, but I think we should say more than just it didn't work. > Configuration for making delegation token failures to timeline server > not-fatal > --- > > Key: YARN-3144 > URL: https://issues.apache.org/jira/browse/YARN-3144 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-3144.1.patch, YARN-3144.2.patch, YARN-3144.3.patch > > > Posting events to the timeline server is best-effort. However, getting the > delegation tokens from the timeline server will kill the job. This patch adds > a configuration to make get delegation token operations "best-effort". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3041) create the ATS entity/event API
[ https://issues.apache.org/jira/browse/YARN-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309612#comment-14309612 ] Sangjin Lee commented on YARN-3041: --- [~rkanter], [~Naganarasimha], IMO it might make sense to define all YARN system entities as explicit types. It would include flow runs, YARN apps, app attempts, and containers. They have well-defined meaning and relationship, so it seems natural to me? Thoughts? > create the ATS entity/event API > --- > > Key: YARN-3041 > URL: https://issues.apache.org/jira/browse/YARN-3041 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Robert Kanter > Attachments: YARN-3041.preliminary.001.patch > > > Per design in YARN-2928, create the ATS entity and events API. > Also, as part of this JIRA, create YARN system entities (e.g. cluster, user, > flow, flow run, YARN app, ...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309639#comment-14309639 ] Jian He commented on YARN-3021: --- bq. Explicitly have an external renewer system that has the right permissions to renew these tokens. I think this is the correct long-term solution. RM today happens to be the renewer. But we need a central renewer component so that we can do cross-cluster renewals. bq. RM can simply inspect the incoming renewer specified in the token and skip renewing those tokens if the renewer doesn't match it's own address I think in this case, the renewer specified in the token is the same as the RM. IIUC, the JobClient will request the token from B cluster, but still specify the renewer as the A cluster RM (via the A cluster local config), am I right? > YARN's delegation-token handling disallows certain trust setups to operate > properly > --- > > Key: YARN-3021 > URL: https://issues.apache.org/jira/browse/YARN-3021 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Affects Versions: 2.3.0 >Reporter: Harsh J > Attachments: YARN-3021.001.patch, YARN-3021.002.patch, > YARN-3021.003.patch, YARN-3021.patch > > > Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, > and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN > clusters. > Now if one logs in with a COMMON credential, and runs a job on A's YARN that > needs to access B's HDFS (such as a DistCp), the operation fails in the RM, > as it attempts a renewDelegationToken(…) synchronously during application > submission (to validate the managed token before it adds it to a scheduler > for automatic renewal). The call obviously fails cause B realm will not trust > A's credentials (here, the RM's principal is the renewer). > In the 1.x JobTracker the same call is present, but it is done asynchronously > and once the renewal attempt failed we simply ceased to schedule any further > attempts of renewals, rather than fail the job immediately. > We should change the logic such that we attempt the renewal but go easy on > the failure and skip the scheduling alone, rather than bubble back an error > to the client, failing the app submission. This way the old behaviour is > retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309650#comment-14309650 ] Jian He commented on YARN-3021: --- bq. the JobClient will request the token from B cluster, but still specify the renewer as the A cluster RM (via the A cluster local config) If this is the case, the assumption here is problematic, why would I request a token from B but let untrusted 3rd party A renew my token in the first place? > YARN's delegation-token handling disallows certain trust setups to operate > properly > --- > > Key: YARN-3021 > URL: https://issues.apache.org/jira/browse/YARN-3021 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Affects Versions: 2.3.0 >Reporter: Harsh J > Attachments: YARN-3021.001.patch, YARN-3021.002.patch, > YARN-3021.003.patch, YARN-3021.patch > > > Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, > and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN > clusters. > Now if one logs in with a COMMON credential, and runs a job on A's YARN that > needs to access B's HDFS (such as a DistCp), the operation fails in the RM, > as it attempts a renewDelegationToken(…) synchronously during application > submission (to validate the managed token before it adds it to a scheduler > for automatic renewal). The call obviously fails cause B realm will not trust > A's credentials (here, the RM's principal is the renewer). > In the 1.x JobTracker the same call is present, but it is done asynchronously > and once the renewal attempt failed we simply ceased to schedule any further > attempts of renewals, rather than fail the job immediately. > We should change the logic such that we attempt the renewal but go easy on > the failure and skip the scheduling alone, rather than bubble back an error > to the client, failing the app submission. This way the old behaviour is > retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1142) MiniYARNCluster web ui does not work properly
[ https://issues.apache.org/jira/browse/YARN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309663#comment-14309663 ] Sangjin Lee commented on YARN-1142: --- Some more info on this at https://issues.apache.org/jira/browse/YARN-3087?focusedCommentId=14307614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14307614 > MiniYARNCluster web ui does not work properly > - > > Key: YARN-1142 > URL: https://issues.apache.org/jira/browse/YARN-1142 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur > Fix For: 2.7.0 > > > When going to the RM http port, the NM web ui is displayed. It seems there is > a singleton somewhere that breaks things when RM & NMs run in the same > process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3087) the REST server (web server) for per-node aggregator does not work if it runs inside node manager
[ https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309662#comment-14309662 ] Sangjin Lee commented on YARN-3087: --- Thanks for looking into this [~devaraj.k]! Doesn't sound there is a quick resolution then. :( > the REST server (web server) for per-node aggregator does not work if it runs > inside node manager > - > > Key: YARN-3087 > URL: https://issues.apache.org/jira/browse/YARN-3087 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Devaraj K > > This is related to YARN-3030. YARN-3030 sets up a per-node timeline > aggregator and the associated REST server. It runs fine as a standalone > process, but does not work if it runs inside the node manager due to possible > collisions of servlet mapping. > Exception: > {noformat} > org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for > v2 not found > at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232) > at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309703#comment-14309703 ] Yongjun Zhang commented on YARN-3021: - Hi [~vinodkv] and [~jianhe], Thank you so much for review and commenting! I will try to respond to part of your comments here and keep looking into the rest. {quote} RM can simply inspect the incoming renewer specified in the token and skip renewing those tokens if the renewer doesn't match it's own address. This way, we don't need an explicit API in the submission context. {quote} Seems regardless of this jira, we could have do the above change, right? any catch? {quote} Apologies for going back and forth on this one. {quote} I appreciate the insight you provided, and we are trying to figure out the best solution together. All the points you provided are reasonable, so absolutely no need for apologies here. {quote} Irrespective of how we decide to skip tokens, the way the patch is skipping renewal will not work. In secure mode, DelegationTokenRenewer drives the app state machine. So if you skip adding the app itself to DTR, the app will be completely {quote} I did test in a secure env and it worked. Would you please elaborate? {quote} I think in this case, the renewer specified in the token is the same as the RM. IIUC, the JobClient will request the token from B cluster, but still specify the renewer as the A cluster RM (via the A cluster local config), am I right? {quote} I think that's the case. The problem is that there is no trust between A and B. So "common" should be the one to renew the token. Thanks. > YARN's delegation-token handling disallows certain trust setups to operate > properly > --- > > Key: YARN-3021 > URL: https://issues.apache.org/jira/browse/YARN-3021 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Affects Versions: 2.3.0 >Reporter: Harsh J > Attachments: YARN-3021.001.patch, YARN-3021.002.patch, > YARN-3021.003.patch, YARN-3021.patch > > > Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, > and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN > clusters. > Now if one logs in with a COMMON credential, and runs a job on A's YARN that > needs to access B's HDFS (such as a DistCp), the operation fails in the RM, > as it attempts a renewDelegationToken(…) synchronously during application > submission (to validate the managed token before it adds it to a scheduler > for automatic renewal). The call obviously fails cause B realm will not trust > A's credentials (here, the RM's principal is the renewer). > In the 1.x JobTracker the same call is present, but it is done asynchronously > and once the renewal attempt failed we simply ceased to schedule any further > attempts of renewals, rather than fail the job immediately. > We should change the logic such that we attempt the renewal but go easy on > the failure and skip the scheduling alone, rather than bubble back an error > to the client, failing the app submission. This way the old behaviour is > retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3144) Configuration for making delegation token failures to timeline server not-fatal
[ https://issues.apache.org/jira/browse/YARN-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-3144: -- Attachment: YARN-3144.4.patch No problem, [~jlowe]. Uploaded patch to add the exception message. > Configuration for making delegation token failures to timeline server > not-fatal > --- > > Key: YARN-3144 > URL: https://issues.apache.org/jira/browse/YARN-3144 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-3144.1.patch, YARN-3144.2.patch, YARN-3144.3.patch, > YARN-3144.4.patch > > > Posting events to the timeline server is best-effort. However, getting the > delegation tokens from the timeline server will kill the job. This patch adds > a configuration to make get delegation token operations "best-effort". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2694) Ensure only single node labels specified in resource request / host, and node label expression only specified when resourceName=ANY
[ https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2694: -- Target Version/s: 2.7.0 (was: 2.6.0) > Ensure only single node labels specified in resource request / host, and node > label expression only specified when resourceName=ANY > --- > > Key: YARN-2694 > URL: https://issues.apache.org/jira/browse/YARN-2694 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Fix For: 2.7.0 > > Attachments: YARN-2694-20141020-1.patch, YARN-2694-20141021-1.patch, > YARN-2694-20141023-1.patch, YARN-2694-20141023-2.patch, > YARN-2694-20141101-1.patch, YARN-2694-20141101-2.patch, > YARN-2694-20150121-1.patch, YARN-2694-20150122-1.patch, > YARN-2694-20150202-1.patch, YARN-2694-20150203-1.patch, > YARN-2694-20150203-2.patch, YARN-2694-20150204-1.patch, > YARN-2694-20150205-1.patch, YARN-2694-20150205-2.patch, > YARN-2694-20150205-3.patch > > > Currently, node label expression supporting in capacity scheduler is partial > completed. Now node label expression specified in Resource Request will only > respected when it specified at ANY level. And a ResourceRequest/host with > multiple node labels will make user limit, etc. computation becomes more > tricky. > Now we need temporarily disable them, changes include, > - AMRMClient > - ApplicationMasterService > - RMAdminCLI > - CommonNodeLabelsManager -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2694) Ensure only single node labels specified in resource request / host, and node label expression only specified when resourceName=ANY
[ https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309752#comment-14309752 ] Hudson commented on YARN-2694: -- FAILURE: Integrated in Hadoop-trunk-Commit #7042 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7042/]) YARN-2694. Ensure only single node label specified in ResourceRequest. Contributed by Wangda Tan (jianhe: rev c1957fef29b07fea70938e971b30532a1e131fd0) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java > Ensure only single node labels specified in resource request / host, and node > label expression only specified when resourceName=ANY > --- > > Key: YARN-2694 > URL: https://issues.apache.org/jira/browse/YARN-2694 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Fix For: 2.7.0 > > Attachments: YARN-2694-20141020-1.patch, YARN-2694-20141021-1.patch, > YARN-2694-20141023-1.patch, YARN-2694-20141023-2.patch, > YARN-2694-20141101-1.patch, YARN-2694-20141101-2.patch, > YARN-2694-20150121-1.patch, YARN-2694-20150122-1.patch, > YARN-2694-20150202-1.patch, YARN-2694-20150203-1.patch, > YARN-2694-20150203-2.patch, YARN-2694-20150204-1.patch, > YARN-2694-20150205-1.patch, YARN-2694-20150205-2.patch, > YARN-2694-20150205-3.patch > > > Currently, node label expression supporting in capacity scheduler is partial > completed. Now node label expression specified in Resource Request will only > respected when it specified at ANY level. And a ResourceRequest/host with > multiple node labels will make user limit, etc. computation becomes more > tricky. > Now we need temporarily disable them, changes include, > - AMRMClient > - ApplicationMasterService > - RMAdminCLI > - CommonNodeLabelsManager -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309757#comment-14309757 ] Jian He commented on YARN-3100: --- bq. AbstractCSQueue and CSQueueUtils Maybe I missed something, I think these two are mostly fine. As we create the new queue hierarchy first and then update the old queues. If certain methods fail in these two classes, the new queue creation will fail upfront and so will not update the old queue. Anyway, we can address this separately. > Make YARN authorization pluggable > - > > Key: YARN-3100 > URL: https://issues.apache.org/jira/browse/YARN-3100 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-3100.1.patch, YARN-3100.2.patch > > > The goal is to have YARN acl model pluggable so as to integrate other > authorization tool such as Apache Ranger, Sentry. > Currently, we have > - admin ACL > - queue ACL > - application ACL > - time line domain ACL > - service ACL > The proposal is to create a YarnAuthorizationProvider interface. Current > implementation will be the default implementation. Ranger or Sentry plug-in > can implement this interface. > Benefit: > - Unify the code base. With the default implementation, we can get rid of > each specific ACL manager such as AdminAclManager, ApplicationACLsManager, > QueueAclsManager etc. > - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-281) Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits
[ https://issues.apache.org/jira/browse/YARN-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan resolved YARN-281. - Resolution: Won't Fix Release Note: I think this may not need since we already have tests in TestSchedulerUitls, it will verify minimum/maximum resource normalization/verification. And SchedulerUtil runs before scheduler can see such resource requests. Resolved it as won't fix. > Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits > - > > Key: YARN-281 > URL: https://issues.apache.org/jira/browse/YARN-281 > Project: Hadoop YARN > Issue Type: Test > Components: scheduler >Affects Versions: 2.0.0-alpha >Reporter: Harsh J >Assignee: Wangda Tan > Labels: test > > We currently have tests that test MINIMUM_ALLOCATION limits for FifoScheduler > and the likes, but no test for MAXIMUM_ALLOCATION yet. We should add a test > to prevent regressions of any kind on such limits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3144) Configuration for making delegation token failures to timeline server not-fatal
[ https://issues.apache.org/jira/browse/YARN-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309779#comment-14309779 ] Jason Lowe commented on YARN-3144: -- +1 pending Jenkins. > Configuration for making delegation token failures to timeline server > not-fatal > --- > > Key: YARN-3144 > URL: https://issues.apache.org/jira/browse/YARN-3144 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-3144.1.patch, YARN-3144.2.patch, YARN-3144.3.patch, > YARN-3144.4.patch > > > Posting events to the timeline server is best-effort. However, getting the > delegation tokens from the timeline server will kill the job. This patch adds > a configuration to make get delegation token operations "best-effort". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2809) Implement workaround for linux kernel panic when removing cgroup
[ https://issues.apache.org/jira/browse/YARN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Roberts updated YARN-2809: - Attachment: YARN-2809-v3.patch > Implement workaround for linux kernel panic when removing cgroup > > > Key: YARN-2809 > URL: https://issues.apache.org/jira/browse/YARN-2809 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 > Environment: RHEL 6.4 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-2809-v2.patch, YARN-2809-v3.patch, YARN-2809.patch > > > Some older versions of linux have a bug that can cause a kernel panic when > the LCE attempts to remove a cgroup. It is a race condition so it's a bit > rare but on a few thousand node cluster it can result in a couple of panics > per day. > This is the commit that likely (haven't verified) fixes the problem in linux: > https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.y&id=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267 > Details will be added in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3144) Configuration for making delegation token failures to timeline server not-fatal
[ https://issues.apache.org/jira/browse/YARN-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309816#comment-14309816 ] Hadoop QA commented on YARN-3144: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697097/YARN-3144.4.patch against trunk revision eaab959. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6537//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6537//console This message is automatically generated. > Configuration for making delegation token failures to timeline server > not-fatal > --- > > Key: YARN-3144 > URL: https://issues.apache.org/jira/browse/YARN-3144 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-3144.1.patch, YARN-3144.2.patch, YARN-3144.3.patch, > YARN-3144.4.patch > > > Posting events to the timeline server is best-effort. However, getting the > delegation tokens from the timeline server will kill the job. This patch adds > a configuration to make get delegation token operations "best-effort". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3126) FairScheduler: queue's usedResource is always more than the maxResource limit
[ https://issues.apache.org/jira/browse/YARN-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309865#comment-14309865 ] Wei Yan commented on YARN-3126: --- [~Xia Hu], I checked the latest trunk version. The problem is still there. Could u rebase a patch for the trunk? Normally we fix the problem in trunk, instead of previous released version. And we may need to get YARN-2083 committed firstly. Hey, [~kasha], do u have time look YARN-2083? > FairScheduler: queue's usedResource is always more than the maxResource limit > - > > Key: YARN-3126 > URL: https://issues.apache.org/jira/browse/YARN-3126 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.3.0 > Environment: hadoop2.3.0. fair scheduler. spark 1.1.0. >Reporter: Xia Hu > Labels: assignContainer, fairscheduler, resources > Attachments: resourcelimit.patch > > > When submitting spark application(both spark-on-yarn-cluster and > spark-on-yarn-cleint model), the queue's usedResources assigned by > fairscheduler always can be more than the queue's maxResources limit. > And by reading codes of fairscheduler, I suppose this issue happened because > of ignore to check the request resources when assign Container. > Here is the detail: > 1. choose a queue. In this process, it will check if queue's usedResource is > bigger than its max, with assignContainerPreCheck. > 2. then choose a app in the certain queue. > 3. then choose a container. And here is the question, there is no check > whether this container would make the queue sources over its max limit. If a > queue's usedResource is 13G, the maxResource limit is 16G, then a container > which asking for 4G resources may be assigned successful. > This problem will always happen in spark application, cause we can ask for > different container resources in different applications. > By the way, I have already use the patch from YARN-2083. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3120) YarnException on windows + org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to setup local dirnm-local-dir, which was marked as good.
[ https://issues.apache.org/jira/browse/YARN-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309867#comment-14309867 ] vaidhyanathan commented on YARN-3120: - Hi Varun, Thanks for responding. I started running the yarn cmd files by running as administrator and it worked . Also i opened the command prompt and ran it in the administrator mode. The word count example worked fine for the first time but now im facing a different issue , When i run it now with the earlier setup , job doesnt proceed after this step '15/02/06 15:38:26 INFO mapreduce.Job: Running job: job_1423255041751_0001' and when i check the consolde the status is 'Accepted' and the final status is 'Undefined' > YarnException on windows + > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to setup local > dirnm-local-dir, which was marked as good. > --- > > Key: YARN-3120 > URL: https://issues.apache.org/jira/browse/YARN-3120 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 > Environment: Windows 8 , Hadoop 2.6.0 >Reporter: vaidhyanathan > > Hi, > I tried to follow the instructiosn in > http://wiki.apache.org/hadoop/Hadoop2OnWindows and have setup > hadoop-2.6.0.jar in my windows system. > I was able to start everything properly but when i try to run the job > wordcount as given in the above URL , the job fails with the below exception . > 15/01/30 12:56:09 INFO localizer.ResourceLocalizationService: Localizer failed > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to setup local > di > r /tmp/hadoop-haremangala/nm-local-dir, which was marked as good. > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer. > ResourceLocalizationService.getInitializedLocalDirs(ResourceLocalizationService. > java:1372) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer. > ResourceLocalizationService.access$900(ResourceLocalizationService.java:137) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer. > ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java > :1085) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3144) Configuration for making delegation token failures to timeline server not-fatal
[ https://issues.apache.org/jira/browse/YARN-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309886#comment-14309886 ] Jason Lowe commented on YARN-3144: -- Committing this. The test failures appear to be unrelated, and they both pass for me locally with the patch applied. > Configuration for making delegation token failures to timeline server > not-fatal > --- > > Key: YARN-3144 > URL: https://issues.apache.org/jira/browse/YARN-3144 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-3144.1.patch, YARN-3144.2.patch, YARN-3144.3.patch, > YARN-3144.4.patch > > > Posting events to the timeline server is best-effort. However, getting the > delegation tokens from the timeline server will kill the job. This patch adds > a configuration to make get delegation token operations "best-effort". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309909#comment-14309909 ] Chris Douglas commented on YARN-3100: - Agreed; definitely a separate JIRA. As state is copied from the old queues, some of the methods called in {{CSQueueUtils}} throw exceptions, similar to the case you found in {{LeafQueue}}. > Make YARN authorization pluggable > - > > Key: YARN-3100 > URL: https://issues.apache.org/jira/browse/YARN-3100 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-3100.1.patch, YARN-3100.2.patch > > > The goal is to have YARN acl model pluggable so as to integrate other > authorization tool such as Apache Ranger, Sentry. > Currently, we have > - admin ACL > - queue ACL > - application ACL > - time line domain ACL > - service ACL > The proposal is to create a YarnAuthorizationProvider interface. Current > implementation will be the default implementation. Ranger or Sentry plug-in > can implement this interface. > Benefit: > - Unify the code base. With the default implementation, we can get rid of > each specific ACL manager such as AdminAclManager, ApplicationACLsManager, > QueueAclsManager etc. > - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3144) Configuration for making delegation token failures to timeline server not-fatal
[ https://issues.apache.org/jira/browse/YARN-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309907#comment-14309907 ] Hudson commented on YARN-3144: -- FAILURE: Integrated in Hadoop-trunk-Commit #7043 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7043/]) YARN-3144. Configuration for making delegation token failures to timeline server not-fatal. Contributed by Jonathan Eagles (jlowe: rev 6f10434a5ad965d50352602ce31a9fce353cb90c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java > Configuration for making delegation token failures to timeline server > not-fatal > --- > > Key: YARN-3144 > URL: https://issues.apache.org/jira/browse/YARN-3144 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Fix For: 2.7.0 > > Attachments: YARN-3144.1.patch, YARN-3144.2.patch, YARN-3144.3.patch, > YARN-3144.4.patch > > > Posting events to the timeline server is best-effort. However, getting the > delegation tokens from the timeline server will kill the job. This patch adds > a configuration to make get delegation token operations "best-effort". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3089) LinuxContainerExecutor does not handle file arguments to deleteAsUser
[ https://issues.apache.org/jira/browse/YARN-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309906#comment-14309906 ] Hudson commented on YARN-3089: -- FAILURE: Integrated in Hadoop-trunk-Commit #7043 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7043/]) YARN-3089. LinuxContainerExecutor does not handle file arguments to deleteAsUser. Contributed by Eric Payne (jlowe: rev 4c484320b430950ce195cfad433a97099e117bad) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c > LinuxContainerExecutor does not handle file arguments to deleteAsUser > - > > Key: YARN-3089 > URL: https://issues.apache.org/jira/browse/YARN-3089 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Eric Payne >Priority: Blocker > Fix For: 2.7.0 > > Attachments: YARN-3089.v1.txt, YARN-3089.v2.txt, YARN-3089.v3.txt > > > YARN-2468 added the deletion of individual logs that are aggregated, but this > fails to delete log files when the LCE is being used. The LCE native > executable assumes the paths being passed are paths and the delete fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2809) Implement workaround for linux kernel panic when removing cgroup
[ https://issues.apache.org/jira/browse/YARN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309905#comment-14309905 ] Hadoop QA commented on YARN-2809: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697110/YARN-2809-v3.patch against trunk revision c1957fe. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6538//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6538//console This message is automatically generated. > Implement workaround for linux kernel panic when removing cgroup > > > Key: YARN-2809 > URL: https://issues.apache.org/jira/browse/YARN-2809 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 > Environment: RHEL 6.4 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-2809-v2.patch, YARN-2809-v3.patch, YARN-2809.patch > > > Some older versions of linux have a bug that can cause a kernel panic when > the LCE attempts to remove a cgroup. It is a race condition so it's a bit > rare but on a few thousand node cluster it can result in a couple of panics > per day. > This is the commit that likely (haven't verified) fixes the problem in linux: > https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.y&id=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267 > Details will be added in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2664) Improve RM webapp to expose info about reservations.
[ https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Mazzucchelli updated YARN-2664: -- Attachment: YARN-2664.10.patch In this patch I set *N/A* instead _(best effort)_ > Improve RM webapp to expose info about reservations. > > > Key: YARN-2664 > URL: https://issues.apache.org/jira/browse/YARN-2664 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Carlo Curino >Assignee: Matteo Mazzucchelli > Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, > YARN-2664.10.patch, YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, > YARN-2664.5.patch, YARN-2664.6.patch, YARN-2664.7.patch, YARN-2664.8.patch, > YARN-2664.9.patch, YARN-2664.patch, legal.patch, screenshot_reservation_UI.pdf > > > YARN-1051 provides a new functionality in the RM to ask for reservation on > resources. Exposing this through the webapp GUI is important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2809) Implement workaround for linux kernel panic when removing cgroup
[ https://issues.apache.org/jira/browse/YARN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309920#comment-14309920 ] Jason Lowe commented on YARN-2809: -- +1 lgtm. Will commit this early next week if there are no objections. > Implement workaround for linux kernel panic when removing cgroup > > > Key: YARN-2809 > URL: https://issues.apache.org/jira/browse/YARN-2809 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 > Environment: RHEL 6.4 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-2809-v2.patch, YARN-2809-v3.patch, YARN-2809.patch > > > Some older versions of linux have a bug that can cause a kernel panic when > the LCE attempts to remove a cgroup. It is a race condition so it's a bit > rare but on a few thousand node cluster it can result in a couple of panics > per day. > This is the commit that likely (haven't verified) fixes the problem in linux: > https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.y&id=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267 > Details will be added in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3143) RM Apps REST API can return NPE or entries missing id and other fields
[ https://issues.apache.org/jira/browse/YARN-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309922#comment-14309922 ] Kihwal Lee commented on YARN-3143: -- +1 the patch looks good. > RM Apps REST API can return NPE or entries missing id and other fields > -- > > Key: YARN-3143 > URL: https://issues.apache.org/jira/browse/YARN-3143 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.5.2 >Reporter: Kendall Thrapp >Assignee: Jason Lowe > Attachments: YARN-3143.001.patch > > > I'm seeing intermittent null pointer exceptions being returned by > the YARN Apps REST API. > For example: > {code} > http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED > {code} > JSON Response was: > {code} > {"RemoteException":{"exception":"NullPointerException","javaClassName":"java.lang.NullPointerException"}} > {code} > At a glance appears to be only when we query for unfinished apps (i.e. > finalStatus=UNDEFINED). > Possibly related, when I do get back a list of apps, sometimes one or more of > the apps will be missing most of the fields, like id, name, user, etc., and > the fields that are present all have zero for the value. > For example: > {code} > {"progress":0.0,"clusterId":0,"applicationTags":"","startedTime":0,"finishedTime":0,"elapsedTime":0,"allocatedMB":0,"allocatedVCores":0,"runningContainers":0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0} > {code} > Let me know if there's any other information I can provide to help debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1126) Add validation of users input nodes-states options to nodes CLI
[ https://issues.apache.org/jira/browse/YARN-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1126: Attachment: YARN-905-addendum.patch Uploading patch from YARN-905 on behalf of [~ywskycn]. > Add validation of users input nodes-states options to nodes CLI > --- > > Key: YARN-1126 > URL: https://issues.apache.org/jira/browse/YARN-1126 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-905-addendum.patch > > > Follow the discussion in YARN-905. > (1) case-insensitive checks for "all". > (2) validation of users input, exit with non-zero code and print all valid > states when user gives an invalid state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3143) RM Apps REST API can return NPE or entries missing id and other fields
[ https://issues.apache.org/jira/browse/YARN-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309949#comment-14309949 ] Jason Lowe commented on YARN-3143: -- Thanks for the review, Kihwal! Committing this. > RM Apps REST API can return NPE or entries missing id and other fields > -- > > Key: YARN-3143 > URL: https://issues.apache.org/jira/browse/YARN-3143 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.5.2 >Reporter: Kendall Thrapp >Assignee: Jason Lowe > Attachments: YARN-3143.001.patch > > > I'm seeing intermittent null pointer exceptions being returned by > the YARN Apps REST API. > For example: > {code} > http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED > {code} > JSON Response was: > {code} > {"RemoteException":{"exception":"NullPointerException","javaClassName":"java.lang.NullPointerException"}} > {code} > At a glance appears to be only when we query for unfinished apps (i.e. > finalStatus=UNDEFINED). > Possibly related, when I do get back a list of apps, sometimes one or more of > the apps will be missing most of the fields, like id, name, user, etc., and > the fields that are present all have zero for the value. > For example: > {code} > {"progress":0.0,"clusterId":0,"applicationTags":"","startedTime":0,"finishedTime":0,"elapsedTime":0,"allocatedMB":0,"allocatedVCores":0,"runningContainers":0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0} > {code} > Let me know if there's any other information I can provide to help debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3089) LinuxContainerExecutor does not handle file arguments to deleteAsUser
[ https://issues.apache.org/jira/browse/YARN-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309952#comment-14309952 ] Vinod Kumar Vavilapalli commented on YARN-3089: --- bq. Currently, even we are running a MR job, it will upload the partial logs which does not sound right. And we need to fix it. Wow, this is a huge blocker. We should fix it in 2.6.1. [~xgong], can you please file a ticket and link it here? Tx. > LinuxContainerExecutor does not handle file arguments to deleteAsUser > - > > Key: YARN-3089 > URL: https://issues.apache.org/jira/browse/YARN-3089 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Eric Payne >Priority: Blocker > Fix For: 2.7.0 > > Attachments: YARN-3089.v1.txt, YARN-3089.v2.txt, YARN-3089.v3.txt > > > YARN-2468 added the deletion of individual logs that are aggregated, but this > fails to delete log files when the LCE is being used. The LCE native > executable assumes the paths being passed are paths and the delete fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3153) Capacity Scheduler max AM resource percentage is mis-used as ratio
Wangda Tan created YARN-3153: Summary: Capacity Scheduler max AM resource percentage is mis-used as ratio Key: YARN-3153 URL: https://issues.apache.org/jira/browse/YARN-3153 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical In existing Capacity Scheduler, it can limit max applications running within a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, but actually, it is used as "ratio", in implementation, it assumes input will be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x of queue capacity. We should fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3153) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio
[ https://issues.apache.org/jira/browse/YARN-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3153: - Summary: Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio (was: Capacity Scheduler max AM resource percentage is mis-used as ratio) > Capacity Scheduler max AM resource limit for queues is defined as percentage > but used as ratio > -- > > Key: YARN-3153 > URL: https://issues.apache.org/jira/browse/YARN-3153 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > > In existing Capacity Scheduler, it can limit max applications running within > a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, > but actually, it is used as "ratio", in implementation, it assumes input will > be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x > of queue capacity. We should fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3154) Should not upload partial logs for MR jobs or other "short-running' applications
Xuan Gong created YARN-3154: --- Summary: Should not upload partial logs for MR jobs or other "short-running' applications Key: YARN-3154 URL: https://issues.apache.org/jira/browse/YARN-3154 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Currently, if we are running a MR job, and we do not set the log interval properly, we will have their partial logs uploaded and then removed from the local filesystem which is not right. We only upload the partial logs for LRS applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other "short-running' applications
[ https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309973#comment-14309973 ] Xuan Gong commented on YARN-3154: - We can add a parameter in logAggregationContext and indicate whether this app is LRS app. Based on this flag, the NM can decide whether it need to upload the partial logs for this app > Should not upload partial logs for MR jobs or other "short-running' > applications > - > > Key: YARN-3154 > URL: https://issues.apache.org/jira/browse/YARN-3154 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Blocker > > Currently, if we are running a MR job, and we do not set the log interval > properly, we will have their partial logs uploaded and then removed from the > local filesystem which is not right. > We only upload the partial logs for LRS applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3153) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio
[ https://issues.apache.org/jira/browse/YARN-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309974#comment-14309974 ] Wangda Tan commented on YARN-3153: -- We have 3 options basically, 1) Keep the config name (...percentage) and continue use it as ratio, add additional checking for this to make sure it fit in range \[0,1\] 2) Keep the config name. Use it as percentage, this need update yarn-default as well. This will have some impacts on existing deployments if they upgrade. 3) Change the config name to (...ratio), this will be a in-compatible change. Thoughts? [~vinodkv], [~jianhe] > Capacity Scheduler max AM resource limit for queues is defined as percentage > but used as ratio > -- > > Key: YARN-3153 > URL: https://issues.apache.org/jira/browse/YARN-3153 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > > In existing Capacity Scheduler, it can limit max applications running within > a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, > but actually, it is used as "ratio", in implementation, it assumes input will > be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x > of queue capacity. We should fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other "short-running' applications
[ https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309986#comment-14309986 ] Jason Lowe commented on YARN-3154: -- Note that even LRS apps have issues if they don't do their own log rolling. If I remember correctly, stdout and stderr files are setup by the container executor, and we'll have partial logs uploaded then deleted from the local filesystem, losing any subsequent logs to these files or any other files that aren't explicitly log rolled and filtered via a log aggregation context. IMHO we need to make sure we do _not_ delete anything for a running app _unless_ it has a log aggregation context filter to tell us what is safe to upload and delete. Without that information, we cannot tell if a log file is "live" and therefore going to be deleted too early. > Should not upload partial logs for MR jobs or other "short-running' > applications > - > > Key: YARN-3154 > URL: https://issues.apache.org/jira/browse/YARN-3154 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Blocker > > Currently, if we are running a MR job, and we do not set the log interval > properly, we will have their partial logs uploaded and then removed from the > local filesystem which is not right. > We only upload the partial logs for LRS applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3143) RM Apps REST API can return NPE or entries missing id and other fields
[ https://issues.apache.org/jira/browse/YARN-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310004#comment-14310004 ] Jason Lowe commented on YARN-3143: -- My apologies, I also meant to thank Eric for the original review! > RM Apps REST API can return NPE or entries missing id and other fields > -- > > Key: YARN-3143 > URL: https://issues.apache.org/jira/browse/YARN-3143 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.5.2 >Reporter: Kendall Thrapp >Assignee: Jason Lowe > Fix For: 2.7.0 > > Attachments: YARN-3143.001.patch > > > I'm seeing intermittent null pointer exceptions being returned by > the YARN Apps REST API. > For example: > {code} > http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED > {code} > JSON Response was: > {code} > {"RemoteException":{"exception":"NullPointerException","javaClassName":"java.lang.NullPointerException"}} > {code} > At a glance appears to be only when we query for unfinished apps (i.e. > finalStatus=UNDEFINED). > Possibly related, when I do get back a list of apps, sometimes one or more of > the apps will be missing most of the fields, like id, name, user, etc., and > the fields that are present all have zero for the value. > For example: > {code} > {"progress":0.0,"clusterId":0,"applicationTags":"","startedTime":0,"finishedTime":0,"elapsedTime":0,"allocatedMB":0,"allocatedVCores":0,"runningContainers":0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0} > {code} > Let me know if there's any other information I can provide to help debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3153) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio
[ https://issues.apache.org/jira/browse/YARN-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310023#comment-14310023 ] Jian He commented on YARN-3153: --- As the [doc|http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html] already explicitly mentions "specified as float", to keep it compatible, we may choose to do 1) > Capacity Scheduler max AM resource limit for queues is defined as percentage > but used as ratio > -- > > Key: YARN-3153 > URL: https://issues.apache.org/jira/browse/YARN-3153 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > > In existing Capacity Scheduler, it can limit max applications running within > a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, > but actually, it is used as "ratio", in implementation, it assumes input will > be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x > of queue capacity. We should fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3041) create the ATS entity/event API
[ https://issues.apache.org/jira/browse/YARN-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310026#comment-14310026 ] Zhijie Shen commented on YARN-3041: --- bq. IMO it might make sense to define all YARN system entities as explicit types Make sense to me. > create the ATS entity/event API > --- > > Key: YARN-3041 > URL: https://issues.apache.org/jira/browse/YARN-3041 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Robert Kanter > Attachments: YARN-3041.preliminary.001.patch > > > Per design in YARN-2928, create the ATS entity and events API. > Also, as part of this JIRA, create YARN system entities (e.g. cluster, user, > flow, flow run, YARN app, ...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3153) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio
[ https://issues.apache.org/jira/browse/YARN-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310029#comment-14310029 ] Vinod Kumar Vavilapalli commented on YARN-3153: --- This is a hard one to solve. +1 for option (1) for now. In addition to that, we can chose to deprecate this configuration completely and introduce a new one with the right semantics but with a name-change: say yarn.scheduler.capacity.maximum-am-resources-percentage. > Capacity Scheduler max AM resource limit for queues is defined as percentage > but used as ratio > -- > > Key: YARN-3153 > URL: https://issues.apache.org/jira/browse/YARN-3153 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > > In existing Capacity Scheduler, it can limit max applications running within > a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, > but actually, it is used as "ratio", in implementation, it assumes input will > be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x > of queue capacity. We should fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3154) Should not upload partial logs for MR jobs or other "short-running' applications
[ https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3154: -- Target Version/s: 2.7.0, 2.6.1 > Should not upload partial logs for MR jobs or other "short-running' > applications > - > > Key: YARN-3154 > URL: https://issues.apache.org/jira/browse/YARN-3154 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Blocker > > Currently, if we are running a MR job, and we do not set the log interval > properly, we will have their partial logs uploaded and then removed from the > local filesystem which is not right. > We only upload the partial logs for LRS applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3153) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio
[ https://issues.apache.org/jira/browse/YARN-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310032#comment-14310032 ] Wangda Tan commented on YARN-3153: -- Thanks for your feedbacks, I agree to do 1) first. I think deprecating+change-name is not so graceful enough, user will get confused when he found one option deprecated but system suggest to use a very similar one. Will upload a patch for #1 shortly. > Capacity Scheduler max AM resource limit for queues is defined as percentage > but used as ratio > -- > > Key: YARN-3153 > URL: https://issues.apache.org/jira/browse/YARN-3153 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > > In existing Capacity Scheduler, it can limit max applications running within > a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, > but actually, it is used as "ratio", in implementation, it assumes input will > be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x > of queue capacity. We should fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1126) Add validation of users input nodes-states options to nodes CLI
[ https://issues.apache.org/jira/browse/YARN-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310033#comment-14310033 ] Hadoop QA commented on YARN-1126: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697127/YARN-905-addendum.patch against trunk revision 5c79439. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6540//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6540//console This message is automatically generated. > Add validation of users input nodes-states options to nodes CLI > --- > > Key: YARN-1126 > URL: https://issues.apache.org/jira/browse/YARN-1126 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-905-addendum.patch > > > Follow the discussion in YARN-905. > (1) case-insensitive checks for "all". > (2) validation of users input, exit with non-zero code and print all valid > states when user gives an invalid state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2990) FairScheduler's delay-scheduling always waits for node-local and rack-local delays, even for off-rack-only requests
[ https://issues.apache.org/jira/browse/YARN-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310034#comment-14310034 ] Sandy Ryza commented on YARN-2990: -- +1. Sorry for the delay in getting to this. > FairScheduler's delay-scheduling always waits for node-local and rack-local > delays, even for off-rack-only requests > --- > > Key: YARN-2990 > URL: https://issues.apache.org/jira/browse/YARN-2990 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-2990-0.patch, yarn-2990-1.patch, yarn-2990-2.patch, > yarn-2990-test.patch > > > Looking at the FairScheduler, it appears the node/rack locality delays are > used for all requests, even those that are only off-rack. > More details in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other "short-running' applications
[ https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310042#comment-14310042 ] Vinod Kumar Vavilapalli commented on YARN-3154: --- Does having two separate notions work? - Today's LogAggregationContext's include/exclude patterns for the app to indicate which log files need to be aggregated explicitly at app finish. This works for regular apps. - A new include/exclude pattern for app to indicate which log files need to be aggregated in a rolling fashion. > Should not upload partial logs for MR jobs or other "short-running' > applications > - > > Key: YARN-3154 > URL: https://issues.apache.org/jira/browse/YARN-3154 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Blocker > > Currently, if we are running a MR job, and we do not set the log interval > properly, we will have their partial logs uploaded and then removed from the > local filesystem which is not right. > We only upload the partial logs for LRS applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3153) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio
[ https://issues.apache.org/jira/browse/YARN-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310045#comment-14310045 ] Vinod Kumar Vavilapalli commented on YARN-3153: --- We could instead pick a radically different name. Or may be two radically different ones - one for the ratio and one for the percentage. > Capacity Scheduler max AM resource limit for queues is defined as percentage > but used as ratio > -- > > Key: YARN-3153 > URL: https://issues.apache.org/jira/browse/YARN-3153 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > > In existing Capacity Scheduler, it can limit max applications running within > a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, > but actually, it is used as "ratio", in implementation, it assumes input will > be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x > of queue capacity. We should fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3153) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio
[ https://issues.apache.org/jira/browse/YARN-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310049#comment-14310049 ] Wangda Tan commented on YARN-3153: -- Good suggestion, I think we can deprecate the precent one, make sure its value within \[0, 1\], and use a ratio/factor as the new option name. Sounds good? > Capacity Scheduler max AM resource limit for queues is defined as percentage > but used as ratio > -- > > Key: YARN-3153 > URL: https://issues.apache.org/jira/browse/YARN-3153 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > > In existing Capacity Scheduler, it can limit max applications running within > a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, > but actually, it is used as "ratio", in implementation, it assumes input will > be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x > of queue capacity. We should fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3143) RM Apps REST API can return NPE or entries missing id and other fields
[ https://issues.apache.org/jira/browse/YARN-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310055#comment-14310055 ] Hudson commented on YARN-3143: -- FAILURE: Integrated in Hadoop-trunk-Commit #7045 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7045/]) YARN-3143. RM Apps REST API can return NPE or entries missing id and other fields. Contributed by Jason Lowe (jlowe: rev da2fb2bc46bddf42d79c6d7664cbf0311973709e) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java > RM Apps REST API can return NPE or entries missing id and other fields > -- > > Key: YARN-3143 > URL: https://issues.apache.org/jira/browse/YARN-3143 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.5.2 >Reporter: Kendall Thrapp >Assignee: Jason Lowe > Fix For: 2.7.0 > > Attachments: YARN-3143.001.patch > > > I'm seeing intermittent null pointer exceptions being returned by > the YARN Apps REST API. > For example: > {code} > http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED > {code} > JSON Response was: > {code} > {"RemoteException":{"exception":"NullPointerException","javaClassName":"java.lang.NullPointerException"}} > {code} > At a glance appears to be only when we query for unfinished apps (i.e. > finalStatus=UNDEFINED). > Possibly related, when I do get back a list of apps, sometimes one or more of > the apps will be missing most of the fields, like id, name, user, etc., and > the fields that are present all have zero for the value. > For example: > {code} > {"progress":0.0,"clusterId":0,"applicationTags":"","startedTime":0,"finishedTime":0,"elapsedTime":0,"allocatedMB":0,"allocatedVCores":0,"runningContainers":0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0} > {code} > Let me know if there's any other information I can provide to help debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)