[jira] [Updated] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset
[ https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Gong updated YARN-3896: --- Attachment: YARN-3896.02.patch RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset --- Key: YARN-3896 URL: https://issues.apache.org/jira/browse/YARN-3896 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3896.01.patch, YARN-3896.02.patch {noformat} 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: Resolved 10.208.132.153 to /default-rack 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Reconnect from the node at: 10.208.132.153 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 10.208.132.153:8041 2015-07-03 16:49:39,104 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far behind rm response id:2506413 nm response id:0 2015-07-03 16:49:39,137 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node 10.208.132.153:8041 as it is now REBOOTED 2015-07-03 16:49:39,137 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED {noformat} The node(10.208.132.153) reconnected with RM. When it registered with RM, RM set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's heartbeat come before RM succeeded setting the id to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620597#comment-14620597 ] Hudson commented on YARN-2194: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2178 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2178/]) YARN-2194. Addendum patch to fix failing unit test in TestPrivilegedOperationExecutor. Contributed by Sidharta Seethana. (vvasudev: rev 63d0365088ff9fca0baaf3c4c3c01f80c72d3281) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/TestPrivilegedOperationExecutor.java Cgroups cease to work in RHEL7 -- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Wei Yan Assignee: Sidharta Seethana Priority: Critical Fix For: 2.8.0 Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the controller name leads to container launch failure. RHEL7 deprecates libcgroup and recommends the user of systemd. However, systemd has certain shortcomings as identified in this JIRA (see comments). This JIRA only fixes the failure, and doesn't try to use systemd. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2003: -- Attachment: 0019-YARN-2003.patch Updating patch after fixing few more comments. Support for Application priority : Changes in RM and Capacity Scheduler --- Key: YARN-2003 URL: https://issues.apache.org/jira/browse/YARN-2003 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Labels: BB2015-05-TBR Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 0018-YARN-2003.patch, 0019-YARN-2003.patch AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from Submission Context and store. Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
[ https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620643#comment-14620643 ] Arun Suresh commented on YARN-3453: --- Test case failure is un-related. Jenkins had passed when i kicked it off manually [here|https://issues.apache.org/jira/browse/YARN-3453?focusedCommentId=14620218page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14620218] Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing Key: YARN-3453 URL: https://issues.apache.org/jira/browse/YARN-3453 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Ashwin Shankar Assignee: Arun Suresh Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch, YARN-3453.4.patch There are two places in preemption code flow where DefaultResourceCalculator is used, even in DRF mode. Which basically results in more resources getting preempted than needed, and those extra preempted containers aren’t even getting to the “starved” queue since scheduling logic is based on DRF's Calculator. Following are the two places : 1. {code:title=FSLeafQueue.java|borderStyle=solid} private boolean isStarved(Resource share) {code} A queue shouldn’t be marked as “starved” if the dominant resource usage is = fair/minshare. 2. {code:title=FairScheduler.java|borderStyle=solid} protected Resource resToPreempt(FSLeafQueue sched, long curTime) {code} -- One more thing that I believe needs to change in DRF mode is : during a preemption round,if preempting a few containers results in satisfying needs of a resource type, then we should exit that preemption round, since the containers that we just preempted should bring the dominant resource usage to min/fair share. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620703#comment-14620703 ] Hudson commented on YARN-2194: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2197 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2197/]) YARN-2194. Addendum patch to fix failing unit test in TestPrivilegedOperationExecutor. Contributed by Sidharta Seethana. (vvasudev: rev 63d0365088ff9fca0baaf3c4c3c01f80c72d3281) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/TestPrivilegedOperationExecutor.java Cgroups cease to work in RHEL7 -- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Wei Yan Assignee: Sidharta Seethana Priority: Critical Fix For: 2.8.0 Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the controller name leads to container launch failure. RHEL7 deprecates libcgroup and recommends the user of systemd. However, systemd has certain shortcomings as identified in this JIRA (see comments). This JIRA only fixes the failure, and doesn't try to use systemd. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset
[ https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620672#comment-14620672 ] Jun Gong commented on YARN-3896: [~devaraj.k] , a test case is added in the new patch. Thanks for reviewing. RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset --- Key: YARN-3896 URL: https://issues.apache.org/jira/browse/YARN-3896 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3896.01.patch, YARN-3896.02.patch {noformat} 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: Resolved 10.208.132.153 to /default-rack 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Reconnect from the node at: 10.208.132.153 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 10.208.132.153:8041 2015-07-03 16:49:39,104 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far behind rm response id:2506413 nm response id:0 2015-07-03 16:49:39,137 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node 10.208.132.153:8041 as it is now REBOOTED 2015-07-03 16:49:39,137 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED {noformat} The node(10.208.132.153) reconnected with RM. When it registered with RM, RM set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's heartbeat come before RM succeeded setting the id to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620578#comment-14620578 ] Hudson commented on YARN-2194: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #239 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/239/]) YARN-2194. Addendum patch to fix failing unit test in TestPrivilegedOperationExecutor. Contributed by Sidharta Seethana. (vvasudev: rev 63d0365088ff9fca0baaf3c4c3c01f80c72d3281) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/TestPrivilegedOperationExecutor.java Cgroups cease to work in RHEL7 -- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Wei Yan Assignee: Sidharta Seethana Priority: Critical Fix For: 2.8.0 Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the controller name leads to container launch failure. RHEL7 deprecates libcgroup and recommends the user of systemd. However, systemd has certain shortcomings as identified in this JIRA (see comments). This JIRA only fixes the failure, and doesn't try to use systemd. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620680#comment-14620680 ] Hudson commented on YARN-2194: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #249 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/249/]) YARN-2194. Addendum patch to fix failing unit test in TestPrivilegedOperationExecutor. Contributed by Sidharta Seethana. (vvasudev: rev 63d0365088ff9fca0baaf3c4c3c01f80c72d3281) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/TestPrivilegedOperationExecutor.java Cgroups cease to work in RHEL7 -- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Wei Yan Assignee: Sidharta Seethana Priority: Critical Fix For: 2.8.0 Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the controller name leads to container launch failure. RHEL7 deprecates libcgroup and recommends the user of systemd. However, systemd has certain shortcomings as identified in this JIRA (see comments). This JIRA only fixes the failure, and doesn't try to use systemd. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620779#comment-14620779 ] Konstantinos Karanasos commented on YARN-3116: -- [~giovanni.fumarola], [~xgong], [~zjshen]: Given you are thinking to substitute the boolean with an enum that indicates the container type, I think this is becoming very related to YARN-2882, which is part of the more general YARN-2877 (that introduces distributed scheduling in YARN). In YARN-2882, we introduce container types to differentiate between GUARANTEED containers allocated by the central RM, and QUEUEABLE containers allocated by one of the distributed schedulers. We already have a patch available for this JIRA. What would be interesting to see is whether the AM_CONTAINER should become yet another type of container or whether it should be a separate field within the container type. The former would probably make more sense at the current implementation, as an AM_CONTAINER can only be allocated by the central RM (we cannot have a QUEUEABLE container that is also an AM_CONTAINER). The latter, however, would probably give more flexibility. [Collector wireup] We need an assured way to determine if a container is an AM container on NM -- Key: YARN-3116 URL: https://issues.apache.org/jira/browse/YARN-3116 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, timelineserver Reporter: Zhijie Shen Assignee: Giovanni Matteo Fumarola Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch, YARN-3116.v8.patch In YARN-3030, to start the per-app aggregator only for a started AM container, we need to determine if the container is an AM container or not from the context in NM (we can do it on RM). This information is missing, such that we worked around to considered the container with ID _01 as the AM container. Unfortunately, this is neither necessary or sufficient condition. We need to have a way to determine if a container is an AM container on NM. We can add flag to the container object or create an API to do the judgement. Perhaps the distributed AM information may also be useful to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620836#comment-14620836 ] Hudson commented on YARN-3878: -- FAILURE: Integrated in Hadoop-trunk-Commit #8140 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8140/]) YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. (Varun Saxena via kasha) (kasha: rev aa067c6aa47b4c79577096817acc00ad6421180c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/CHANGES.txt AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a
[jira] [Commented] (YARN-3866) AM-RM protocol changes to support container resizing
[ https://issues.apache.org/jira/browse/YARN-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620718#comment-14620718 ] MENG DING commented on YARN-3866: - Thanks [~jianhe] for the review! bq. Mark all getters/setters unstable for now Will do bq. DecreasedContainer.java/IncreasedContainer.java - how about reusing the Container.java object? This seems to be a better approach, and does simplify the code quite a bit. I can't think of anything wrong about it. If nobody else oppose this, I will go ahead changing it. bq. increaseRequests/decreaseRequests - We may just pass one list of changeResourceRequests instead of differentiating whether it’s increase or decrease ? as the underlying implementations are the same. IMO, this also saves application writers from differentiating them programmatically. Actually we thought about using a single changeResourceRequests, the main reasons that we separate them are: * We do want application writers to make a conscious decision about whether they are making an increase request or decrease request, and tell Resource Manager explicitly, such that if they make a mistake, RM will be able to catch that. For example, if a user intends to increase a container resource, but made a mistake by passing in a resource value smaller than the current resource allocation, RM will catch this and will NOT actually decrease the resource. If a user had sent a changeResourceRequest to RM, RM would not know the original intention, and go ahead decrease the resource. As a result, the container may be killed if memory enforcement is enabled. * Reduce the logic in RM to check if a request is for increase or decrease (less of a concern). Let me know if the above concerns make sense to you or not. AM-RM protocol changes to support container resizing Key: YARN-3866 URL: https://issues.apache.org/jira/browse/YARN-3866 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: MENG DING Assignee: MENG DING Attachments: YARN-3866.1.patch, YARN-3866.2.patch YARN-1447 and YARN-1448 are outdated. This ticket deals with AM-RM Protocol changes to support container resize according to the latest design in YARN-1197. 1) Add increase/decrease requests in AllocateRequest 2) Get approved increase/decrease requests from RM in AllocateResponse 3) Add relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run table
[ https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620799#comment-14620799 ] Sangjin Lee commented on YARN-3901: --- Hi [~zjshen], I was going to file JIRAs that covers splitting the application table and creating the app-to-flow table as well as the flow-version table, and work on them. Would you like to work on the app-to-flow table? I could then cover the others. Let me know. Populate flow run data in the flow_run table Key: YARN-3901 URL: https://issues.apache.org/jira/browse/YARN-3901 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Vrushali C Assignee: Vrushali C As per the schema proposed in YARN-3815 in https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf filing jira to track creation and population of data in the flow run table. Some points that are being considered: - Stores per flow run information aggregated across applications, flow version RM’s collector writes to on app creation and app completion - Per App collector writes to it for metric updates at a slower frequency than the metric updates to application table primary key: cluster ! user ! flow ! flow run id - Only the latest version of flow-level aggregated metrics will be kept, even if the entity and application level keep a timeseries. - The running_apps column will be incremented on app creation, and decremented on app completion. - For min_start_time the RM writer will simply write a value with the tag for the applicationId. A coprocessor will return the min value of all written values. - - Upon flush and compactions, the min value between all the cells of this column will be written to the cell without any tag (empty tag) and all the other cells will be discarded. - Ditto for the max_end_time, but then the max will be kept. - Tags are represented as #type:value. The type can be not set (0), or can indicate running (1) or complete (2). In those cases (for metrics) only complete app metrics are collapsed on compaction. - The m! values are aggregated (summed) upon read. Only when applications are completed (indicated by tag type 2) can the values be collapsed. - The application ids that have completed and been aggregated into the flow numbers are retained in a separate column for historical tracking: we don’t want to re-aggregate for those upon replay -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3445) Cache runningApps in RMNode for getting running apps on given NodeId
[ https://issues.apache.org/jira/browse/YARN-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620814#comment-14620814 ] Hadoop QA commented on YARN-3445: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 38s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 4s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 20s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 14s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | tools/hadoop tests | 0m 52s | Tests passed in hadoop-sls. | | {color:green}+1{color} | yarn tests | 51m 2s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 91m 24s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744509/YARN-3445-v5.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / fffb15b | | hadoop-sls test log | https://builds.apache.org/job/PreCommit-YARN-Build/8480/artifact/patchprocess/testrun_hadoop-sls.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8480/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8480/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8480/console | This message was automatically generated. Cache runningApps in RMNode for getting running apps on given NodeId Key: YARN-3445 URL: https://issues.apache.org/jira/browse/YARN-3445 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Junping Du Assignee: Junping Du Attachments: YARN-3445-v2.patch, YARN-3445-v3.1.patch, YARN-3445-v3.patch, YARN-3445-v4.1.patch, YARN-3445-v4.patch, YARN-3445-v5.1.patch, YARN-3445-v5.patch, YARN-3445.patch Per discussion in YARN-3334, we need filter out unnecessary collectors info from RM in heartbeat response. Our propose is to add cache for runningApps in RMNode, so RM only send collectors for local running apps back. This is also needed in YARN-914 (graceful decommission) that if no running apps in NM which is in decommissioning stage, it will get decommissioned immediately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3902) Fair scheduler preempts ApplicationMaster
[ https://issues.apache.org/jira/browse/YARN-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3902: --- Assignee: Arun Suresh Fair scheduler preempts ApplicationMaster - Key: YARN-3902 URL: https://issues.apache.org/jira/browse/YARN-3902 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.3.0 Environment: 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 (2014-12-08) x86_64 Reporter: He Tianyi Assignee: Arun Suresh Original Estimate: 72h Remaining Estimate: 72h YARN-2022 have fixed the similar issue related to CapacityScheduler. However, FairScheduler still suffer, preempting AM while other normal containers running out there. I think we should take the same approach, avoid AM being preempted unless there is no container running other than AM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3903) Disable preemption at Queue level for Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla reassigned YARN-3903: -- Assignee: Karthik Kambatla Disable preemption at Queue level for Fair Scheduler Key: YARN-3903 URL: https://issues.apache.org/jira/browse/YARN-3903 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.3.0 Environment: 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 (2014-12-08) x86_64 Reporter: He Tianyi Assignee: Karthik Kambatla Priority: Trivial Original Estimate: 72h Remaining Estimate: 72h YARN-2056 supports disabling preemption at queue level for CapacityScheduler. As for fair scheduler, we recently encountered the same need. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620730#comment-14620730 ] Hadoop QA commented on YARN-2003: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 9s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 18 new or modified test files. | | {color:green}+1{color} | javac | 7m 38s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 28s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:red}-1{color} | whitespace | 0m 52s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 24s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 49s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | tools/hadoop tests | 0m 54s | Tests passed in hadoop-sls. | | {color:green}+1{color} | yarn tests | 0m 22s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 49m 57s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 94m 21s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs | | | hadoop.yarn.server.resourcemanager.TestApplicationACLs | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler | | | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | | | hadoop.yarn.server.resourcemanager.TestResourceManager | | | hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerHealth | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744501/0019-YARN-2003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / fffb15b | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8479/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8479/artifact/patchprocess/whitespace.txt | | hadoop-sls test log | https://builds.apache.org/job/PreCommit-YARN-Build/8479/artifact/patchprocess/testrun_hadoop-sls.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8479/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8479/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8479/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8479/console | This message was automatically generated. Support for Application priority : Changes in RM and Capacity Scheduler --- Key: YARN-2003 URL: https://issues.apache.org/jira/browse/YARN-2003 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Labels: BB2015-05-TBR Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 0018-YARN-2003.patch, 0019-YARN-2003.patch AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from Submission Context and store. Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2003: -- Attachment: 0019-YARN-2003.patch Fixed few test failures. Support for Application priority : Changes in RM and Capacity Scheduler --- Key: YARN-2003 URL: https://issues.apache.org/jira/browse/YARN-2003 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Labels: BB2015-05-TBR Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 0018-YARN-2003.patch, 0019-YARN-2003.patch AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from Submission Context and store. Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model
[ https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620861#comment-14620861 ] Li Lu commented on YARN-3836: - bq. Regarding metric, can't id uniquely identify a metric ? Do we expect two metrics to share same id for different types ? This is a tricky point. and I'm thinking out loud... Under normal circumstances it's fine to only check the id of metrics. However, since we're making different assumptions on the internal data of different types, is it possible that under some use cases users may mistakenly or accidentally confuse them? If this is possible we may want to check both types and ids. add equals and hashCode to TimelineEntity and other classes in the data model - Key: YARN-3836 URL: https://issues.apache.org/jira/browse/YARN-3836 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Li Lu Attachments: YARN-3836-YARN-2928.001.patch, YARN-3836-YARN-2928.002.patch Classes in the data model API (e.g. {{TimelineEntity}}, {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or {{hashCode()}}. This can cause problems when these objects are used in a collection such as a {{HashSet}}. We should implement these methods wherever appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3754) Race condition when the NodeManager is shutting down and container is launched
[ https://issues.apache.org/jira/browse/YARN-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G resolved YARN-3754. --- Resolution: Not A Problem I am closing this issue as it is not happening in trunk. [~bibinchundatt] please reopen otherwise. Race condition when the NodeManager is shutting down and container is launched -- Key: YARN-3754 URL: https://issues.apache.org/jira/browse/YARN-3754 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Assignee: Sunil G Priority: Critical Attachments: NM.log Container is launched and returned to ContainerImpl NodeManager closed the DB connection which resulting in {{org.iq80.leveldb.DBException: Closed}}. *Attaching the exception trace* {code} 2015-05-30 02:11:49,122 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Unable to update state store diagnostics for container_e310_1432817693365_3338_01_02 java.io.IOException: org.iq80.leveldb.DBException: Closed at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostics(NMLeveldbStateStoreService.java:261) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ContainerDiagnosticsUpdateTransition.transition(ContainerImpl.java:1109) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ContainerDiagnosticsUpdateTransition.transition(ContainerImpl.java:1101) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1129) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:83) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:246) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.iq80.leveldb.DBException: Closed at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:123) at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:106) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostics(NMLeveldbStateStoreService.java:259) ... 15 more {code} we can add a check whether DB is closed while we move container from ACQUIRED state. As per the discussion in YARN-3585 have add the same -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2003: -- Attachment: (was: 0019-YARN-2003.patch) Support for Application priority : Changes in RM and Capacity Scheduler --- Key: YARN-2003 URL: https://issues.apache.org/jira/browse/YARN-2003 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Labels: BB2015-05-TBR Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 0018-YARN-2003.patch AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from Submission Context and store. Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model
[ https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620868#comment-14620868 ] Sangjin Lee commented on YARN-3836: --- I tend to think that using type + id is probably a slightly better idea. Currently the type is between single data vs. time series. For the most part, the id should be unique across the board. One interesting scenario is if a metric changes from a single data to a time series (or vice versa). Again, this is probably not something that should happen often, if ever. But if it should happen, I happen to think that they need to be considered two different metrics. My 2 cents. add equals and hashCode to TimelineEntity and other classes in the data model - Key: YARN-3836 URL: https://issues.apache.org/jira/browse/YARN-3836 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Li Lu Attachments: YARN-3836-YARN-2928.001.patch, YARN-3836-YARN-2928.002.patch Classes in the data model API (e.g. {{TimelineEntity}}, {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or {{hashCode()}}. This can cause problems when these objects are used in a collection such as a {{HashSet}}. We should implement these methods wherever appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3534) Collect memory/cpu usage on the node
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621023#comment-14621023 ] Hadoop QA commented on YARN-3534: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 13s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:red}-1{color} | javac | 2m 59s | The patch appears to cause the build to fail. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744528/YARN-3534-14.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / ac60483 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8483/console | This message was automatically generated. Collect memory/cpu usage on the node Key: YARN-3534 URL: https://issues.apache.org/jira/browse/YARN-3534 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Assignee: Inigo Goiri Attachments: YARN-3534-1.patch, YARN-3534-10.patch, YARN-3534-11.patch, YARN-3534-12.patch, YARN-3534-14.patch, YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch, YARN-3534-4.patch, YARN-3534-5.patch, YARN-3534-6.patch, YARN-3534-7.patch, YARN-3534-8.patch, YARN-3534-9.patch Original Estimate: 336h Remaining Estimate: 336h YARN should be aware of the resource utilization of the nodes when scheduling containers. For this, this task will implement the collection of memory/cpu usage on the node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3534) Collect memory/cpu usage on the node
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3534: -- Attachment: YARN-3534-14.patch Updated to trunk (using ResourceUtilization already there). Collect memory/cpu usage on the node Key: YARN-3534 URL: https://issues.apache.org/jira/browse/YARN-3534 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Assignee: Inigo Goiri Attachments: YARN-3534-1.patch, YARN-3534-10.patch, YARN-3534-11.patch, YARN-3534-12.patch, YARN-3534-14.patch, YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch, YARN-3534-4.patch, YARN-3534-5.patch, YARN-3534-6.patch, YARN-3534-7.patch, YARN-3534-8.patch, YARN-3534-9.patch Original Estimate: 336h Remaining Estimate: 336h YARN should be aware of the resource utilization of the nodes when scheduling containers. For this, this task will implement the collection of memory/cpu usage on the node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs
[ https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620965#comment-14620965 ] Anubhav Dhoot commented on YARN-2005: - [~jlowe] appreciate your review of the updated patch Blacklisting support for scheduling AMs --- Key: YARN-2005 URL: https://issues.apache.org/jira/browse/YARN-2005 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 0.23.10, 2.4.0 Reporter: Jason Lowe Assignee: Anubhav Dhoot Attachments: YARN-2005.001.patch, YARN-2005.002.patch, YARN-2005.003.patch It would be nice if the RM supported blacklisting a node for an AM launch after the same node fails a configurable number of AM attempts. This would be similar to the blacklisting support for scheduling task attempts in the MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model
[ https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621010#comment-14621010 ] Sangjin Lee commented on YARN-3836: --- Thanks for updating the patch [~gtCarrera9]. I went over the latest patch (v.2), and here is my input: (TimelineEntity.java) - l.109: Nit: actually {{obj instanceof Identifier}} returns false if {{obj}} is {{null}}. Therefore, you can safely omit the {{obj == null}} check. The same goes for the other classes. - l.533: Shouldn't we check for null from {{getIdentifier()}}? We cannot guarantee that it will be called only by callers who checked {{isValid()}} - l.545: same here - l.550: It sounds like now the type takes precedence over the created time in the sort order in this version. Is this intended? If not (timestamp is supposed to be first), it might be a good idea to have {{Identifier}} implement {{Comparable}} as well and use that in {{TimelineEntity.compareTo()}}. (TimelineMetric.java) - l.149-155: it would perform a little faster to check the id first and then the type add equals and hashCode to TimelineEntity and other classes in the data model - Key: YARN-3836 URL: https://issues.apache.org/jira/browse/YARN-3836 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Li Lu Attachments: YARN-3836-YARN-2928.001.patch, YARN-3836-YARN-2928.002.patch Classes in the data model API (e.g. {{TimelineEntity}}, {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or {{hashCode()}}. This can cause problems when these objects are used in a collection such as a {{HashSet}}. We should implement these methods wherever appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3904) Adopt PhoenixTimelineWriter into time-based aggregation storage
Li Lu created YARN-3904: --- Summary: Adopt PhoenixTimelineWriter into time-based aggregation storage Key: YARN-3904 URL: https://issues.apache.org/jira/browse/YARN-3904 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu After we finished the design for time-based aggregation, we can adopt our existing Phoenix storage into the storage of the aggregated data. This JIRA proposes to move the Phoenix storage implementation from o.a.h.yarn.server.timelineservice.storage to o.a.h.yarn.server.timelineservice.aggregation.timebased, and make it a fully devoted writer for time-based aggregation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model
[ https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621033#comment-14621033 ] Sangjin Lee commented on YARN-3836: --- I take back my comment about the null check for {{getIdentifier()}}. Looking at it, I see that {{getIdentifier()}} will never return null. Sorry for the confusion. add equals and hashCode to TimelineEntity and other classes in the data model - Key: YARN-3836 URL: https://issues.apache.org/jira/browse/YARN-3836 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Li Lu Attachments: YARN-3836-YARN-2928.001.patch, YARN-3836-YARN-2928.002.patch Classes in the data model API (e.g. {{TimelineEntity}}, {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or {{hashCode()}}. This can cause problems when these objects are used in a collection such as a {{HashSet}}. We should implement these methods wherever appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621034#comment-14621034 ] Zhijie Shen commented on YARN-3116: --- [~kkaranasos], thanks for notifying us of YARN-2882. I took a quick look at the jira. Our approach seems to be similar, but it seems that we're on parallel tracks. While YARN-2882 defines two container type for container related API so as to differ the container request to RM or NM, what we want to label a container here aims to let NM know if the container hosts AM or not. This is completely internal information, and users are blind to this type and also not able to set/change it. And this is why we propose to pass this information via ContainerTokenIndentifier instead of ContainerLaunchContext. Thoughts? [Collector wireup] We need an assured way to determine if a container is an AM container on NM -- Key: YARN-3116 URL: https://issues.apache.org/jira/browse/YARN-3116 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, timelineserver Reporter: Zhijie Shen Assignee: Giovanni Matteo Fumarola Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch, YARN-3116.v8.patch In YARN-3030, to start the per-app aggregator only for a started AM container, we need to determine if the container is an AM container or not from the context in NM (we can do it on RM). This information is missing, such that we worked around to considered the container with ID _01 as the AM container. Unfortunately, this is neither necessary or sufficient condition. We need to have a way to determine if a container is an AM container on NM. We can add flag to the container object or create an API to do the judgement. Perhaps the distributed AM information may also be useful to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run table
[ https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621051#comment-14621051 ] Zhijie Shen commented on YARN-3901: --- Yeah, I have dependency on this table for reader. If nobody is working on this table, I can take care of it. Populate flow run data in the flow_run table Key: YARN-3901 URL: https://issues.apache.org/jira/browse/YARN-3901 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Vrushali C Assignee: Vrushali C As per the schema proposed in YARN-3815 in https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf filing jira to track creation and population of data in the flow run table. Some points that are being considered: - Stores per flow run information aggregated across applications, flow version RM’s collector writes to on app creation and app completion - Per App collector writes to it for metric updates at a slower frequency than the metric updates to application table primary key: cluster ! user ! flow ! flow run id - Only the latest version of flow-level aggregated metrics will be kept, even if the entity and application level keep a timeseries. - The running_apps column will be incremented on app creation, and decremented on app completion. - For min_start_time the RM writer will simply write a value with the tag for the applicationId. A coprocessor will return the min value of all written values. - - Upon flush and compactions, the min value between all the cells of this column will be written to the cell without any tag (empty tag) and all the other cells will be discarded. - Ditto for the max_end_time, but then the max will be kept. - Tags are represented as #type:value. The type can be not set (0), or can indicate running (1) or complete (2). In those cases (for metrics) only complete app metrics are collapsed on compaction. - The m! values are aggregated (summed) upon read. Only when applications are completed (indicated by tag type 2) can the values be collapsed. - The application ids that have completed and been aggregated into the flow numbers are retained in a separate column for historical tracking: we don’t want to re-aggregate for those upon replay -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621089#comment-14621089 ] Giovanni Matteo Fumarola commented on YARN-3116: [~kkaranasos], thanks for the observation. As [~zjshen] rightly pointed out, YARN-2882 and YARN-3116 are complementary. We are adding the container type enum to notify the NM whether its an AM container or not. This is purely internal and we deliberately don't want to expose it to the Application for security reasons while in YARN-2882, you want to expose the container type to the application. [Collector wireup] We need an assured way to determine if a container is an AM container on NM -- Key: YARN-3116 URL: https://issues.apache.org/jira/browse/YARN-3116 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, timelineserver Reporter: Zhijie Shen Assignee: Giovanni Matteo Fumarola Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch, YARN-3116.v8.patch In YARN-3030, to start the per-app aggregator only for a started AM container, we need to determine if the container is an AM container or not from the context in NM (we can do it on RM). This information is missing, such that we worked around to considered the container with ID _01 as the AM container. Unfortunately, this is neither necessary or sufficient condition. We need to have a way to determine if a container is an AM container on NM. We can add flag to the container object or create an API to do the judgement. Perhaps the distributed AM information may also be useful to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620925#comment-14620925 ] Hadoop QA commented on YARN-2003: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 42s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 18 new or modified test files. | | {color:green}+1{color} | javac | 8m 9s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 8s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 49s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:red}-1{color} | whitespace | 0m 37s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 24s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 56s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | tools/hadoop tests | 0m 51s | Tests passed in hadoop-sls. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 48m 49s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 96m 7s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestResourceTrackerService | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates | | | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.webapp.TestNodesPage | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744516/0019-YARN-2003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / fffb15b | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8482/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8482/artifact/patchprocess/whitespace.txt | | hadoop-sls test log | https://builds.apache.org/job/PreCommit-YARN-Build/8482/artifact/patchprocess/testrun_hadoop-sls.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8482/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8482/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8482/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8482/console | This message was automatically generated. Support for Application priority : Changes in RM and Capacity Scheduler --- Key: YARN-2003 URL: https://issues.apache.org/jira/browse/YARN-2003 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Labels: BB2015-05-TBR Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 0018-YARN-2003.patch, 0019-YARN-2003.patch AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from Submission Context and store. Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620947#comment-14620947 ] Ray Chiang commented on YARN-3069: -- Thanks Akira! I'll be happy to see one of these XML verifiers pushed all the way through. Document missing properties in yarn-default.xml --- Key: YARN-3069 URL: https://issues.apache.org/jira/browse/YARN-3069 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Ray Chiang Assignee: Ray Chiang Labels: BB2015-05-TBR, supportability Attachments: YARN-3069.001.patch, YARN-3069.002.patch, YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, YARN-3069.009.patch, YARN-3069.010.patch, YARN-3069.011.patch, YARN-3069.012.patch, YARN-3069.013.patch The following properties are currently not defined in yarn-default.xml. These properties should either be A) documented in yarn-default.xml OR B) listed as an exception (with comments, e.g. for internal use) in the TestYarnConfigurationFields unit test Any comments for any of the properties below are welcome. org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore security.applicationhistory.protocol.acl yarn.app.container.log.backups yarn.app.container.log.dir yarn.app.container.log.filesize yarn.client.app-submission.poll-interval yarn.client.application-client-protocol.poll-timeout-ms yarn.is.minicluster yarn.log.server.url yarn.minicluster.control-resource-monitoring yarn.minicluster.fixed.ports yarn.minicluster.use-rpc yarn.node-labels.fs-store.retry-policy-spec yarn.node-labels.fs-store.root-dir yarn.node-labels.manager-class yarn.nodemanager.container-executor.os.sched.priority.adjustment yarn.nodemanager.container-monitor.process-tree.class yarn.nodemanager.disk-health-checker.enable yarn.nodemanager.docker-container-executor.image-name yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms yarn.nodemanager.linux-container-executor.group yarn.nodemanager.log.deletion-threads-count yarn.nodemanager.user-home-dir yarn.nodemanager.webapp.https.address yarn.nodemanager.webapp.spnego-keytab-file yarn.nodemanager.webapp.spnego-principal yarn.nodemanager.windows-secure-container-executor.group yarn.resourcemanager.configuration.file-system-based-store yarn.resourcemanager.delegation-token-renewer.thread-count yarn.resourcemanager.delegation.key.update-interval yarn.resourcemanager.delegation.token.max-lifetime yarn.resourcemanager.delegation.token.renew-interval yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size yarn.resourcemanager.metrics.runtime.buckets yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs yarn.resourcemanager.reservation-system.class yarn.resourcemanager.reservation-system.enable yarn.resourcemanager.reservation-system.plan.follower yarn.resourcemanager.reservation-system.planfollower.time-step yarn.resourcemanager.rm.container-allocation.expiry-interval-ms yarn.resourcemanager.webapp.spnego-keytab-file yarn.resourcemanager.webapp.spnego-principal yarn.scheduler.include-port-in-node-name yarn.timeline-service.delegation.key.update-interval yarn.timeline-service.delegation.token.max-lifetime yarn.timeline-service.delegation.token.renew-interval yarn.timeline-service.generic-application-history.enabled yarn.timeline-service.generic-application-history.fs-history-store.compression-type yarn.timeline-service.generic-application-history.fs-history-store.uri yarn.timeline-service.generic-application-history.store-class yarn.timeline-service.http-cross-origin.enabled yarn.tracking.url.generator -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3866) AM-RM protocol changes to support container resizing
[ https://issues.apache.org/jira/browse/YARN-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620946#comment-14620946 ] Jian He commented on YARN-3866: --- bq. increaseRequests/decreaseRequests - We may just pass one list of changeResourceRequests [~sandyr], Would like to hear some thoughts from an application wirter's perspective. mind sharing some thoughts here ? In case of Spark, do you think two separate increase/decrease requests in the AllocateRequest is better or a single changeResourceRequests ? AM-RM protocol changes to support container resizing Key: YARN-3866 URL: https://issues.apache.org/jira/browse/YARN-3866 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: MENG DING Assignee: MENG DING Attachments: YARN-3866.1.patch, YARN-3866.2.patch YARN-1447 and YARN-1448 are outdated. This ticket deals with AM-RM Protocol changes to support container resize according to the latest design in YARN-1197. 1) Add increase/decrease requests in AllocateRequest 2) Get approved increase/decrease requests from RM in AllocateResponse 3) Add relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs
[ https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621057#comment-14621057 ] Sunil G commented on YARN-2005: --- Hi [~adhoot] Thank you for sharing patch for same. I have couple of doubts. - DEFAULT_FAILURE_THRESHOLD Now default is 0.8, I feel we can keep this as a configurable limit. Based on node size, i feel user can decide till which threshold we can support AM blacklisting. - Below code from CS#allocate {code} application.updateBlacklist(blacklistAdditions, blacklistRemovals); {code} Assume a case where app1 AM is running in {{node1}}. Due to a failure there, app is relaunched in {{node2}} and {{node1}} is marked for blacklisting by SimpleBlacklistManager. Since node1 is added as blacklisted, all containers of this app will be blacklisted in node1. Is this intended, Please correct me if I am wrong. Blacklisting support for scheduling AMs --- Key: YARN-2005 URL: https://issues.apache.org/jira/browse/YARN-2005 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 0.23.10, 2.4.0 Reporter: Jason Lowe Assignee: Anubhav Dhoot Attachments: YARN-2005.001.patch, YARN-2005.002.patch, YARN-2005.003.patch It would be nice if the RM supported blacklisting a node for an AM launch after the same node fails a configurable number of AM attempts. This would be similar to the blacklisting support for scheduling task attempts in the MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model
[ https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621545#comment-14621545 ] Zhijie Shen commented on YARN-3836: --- +1 LGTM add equals and hashCode to TimelineEntity and other classes in the data model - Key: YARN-3836 URL: https://issues.apache.org/jira/browse/YARN-3836 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Li Lu Attachments: YARN-3836-YARN-2928.001.patch, YARN-3836-YARN-2928.002.patch, YARN-3836-YARN-2928.003.patch, YARN-3836-YARN-2928.004.patch Classes in the data model API (e.g. {{TimelineEntity}}, {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or {{hashCode()}}. This can cause problems when these objects are used in a collection such as a {{HashSet}}. We should implement these methods wherever appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621573#comment-14621573 ] Subru Krishnan commented on YARN-3116: -- [~kkaranasos], thanks for taking at look at this JIRA. We felt _ContainerTokenIdentifier _ is a reasonably secure way to propagate the _containerType_ to NM from the RM.The _containerType_ is set in the _ContainerContext_ in the NM so that it is available for auxiliary services. [~kishorch] is already integrating this with YARN-2884 so it should be aligned with what you are trying to achieve in YARN-2877. Additionally based on [~xgong]'s feedback, we updated _containerType_ to be an enum from the earlier boolean flag so should cover your future requirements of adding additional container types. [Collector wireup] We need an assured way to determine if a container is an AM container on NM -- Key: YARN-3116 URL: https://issues.apache.org/jira/browse/YARN-3116 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, timelineserver Reporter: Zhijie Shen Assignee: Giovanni Matteo Fumarola Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch, YARN-3116.v8.patch, YARN-3116.v9.patch In YARN-3030, to start the per-app aggregator only for a started AM container, we need to determine if the container is an AM container or not from the context in NM (we can do it on RM). This information is missing, such that we worked around to considered the container with ID _01 as the AM container. Unfortunately, this is neither necessary or sufficient condition. We need to have a way to determine if a container is an AM container on NM. We can add flag to the container object or create an API to do the judgement. Perhaps the distributed AM information may also be useful to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621592#comment-14621592 ] Zhijie Shen commented on YARN-3908: --- 1. TimelineEvent has a timestamp associated with it. It tells us when the event happened. We should have this information persisted, but unfortunately it seems not. 2. Metric doesn't have a timestamp because the timestamp is associated with each individual value. 3. I also realized that the metric type is not persisted too. Now I just assume if size(metric) 1 = time series, else = single value in reader implementation. But it may not be guaranteed. Bugs in HBaseTimelineWriterImpl --- Key: YARN-3908 URL: https://issues.apache.org/jira/browse/YARN-3908 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C 1. In HBaseTimelineWriterImpl, the info column family contains the basic fields of a timeline entity plus events. However, entity#info map is not stored at all. 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621739#comment-14621739 ] Varun Saxena commented on YARN-3893: Maybe set the HA service state in RM context as STANDBY upon throwing the exception. Or not set it to ACTIVE till the all active services are actually started. We primarily check RM context to make the decision about whether RM is in standby state or active. Both RM in active state when Admin#transitionToActive failure from refeshAll() -- Key: YARN-3893 URL: https://issues.apache.org/jira/browse/YARN-3893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: yarn-site.xml Cases that can cause this. # Capacity scheduler xml is wrongly configured during switch # Refresh ACL failure due to configuration # Refresh User group failure due to configuration Continuously both RM will try to be active {code} dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm1 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm2 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active {code} # Both Web UI active # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621703#comment-14621703 ] Xuan Gong commented on YARN-3893: - How about add rm.transitionToStandby(true) before we throw the ServiceFailedException in catch block ? {code} try { rm.transitionToActive(); // call all refresh*s for active RM to get the updated configurations. refreshAll(); RMAuditLogger.logSuccess(user.getShortUserName(), transitionToActive, RMHAProtocolService); } catch (Exception e) { RMAuditLogger.logFailure(user.getShortUserName(), transitionToActive, , RMHAProtocolService, Exception transitioning to active); throw new ServiceFailedException( Error when transitioning to Active mode, e); } {code} In that case, we could transit the RM to standby, and since we throw out the ServiceFailedException, this RM will rejoin the leader election process. Both RM in active state when Admin#transitionToActive failure from refeshAll() -- Key: YARN-3893 URL: https://issues.apache.org/jira/browse/YARN-3893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: yarn-site.xml Cases that can cause this. # Capacity scheduler xml is wrongly configured during switch # Refresh ACL failure due to configuration # Refresh User group failure due to configuration Continuously both RM will try to be active {code} dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm1 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm2 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active {code} # Both Web UI active # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3888) ApplicationMaster link is broken in RM WebUI when appstate is NEW
[ https://issues.apache.org/jira/browse/YARN-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621707#comment-14621707 ] Xuan Gong commented on YARN-3888: - +1 LGTM. Will commit ApplicationMaster link is broken in RM WebUI when appstate is NEW -- Key: YARN-3888 URL: https://issues.apache.org/jira/browse/YARN-3888 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: 0001-YARN-3888.patch, 0002-YARN-3888.patch When the application state is NEW in RM Web UI *Application Master* link is broken. {code} 15/07/06 19:46:16 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1436191509558_0003 is still in NEW 15/07/06 19:46:18 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1436191509558_0003 is still in NEW 15/07/06 19:46:20 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1436191509558_0003 is still in NEW {code} *URL formed* http://HOSTNAME:45020/cluster/app/application_1436191509558_0003 The above link is broken -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621734#comment-14621734 ] Akira AJISAKA commented on YARN-3381: - bq. Findbugs (version 3.0.0) appears to be broken on trunk. Reported by MAPREDUCE-6421. bq. The applied patch generated 1 new checkstyle issues (total was 48, now 49). Hi [~brahmareddy], would you add a javadoc to specify what class should be used instead of the old class? The test failures looks unrelated to the patch. A typographical error in InvalidStateTransitonException - Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Labels: BB2015-05-TBR Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621786#comment-14621786 ] Varun Saxena commented on YARN-3878: Thanks [~kasha] for the commit and review. Thanks to [~jianhe] and [~devaraj.k] for the reviews as well. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621709#comment-14621709 ] Tsuyoshi Ozawa commented on YARN-3798: -- [~zxu] thank you for the review. {quote} 1. It looks like retry is added twice when we do retry with new connection. Should we move ++retry to if statement when we check shouldRetry? {quote} It works as expected, meaning that retry wont be incremented doubly, since the loop will call continue after shouldRetry() and shouldRetryWithNewConnection. However, I think it's a bit tricky for readers of the code and it's worth a fix. Updating. {quote} Should we call cb.latch.await with timeout zkSessionTimeout? Since we do sync for the new session, Will it be reasonable not to use the left timeout value from the old session for the new session? {quote} Agree. {quote} Based on the document: http://zookeeper.apache.org/doc/r3.3.2/api/org/apache/zookeeper/KeeperException.html#getPath(), ke.getPath() may return null, Should we check if ke.getPath() is null and handle it differently? {quote} Okay. I'll also add a error handling code to the callback when rc != Code.OK.intValue. ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED --- Key: YARN-3798 URL: https://issues.apache.org/jira/browse/YARN-3798 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Assignee: Varun Saxena Priority: Blocker Attachments: RM.log, YARN-3798-2.7.002.patch, YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.patch RM going down with NoNode exception during create of znode for appattempt *Please find the exception logs* {code} 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2015-06-09 10:09:44,886 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation. org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) at
[jira] [Commented] (YARN-3888) ApplicationMaster link is broken in RM WebUI when appstate is NEW
[ https://issues.apache.org/jira/browse/YARN-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621710#comment-14621710 ] Xuan Gong commented on YARN-3888: - Committed into trunk/branch-2. Thanks, Bibin A Chundatt ApplicationMaster link is broken in RM WebUI when appstate is NEW -- Key: YARN-3888 URL: https://issues.apache.org/jira/browse/YARN-3888 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Fix For: 2.8.0 Attachments: 0001-YARN-3888.patch, 0002-YARN-3888.patch When the application state is NEW in RM Web UI *Application Master* link is broken. {code} 15/07/06 19:46:16 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1436191509558_0003 is still in NEW 15/07/06 19:46:18 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1436191509558_0003 is still in NEW 15/07/06 19:46:20 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1436191509558_0003 is still in NEW {code} *URL formed* http://HOSTNAME:45020/cluster/app/application_1436191509558_0003 The above link is broken -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3888) ApplicationMaster link is broken in RM WebUI when appstate is NEW
[ https://issues.apache.org/jira/browse/YARN-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621713#comment-14621713 ] Hudson commented on YARN-3888: -- FAILURE: Integrated in Hadoop-trunk-Commit #8145 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8145/]) YARN-3888. ApplicationMaster link is broken in RM WebUI when appstate is (xgong: rev 52148767924baf423172d26f2c6d8a4cfc6e143f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppsBlock.java * hadoop-yarn-project/CHANGES.txt ApplicationMaster link is broken in RM WebUI when appstate is NEW -- Key: YARN-3888 URL: https://issues.apache.org/jira/browse/YARN-3888 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Fix For: 2.8.0 Attachments: 0001-YARN-3888.patch, 0002-YARN-3888.patch When the application state is NEW in RM Web UI *Application Master* link is broken. {code} 15/07/06 19:46:16 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1436191509558_0003 is still in NEW 15/07/06 19:46:18 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1436191509558_0003 is still in NEW 15/07/06 19:46:20 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1436191509558_0003 is still in NEW {code} *URL formed* http://HOSTNAME:45020/cluster/app/application_1436191509558_0003 The above link is broken -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621720#comment-14621720 ] Sunil G commented on YARN-3893: --- Hi [~xgong] Thank you for the update. I have a doubt here. If we call rm.transitionToStandby(true) , then it will result a call to ResourceManager#createAndInitActiveServices(). So is it possible that we may get the same exception which we got from refreshAll call earlier. Specifically queue reinitialize. Currently the CS#serviceInit will call parseQueues. As mentioned here, [~bibinchundatt] used a wrong CS xml file. Both RM in active state when Admin#transitionToActive failure from refeshAll() -- Key: YARN-3893 URL: https://issues.apache.org/jira/browse/YARN-3893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: yarn-site.xml Cases that can cause this. # Capacity scheduler xml is wrongly configured during switch # Refresh ACL failure due to configuration # Refresh User group failure due to configuration Continuously both RM will try to be active {code} dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm1 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm2 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active {code} # Both Web UI active # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3800) Reduce storage footprint for ReservationAllocation
[ https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621534#comment-14621534 ] Hudson commented on YARN-3800: -- FAILURE: Integrated in Hadoop-trunk-Commit #8143 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8143/]) YARN-3800. Reduce storage footprint for ReservationAllocation. Contributed by Anubhav Dhoot. (Carlo Curino: rev 0e602fa3a1529134214452fba10a90307d9c2072) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/RLESparseResourceAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryReservationAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryReservationAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestGreedyReservationAgent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryPlan.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestRLESparseResourceAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSimpleCapacityReplanner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryPlan.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/GreedyReservationAgent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java Reduce storage footprint for ReservationAllocation -- Key: YARN-3800 URL: https://issues.apache.org/jira/browse/YARN-3800 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler, resourcemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.8.0 Attachments: YARN-3800.001.patch, YARN-3800.002.patch, YARN-3800.002.patch, YARN-3800.003.patch, YARN-3800.004.patch, YARN-3800.005.patch Instead of storing the ReservationRequest we store the Resource for allocations, as thats the only thing we need. Ultimately we convert everything to resources anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621679#comment-14621679 ] Hadoop QA commented on YARN-3381: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 19m 20s | Findbugs (version 3.0.0) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 43s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 10s | The applied patch generated 1 new checkstyle issues (total was 48, now 49). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 24s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 6m 11s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 9m 0s | Tests passed in hadoop-mapreduce-client-app. | | {color:red}-1{color} | yarn tests | 6m 50s | Tests failed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 1m 55s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 6m 0s | Tests failed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 50m 52s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 122m 24s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-mapreduce-client-app | | Failed unit tests | hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart | | | hadoop.yarn.server.nodemanager.TestDeletionService | | | hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService | | | hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService | | | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates | | | hadoop.yarn.server.resourcemanager.TestResourceTrackerService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744462/YARN-3381-008.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 1a0752d | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8489/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8489/artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html | | hadoop-mapreduce-client-app test log | https://builds.apache.org/job/PreCommit-YARN-Build/8489/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8489/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8489/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8489/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8489/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8489/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8489/console | This message was automatically generated. A typographical error in InvalidStateTransitonException - Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Labels: BB2015-05-TBR Attachments: YARN-3381-002.patch,
[jira] [Commented] (YARN-3800) Reduce storage footprint for ReservationAllocation
[ https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621527#comment-14621527 ] Subru Krishnan commented on YARN-3800: -- Thanks [~adhoot] for the patch and [~curino] for reviewing and committing it! Reduce storage footprint for ReservationAllocation -- Key: YARN-3800 URL: https://issues.apache.org/jira/browse/YARN-3800 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler, resourcemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.8.0 Attachments: YARN-3800.001.patch, YARN-3800.002.patch, YARN-3800.002.patch, YARN-3800.003.patch, YARN-3800.004.patch, YARN-3800.005.patch Instead of storing the ReservationRequest we store the Resource for allocations, as thats the only thing we need. Ultimately we convert everything to resources anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model
[ https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621559#comment-14621559 ] Sangjin Lee commented on YARN-3836: --- Great. I'll commit the patch this evening unless there are further comments. add equals and hashCode to TimelineEntity and other classes in the data model - Key: YARN-3836 URL: https://issues.apache.org/jira/browse/YARN-3836 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Li Lu Attachments: YARN-3836-YARN-2928.001.patch, YARN-3836-YARN-2928.002.patch, YARN-3836-YARN-2928.003.patch, YARN-3836-YARN-2928.004.patch Classes in the data model API (e.g. {{TimelineEntity}}, {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or {{hashCode()}}. This can cause problems when these objects are used in a collection such as a {{HashSet}}. We should implement these methods wherever appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621585#comment-14621585 ] Hadoop QA commented on YARN-3116: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 19s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 5 new or modified test files. | | {color:green}+1{color} | javac | 7m 56s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 46s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 48s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 6s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 24s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 5m 50s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 6m 4s | Tests failed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 51m 25s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 108m 58s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService | | | hadoop.yarn.server.nodemanager.containermanager.container.TestContainer | | | hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService | | | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | | | hadoop.yarn.server.resourcemanager.TestResourceTrackerService | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744597/YARN-3116.v9.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f4ca530 | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8487/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8487/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8487/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8487/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8487/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8487/console | This message was automatically generated. [Collector wireup] We need an assured way to determine if a container is an AM container on NM -- Key: YARN-3116 URL: https://issues.apache.org/jira/browse/YARN-3116 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, timelineserver Reporter: Zhijie Shen Assignee: Giovanni Matteo Fumarola Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch, YARN-3116.v8.patch, YARN-3116.v9.patch In YARN-3030, to start the per-app aggregator only for a started AM container, we need to determine if the container is an AM container or not from the context in NM (we can do it on RM). This information is missing, such that we worked around to considered the container with ID _01 as the AM container. Unfortunately, this is neither necessary or sufficient condition. We need to have a way to determine if a container is an AM container on NM. We can add flag to the container object or create an API to do the judgement. Perhaps the distributed AM information may also be useful to YARN-2877. -- This message was sent by
[jira] [Updated] (YARN-3857) Memory leak in ResourceManager with SIMPLE mode
[ https://issues.apache.org/jira/browse/YARN-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mujunchao updated YARN-3857: Attachment: YARN-3857-1.patch add test case. Memory leak in ResourceManager with SIMPLE mode --- Key: YARN-3857 URL: https://issues.apache.org/jira/browse/YARN-3857 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: mujunchao Assignee: mujunchao Priority: Critical Attachments: YARN-3857-1.patch, hadoop-yarn-server-resourcemanager.patch We register the ClientTokenMasterKey to avoid client may hold an invalid ClientToken after RM restarts. In SIMPLE mode, we register PairApplicationAttemptId, null , But we never remove it from HashMap, as unregister only runing while in Security mode, so memory leak coming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621571#comment-14621571 ] Vrushali C commented on YARN-3908: -- Hi [~zjshen] I see that event#info is not being stored, but which is the event timestamp that is being referred? Event metrics does store the timestamp per metric. (Also, I will be on vacation starting tomorrow through next week, so checking with Sangjin offline about this.). thanks Vrushali Bugs in HBaseTimelineWriterImpl --- Key: YARN-3908 URL: https://issues.apache.org/jira/browse/YARN-3908 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C 1. In HBaseTimelineWriterImpl, the info column family contains the basic fields of a timeline entity plus events. However, entity#info map is not stored at all. 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3534) Collect memory/cpu usage on the node
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621621#comment-14621621 ] Hadoop QA commented on YARN-3534: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 26s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:red}-1{color} | javac | 7m 46s | The applied patch generated 2 additional warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 19s | The applied patch generated 5 new checkstyle issues (total was 211, now 215). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 24s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 44s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 6m 6s | Tests failed in hadoop-yarn-server-nodemanager. | | | | 48m 0s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService | | | hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService | | | hadoop.yarn.server.nodemanager.containermanager.container.TestContainer | | | hadoop.yarn.server.nodemanager.TestDeletionService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744609/YARN-3534-15.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 1a0752d | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/8488/artifact/patchprocess/diffJavacWarnings.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8488/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8488/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8488/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8488/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8488/console | This message was automatically generated. Collect memory/cpu usage on the node Key: YARN-3534 URL: https://issues.apache.org/jira/browse/YARN-3534 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Assignee: Inigo Goiri Attachments: YARN-3534-1.patch, YARN-3534-10.patch, YARN-3534-11.patch, YARN-3534-12.patch, YARN-3534-14.patch, YARN-3534-15.patch, YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch, YARN-3534-4.patch, YARN-3534-5.patch, YARN-3534-6.patch, YARN-3534-7.patch, YARN-3534-8.patch, YARN-3534-9.patch Original Estimate: 336h Remaining Estimate: 336h YARN should be aware of the resource utilization of the nodes when scheduling containers. For this, this task will implement the collection of memory/cpu usage on the node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621630#comment-14621630 ] zhihai xu commented on YARN-3798: - Thanks for the new patch [~ozawa]! # It looks like {{retry}} is added twice when we do retry with new connection. Should we move {{++retry}} to if statement when we check {{shouldRetry}}? # Should we call {{cb.latch.await}} with timeout {{zkSessionTimeout}}? Since we do sync for the new session, Will it be reasonable not to use the left timeout value from the old session for the new session? # Based on the document: http://zookeeper.apache.org/doc/r3.3.2/api/org/apache/zookeeper/KeeperException.html#getPath(), {{ke.getPath()}} may return null, Should we check if {{ke.getPath()}} is null and handle it differently? ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED --- Key: YARN-3798 URL: https://issues.apache.org/jira/browse/YARN-3798 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Assignee: Varun Saxena Priority: Blocker Attachments: RM.log, YARN-3798-2.7.002.patch, YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.patch RM going down with NoNode exception during create of znode for appattempt *Please find the exception logs* {code} 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2015-06-09 10:09:44,886 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation. org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) at java.lang.Thread.run(Thread.java:745) 2015-06-09 10:09:44,887 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed out ZK retries. Giving up! 2015-06-09 10:09:44,887 ERROR
[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
[ https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621631#comment-14621631 ] Peng Zhang commented on YARN-3453: -- Thanks [~asuresh] for working on this comments: # Why not changing all usage of calculator in FairScheduler to policy related. In below code, RESOURCE_CALCULATOR only calculate memory, and it may return false when resToPreempt is (0, non-zero) for DRF policy: {code:title=FairScheduler.java|borderStyle=solid} if (Resources.greaterThan(RESOURCE_CALCULATOR, clusterResource, resToPreempt, Resources.none())) { {code} Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing Key: YARN-3453 URL: https://issues.apache.org/jira/browse/YARN-3453 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Ashwin Shankar Assignee: Arun Suresh Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch, YARN-3453.4.patch There are two places in preemption code flow where DefaultResourceCalculator is used, even in DRF mode. Which basically results in more resources getting preempted than needed, and those extra preempted containers aren’t even getting to the “starved” queue since scheduling logic is based on DRF's Calculator. Following are the two places : 1. {code:title=FSLeafQueue.java|borderStyle=solid} private boolean isStarved(Resource share) {code} A queue shouldn’t be marked as “starved” if the dominant resource usage is = fair/minshare. 2. {code:title=FairScheduler.java|borderStyle=solid} protected Resource resToPreempt(FSLeafQueue sched, long curTime) {code} -- One more thing that I believe needs to change in DRF mode is : during a preemption round,if preempting a few containers results in satisfying needs of a resource type, then we should exit that preemption round, since the containers that we just preempted should bring the dominant resource usage to min/fair share. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3857) Memory leak in ResourceManager with SIMPLE mode
[ https://issues.apache.org/jira/browse/YARN-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621686#comment-14621686 ] zhihai xu commented on YARN-3857: - thanks for the updated patch [~mujunchao]! The patch looks most good to me, some nits: # Add {{@VisibleForTesting}} before function {{hasMasterKey}} to mark this function used for test only. So you can remove the comment {{// Only for test}} # It looks like the code in {{testNoSecureNoRegistClientToken}} are similar as {{testRegistClientTokenInSecure}}. Can we merge {{testNoSecureNoRegistClientToken}} with {{testRegistClientTokenInSecure}} to one test? We can rename the test as {{testApplicationAttemptMasterKey}}. You can check {{isMasterKeyExisted}} based on {{isSecurityEnabled}}. You can change your comments {{can not get ClientToken}}/{{can get ClientToken}} to {{can not get MasterKey}}/{{can get MasterKey}} Memory leak in ResourceManager with SIMPLE mode --- Key: YARN-3857 URL: https://issues.apache.org/jira/browse/YARN-3857 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: mujunchao Assignee: mujunchao Priority: Critical Attachments: YARN-3857-1.patch, hadoop-yarn-server-resourcemanager.patch We register the ClientTokenMasterKey to avoid client may hold an invalid ClientToken after RM restarts. In SIMPLE mode, we register PairApplicationAttemptId, null , But we never remove it from HashMap, as unregister only runing while in Security mode, so memory leak coming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621581#comment-14621581 ] Zhijie Shen commented on YARN-3116: --- [~kkaranasos], I didn't touch the detail on YARN-2884, but it seems to be the API change that needs to be exposed to the users. In this case, user faced objects, i.e., ContainerLaunchContext, is the better choice for you. [Collector wireup] We need an assured way to determine if a container is an AM container on NM -- Key: YARN-3116 URL: https://issues.apache.org/jira/browse/YARN-3116 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, timelineserver Reporter: Zhijie Shen Assignee: Giovanni Matteo Fumarola Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch, YARN-3116.v8.patch, YARN-3116.v9.patch In YARN-3030, to start the per-app aggregator only for a started AM container, we need to determine if the container is an AM container or not from the context in NM (we can do it on RM). This information is missing, such that we worked around to considered the container with ID _01 as the AM container. Unfortunately, this is neither necessary or sufficient condition. We need to have a way to determine if a container is an AM container on NM. We can add flag to the container object or create an API to do the judgement. Perhaps the distributed AM information may also be useful to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621611#comment-14621611 ] Brahma Reddy Battula commented on YARN-3381: [~ajisakaa] can kickoff jenkin on 007 patch...Seems to be something wrong in jenkins report. A typographical error in InvalidStateTransitonException - Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Labels: BB2015-05-TBR Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1449) AM-NM protocol changes to support container resizing
[ https://issues.apache.org/jira/browse/YARN-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621634#comment-14621634 ] Jian He commented on YARN-1449: --- looks good to me overall, could you mark unstable for the newly added APIs too ? AM-NM protocol changes to support container resizing Key: YARN-1449 URL: https://issues.apache.org/jira/browse/YARN-1449 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: Wangda Tan (No longer used) Assignee: MENG DING Attachments: YARN-1449.1.patch, YARN-1449.2.patch, yarn-1449.1.patch, yarn-1449.3.patch, yarn-1449.4.patch, yarn-1449.5.patch AM-NM protocol changes to support container resizing 1) IncreaseContainersResourceRequest and IncreaseContainersResourceResponse PB protocol and implementation 2) increaseContainersResources method in ContainerManagementProtocol 3) Update ContainerStatus protocol to include Resource 4) Relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620122#comment-14620122 ] Brahma Reddy Battula commented on YARN-3381: {quote}issue because the pros small and cons is much larger.{quote} AFAIK Should not impact large as we are extending class.. This doesn't need to be incompatible though,,can you elaborate more,if I am wrong.. A typographical error in InvalidStateTransitonException - Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Labels: BB2015-05-TBR Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620311#comment-14620311 ] Hudson commented on YARN-2194: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #251 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/251/]) YARN-2194. Addendum patch to fix failing unit test in TestPrivilegedOperationExecutor. Contributed by Sidharta Seethana. (vvasudev: rev 63d0365088ff9fca0baaf3c4c3c01f80c72d3281) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/TestPrivilegedOperationExecutor.java Cgroups cease to work in RHEL7 -- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Wei Yan Assignee: Sidharta Seethana Priority: Critical Fix For: 2.8.0 Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the controller name leads to container launch failure. RHEL7 deprecates libcgroup and recommends the user of systemd. However, systemd has certain shortcomings as identified in this JIRA (see comments). This JIRA only fixes the failure, and doesn't try to use systemd. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level
[ https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620329#comment-14620329 ] Hadoop QA commented on YARN-3885: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 58s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 48s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 21s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 25s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 51m 7s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 88m 56s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744423/YARN-3885.04.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 63d0365 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8475/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8475/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8475/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8475/console | This message was automatically generated. ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level -- Key: YARN-3885 URL: https://issues.apache.org/jira/browse/YARN-3885 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.8.0 Reporter: Ajith S Assignee: Ajith S Priority: Critical Attachments: YARN-3885.02.patch, YARN-3885.03.patch, YARN-3885.04.patch, YARN-3885.patch when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}} this piece of code, to calculate {{untoucable}} doesnt consider al the children, it considers only immediate childern -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620333#comment-14620333 ] Hadoop QA commented on YARN-3798: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744468/YARN-3798-branch-2.7.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / fffb15b | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8477/console | This message was automatically generated. ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED --- Key: YARN-3798 URL: https://issues.apache.org/jira/browse/YARN-3798 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Assignee: Varun Saxena Priority: Blocker Attachments: RM.log, YARN-3798-2.7.002.patch, YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.patch RM going down with NoNode exception during create of znode for appattempt *Please find the exception logs* {code} 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2015-06-09 10:09:44,886 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation. org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) at java.lang.Thread.run(Thread.java:745) 2015-06-09 10:09:44,887 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed out ZK retries. Giving up! 2015-06-09 10:09:44,887 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error updating
[jira] [Updated] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-3798: - Attachment: YARN-3798-branch-2.7.004.patch Attaching a new patch. ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED --- Key: YARN-3798 URL: https://issues.apache.org/jira/browse/YARN-3798 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Assignee: Varun Saxena Priority: Blocker Attachments: RM.log, YARN-3798-2.7.002.patch, YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.patch RM going down with NoNode exception during create of znode for appattempt *Please find the exception logs* {code} 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2015-06-09 10:09:44,886 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation. org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) at java.lang.Thread.run(Thread.java:745) 2015-06-09 10:09:44,887 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed out ZK retries. Giving up! 2015-06-09 10:09:44,887 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error updating appAttempt: appattempt_1433764310492_7152_01 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) at
[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620040#comment-14620040 ] Brahma Reddy Battula commented on YARN-3381: [~ajisakaa] thanks again for quick review..Updated the patch to address above comments ( missed these in earlier patch)... A typographical error in InvalidStateTransitonException - Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Labels: BB2015-05-TBR Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381-006.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3381) A typographical error in InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-3381: --- Attachment: YARN-3381-006.patch A typographical error in InvalidStateTransitonException - Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Labels: BB2015-05-TBR Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381-006.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619997#comment-14619997 ] Tsuyoshi Ozawa commented on YARN-3798: -- [~zxu] Sorry for the delay. I missed you comment. Agree. fixing it shortly. ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED --- Key: YARN-3798 URL: https://issues.apache.org/jira/browse/YARN-3798 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Assignee: Varun Saxena Priority: Blocker Attachments: RM.log, YARN-3798-2.7.002.patch, YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, YARN-3798-branch-2.7.patch RM going down with NoNode exception during create of znode for appattempt *Please find the exception logs* {code} 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2015-06-09 10:09:44,886 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation. org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) at java.lang.Thread.run(Thread.java:745) 2015-06-09 10:09:44,887 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed out ZK retries. Giving up! 2015-06-09 10:09:44,887 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error updating appAttempt: appattempt_1433764310492_7152_01 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) at
[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620011#comment-14620011 ] Akira AJISAKA commented on YARN-3381: - Thanks Brahma for updating the patch. Two comments: 1. In the old class, would you call {{super(currentState, event)}} in the constructor? That way we can drop private variables and overriding getter methods. 2. {{serialVersionUID}} should be unique for each serializable class. A typographical error in InvalidStateTransitonException - Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Labels: BB2015-05-TBR Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model
[ https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620049#comment-14620049 ] Hadoop QA commented on YARN-3836: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 49s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 44s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 43s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 6s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 39s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 39s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 4s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 22s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 57s | Tests passed in hadoop-yarn-common. | | | | 43m 33s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744429/YARN-3836-YARN-2928.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 4c5f88f | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8469/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8469/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8469/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8469/console | This message was automatically generated. add equals and hashCode to TimelineEntity and other classes in the data model - Key: YARN-3836 URL: https://issues.apache.org/jira/browse/YARN-3836 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Li Lu Attachments: YARN-3836-YARN-2928.001.patch, YARN-3836-YARN-2928.002.patch Classes in the data model API (e.g. {{TimelineEntity}}, {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or {{hashCode()}}. This can cause problems when these objects are used in a collection such as a {{HashSet}}. We should implement these methods wherever appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620056#comment-14620056 ] Akira AJISAKA commented on YARN-3381: - {code:title=InvalidStateTransitionException.java} public InvalidStateTransitionException(String message) { super(message); } {code} Would you remove the unused constructor? I'm +1 if that is addressed. A typographical error in InvalidStateTransitonException - Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Labels: BB2015-05-TBR Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381-006.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3902) Fair scheduler preempts ApplicationMaster
He Tianyi created YARN-3902: --- Summary: Fair scheduler preempts ApplicationMaster Key: YARN-3902 URL: https://issues.apache.org/jira/browse/YARN-3902 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.3.0 Environment: 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 (2014-12-08) x86_64 Reporter: He Tianyi YARN-2022 have fixed the similar issue related to CapacityScheduler. However, FairScheduler still suffer, preempting AM while other normal containers running out there. I think we should take the same approach, avoid AM being preempted unless there is no container running other than AM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.
[ https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619984#comment-14619984 ] nijel commented on YARN-3813: - Thanks [~sunilg] and [~devaraj.k] for the comments bq.How frequently are you going to check this condition for each application? Plan is to have a configurable interval default to 30 sec (yarn.app.timeout.monitor.interval) bq.Could we have a new TIMEOUT event in RMAppImpl for this. In that case, we may not need a flag. bq.I feel having a TIMEOUT state for RMAppImpl would be proper here. ok. We will add a TIMEOUT state and handle the changes Due to this there will be few changes in app transitions, client package and the WEBUI bq.I have a suggestion here.We can have a BasicAppMonitoringManager which can keep an entry of appId, app.getSubmissionTime. bq. when the application gets submitted to RM then we can register the application with RMAppTimeOutMonitor using the user specified timeout. Yes. Good suggestion. This we will update as a registration mechanism. But since each application can have its own timeout period, the code reusability looks like minimal. {code} RMAppTimeOutMonitor local map (appid, timeout) add/register(appid, timeout) -- from RMAppImpl Run - if app is running/submitted and elapsed the time, kill it. If already completed, remove from map. No delete/unregister method -- this application will be be removed from map from run method {code} Support Application timeout feature in YARN. - Key: YARN-3813 URL: https://issues.apache.org/jira/browse/YARN-3813 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: nijel Attachments: YARN Application Timeout .pdf It will be useful to support Application Timeout in YARN. Some use cases are not worried about the output of the applications if the application is not completed in a specific time. *Background:* The requirement is to show the CDR statistics of last few minutes, say for every 5 minutes. The same Job will run continuously with different dataset. So one job will be started in every 5 minutes. The estimate time for this task is 2 minutes or lesser time. If the application is not completing in the given time the output is not useful. *Proposal* So idea is to support application timeout, with which timeout parameter is given while submitting the job. Here, user is expecting to finish (complete or kill) the application in the given time. One option for us is to move this logic to Application client (who submit the job). But it will be nice if it can be generic logic and can make more robust. Kindly provide your suggestions/opinion on this feature. If it sounds good, i will update the design doc and prototype patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619988#comment-14619988 ] Varun Vasudev commented on YARN-2194: - My apologies for missing the failing unit test [~sidharta-s]. I've committed the fix for the failing unit test. Cgroups cease to work in RHEL7 -- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Wei Yan Assignee: Sidharta Seethana Priority: Critical Fix For: 2.8.0 Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the controller name leads to container launch failure. RHEL7 deprecates libcgroup and recommends the user of systemd. However, systemd has certain shortcomings as identified in this JIRA (see comments). This JIRA only fixes the failure, and doesn't try to use systemd. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level
[ https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619989#comment-14619989 ] Ajith S commented on YARN-3885: --- /root A /\ C B /\ D E +*Before fix:*+ NAME: queueA CUR: memory:209, vCores:0 PEN: memory:0, vCores:0 GAR: memory:200, vCores:0 NORM: NaN IDEAL_ASSIGNED: memory:209, vCores:0 IDEAL_PREEMPT: memory:0, vCores:0 ACTUAL_PREEMPT: memory:0, vCores:0 *{color:red}UNTOUCHABLE: memory:9, vCores:0 PREEMPTABLE: memory:0, vCores:0{color}* NAME: queueB CUR: memory:60, vCores:0 PEN: memory:0, vCores:0 GAR: memory:60, vCores:0 NORM: NaN IDEAL_ASSIGNED: memory:60, vCores:0 IDEAL_PREEMPT: memory:0, vCores:0 ACTUAL_PREEMPT: memory:0, vCores:0 *UNTOUCHABLE: memory:0, vCores:0 PREEMPTABLE: memory:0, vCores:0* NAME: queueC CUR: memory:150, vCores:0 PEN: memory:0, vCores:0 GAR: memory:139, vCores:0 NORM: 1.0 IDEAL_ASSIGNED: memory:149, vCores:1 IDEAL_PREEMPT: memory:1, vCores:-1 ACTUAL_PREEMPT: memory:0, vCores:0 *UNTOUCHABLE: memory:1, vCores:0 PREEMPTABLE: memory:0, vCores:0* NAME: queueD CUR: memory:100, vCores:0 PEN: memory:0, vCores:0 GAR: memory:100, vCores:0 NORM: NaN IDEAL_ASSIGNED: memory:100, vCores:0 IDEAL_PREEMPT: memory:0, vCores:0 ACTUAL_PREEMPT: memory:0, vCores:0 *UNTOUCHABLE: memory:0, vCores:0 PREEMPTABLE: memory:0, vCores:0* NAME: queueE CUR: memory:50, vCores:0 PEN: memory:0, vCores:0 GAR: memory:40, vCores:0 NORM: 1.0 IDEAL_ASSIGNED: memory:49, vCores:1 IDEAL_PREEMPT: memory:1, vCores:-1 ACTUAL_PREEMPT: memory:0, vCores:0 *UNTOUCHABLE: memory:0, vCores:0 PREEMPTABLE: memory:10, vCores:0* +*After:*+ NAME: queueA CUR: memory:209, vCores:0 PEN: memory:0, vCores:0 GAR: memory:200, vCores:0 NORM: 1.0 IDEAL_ASSIGNED: memory:201, vCores:1 IDEAL_PREEMPT: memory:8, vCores:-1 ACTUAL_PREEMPT: memory:0, vCores:0 *{color:green}UNTOUCHABLE: memory:0, vCores:0 PREEMPTABLE: memory:10, vCores:0{color}* NAME: queueB CUR: memory:60, vCores:0 PEN: memory:0, vCores:0 GAR: memory:60, vCores:0 NORM: NaN IDEAL_ASSIGNED: memory:60, vCores:0 IDEAL_PREEMPT: memory:0, vCores:0 ACTUAL_PREEMPT: memory:0, vCores:0 *UNTOUCHABLE: memory:0, vCores:0 PREEMPTABLE: memory:0, vCores:0* NAME: queueC CUR: memory:150, vCores:0 PEN: memory:0, vCores:0 GAR: memory:139, vCores:0 NORM: 1.0 IDEAL_ASSIGNED: memory:141, vCores:1 IDEAL_PREEMPT: memory:9, vCores:-1 ACTUAL_PREEMPT: memory:0, vCores:0 *UNTOUCHABLE: memory:1, vCores:0 PREEMPTABLE: memory:10, vCores:0* NAME: queueD CUR: memory:100, vCores:0 PEN: memory:0, vCores:0 GAR: memory:100, vCores:0 NORM: NaN IDEAL_ASSIGNED: memory:100, vCores:0 IDEAL_PREEMPT: memory:0, vCores:0 ACTUAL_PREEMPT: memory:0, vCores:0 *UNTOUCHABLE: memory:0, vCores:0 PREEMPTABLE: memory:0, vCores:0* NAME: queueE CUR: memory:50, vCores:0 PEN: memory:0, vCores:0 GAR: memory:40, vCores:0 NORM: 1.0 IDEAL_ASSIGNED: memory:41, vCores:1 IDEAL_PREEMPT: memory:9, vCores:-1 ACTUAL_PREEMPT: memory:0, vCores:0 *UNTOUCHABLE: memory:0, vCores:0 PREEMPTABLE: memory:10, vCores:0* ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level -- Key: YARN-3885 URL: https://issues.apache.org/jira/browse/YARN-3885 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.8.0 Reporter: Ajith S Priority: Critical Attachments: YARN-3885.02.patch, YARN-3885.03.patch, YARN-3885.04.patch, YARN-3885.patch when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}} this piece of code, to calculate {{untoucable}} doesnt consider al the children, it considers only immediate childern -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619992#comment-14619992 ] Brahma Reddy Battula commented on YARN-3381: [~ajisakaa] thanks a lot for taking a look into this issue..Updated the patch based on your comment.Kindly review.. A typographical error in InvalidStateTransitonException - Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Labels: BB2015-05-TBR Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3381) A typographical error in InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-3381: --- Attachment: YARN-3381-005.patch A typographical error in InvalidStateTransitonException - Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Labels: BB2015-05-TBR Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model
[ https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3836: Attachment: YARN-3836-YARN-2928.002.patch Hi [~sjlee0], thanks for the prompt feedback! I updated the patch according to your comments. Specifically: bq. What I would prefer is to override equals() and hashCode() for Identifier instead, and have simple equals() and hashCode() implementations for TimelineEntity that mostly delegate to Identifier. The rationale is that Identifier can be useful as keys to collections in its own right, and thus should override those methods. That's a nice suggestion! Fixed. bq. One related question for your use case of putting entities into a map: I notice that you're using the TimelineEntity instances directly as keys to maps. Wouldn't it be better to use their Identifier instances as keys instead? Identifier instances are easier and cheaper to construct and compare. I think I used an inappropriate example here. I meant to say HashSet but not HashMap. bq. We should make isValid() a proper javadoc hyperlink Fixed. bq. Since we're checking the entity type and the id, wouldn't it be sufficient to check whether the object is an instance of TimelineEntity? I agree. Fixed all related ones. add equals and hashCode to TimelineEntity and other classes in the data model - Key: YARN-3836 URL: https://issues.apache.org/jira/browse/YARN-3836 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Li Lu Attachments: YARN-3836-YARN-2928.001.patch, YARN-3836-YARN-2928.002.patch Classes in the data model API (e.g. {{TimelineEntity}}, {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or {{hashCode()}}. This can cause problems when these objects are used in a collection such as a {{HashSet}}. We should implement these methods wherever appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model
[ https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620045#comment-14620045 ] Varun Saxena commented on YARN-3836: Regarding metric, can't id uniquely identify a metric ? Do we expect two metrics to share same id for different types ? add equals and hashCode to TimelineEntity and other classes in the data model - Key: YARN-3836 URL: https://issues.apache.org/jira/browse/YARN-3836 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Li Lu Attachments: YARN-3836-YARN-2928.001.patch, YARN-3836-YARN-2928.002.patch Classes in the data model API (e.g. {{TimelineEntity}}, {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or {{hashCode()}}. This can cause problems when these objects are used in a collection such as a {{HashSet}}. We should implement these methods wherever appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
[ https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-3453: -- Attachment: YARN-3453.3.patch Uploading updated patch : * Added unit-tests * clean-up code based on comments [~kasha], bq. Nit: In each of the policies, my preference would be not make the calculator and comparator members static unless required. We have had cases where our tests would invoke multiple instances of the class leading to issues. Not that I foresee multiple instantiations for these classes, but would like to avoid it if we can. If it ok with you, I feel we should infact make it static. Am of the opinion that the code reads better, is a lot cleaner and efficient, since only 1 instance is ever created.. We are always at the liberty to over-ride the getComparator/Calculator method in test (and possible subclasses) bq. .. think we will have to fix YARN-2154 too. On further thought.. and after consultation with [~kasha], Think we can decouple from that JIRA, given its larger scope. Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing Key: YARN-3453 URL: https://issues.apache.org/jira/browse/YARN-3453 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Ashwin Shankar Assignee: Arun Suresh Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch There are two places in preemption code flow where DefaultResourceCalculator is used, even in DRF mode. Which basically results in more resources getting preempted than needed, and those extra preempted containers aren’t even getting to the “starved” queue since scheduling logic is based on DRF's Calculator. Following are the two places : 1. {code:title=FSLeafQueue.java|borderStyle=solid} private boolean isStarved(Resource share) {code} A queue shouldn’t be marked as “starved” if the dominant resource usage is = fair/minshare. 2. {code:title=FairScheduler.java|borderStyle=solid} protected Resource resToPreempt(FSLeafQueue sched, long curTime) {code} -- One more thing that I believe needs to change in DRF mode is : during a preemption round,if preempting a few containers results in satisfying needs of a resource type, then we should exit that preemption round, since the containers that we just preempted should bring the dominant resource usage to min/fair share. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3381) A typographical error in InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-3381: --- Attachment: YARN-3381-007.patch A typographical error in InvalidStateTransitonException - Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Labels: BB2015-05-TBR Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
[ https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-3453: -- Attachment: YARN-3453.4.patch New Patch : * Cleaned up some doc * changed the name of {{resToPreempt}} to {{resourceDeficit}}. I feel {{resToPreempt}} is not just confusing but kinda wrong.. given that the method technically does not find resources to Preempt from the given queue. It actually finds the resource deficit that would bring it back to min/fair share. Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing Key: YARN-3453 URL: https://issues.apache.org/jira/browse/YARN-3453 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Ashwin Shankar Assignee: Arun Suresh Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch, YARN-3453.4.patch There are two places in preemption code flow where DefaultResourceCalculator is used, even in DRF mode. Which basically results in more resources getting preempted than needed, and those extra preempted containers aren’t even getting to the “starved” queue since scheduling logic is based on DRF's Calculator. Following are the two places : 1. {code:title=FSLeafQueue.java|borderStyle=solid} private boolean isStarved(Resource share) {code} A queue shouldn’t be marked as “starved” if the dominant resource usage is = fair/minshare. 2. {code:title=FairScheduler.java|borderStyle=solid} protected Resource resToPreempt(FSLeafQueue sched, long curTime) {code} -- One more thing that I believe needs to change in DRF mode is : during a preemption round,if preempting a few containers results in satisfying needs of a resource type, then we should exit that preemption round, since the containers that we just preempted should bring the dominant resource usage to min/fair share. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620139#comment-14620139 ] Hadoop QA commented on YARN-3381: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 20m 15s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 40s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 28s | The applied patch generated 2 new checkstyle issues (total was 48, now 50). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 6m 8s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 9m 4s | Tests passed in hadoop-mapreduce-client-app. | | {color:green}+1{color} | yarn tests | 6m 57s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 1m 57s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 6m 4s | Tests failed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 51m 1s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 123m 52s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.TestDeletionService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744425/YARN-3381-005.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 63d0365 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8468/artifact/patchprocess/trunkFindbugsWarningshadoop-mapreduce-client-app.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8468/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt | | hadoop-mapreduce-client-app test log | https://builds.apache.org/job/PreCommit-YARN-Build/8468/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8468/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8468/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8468/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8468/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8468/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8468/console | This message was automatically generated. A typographical error in InvalidStateTransitonException - Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Labels: BB2015-05-TBR Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level
[ https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S reassigned YARN-3885: - Assignee: Ajith S ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level -- Key: YARN-3885 URL: https://issues.apache.org/jira/browse/YARN-3885 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.8.0 Reporter: Ajith S Assignee: Ajith S Priority: Critical Attachments: YARN-3885.02.patch, YARN-3885.03.patch, YARN-3885.04.patch, YARN-3885.patch when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}} this piece of code, to calculate {{untoucable}} doesnt consider al the children, it considers only immediate childern -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3903) Disable preemption at Queue level for Fair Scheduler
He Tianyi created YARN-3903: --- Summary: Disable preemption at Queue level for Fair Scheduler Key: YARN-3903 URL: https://issues.apache.org/jira/browse/YARN-3903 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.3.0 Environment: 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 (2014-12-08) x86_64 Reporter: He Tianyi Priority: Trivial YARN-2056 supports disabling preemption at queue level for CapacityScheduler. As for fair scheduler, we recently encountered the same need. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level
[ https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated YARN-3885: -- Attachment: YARN-3885.04.patch ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level -- Key: YARN-3885 URL: https://issues.apache.org/jira/browse/YARN-3885 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.8.0 Reporter: Ajith S Priority: Critical Attachments: YARN-3885.02.patch, YARN-3885.03.patch, YARN-3885.04.patch, YARN-3885.patch when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}} this piece of code, to calculate {{untoucable}} doesnt consider al the children, it considers only immediate childern -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level
[ https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619971#comment-14619971 ] Ajith S commented on YARN-3885: --- Hi [~sunilg] Sorry for the delay, i have added the testcase ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level -- Key: YARN-3885 URL: https://issues.apache.org/jira/browse/YARN-3885 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.8.0 Reporter: Ajith S Priority: Critical Attachments: YARN-3885.02.patch, YARN-3885.03.patch, YARN-3885.04.patch, YARN-3885.patch when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}} this piece of code, to calculate {{untoucable}} doesnt consider al the children, it considers only immediate childern -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619994#comment-14619994 ] Hudson commented on YARN-2194: -- FAILURE: Integrated in Hadoop-trunk-Commit #8138 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8138/]) YARN-2194. Addendum patch to fix failing unit test in TestPrivilegedOperationExecutor. Contributed by Sidharta Seethana. (vvasudev: rev 63d0365088ff9fca0baaf3c4c3c01f80c72d3281) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/TestPrivilegedOperationExecutor.java Cgroups cease to work in RHEL7 -- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Wei Yan Assignee: Sidharta Seethana Priority: Critical Fix For: 2.8.0 Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the controller name leads to container launch failure. RHEL7 deprecates libcgroup and recommends the user of systemd. However, systemd has certain shortcomings as identified in this JIRA (see comments). This JIRA only fixes the failure, and doesn't try to use systemd. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620022#comment-14620022 ] Sidharta Seethana commented on YARN-2194: - Thanks [~vvasudev] - jenkins wasn't triggered so we all missed it, Cgroups cease to work in RHEL7 -- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Wei Yan Assignee: Sidharta Seethana Priority: Critical Fix For: 2.8.0 Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the controller name leads to container launch failure. RHEL7 deprecates libcgroup and recommends the user of systemd. However, systemd has certain shortcomings as identified in this JIRA (see comments). This JIRA only fixes the failure, and doesn't try to use systemd. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620084#comment-14620084 ] Tsuyoshi Ozawa commented on YARN-3381: -- Hmm, I'm still thinking of whether we should fix this or not. I know that this is a typo, but that makes more incompatible change for YARN apps. Currently, I prefer to preserve the typo as won't fix issue because the pros small and cons is much larger. A typographical error in InvalidStateTransitonException - Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Labels: BB2015-05-TBR Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381-006.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620372#comment-14620372 ] Hadoop QA commented on YARN-3381: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 20m 4s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 48s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 46s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 0s | The applied patch generated 1 new checkstyle issues (total was 48, now 49). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 22s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 6m 12s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 9m 5s | Tests passed in hadoop-mapreduce-client-app. | | {color:green}+1{color} | yarn tests | 6m 58s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 1m 59s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 6m 5s | Tests failed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 25m 24s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 98m 7s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.TestDeletionService | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebApp | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | | hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAppRunnability | | | hadoop.yarn.server.resourcemanager.TestApplicationACLs | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs | | | hadoop.yarn.server.resourcemanager.TestApplicationCleanup | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue | | | hadoop.yarn.server.resourcemanager.TestRM | | | hadoop.yarn.server.resourcemanager.TestRMNodeTransitions | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes | | | hadoop.yarn.server.resourcemanager.reservation.TestFairSchedulerPlanFollower | | | hadoop.yarn.server.resourcemanager.TestResourceTrackerService | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification | | | hadoop.yarn.server.resourcemanager.webapp.TestNodesPage | | | hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerFairShare | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestSchedulingUpdate | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppAttempt | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior | | | hadoop.yarn.server.resourcemanager.TestMoveApplication | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched | | | hadoop.yarn.server.resourcemanager.TestRMHAForNodeLabels | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerEventLog | | | hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher | | | hadoop.yarn.server.resourcemanager.TestRMProxyUsersConf | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestChildQueueOrder | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | | | hadoop.yarn.server.resourcemanager.reservation.TestCapacityReservationSystem | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesHttpStaticUserPermissions | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings | | |
[jira] [Updated] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-3798: - Attachment: (was: YARN-3798-branch-2.7.004.patch) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED --- Key: YARN-3798 URL: https://issues.apache.org/jira/browse/YARN-3798 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Assignee: Varun Saxena Priority: Blocker Attachments: RM.log, YARN-3798-2.7.002.patch, YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.patch RM going down with NoNode exception during create of znode for appattempt *Please find the exception logs* {code} 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2015-06-09 10:09:44,886 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation. org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) at java.lang.Thread.run(Thread.java:745) 2015-06-09 10:09:44,887 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed out ZK retries. Giving up! 2015-06-09 10:09:44,887 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error updating appAttempt: appattempt_1433764310492_7152_01 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) at
[jira] [Updated] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-3798: - Attachment: YARN-3798-branch-2.7.004.patch ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED --- Key: YARN-3798 URL: https://issues.apache.org/jira/browse/YARN-3798 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Assignee: Varun Saxena Priority: Blocker Attachments: RM.log, YARN-3798-2.7.002.patch, YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.patch RM going down with NoNode exception during create of znode for appattempt *Please find the exception logs* {code} 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2015-06-09 10:09:44,886 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation. org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) at java.lang.Thread.run(Thread.java:745) 2015-06-09 10:09:44,887 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed out ZK retries. Giving up! 2015-06-09 10:09:44,887 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error updating appAttempt: appattempt_1433764310492_7152_01 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) at
[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620353#comment-14620353 ] Hudson commented on YARN-2194: -- FAILURE: Integrated in Hadoop-Yarn-trunk #981 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/981/]) YARN-2194. Addendum patch to fix failing unit test in TestPrivilegedOperationExecutor. Contributed by Sidharta Seethana. (vvasudev: rev 63d0365088ff9fca0baaf3c4c3c01f80c72d3281) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/TestPrivilegedOperationExecutor.java Cgroups cease to work in RHEL7 -- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Wei Yan Assignee: Sidharta Seethana Priority: Critical Fix For: 2.8.0 Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the controller name leads to container launch failure. RHEL7 deprecates libcgroup and recommends the user of systemd. However, systemd has certain shortcomings as identified in this JIRA (see comments). This JIRA only fixes the failure, and doesn't try to use systemd. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620180#comment-14620180 ] Varun Saxena commented on YARN-3047: Thanks [~sjlee0] for the review and commit. And thanks to [~zjshen], [~gtCarrera9] and [~vrushalic] as well for the reviews. [Data Serving] Set up ATS reader with basic request serving structure and lifecycle --- Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Varun Saxena Fix For: YARN-2928 Attachments: Timeline_Reader(draft).pdf, YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, YARN-3047-YARN-2928.12.patch, YARN-3047-YARN-2928.13.patch, YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.005.patch, YARN-3047.006.patch, YARN-3047.007.patch, YARN-3047.02.patch, YARN-3047.04.patch Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620187#comment-14620187 ] Hadoop QA commented on YARN-3381: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 20m 31s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 48s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 41s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 24s | The applied patch generated 2 new checkstyle issues (total was 48, now 50). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 25s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 6m 12s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 9m 6s | Tests passed in hadoop-mapreduce-client-app. | | {color:green}+1{color} | yarn tests | 6m 51s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 6m 0s | Tests failed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 50m 51s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 124m 3s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744432/YARN-3381-006.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 63d0365 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8470/artifact/patchprocess/trunkFindbugsWarningshadoop-mapreduce-client-app.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8470/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt | | hadoop-mapreduce-client-app test log | https://builds.apache.org/job/PreCommit-YARN-Build/8470/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8470/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8470/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8470/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8470/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8470/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8470/console | This message was automatically generated. A typographical error in InvalidStateTransitonException - Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Labels: BB2015-05-TBR Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620189#comment-14620189 ] Akira AJISAKA commented on YARN-3069: - +1, looks good to me. Thanks [~rchiang] for updating the patch. I'll commit it on July 13 JST if there are no objections. Document missing properties in yarn-default.xml --- Key: YARN-3069 URL: https://issues.apache.org/jira/browse/YARN-3069 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Ray Chiang Assignee: Ray Chiang Labels: BB2015-05-TBR, supportability Attachments: YARN-3069.001.patch, YARN-3069.002.patch, YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, YARN-3069.009.patch, YARN-3069.010.patch, YARN-3069.011.patch, YARN-3069.012.patch, YARN-3069.013.patch The following properties are currently not defined in yarn-default.xml. These properties should either be A) documented in yarn-default.xml OR B) listed as an exception (with comments, e.g. for internal use) in the TestYarnConfigurationFields unit test Any comments for any of the properties below are welcome. org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore security.applicationhistory.protocol.acl yarn.app.container.log.backups yarn.app.container.log.dir yarn.app.container.log.filesize yarn.client.app-submission.poll-interval yarn.client.application-client-protocol.poll-timeout-ms yarn.is.minicluster yarn.log.server.url yarn.minicluster.control-resource-monitoring yarn.minicluster.fixed.ports yarn.minicluster.use-rpc yarn.node-labels.fs-store.retry-policy-spec yarn.node-labels.fs-store.root-dir yarn.node-labels.manager-class yarn.nodemanager.container-executor.os.sched.priority.adjustment yarn.nodemanager.container-monitor.process-tree.class yarn.nodemanager.disk-health-checker.enable yarn.nodemanager.docker-container-executor.image-name yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms yarn.nodemanager.linux-container-executor.group yarn.nodemanager.log.deletion-threads-count yarn.nodemanager.user-home-dir yarn.nodemanager.webapp.https.address yarn.nodemanager.webapp.spnego-keytab-file yarn.nodemanager.webapp.spnego-principal yarn.nodemanager.windows-secure-container-executor.group yarn.resourcemanager.configuration.file-system-based-store yarn.resourcemanager.delegation-token-renewer.thread-count yarn.resourcemanager.delegation.key.update-interval yarn.resourcemanager.delegation.token.max-lifetime yarn.resourcemanager.delegation.token.renew-interval yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size yarn.resourcemanager.metrics.runtime.buckets yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs yarn.resourcemanager.reservation-system.class yarn.resourcemanager.reservation-system.enable yarn.resourcemanager.reservation-system.plan.follower yarn.resourcemanager.reservation-system.planfollower.time-step yarn.resourcemanager.rm.container-allocation.expiry-interval-ms yarn.resourcemanager.webapp.spnego-keytab-file yarn.resourcemanager.webapp.spnego-principal yarn.scheduler.include-port-in-node-name yarn.timeline-service.delegation.key.update-interval yarn.timeline-service.delegation.token.max-lifetime yarn.timeline-service.delegation.token.renew-interval yarn.timeline-service.generic-application-history.enabled yarn.timeline-service.generic-application-history.fs-history-store.compression-type yarn.timeline-service.generic-application-history.fs-history-store.uri yarn.timeline-service.generic-application-history.store-class yarn.timeline-service.http-cross-origin.enabled yarn.tracking.url.generator -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3069) Document missing properties in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-3069: Target Version/s: 2.8.0 Hadoop Flags: Reviewed Document missing properties in yarn-default.xml --- Key: YARN-3069 URL: https://issues.apache.org/jira/browse/YARN-3069 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Ray Chiang Assignee: Ray Chiang Labels: BB2015-05-TBR, supportability Attachments: YARN-3069.001.patch, YARN-3069.002.patch, YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, YARN-3069.009.patch, YARN-3069.010.patch, YARN-3069.011.patch, YARN-3069.012.patch, YARN-3069.013.patch The following properties are currently not defined in yarn-default.xml. These properties should either be A) documented in yarn-default.xml OR B) listed as an exception (with comments, e.g. for internal use) in the TestYarnConfigurationFields unit test Any comments for any of the properties below are welcome. org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore security.applicationhistory.protocol.acl yarn.app.container.log.backups yarn.app.container.log.dir yarn.app.container.log.filesize yarn.client.app-submission.poll-interval yarn.client.application-client-protocol.poll-timeout-ms yarn.is.minicluster yarn.log.server.url yarn.minicluster.control-resource-monitoring yarn.minicluster.fixed.ports yarn.minicluster.use-rpc yarn.node-labels.fs-store.retry-policy-spec yarn.node-labels.fs-store.root-dir yarn.node-labels.manager-class yarn.nodemanager.container-executor.os.sched.priority.adjustment yarn.nodemanager.container-monitor.process-tree.class yarn.nodemanager.disk-health-checker.enable yarn.nodemanager.docker-container-executor.image-name yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms yarn.nodemanager.linux-container-executor.group yarn.nodemanager.log.deletion-threads-count yarn.nodemanager.user-home-dir yarn.nodemanager.webapp.https.address yarn.nodemanager.webapp.spnego-keytab-file yarn.nodemanager.webapp.spnego-principal yarn.nodemanager.windows-secure-container-executor.group yarn.resourcemanager.configuration.file-system-based-store yarn.resourcemanager.delegation-token-renewer.thread-count yarn.resourcemanager.delegation.key.update-interval yarn.resourcemanager.delegation.token.max-lifetime yarn.resourcemanager.delegation.token.renew-interval yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size yarn.resourcemanager.metrics.runtime.buckets yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs yarn.resourcemanager.reservation-system.class yarn.resourcemanager.reservation-system.enable yarn.resourcemanager.reservation-system.plan.follower yarn.resourcemanager.reservation-system.planfollower.time-step yarn.resourcemanager.rm.container-allocation.expiry-interval-ms yarn.resourcemanager.webapp.spnego-keytab-file yarn.resourcemanager.webapp.spnego-principal yarn.scheduler.include-port-in-node-name yarn.timeline-service.delegation.key.update-interval yarn.timeline-service.delegation.token.max-lifetime yarn.timeline-service.delegation.token.renew-interval yarn.timeline-service.generic-application-history.enabled yarn.timeline-service.generic-application-history.fs-history-store.compression-type yarn.timeline-service.generic-application-history.fs-history-store.uri yarn.timeline-service.generic-application-history.store-class yarn.timeline-service.http-cross-origin.enabled yarn.tracking.url.generator -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
[ https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620218#comment-14620218 ] Hadoop QA commented on YARN-3453: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 5s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 43s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 47s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 6s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 20s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 27s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 50m 58s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 89m 1s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/1276/YARN-3453.4.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 63d0365 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8474/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8474/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8474/console | This message was automatically generated. Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing Key: YARN-3453 URL: https://issues.apache.org/jira/browse/YARN-3453 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Ashwin Shankar Assignee: Arun Suresh Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch, YARN-3453.4.patch There are two places in preemption code flow where DefaultResourceCalculator is used, even in DRF mode. Which basically results in more resources getting preempted than needed, and those extra preempted containers aren’t even getting to the “starved” queue since scheduling logic is based on DRF's Calculator. Following are the two places : 1. {code:title=FSLeafQueue.java|borderStyle=solid} private boolean isStarved(Resource share) {code} A queue shouldn’t be marked as “starved” if the dominant resource usage is = fair/minshare. 2. {code:title=FairScheduler.java|borderStyle=solid} protected Resource resToPreempt(FSLeafQueue sched, long curTime) {code} -- One more thing that I believe needs to change in DRF mode is : during a preemption round,if preempting a few containers results in satisfying needs of a resource type, then we should exit that preemption round, since the containers that we just preempted should bring the dominant resource usage to min/fair share. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
[ https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620228#comment-14620228 ] Hadoop QA commented on YARN-3453: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 0s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 44s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 6s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 61m 14s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 99m 10s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/1276/YARN-3453.4.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 63d0365 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8473/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8473/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8473/console | This message was automatically generated. Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing Key: YARN-3453 URL: https://issues.apache.org/jira/browse/YARN-3453 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Ashwin Shankar Assignee: Arun Suresh Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch, YARN-3453.4.patch There are two places in preemption code flow where DefaultResourceCalculator is used, even in DRF mode. Which basically results in more resources getting preempted than needed, and those extra preempted containers aren’t even getting to the “starved” queue since scheduling logic is based on DRF's Calculator. Following are the two places : 1. {code:title=FSLeafQueue.java|borderStyle=solid} private boolean isStarved(Resource share) {code} A queue shouldn’t be marked as “starved” if the dominant resource usage is = fair/minshare. 2. {code:title=FairScheduler.java|borderStyle=solid} protected Resource resToPreempt(FSLeafQueue sched, long curTime) {code} -- One more thing that I believe needs to change in DRF mode is : during a preemption round,if preempting a few containers results in satisfying needs of a resource type, then we should exit that preemption round, since the containers that we just preempted should bring the dominant resource usage to min/fair share. -- This message was sent by Atlassian JIRA (v6.3.4#6332)