date:20150709


[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620597#comment-14620597
 ] 

Hudson commented on YARN-2194:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2178 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2178/])
YARN-2194. Addendum patch to fix failing unit test in 
TestPrivilegedOperationExecutor. Contributed by Sidharta Seethana. (vvasudev: 
rev 63d0365088ff9fca0baaf3c4c3c01f80c72d3281)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/TestPrivilegedOperationExecutor.java


 Cgroups cease to work in RHEL7
 --

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Wei Yan
Assignee: Sidharta Seethana
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
 YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch


 In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the 
 controller name leads to container launch failure. 
 RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
 systemd has certain shortcomings as identified in this JIRA (see comments). 
 This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2003:
--
Attachment: 0019-YARN-2003.patch

Updating patch after fixing few more comments.

 Support for Application priority : Changes in RM and Capacity Scheduler
 ---

 Key: YARN-2003
 URL: https://issues.apache.org/jira/browse/YARN-2003
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
  Labels: BB2015-05-TBR
 Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 
 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 
 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 
 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 
 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 
 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 
 0018-YARN-2003.patch, 0019-YARN-2003.patch


 AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from 
 Submission Context and store.
 Later this can be used by Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing

2015-07-09 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620643#comment-14620643
 ] 

Arun Suresh commented on YARN-3453:
---

Test case failure is un-related. Jenkins had passed when i kicked it off 
manually 
[here|https://issues.apache.org/jira/browse/YARN-3453?focusedCommentId=14620218page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14620218]

 Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator 
 even in DRF mode causing thrashing
 

 Key: YARN-3453
 URL: https://issues.apache.org/jira/browse/YARN-3453
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Ashwin Shankar
Assignee: Arun Suresh
 Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch, 
 YARN-3453.4.patch


 There are two places in preemption code flow where DefaultResourceCalculator 
 is used, even in DRF mode.
 Which basically results in more resources getting preempted than needed, and 
 those extra preempted containers aren’t even getting to the “starved” queue 
 since scheduling logic is based on DRF's Calculator.
 Following are the two places :
 1. {code:title=FSLeafQueue.java|borderStyle=solid}
 private boolean isStarved(Resource share)
 {code}
 A queue shouldn’t be marked as “starved” if the dominant resource usage
 is =  fair/minshare.
 2. {code:title=FairScheduler.java|borderStyle=solid}
 protected Resource resToPreempt(FSLeafQueue sched, long curTime)
 {code}
 --
 One more thing that I believe needs to change in DRF mode is : during a 
 preemption round,if preempting a few containers results in satisfying needs 
 of a resource type, then we should exit that preemption round, since the 
 containers that we just preempted should bring the dominant resource usage to 
 min/fair share.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7


[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620703#comment-14620703
 ] 

Hudson commented on YARN-2194:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2197 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2197/])
YARN-2194. Addendum patch to fix failing unit test in 
TestPrivilegedOperationExecutor. Contributed by Sidharta Seethana. (vvasudev: 
rev 63d0365088ff9fca0baaf3c4c3c01f80c72d3281)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/TestPrivilegedOperationExecutor.java


 Cgroups cease to work in RHEL7
 --

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Wei Yan
Assignee: Sidharta Seethana
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
 YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch


 In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the 
 controller name leads to container launch failure. 
 RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
 systemd has certain shortcomings as identified in this JIRA (see comments). 
 This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-07-09 Thread Jun Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620672#comment-14620672
 ] 

Jun Gong commented on YARN-3896:


[~devaraj.k] , a test case is added in the new patch. Thanks for reviewing.

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3896.01.patch, YARN-3896.02.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7


[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620578#comment-14620578
 ] 

Hudson commented on YARN-2194:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #239 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/239/])
YARN-2194. Addendum patch to fix failing unit test in 
TestPrivilegedOperationExecutor. Contributed by Sidharta Seethana. (vvasudev: 
rev 63d0365088ff9fca0baaf3c4c3c01f80c72d3281)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/TestPrivilegedOperationExecutor.java


 Cgroups cease to work in RHEL7
 --

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Wei Yan
Assignee: Sidharta Seethana
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
 YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch


 In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the 
 controller name leads to container launch failure. 
 RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
 systemd has certain shortcomings as identified in this JIRA (see comments). 
 This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7


[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620680#comment-14620680
 ] 

Hudson commented on YARN-2194:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #249 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/249/])
YARN-2194. Addendum patch to fix failing unit test in 
TestPrivilegedOperationExecutor. Contributed by Sidharta Seethana. (vvasudev: 
rev 63d0365088ff9fca0baaf3c4c3c01f80c72d3281)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/TestPrivilegedOperationExecutor.java


 Cgroups cease to work in RHEL7
 --

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Wei Yan
Assignee: Sidharta Seethana
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
 YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch


 In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the 
 controller name leads to container launch failure. 
 RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
 systemd has certain shortcomings as identified in this JIRA (see comments). 
 This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-09 Thread Konstantinos Karanasos (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620779#comment-14620779
]

Konstantinos Karanasos commented on YARN-3116:
--

[~giovanni.fumarola], [~xgong], [~zjshen]:
Given you are thinking to substitute the boolean with an enum that indicates
the container type, I think this is becoming very related to YARN-2882, which
is part of the more general YARN-2877 (that introduces distributed scheduling
in YARN).

In YARN-2882, we introduce container types to differentiate between GUARANTEED
containers allocated by the central RM, and QUEUEABLE containers allocated by
one of the distributed schedulers.

We already have a patch available for this JIRA.
What would be interesting to see is whether the AM_CONTAINER should become yet
another type of container or whether it should be a separate field within the
container type.
The former would probably make more sense at the current implementation, as an
AM_CONTAINER can only be allocated by the central RM (we cannot have a
QUEUEABLE container that is also an AM_CONTAINER).
The latter, however, would probably give more flexibility.

[Collector wireup] We need an assured way to determine if a container is an
AM container on NM
--

Key: YARN-3116
URL: https://issues.apache.org/jira/browse/YARN-3116
Project: Hadoop YARN
Issue Type: Sub-task
Components: nodemanager, timelineserver
Reporter: Zhijie Shen
Assignee: Giovanni Matteo Fumarola
Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch,
YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch,
YARN-3116.v7.patch, YARN-3116.v8.patch

In YARN-3030, to start the per-app aggregator only for a started AM
container, we need to determine if the container is an AM container or not
from the context in NM (we can do it on RM). This information is missing,
such that we worked around to considered the container with ID _01 as
the AM container. Unfortunately, this is neither necessary or sufficient
condition. We need to have a way to determine if a container is an AM
container on NM. We can add flag to the container object or create an API to
do the judgement. Perhaps the distributed AM information may also be useful
to YARN-2877.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop


[ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620836#comment-14620836
 ] 

Hudson commented on YARN-3878:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8140 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8140/])
YARN-3878. AsyncDispatcher can hang while stopping if it is configured for 
draining events on stop. (Varun Saxena via kasha) (kasha: rev 
aa067c6aa47b4c79577096817acc00ad6421180c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java
* hadoop-yarn-project/CHANGES.txt


 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, 
 YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, 
 YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a

[jira] [Commented] (YARN-3866) AM-RM protocol changes to support container resizing

2015-07-09 Thread MENG DING (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620718#comment-14620718
]

MENG DING commented on YARN-3866:
-

Thanks [~jianhe] for the review!

bq. Mark all getters/setters unstable for now
Will do
bq. DecreasedContainer.java/IncreasedContainer.java - how about reusing the
Container.java object?
This seems to be a better approach, and does simplify the code quite a bit. I
can't think of anything wrong about it. If nobody else oppose this, I will go
ahead changing it.
bq. increaseRequests/decreaseRequests - We may just pass one list of
changeResourceRequests instead of differentiating whether it’s increase or
decrease ? as the underlying implementations are the same. IMO, this also saves
application writers from differentiating them programmatically.
Actually we thought about using a single changeResourceRequests, the main
reasons that we separate them are:
* We do want application writers to make a conscious decision about whether
they are making an increase request or decrease request, and tell Resource
Manager explicitly, such that if they make a mistake, RM will be able to catch
that. For example, if a user intends to increase a container resource, but made
a mistake by passing in a resource value smaller than the current resource
allocation, RM will catch this and will NOT actually decrease the resource. If
a user had sent a changeResourceRequest to RM, RM would not know the original
intention, and go ahead decrease the resource. As a result, the container may
be killed if memory enforcement is enabled.
* Reduce the logic in RM to check if a request is for increase or decrease
(less of a concern).

Let me know if the above concerns make sense to you or not.

AM-RM protocol changes to support container resizing

Key: YARN-3866
URL: https://issues.apache.org/jira/browse/YARN-3866
Project: Hadoop YARN
Issue Type: Sub-task
Components: api
Reporter: MENG DING
Assignee: MENG DING
Attachments: YARN-3866.1.patch, YARN-3866.2.patch

YARN-1447 and YARN-1448 are outdated.
This ticket deals with AM-RM Protocol changes to support container resize
according to the latest design in YARN-1197.
1) Add increase/decrease requests in AllocateRequest
2) Get approved increase/decrease requests from RM in AllocateResponse
3) Add relevant test cases

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run table

[
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620799#comment-14620799
]

Sangjin Lee commented on YARN-3901:
---

Hi [~zjshen], I was going to file JIRAs that covers splitting the application
table and creating the app-to-flow table as well as the flow-version table, and
work on them. Would you like to work on the app-to-flow table? I could then
cover the others. Let me know.

Populate flow run data in the flow_run table

Key: YARN-3901
URL: https://issues.apache.org/jira/browse/YARN-3901
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Vrushali C
Assignee: Vrushali C

As per the schema proposed in YARN-3815 in
https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
filing jira to track creation and population of data in the flow run table.
Some points that are being considered:
- Stores per flow run information aggregated across applications, flow version
RM’s collector writes to on app creation and app completion
- Per App collector writes to it for metric updates at a slower frequency
than the metric updates to application table
primary key: cluster ! user ! flow ! flow run id
- Only the latest version of flow-level aggregated metrics will be kept, even
if the entity and application level keep a timeseries.
- The running_apps column will be incremented on app creation, and
decremented on app completion.
- For min_start_time the RM writer will simply write a value with the tag for
the applicationId. A coprocessor will return the min value of all written
values. -
- Upon flush and compactions, the min value between all the cells of this
column will be written to the cell without any tag (empty tag) and all the
other cells will be discarded.
- Ditto for the max_end_time, but then the max will be kept.
- Tags are represented as #type:value. The type can be not set (0), or can
indicate running (1) or complete (2). In those cases (for metrics) only
complete app metrics are collapsed on compaction.
- The m! values are aggregated (summed) upon read. Only when applications are
completed (indicated by tag type 2) can the values be collapsed.
- The application ids that have completed and been aggregated into the flow
numbers are retained in a separate column for historical tracking: we don’t
want to re-aggregate for those upon replay

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3445) Cache runningApps in RMNode for getting running apps on given NodeId


[ 
https://issues.apache.org/jira/browse/YARN-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620814#comment-14620814
 ] 

Hadoop QA commented on YARN-3445:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 38s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 41s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 34s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m  4s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 20s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 14s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | tools/hadoop tests |   0m 52s | Tests passed in 
hadoop-sls. |
| {color:green}+1{color} | yarn tests |  51m  2s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  91m 24s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744509/YARN-3445-v5.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / fffb15b |
| hadoop-sls test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8480/artifact/patchprocess/testrun_hadoop-sls.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8480/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8480/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8480/console |


This message was automatically generated.

 Cache runningApps in RMNode for getting running apps on given NodeId
 

 Key: YARN-3445
 URL: https://issues.apache.org/jira/browse/YARN-3445
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3445-v2.patch, YARN-3445-v3.1.patch, 
 YARN-3445-v3.patch, YARN-3445-v4.1.patch, YARN-3445-v4.patch, 
 YARN-3445-v5.1.patch, YARN-3445-v5.patch, YARN-3445.patch


 Per discussion in YARN-3334, we need filter out unnecessary collectors info 
 from RM in heartbeat response. Our propose is to add cache for runningApps in 
 RMNode, so RM only send collectors for local running apps back. This is also 
 needed in YARN-914 (graceful decommission) that if no running apps in NM 
 which is in decommissioning stage, it will get decommissioned immediately. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3902) Fair scheduler preempts ApplicationMaster

2015-07-09 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3902:
---
Assignee: Arun Suresh

 Fair scheduler preempts ApplicationMaster
 -

 Key: YARN-3902
 URL: https://issues.apache.org/jira/browse/YARN-3902
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.3.0
 Environment: 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 
 (2014-12-08) x86_64
Reporter: He Tianyi
Assignee: Arun Suresh
   Original Estimate: 72h
  Remaining Estimate: 72h

 YARN-2022 have fixed the similar issue related to CapacityScheduler.
 However, FairScheduler still suffer, preempting AM while other normal 
 containers running out there.
 I think we should take the same approach, avoid AM being preempted unless 
 there is no container running other than AM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3903) Disable preemption at Queue level for Fair Scheduler

2015-07-09 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-3903:
--

Assignee: Karthik Kambatla

 Disable preemption at Queue level for Fair Scheduler
 

 Key: YARN-3903
 URL: https://issues.apache.org/jira/browse/YARN-3903
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.3.0
 Environment: 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 
 (2014-12-08) x86_64
Reporter: He Tianyi
Assignee: Karthik Kambatla
Priority: Trivial
   Original Estimate: 72h
  Remaining Estimate: 72h

 YARN-2056 supports disabling preemption at queue level for CapacityScheduler.
 As for fair scheduler, we recently encountered the same need.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620730#comment-14620730
 ] 

Hadoop QA commented on YARN-2003:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m  9s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 18 new or modified test files. |
| {color:green}+1{color} | javac |   7m 38s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 28s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:red}-1{color} | whitespace |   0m 52s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 24s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 49s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | tools/hadoop tests |   0m 54s | Tests passed in 
hadoop-sls. |
| {color:green}+1{color} | yarn tests |   0m 22s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |  49m 57s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  94m 21s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
 |
|   | hadoop.yarn.server.resourcemanager.TestApplicationACLs |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
|   | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
|   | hadoop.yarn.server.resourcemanager.TestResourceManager |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerHealth |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744501/0019-YARN-2003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / fffb15b |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8479/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8479/artifact/patchprocess/whitespace.txt
 |
| hadoop-sls test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8479/artifact/patchprocess/testrun_hadoop-sls.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8479/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8479/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8479/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8479/console |


This message was automatically generated.

 Support for Application priority : Changes in RM and Capacity Scheduler
 ---

 Key: YARN-2003
 URL: https://issues.apache.org/jira/browse/YARN-2003
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
  Labels: BB2015-05-TBR
 Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 
 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 
 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 
 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 
 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 
 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 
 0018-YARN-2003.patch, 0019-YARN-2003.patch


 AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from 
 Submission Context and store.
 Later this can be used by Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2003:
--
Attachment: 0019-YARN-2003.patch

Fixed few test failures.

 Support for Application priority : Changes in RM and Capacity Scheduler
 ---

 Key: YARN-2003
 URL: https://issues.apache.org/jira/browse/YARN-2003
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
  Labels: BB2015-05-TBR
 Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 
 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 
 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 
 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 
 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 
 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 
 0018-YARN-2003.patch, 0019-YARN-2003.patch


 AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from 
 Submission Context and store.
 Later this can be used by Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model

2015-07-09 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620861#comment-14620861
 ] 

Li Lu commented on YARN-3836:
-

bq. Regarding metric, can't id uniquely identify a metric ? Do we expect two 
metrics to share same id for different types ?
This is a tricky point. and I'm thinking out loud... Under normal circumstances 
it's fine to only check the id of metrics. However, since we're making 
different assumptions on the internal data of different types, is it possible 
that under some use cases users may mistakenly or accidentally confuse them? If 
this is possible we may want to check both types and ids. 

 add equals and hashCode to TimelineEntity and other classes in the data model
 -

 Key: YARN-3836
 URL: https://issues.apache.org/jira/browse/YARN-3836
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Li Lu
 Attachments: YARN-3836-YARN-2928.001.patch, 
 YARN-3836-YARN-2928.002.patch


 Classes in the data model API (e.g. {{TimelineEntity}}, 
 {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or 
 {{hashCode()}}. This can cause problems when these objects are used in a 
 collection such as a {{HashSet}}. We should implement these methods wherever 
 appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-3754) Race condition when the NodeManager is shutting down and container is launched


 [ 
https://issues.apache.org/jira/browse/YARN-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G resolved YARN-3754.
---
Resolution: Not A Problem

I am closing this  issue as it is not happening in trunk.
[~bibinchundatt] please reopen otherwise.

 Race condition when the NodeManager is shutting down and container is launched
 --

 Key: YARN-3754
 URL: https://issues.apache.org/jira/browse/YARN-3754
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Assignee: Sunil G
Priority: Critical
 Attachments: NM.log


 Container is launched and returned to ContainerImpl
 NodeManager closed the DB connection which resulting in 
 {{org.iq80.leveldb.DBException: Closed}}. 
 *Attaching the exception trace*
 {code}
 2015-05-30 02:11:49,122 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
  Unable to update state store diagnostics for 
 container_e310_1432817693365_3338_01_02
 java.io.IOException: org.iq80.leveldb.DBException: Closed
 at 
 org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostics(NMLeveldbStateStoreService.java:261)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ContainerDiagnosticsUpdateTransition.transition(ContainerImpl.java:1109)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ContainerDiagnosticsUpdateTransition.transition(ContainerImpl.java:1101)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1129)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:83)
 at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:246)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: org.iq80.leveldb.DBException: Closed
 at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:123)
 at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:106)
 at 
 org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostics(NMLeveldbStateStoreService.java:259)
 ... 15 more
 {code}
 we can add a check whether DB is closed while we move container from ACQUIRED 
 state.
 As per the discussion in YARN-3585 have add the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2003:
--
Attachment: (was: 0019-YARN-2003.patch)

 Support for Application priority : Changes in RM and Capacity Scheduler
 ---

 Key: YARN-2003
 URL: https://issues.apache.org/jira/browse/YARN-2003
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
  Labels: BB2015-05-TBR
 Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 
 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 
 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 
 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 
 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 
 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 
 0018-YARN-2003.patch


 AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from 
 Submission Context and store.
 Later this can be used by Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model


[ 
https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620868#comment-14620868
 ] 

Sangjin Lee commented on YARN-3836:
---

I tend to think that using type + id is probably a slightly better idea. 
Currently the type is between single data vs. time series. For the most part, 
the id should be unique across the board. One interesting scenario is if a 
metric changes from a single data to a time series (or vice versa). Again, this 
is probably not something that should happen often, if ever. But if it should 
happen, I happen to think that they need to be considered two different 
metrics. My 2 cents.

 add equals and hashCode to TimelineEntity and other classes in the data model
 -

 Key: YARN-3836
 URL: https://issues.apache.org/jira/browse/YARN-3836
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Li Lu
 Attachments: YARN-3836-YARN-2928.001.patch, 
 YARN-3836-YARN-2928.002.patch


 Classes in the data model API (e.g. {{TimelineEntity}}, 
 {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or 
 {{hashCode()}}. This can cause problems when these objects are used in a 
 collection such as a {{HashSet}}. We should implement these methods wherever 
 appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3534) Collect memory/cpu usage on the node


[ 
https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621023#comment-14621023
 ] 

Hadoop QA commented on YARN-3534:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 13s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:red}-1{color} | javac |   2m 59s | The patch appears to cause the 
build to fail. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744528/YARN-3534-14.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / ac60483 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8483/console |


This message was automatically generated.

 Collect memory/cpu usage on the node
 

 Key: YARN-3534
 URL: https://issues.apache.org/jira/browse/YARN-3534
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Attachments: YARN-3534-1.patch, YARN-3534-10.patch, 
 YARN-3534-11.patch, YARN-3534-12.patch, YARN-3534-14.patch, 
 YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch, YARN-3534-4.patch, 
 YARN-3534-5.patch, YARN-3534-6.patch, YARN-3534-7.patch, YARN-3534-8.patch, 
 YARN-3534-9.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 YARN should be aware of the resource utilization of the nodes when scheduling 
 containers. For this, this task will implement the collection of memory/cpu 
 usage on the node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3534) Collect memory/cpu usage on the node

2015-07-09 Thread Inigo Goiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-3534:
--
Attachment: YARN-3534-14.patch

Updated to trunk (using ResourceUtilization already there).

 Collect memory/cpu usage on the node
 

 Key: YARN-3534
 URL: https://issues.apache.org/jira/browse/YARN-3534
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Attachments: YARN-3534-1.patch, YARN-3534-10.patch, 
 YARN-3534-11.patch, YARN-3534-12.patch, YARN-3534-14.patch, 
 YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch, YARN-3534-4.patch, 
 YARN-3534-5.patch, YARN-3534-6.patch, YARN-3534-7.patch, YARN-3534-8.patch, 
 YARN-3534-9.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 YARN should be aware of the resource utilization of the nodes when scheduling 
 containers. For this, this task will implement the collection of memory/cpu 
 usage on the node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2015-07-09 Thread Anubhav Dhoot (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620965#comment-14620965
 ] 

Anubhav Dhoot commented on YARN-2005:
-

[~jlowe] appreciate your review of the updated patch

 Blacklisting support for scheduling AMs
 ---

 Key: YARN-2005
 URL: https://issues.apache.org/jira/browse/YARN-2005
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 0.23.10, 2.4.0
Reporter: Jason Lowe
Assignee: Anubhav Dhoot
 Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
 YARN-2005.003.patch


 It would be nice if the RM supported blacklisting a node for an AM launch 
 after the same node fails a configurable number of AM attempts.  This would 
 be similar to the blacklisting support for scheduling task attempts in the 
 MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model


[ 
https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621010#comment-14621010
 ] 

Sangjin Lee commented on YARN-3836:
---

Thanks for updating the patch [~gtCarrera9].

I went over the latest patch (v.2), and here is my input:

(TimelineEntity.java)
- l.109: Nit: actually {{obj instanceof Identifier}} returns false if {{obj}} 
is {{null}}. Therefore, you can safely omit the {{obj == null}} check. The same 
goes for the other classes.
- l.533: Shouldn't we check for null from {{getIdentifier()}}? We cannot 
guarantee that it will be called only by callers who checked {{isValid()}}
- l.545: same here
- l.550: It sounds like now the type takes precedence over the created time in 
the sort order in this version. Is this intended? If not (timestamp is supposed 
to be first), it might be a good idea to have {{Identifier}} implement 
{{Comparable}} as well and use that in {{TimelineEntity.compareTo()}}.

(TimelineMetric.java)
- l.149-155: it would perform a little faster to check the id first and then 
the type

 add equals and hashCode to TimelineEntity and other classes in the data model
 -

 Key: YARN-3836
 URL: https://issues.apache.org/jira/browse/YARN-3836
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Li Lu
 Attachments: YARN-3836-YARN-2928.001.patch, 
 YARN-3836-YARN-2928.002.patch


 Classes in the data model API (e.g. {{TimelineEntity}}, 
 {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or 
 {{hashCode()}}. This can cause problems when these objects are used in a 
 collection such as a {{HashSet}}. We should implement these methods wherever 
 appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3904) Adopt PhoenixTimelineWriter into time-based aggregation storage

2015-07-09 Thread Li Lu (JIRA)

Li Lu created YARN-3904:
---

 Summary: Adopt PhoenixTimelineWriter into time-based aggregation 
storage
 Key: YARN-3904
 URL: https://issues.apache.org/jira/browse/YARN-3904
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu


After we finished the design for time-based aggregation, we can adopt our 
existing Phoenix storage into the storage of the aggregated data. This JIRA 
proposes to move the Phoenix storage implementation from 
o.a.h.yarn.server.timelineservice.storage to 
o.a.h.yarn.server.timelineservice.aggregation.timebased, and make it a fully 
devoted writer for time-based aggregation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model


[ 
https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621033#comment-14621033
 ] 

Sangjin Lee commented on YARN-3836:
---

I take back my comment about the null check for {{getIdentifier()}}. Looking at 
it, I see that {{getIdentifier()}} will never return null. Sorry for the 
confusion.

 add equals and hashCode to TimelineEntity and other classes in the data model
 -

 Key: YARN-3836
 URL: https://issues.apache.org/jira/browse/YARN-3836
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Li Lu
 Attachments: YARN-3836-YARN-2928.001.patch, 
 YARN-3836-YARN-2928.002.patch


 Classes in the data model API (e.g. {{TimelineEntity}}, 
 {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or 
 {{hashCode()}}. This can cause problems when these objects are used in a 
 collection such as a {{HashSet}}. We should implement these methods wherever 
 appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM


[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621034#comment-14621034
 ] 

Zhijie Shen commented on YARN-3116:
---

[~kkaranasos], thanks for notifying us of YARN-2882. I took a quick look at the 
jira. Our approach seems to be similar, but it seems that we're on parallel 
tracks. While YARN-2882 defines two container type for container related API so 
as to differ the container request to RM or NM, what we want to label a 
container here aims to let NM know if the container hosts AM or not. This is 
completely internal information, and users are blind to this type and also not 
able to set/change it. And this is why we propose to pass this information via 
ContainerTokenIndentifier instead of ContainerLaunchContext. Thoughts?

 [Collector wireup] We need an assured way to determine if a container is an 
 AM container on NM
 --

 Key: YARN-3116
 URL: https://issues.apache.org/jira/browse/YARN-3116
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, timelineserver
Reporter: Zhijie Shen
Assignee: Giovanni Matteo Fumarola
 Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, 
 YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, 
 YARN-3116.v7.patch, YARN-3116.v8.patch


 In YARN-3030, to start the per-app aggregator only for a started AM 
 container,  we need to determine if the container is an AM container or not 
 from the context in NM (we can do it on RM). This information is missing, 
 such that we worked around to considered the container with ID _01 as 
 the AM container. Unfortunately, this is neither necessary or sufficient 
 condition. We need to have a way to determine if a container is an AM 
 container on NM. We can add flag to the container object or create an API to 
 do the judgement. Perhaps the distributed AM information may also be useful 
 to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run table

[
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621051#comment-14621051
]

Zhijie Shen commented on YARN-3901:
---

Yeah, I have dependency on this table for reader. If nobody is working on this
table, I can take care of it.

Populate flow run data in the flow_run table

Key: YARN-3901
URL: https://issues.apache.org/jira/browse/YARN-3901
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Vrushali C
Assignee: Vrushali C

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-09 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621089#comment-14621089
 ] 

Giovanni Matteo Fumarola commented on YARN-3116:


[~kkaranasos], thanks for the observation. As [~zjshen] rightly pointed out, 
YARN-2882 and YARN-3116 are complementary. We are adding the container type 
enum to notify the NM whether its an AM container or not. This is purely 
internal and we deliberately don't want to expose it to the Application for 
security reasons while in YARN-2882, you want to expose the container type to 
the application.

 [Collector wireup] We need an assured way to determine if a container is an 
 AM container on NM
 --

 Key: YARN-3116
 URL: https://issues.apache.org/jira/browse/YARN-3116
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, timelineserver
Reporter: Zhijie Shen
Assignee: Giovanni Matteo Fumarola
 Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, 
 YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, 
 YARN-3116.v7.patch, YARN-3116.v8.patch


 In YARN-3030, to start the per-app aggregator only for a started AM 
 container,  we need to determine if the container is an AM container or not 
 from the context in NM (we can do it on RM). This information is missing, 
 such that we worked around to considered the container with ID _01 as 
 the AM container. Unfortunately, this is neither necessary or sufficient 
 condition. We need to have a way to determine if a container is an AM 
 container on NM. We can add flag to the container object or create an API to 
 do the judgement. Perhaps the distributed AM information may also be useful 
 to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620925#comment-14620925
 ] 

Hadoop QA commented on YARN-2003:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 42s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 18 new or modified test files. |
| {color:green}+1{color} | javac |   8m  9s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  8s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 49s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:red}-1{color} | whitespace |   0m 37s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 24s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 56s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | tools/hadoop tests |   0m 51s | Tests passed in 
hadoop-sls. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |  48m 49s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  96m  7s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.TestResourceTrackerService |
|   | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates |
|   | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification |
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestNodesPage |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744516/0019-YARN-2003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / fffb15b |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8482/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8482/artifact/patchprocess/whitespace.txt
 |
| hadoop-sls test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8482/artifact/patchprocess/testrun_hadoop-sls.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8482/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8482/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8482/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8482/console |


This message was automatically generated.

 Support for Application priority : Changes in RM and Capacity Scheduler
 ---

 Key: YARN-2003
 URL: https://issues.apache.org/jira/browse/YARN-2003
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
  Labels: BB2015-05-TBR
 Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 
 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 
 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 
 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 
 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 
 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 
 0018-YARN-2003.patch, 0019-YARN-2003.patch


 AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from 
 Submission Context and store.
 Later this can be used by Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml

2015-07-09 Thread Ray Chiang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620947#comment-14620947
 ] 

Ray Chiang commented on YARN-3069:
--

Thanks Akira!  I'll be happy to see one of these XML verifiers pushed all the 
way through.

 Document missing properties in yarn-default.xml
 ---

 Key: YARN-3069
 URL: https://issues.apache.org/jira/browse/YARN-3069
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: BB2015-05-TBR, supportability
 Attachments: YARN-3069.001.patch, YARN-3069.002.patch, 
 YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, 
 YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, 
 YARN-3069.009.patch, YARN-3069.010.patch, YARN-3069.011.patch, 
 YARN-3069.012.patch, YARN-3069.013.patch


 The following properties are currently not defined in yarn-default.xml.  
 These properties should either be
   A) documented in yarn-default.xml OR
   B)  listed as an exception (with comments, e.g. for internal use) in the 
 TestYarnConfigurationFields unit test
 Any comments for any of the properties below are welcome.
   org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker
   org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore
   security.applicationhistory.protocol.acl
   yarn.app.container.log.backups
   yarn.app.container.log.dir
   yarn.app.container.log.filesize
   yarn.client.app-submission.poll-interval
   yarn.client.application-client-protocol.poll-timeout-ms
   yarn.is.minicluster
   yarn.log.server.url
   yarn.minicluster.control-resource-monitoring
   yarn.minicluster.fixed.ports
   yarn.minicluster.use-rpc
   yarn.node-labels.fs-store.retry-policy-spec
   yarn.node-labels.fs-store.root-dir
   yarn.node-labels.manager-class
   yarn.nodemanager.container-executor.os.sched.priority.adjustment
   yarn.nodemanager.container-monitor.process-tree.class
   yarn.nodemanager.disk-health-checker.enable
   yarn.nodemanager.docker-container-executor.image-name
   yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms
   yarn.nodemanager.linux-container-executor.group
   yarn.nodemanager.log.deletion-threads-count
   yarn.nodemanager.user-home-dir
   yarn.nodemanager.webapp.https.address
   yarn.nodemanager.webapp.spnego-keytab-file
   yarn.nodemanager.webapp.spnego-principal
   yarn.nodemanager.windows-secure-container-executor.group
   yarn.resourcemanager.configuration.file-system-based-store
   yarn.resourcemanager.delegation-token-renewer.thread-count
   yarn.resourcemanager.delegation.key.update-interval
   yarn.resourcemanager.delegation.token.max-lifetime
   yarn.resourcemanager.delegation.token.renew-interval
   yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size
   yarn.resourcemanager.metrics.runtime.buckets
   yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs
   yarn.resourcemanager.reservation-system.class
   yarn.resourcemanager.reservation-system.enable
   yarn.resourcemanager.reservation-system.plan.follower
   yarn.resourcemanager.reservation-system.planfollower.time-step
   yarn.resourcemanager.rm.container-allocation.expiry-interval-ms
   yarn.resourcemanager.webapp.spnego-keytab-file
   yarn.resourcemanager.webapp.spnego-principal
   yarn.scheduler.include-port-in-node-name
   yarn.timeline-service.delegation.key.update-interval
   yarn.timeline-service.delegation.token.max-lifetime
   yarn.timeline-service.delegation.token.renew-interval
   yarn.timeline-service.generic-application-history.enabled
   
 yarn.timeline-service.generic-application-history.fs-history-store.compression-type
   yarn.timeline-service.generic-application-history.fs-history-store.uri
   yarn.timeline-service.generic-application-history.store-class
   yarn.timeline-service.http-cross-origin.enabled
   yarn.tracking.url.generator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3866) AM-RM protocol changes to support container resizing

2015-07-09 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620946#comment-14620946
 ] 

Jian He commented on YARN-3866:
---

bq. increaseRequests/decreaseRequests - We may just pass one list of 
changeResourceRequests

[~sandyr], Would like to hear some thoughts from  an application wirter's 
perspective. mind sharing some thoughts here ? In case of Spark, do you think 
two separate increase/decrease requests in the AllocateRequest is better or a 
single changeResourceRequests ?


 AM-RM protocol changes to support container resizing
 

 Key: YARN-3866
 URL: https://issues.apache.org/jira/browse/YARN-3866
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: MENG DING
Assignee: MENG DING
 Attachments: YARN-3866.1.patch, YARN-3866.2.patch


 YARN-1447 and YARN-1448 are outdated. 
 This ticket deals with AM-RM Protocol changes to support container resize 
 according to the latest design in YARN-1197.
 1) Add increase/decrease requests in AllocateRequest
 2) Get approved increase/decrease requests from RM in AllocateResponse
 3) Add relevant test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs


[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621057#comment-14621057
 ] 

Sunil G commented on YARN-2005:
---

Hi [~adhoot]
Thank you for sharing patch for same. I have couple of doubts.

- DEFAULT_FAILURE_THRESHOLD
Now default is 0.8, I feel we can keep this as a configurable limit. Based on 
node size, i feel user can decide till which threshold we can support AM 
blacklisting.

- Below code from CS#allocate
{code}
   application.updateBlacklist(blacklistAdditions, blacklistRemovals);
{code}
Assume a case where app1 AM is running in {{node1}}. Due to a failure there, 
app is relaunched in {{node2}} and {{node1}} is marked for blacklisting by 
SimpleBlacklistManager.
Since  node1 is added as blacklisted, all containers of this app will be 
blacklisted in node1. Is this intended, Please correct me if I am wrong.

 Blacklisting support for scheduling AMs
 ---

 Key: YARN-2005
 URL: https://issues.apache.org/jira/browse/YARN-2005
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 0.23.10, 2.4.0
Reporter: Jason Lowe
Assignee: Anubhav Dhoot
 Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
 YARN-2005.003.patch


 It would be nice if the RM supported blacklisting a node for an AM launch 
 after the same node fails a configurable number of AM attempts.  This would 
 be similar to the blacklisting support for scheduling task attempts in the 
 MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model


[ 
https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621545#comment-14621545
 ] 

Zhijie Shen commented on YARN-3836:
---

+1 LGTM

 add equals and hashCode to TimelineEntity and other classes in the data model
 -

 Key: YARN-3836
 URL: https://issues.apache.org/jira/browse/YARN-3836
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Li Lu
 Attachments: YARN-3836-YARN-2928.001.patch, 
 YARN-3836-YARN-2928.002.patch, YARN-3836-YARN-2928.003.patch, 
 YARN-3836-YARN-2928.004.patch


 Classes in the data model API (e.g. {{TimelineEntity}}, 
 {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or 
 {{hashCode()}}. This can cause problems when these objects are used in a 
 collection such as a {{HashSet}}. We should implement these methods wherever 
 appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-09 Thread Subru Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621573#comment-14621573
 ] 

Subru Krishnan commented on YARN-3116:
--

[~kkaranasos], thanks for taking at look at this JIRA. We felt 
_ContainerTokenIdentifier _ is a reasonably secure way to propagate the 
_containerType_  to NM from the RM.The _containerType_ is set in the 
_ContainerContext_ in the NM so that it is available for auxiliary services. 
[~kishorch] is already integrating this with YARN-2884 so it should be aligned 
with what you are trying to achieve in YARN-2877. Additionally based on 
[~xgong]'s feedback, we updated _containerType_  to be an enum from the earlier 
boolean flag so should cover your future requirements of adding additional 
container types.

 [Collector wireup] We need an assured way to determine if a container is an 
 AM container on NM
 --

 Key: YARN-3116
 URL: https://issues.apache.org/jira/browse/YARN-3116
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, timelineserver
Reporter: Zhijie Shen
Assignee: Giovanni Matteo Fumarola
 Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, 
 YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, 
 YARN-3116.v7.patch, YARN-3116.v8.patch, YARN-3116.v9.patch


 In YARN-3030, to start the per-app aggregator only for a started AM 
 container,  we need to determine if the container is an AM container or not 
 from the context in NM (we can do it on RM). This information is missing, 
 such that we worked around to considered the container with ID _01 as 
 the AM container. Unfortunately, this is neither necessary or sufficient 
 condition. We need to have a way to determine if a container is an AM 
 container on NM. We can add flag to the container object or create an API to 
 do the judgement. Perhaps the distributed AM information may also be useful 
 to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621592#comment-14621592
 ] 

Zhijie Shen commented on YARN-3908:
---

1. TimelineEvent has a timestamp associated with it. It tells us when the event 
happened. We should have this information persisted, but unfortunately it seems 
not.

2. Metric doesn't have a timestamp because the timestamp is associated with 
each individual value.

3. I also realized that the metric type is not persisted too. Now I just assume 
if size(metric)  1 = time series, else = single value in reader 
implementation. But it may not be guaranteed.


 Bugs in HBaseTimelineWriterImpl
 ---

 Key: YARN-3908
 URL: https://issues.apache.org/jira/browse/YARN-3908
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C

 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
 fields of a timeline entity plus events. However, entity#info map is not 
 stored at all.
 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621739#comment-14621739
 ] 

Varun Saxena commented on YARN-3893:


Maybe set the HA service state in RM context as STANDBY upon throwing the 
exception. Or not set it to ACTIVE till the all active services are actually 
started.
We primarily check RM context to make the decision about whether RM is in 
standby state or active.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-07-09 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621703#comment-14621703
 ] 

Xuan Gong commented on YARN-3893:
-

How about add rm.transitionToStandby(true) before we throw the 
ServiceFailedException in catch block ? 
{code}
try {
  rm.transitionToActive();
  // call all refresh*s for active RM to get the updated configurations.
  refreshAll();
  RMAuditLogger.logSuccess(user.getShortUserName(),
  transitionToActive, RMHAProtocolService);
} catch (Exception e) {
  RMAuditLogger.logFailure(user.getShortUserName(), transitionToActive,
  , RMHAProtocolService,
  Exception transitioning to active);
  throw new ServiceFailedException(
  Error when transitioning to Active mode, e);
}
{code}

In that case, we could transit the RM to standby, and since we throw out the 
ServiceFailedException, this RM will rejoin the leader election process.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3888) ApplicationMaster link is broken in RM WebUI when appstate is NEW

2015-07-09 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621707#comment-14621707
 ] 

Xuan Gong commented on YARN-3888:
-

+1 LGTM. Will commit

 ApplicationMaster link is broken in RM WebUI when appstate is NEW 
 --

 Key: YARN-3888
 URL: https://issues.apache.org/jira/browse/YARN-3888
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Attachments: 0001-YARN-3888.patch, 0002-YARN-3888.patch


 When the application state is NEW in RM Web UI  *Application Master* link is 
 broken.
 {code}
 15/07/06 19:46:16 INFO impl.YarnClientImpl: Application submission is not 
 finished, submitted application application_1436191509558_0003 is still in NEW
 15/07/06 19:46:18 INFO impl.YarnClientImpl: Application submission is not 
 finished, submitted application application_1436191509558_0003 is still in NEW
 15/07/06 19:46:20 INFO impl.YarnClientImpl: Application submission is not 
 finished, submitted application application_1436191509558_0003 is still in NEW
 {code}
 *URL formed* 
 http://HOSTNAME:45020/cluster/app/application_1436191509558_0003
 The above link is broken



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException


[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621734#comment-14621734
 ] 

Akira AJISAKA commented on YARN-3381:
-

bq. Findbugs (version 3.0.0) appears to be broken on trunk.
Reported by MAPREDUCE-6421.

bq. The applied patch generated 1 new checkstyle issues (total was 48, now 49).
Hi [~brahmareddy], would you add a javadoc to specify what class should be used 
instead of the old class?

The test failures looks unrelated to the patch.

 A typographical error in InvalidStateTransitonException
 -

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop


[ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621786#comment-14621786
 ] 

Varun Saxena commented on YARN-3878:


Thanks [~kasha] for the commit and review.
Thanks to [~jianhe] and [~devaraj.k] for the reviews as well.

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, 
 YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, 
 YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING

[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED


[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621709#comment-14621709
 ] 

Tsuyoshi Ozawa commented on YARN-3798:
--

[~zxu] thank you for the review.

{quote}
1. It looks like retry is added twice when we do retry with new connection. 
Should we move ++retry to if statement when we check shouldRetry?
{quote}

It works as expected, meaning that retry wont be incremented doubly, since the 
loop will call continue after shouldRetry() and shouldRetryWithNewConnection. 
However, I think it's a bit tricky for readers of the code and it's worth a 
fix. Updating.

{quote}
Should we call cb.latch.await with timeout zkSessionTimeout? Since we do sync 
for the new session, Will it be reasonable not to use the left timeout value 
from the old session for the new session?
{quote}

Agree.

{quote}
Based on the document: 
http://zookeeper.apache.org/doc/r3.3.2/api/org/apache/zookeeper/KeeperException.html#getPath(),
 ke.getPath() may return null, Should we check if ke.getPath() is null and 
handle it differently?
{quote}

Okay. I'll also add a error handling code to the callback when rc != 
Code.OK.intValue.

 ZKRMStateStore shouldn't create new session without occurrance of 
 SESSIONEXPIED
 ---

 Key: YARN-3798
 URL: https://issues.apache.org/jira/browse/YARN-3798
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Assignee: Varun Saxena
Priority: Blocker
 Attachments: RM.log, YARN-3798-2.7.002.patch, 
 YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, 
 YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.patch


 RM going down with NoNode exception during create of znode for appattempt
 *Please find the exception logs*
 {code}
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-09 10:09:44,886 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
   at

[jira] [Commented] (YARN-3888) ApplicationMaster link is broken in RM WebUI when appstate is NEW

2015-07-09 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621710#comment-14621710
 ] 

Xuan Gong commented on YARN-3888:
-

Committed into trunk/branch-2. Thanks, Bibin A Chundatt

 ApplicationMaster link is broken in RM WebUI when appstate is NEW 
 --

 Key: YARN-3888
 URL: https://issues.apache.org/jira/browse/YARN-3888
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Fix For: 2.8.0

 Attachments: 0001-YARN-3888.patch, 0002-YARN-3888.patch


 When the application state is NEW in RM Web UI  *Application Master* link is 
 broken.
 {code}
 15/07/06 19:46:16 INFO impl.YarnClientImpl: Application submission is not 
 finished, submitted application application_1436191509558_0003 is still in NEW
 15/07/06 19:46:18 INFO impl.YarnClientImpl: Application submission is not 
 finished, submitted application application_1436191509558_0003 is still in NEW
 15/07/06 19:46:20 INFO impl.YarnClientImpl: Application submission is not 
 finished, submitted application application_1436191509558_0003 is still in NEW
 {code}
 *URL formed* 
 http://HOSTNAME:45020/cluster/app/application_1436191509558_0003
 The above link is broken



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3888) ApplicationMaster link is broken in RM WebUI when appstate is NEW


[ 
https://issues.apache.org/jira/browse/YARN-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621713#comment-14621713
 ] 

Hudson commented on YARN-3888:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8145 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8145/])
YARN-3888. ApplicationMaster link is broken in RM WebUI when appstate is 
(xgong: rev 52148767924baf423172d26f2c6d8a4cfc6e143f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppsBlock.java
* hadoop-yarn-project/CHANGES.txt


 ApplicationMaster link is broken in RM WebUI when appstate is NEW 
 --

 Key: YARN-3888
 URL: https://issues.apache.org/jira/browse/YARN-3888
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Fix For: 2.8.0

 Attachments: 0001-YARN-3888.patch, 0002-YARN-3888.patch


 When the application state is NEW in RM Web UI  *Application Master* link is 
 broken.
 {code}
 15/07/06 19:46:16 INFO impl.YarnClientImpl: Application submission is not 
 finished, submitted application application_1436191509558_0003 is still in NEW
 15/07/06 19:46:18 INFO impl.YarnClientImpl: Application submission is not 
 finished, submitted application application_1436191509558_0003 is still in NEW
 15/07/06 19:46:20 INFO impl.YarnClientImpl: Application submission is not 
 finished, submitted application application_1436191509558_0003 is still in NEW
 {code}
 *URL formed* 
 http://HOSTNAME:45020/cluster/app/application_1436191509558_0003
 The above link is broken



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

[
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621720#comment-14621720
]

Sunil G commented on YARN-3893:
---

Hi [~xgong]
Thank you for the update. I have a doubt here.

If we call rm.transitionToStandby(true) , then it will result a call to
ResourceManager#createAndInitActiveServices().
So is it possible that we may get the same exception which we got from
refreshAll call earlier. Specifically queue reinitialize. Currently the
CS#serviceInit will call parseQueues. As mentioned here, [~bibinchundatt] used
a wrong CS xml file.

Both RM in active state when Admin#transitionToActive failure from refeshAll()
--

Key: YARN-3893
URL: https://issues.apache.org/jira/browse/YARN-3893
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
Attachments: yarn-site.xml

Cases that can cause this.
# Capacity scheduler xml is wrongly configured during switch
# Refresh ACL failure due to configuration
# Refresh User group failure due to configuration
Continuously both RM will try to be active
{code}
dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
./yarn rmadmin -getServiceState rm1
15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
active
dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
./yarn rmadmin -getServiceState rm2
15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
active
{code}
# Both Web UI active
# Status shown as active for both RM

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3800) Reduce storage footprint for ReservationAllocation


[ 
https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621534#comment-14621534
 ] 

Hudson commented on YARN-3800:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8143 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8143/])
YARN-3800. Reduce storage footprint for ReservationAllocation. Contributed by 
Anubhav Dhoot. (Carlo Curino: rev 0e602fa3a1529134214452fba10a90307d9c2072)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/RLESparseResourceAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryReservationAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryReservationAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestGreedyReservationAgent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryPlan.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestRLESparseResourceAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSimpleCapacityReplanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryPlan.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/GreedyReservationAgent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java


 Reduce storage footprint for ReservationAllocation
 --

 Key: YARN-3800
 URL: https://issues.apache.org/jira/browse/YARN-3800
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.8.0

 Attachments: YARN-3800.001.patch, YARN-3800.002.patch, 
 YARN-3800.002.patch, YARN-3800.003.patch, YARN-3800.004.patch, 
 YARN-3800.005.patch


 Instead of storing the ReservationRequest we store the Resource for 
 allocations, as thats the only thing we need. Ultimately we convert 
 everything to resources anyway



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException


[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621679#comment-14621679
 ] 

Hadoop QA commented on YARN-3381:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  19m 20s | Findbugs (version 3.0.0) 
appears to be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 43s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 10s | The applied patch generated  1 
new checkstyle issues (total was 48, now 49). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 24s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   6m 11s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   9m  0s | Tests passed in 
hadoop-mapreduce-client-app. |
| {color:red}-1{color} | yarn tests |   6m 50s | Tests failed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   1m 55s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |   6m  0s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | yarn tests |  50m 52s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 122m 24s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-mapreduce-client-app |
| Failed unit tests | hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart |
|   | hadoop.yarn.server.nodemanager.TestDeletionService |
|   | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
 |
|   | 
hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
 |
|   | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
|   | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates |
|   | hadoop.yarn.server.resourcemanager.TestResourceTrackerService |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744462/YARN-3381-008.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 1a0752d |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8489/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8489/artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html
 |
| hadoop-mapreduce-client-app test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8489/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8489/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8489/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8489/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8489/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8489/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8489/console |


This message was automatically generated.

 A typographical error in InvalidStateTransitonException
 -

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3381-002.patch,

[jira] [Commented] (YARN-3800) Reduce storage footprint for ReservationAllocation

2015-07-09 Thread Subru Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621527#comment-14621527
 ] 

Subru Krishnan commented on YARN-3800:
--

Thanks [~adhoot] for the patch and [~curino] for reviewing and committing it!

 Reduce storage footprint for ReservationAllocation
 --

 Key: YARN-3800
 URL: https://issues.apache.org/jira/browse/YARN-3800
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.8.0

 Attachments: YARN-3800.001.patch, YARN-3800.002.patch, 
 YARN-3800.002.patch, YARN-3800.003.patch, YARN-3800.004.patch, 
 YARN-3800.005.patch


 Instead of storing the ReservationRequest we store the Resource for 
 allocations, as thats the only thing we need. Ultimately we convert 
 everything to resources anyway



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model


[ 
https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621559#comment-14621559
 ] 

Sangjin Lee commented on YARN-3836:
---

Great. I'll commit the patch this evening unless there are further comments.

 add equals and hashCode to TimelineEntity and other classes in the data model
 -

 Key: YARN-3836
 URL: https://issues.apache.org/jira/browse/YARN-3836
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Li Lu
 Attachments: YARN-3836-YARN-2928.001.patch, 
 YARN-3836-YARN-2928.002.patch, YARN-3836-YARN-2928.003.patch, 
 YARN-3836-YARN-2928.004.patch


 Classes in the data model API (e.g. {{TimelineEntity}}, 
 {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or 
 {{hashCode()}}. This can cause problems when these objects are used in a 
 collection such as a {{HashSet}}. We should implement these methods wherever 
 appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM


[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621585#comment-14621585
 ] 

Hadoop QA commented on YARN-3116:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m 19s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 5 new or modified test files. |
| {color:green}+1{color} | javac |   7m 56s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 46s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m 48s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  6s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 24s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m 50s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 56s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |   6m  4s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | yarn tests |  51m 25s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 108m 58s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
 |
|   | hadoop.yarn.server.nodemanager.containermanager.container.TestContainer |
|   | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
 |
|   | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
|   | hadoop.yarn.server.resourcemanager.TestResourceTrackerService |
|   | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744597/YARN-3116.v9.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f4ca530 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8487/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8487/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8487/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8487/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8487/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8487/console |


This message was automatically generated.

 [Collector wireup] We need an assured way to determine if a container is an 
 AM container on NM
 --

 Key: YARN-3116
 URL: https://issues.apache.org/jira/browse/YARN-3116
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, timelineserver
Reporter: Zhijie Shen
Assignee: Giovanni Matteo Fumarola
 Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, 
 YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, 
 YARN-3116.v7.patch, YARN-3116.v8.patch, YARN-3116.v9.patch


 In YARN-3030, to start the per-app aggregator only for a started AM 
 container,  we need to determine if the container is an AM container or not 
 from the context in NM (we can do it on RM). This information is missing, 
 such that we worked around to considered the container with ID _01 as 
 the AM container. Unfortunately, this is neither necessary or sufficient 
 condition. We need to have a way to determine if a container is an AM 
 container on NM. We can add flag to the container object or create an API to 
 do the judgement. Perhaps the distributed AM information may also be useful 
 to YARN-2877.



--
This message was sent by

[jira] [Updated] (YARN-3857) Memory leak in ResourceManager with SIMPLE mode

2015-07-09 Thread mujunchao (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mujunchao updated YARN-3857:

Attachment: YARN-3857-1.patch

add test case.

 Memory leak in ResourceManager with SIMPLE mode
 ---

 Key: YARN-3857
 URL: https://issues.apache.org/jira/browse/YARN-3857
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: mujunchao
Assignee: mujunchao
Priority: Critical
 Attachments: YARN-3857-1.patch, 
 hadoop-yarn-server-resourcemanager.patch


  We register the ClientTokenMasterKey to avoid client may hold an invalid 
 ClientToken after RM restarts. In SIMPLE mode, we register 
 PairApplicationAttemptId, null ,  But we never remove it from HashMap, as 
 unregister only runing while in Security mode, so memory leak coming. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-09 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621571#comment-14621571
 ] 

Vrushali C commented on YARN-3908:
--

Hi [~zjshen]

I see that event#info is not being stored, but which is the event timestamp 
that is being referred? Event metrics does store the timestamp per metric. 

(Also, I will be on vacation starting tomorrow through next week, so checking 
with Sangjin offline about this.). 
thanks
Vrushali

 Bugs in HBaseTimelineWriterImpl
 ---

 Key: YARN-3908
 URL: https://issues.apache.org/jira/browse/YARN-3908
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C

 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
 fields of a timeline entity plus events. However, entity#info map is not 
 stored at all.
 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3534) Collect memory/cpu usage on the node


[ 
https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621621#comment-14621621
 ] 

Hadoop QA commented on YARN-3534:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 26s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:red}-1{color} | javac |   7m 46s | The applied patch generated  2  
additional warning messages. |
| {color:green}+1{color} | javadoc |   9m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 19s | The applied patch generated  5 
new checkstyle issues (total was 211, now 215). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 24s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 44s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   6m  6s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  48m  0s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
 |
|   | 
hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
 |
|   | hadoop.yarn.server.nodemanager.containermanager.container.TestContainer |
|   | hadoop.yarn.server.nodemanager.TestDeletionService |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744609/YARN-3534-15.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 1a0752d |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/8488/artifact/patchprocess/diffJavacWarnings.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8488/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8488/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8488/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8488/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8488/console |


This message was automatically generated.

 Collect memory/cpu usage on the node
 

 Key: YARN-3534
 URL: https://issues.apache.org/jira/browse/YARN-3534
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Attachments: YARN-3534-1.patch, YARN-3534-10.patch, 
 YARN-3534-11.patch, YARN-3534-12.patch, YARN-3534-14.patch, 
 YARN-3534-15.patch, YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch, 
 YARN-3534-4.patch, YARN-3534-5.patch, YARN-3534-6.patch, YARN-3534-7.patch, 
 YARN-3534-8.patch, YARN-3534-9.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 YARN should be aware of the resource utilization of the nodes when scheduling 
 containers. For this, this task will implement the collection of memory/cpu 
 usage on the node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-07-09 Thread zhihai xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621630#comment-14621630
 ] 

zhihai xu commented on YARN-3798:
-

Thanks for the new patch [~ozawa]!
# It looks like {{retry}} is added twice when we do retry with new connection. 
Should we move {{++retry}} to if statement when we check {{shouldRetry}}?
# Should we call {{cb.latch.await}} with timeout {{zkSessionTimeout}}? Since we 
do sync for the new session,  Will it be reasonable not to use the left timeout 
value from the old session for the new session?
# Based on the document: 
http://zookeeper.apache.org/doc/r3.3.2/api/org/apache/zookeeper/KeeperException.html#getPath(),
 {{ke.getPath()}} may return null, Should we check if {{ke.getPath()}} is null 
and handle it differently?

 ZKRMStateStore shouldn't create new session without occurrance of 
 SESSIONEXPIED
 ---

 Key: YARN-3798
 URL: https://issues.apache.org/jira/browse/YARN-3798
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Assignee: Varun Saxena
Priority: Blocker
 Attachments: RM.log, YARN-3798-2.7.002.patch, 
 YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, 
 YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.patch


 RM going down with NoNode exception during create of znode for appattempt
 *Please find the exception logs*
 {code}
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-09 10:09:44,886 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
   at java.lang.Thread.run(Thread.java:745)
 2015-06-09 10:09:44,887 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
 out ZK retries. Giving up!
 2015-06-09 10:09:44,887 ERROR

[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing

2015-07-09 Thread Peng Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621631#comment-14621631
 ] 

Peng Zhang commented on YARN-3453:
--

Thanks [~asuresh] for working on this

comments:
# Why not changing all usage of calculator in FairScheduler to policy related. 
In below code, RESOURCE_CALCULATOR only calculate memory, and it may return 
false when resToPreempt is (0, non-zero) for DRF policy:
{code:title=FairScheduler.java|borderStyle=solid}
 if (Resources.greaterThan(RESOURCE_CALCULATOR, clusterResource, 
resToPreempt,
 Resources.none())) {
{code}




 Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator 
 even in DRF mode causing thrashing
 

 Key: YARN-3453
 URL: https://issues.apache.org/jira/browse/YARN-3453
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Ashwin Shankar
Assignee: Arun Suresh
 Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch, 
 YARN-3453.4.patch


 There are two places in preemption code flow where DefaultResourceCalculator 
 is used, even in DRF mode.
 Which basically results in more resources getting preempted than needed, and 
 those extra preempted containers aren’t even getting to the “starved” queue 
 since scheduling logic is based on DRF's Calculator.
 Following are the two places :
 1. {code:title=FSLeafQueue.java|borderStyle=solid}
 private boolean isStarved(Resource share)
 {code}
 A queue shouldn’t be marked as “starved” if the dominant resource usage
 is =  fair/minshare.
 2. {code:title=FairScheduler.java|borderStyle=solid}
 protected Resource resToPreempt(FSLeafQueue sched, long curTime)
 {code}
 --
 One more thing that I believe needs to change in DRF mode is : during a 
 preemption round,if preempting a few containers results in satisfying needs 
 of a resource type, then we should exit that preemption round, since the 
 containers that we just preempted should bring the dominant resource usage to 
 min/fair share.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3857) Memory leak in ResourceManager with SIMPLE mode

2015-07-09 Thread zhihai xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621686#comment-14621686
 ] 

zhihai xu commented on YARN-3857:
-

thanks for the updated patch [~mujunchao]!
The patch looks most good to me, some nits:
# Add {{@VisibleForTesting}} before function {{hasMasterKey}} to mark this 
function used for test only. So you can remove the comment {{// Only for test}}
# It looks like the code in {{testNoSecureNoRegistClientToken}} are similar as 
{{testRegistClientTokenInSecure}}. Can we merge 
{{testNoSecureNoRegistClientToken}} with {{testRegistClientTokenInSecure}} to 
one test? We can rename the test as {{testApplicationAttemptMasterKey}}. You 
can check {{isMasterKeyExisted}} based on {{isSecurityEnabled}}. You can change 
your comments {{can not get ClientToken}}/{{can get ClientToken}} to {{can not 
get MasterKey}}/{{can get MasterKey}}

 Memory leak in ResourceManager with SIMPLE mode
 ---

 Key: YARN-3857
 URL: https://issues.apache.org/jira/browse/YARN-3857
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: mujunchao
Assignee: mujunchao
Priority: Critical
 Attachments: YARN-3857-1.patch, 
 hadoop-yarn-server-resourcemanager.patch


  We register the ClientTokenMasterKey to avoid client may hold an invalid 
 ClientToken after RM restarts. In SIMPLE mode, we register 
 PairApplicationAttemptId, null ,  But we never remove it from HashMap, as 
 unregister only runing while in Security mode, so memory leak coming. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM


[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621581#comment-14621581
 ] 

Zhijie Shen commented on YARN-3116:
---

[~kkaranasos], I didn't touch the detail on YARN-2884, but it seems to be the 
API change that needs to be exposed to the users. In this case, user faced 
objects, i.e., ContainerLaunchContext, is the better choice for you.

 [Collector wireup] We need an assured way to determine if a container is an 
 AM container on NM
 --

 Key: YARN-3116
 URL: https://issues.apache.org/jira/browse/YARN-3116
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, timelineserver
Reporter: Zhijie Shen
Assignee: Giovanni Matteo Fumarola
 Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, 
 YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, 
 YARN-3116.v7.patch, YARN-3116.v8.patch, YARN-3116.v9.patch


 In YARN-3030, to start the per-app aggregator only for a started AM 
 container,  we need to determine if the container is an AM container or not 
 from the context in NM (we can do it on RM). This information is missing, 
 such that we worked around to considered the container with ID _01 as 
 the AM container. Unfortunately, this is neither necessary or sufficient 
 condition. We need to have a way to determine if a container is an AM 
 container on NM. We can add flag to the container object or create an API to 
 do the judgement. Perhaps the distributed AM information may also be useful 
 to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException


[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621611#comment-14621611
 ] 

Brahma Reddy Battula commented on YARN-3381:


[~ajisakaa] can kickoff jenkin on 007 patch...Seems to be something wrong in 
jenkins report.

 A typographical error in InvalidStateTransitonException
 -

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1449) AM-NM protocol changes to support container resizing

2015-07-09 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621634#comment-14621634
 ] 

Jian He commented on YARN-1449:
---

looks good to me overall, could you mark unstable for the newly added APIs too ?

 AM-NM protocol changes to support container resizing
 

 Key: YARN-1449
 URL: https://issues.apache.org/jira/browse/YARN-1449
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: Wangda Tan (No longer used)
Assignee: MENG DING
 Attachments: YARN-1449.1.patch, YARN-1449.2.patch, yarn-1449.1.patch, 
 yarn-1449.3.patch, yarn-1449.4.patch, yarn-1449.5.patch


 AM-NM protocol changes to support container resizing
 1) IncreaseContainersResourceRequest and 
 IncreaseContainersResourceResponse PB protocol and implementation
 2) increaseContainersResources method in ContainerManagementProtocol
 3) Update ContainerStatus protocol to include Resource
 4) Relevant test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException


[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620122#comment-14620122
 ] 

Brahma Reddy Battula commented on YARN-3381:


{quote}issue because the pros small and cons is much larger.{quote}
AFAIK Should not impact large as we are extending class.. This doesn't need to 
be incompatible though,,can you elaborate more,if I am wrong..

 A typographical error in InvalidStateTransitonException
 -

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7


[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620311#comment-14620311
 ] 

Hudson commented on YARN-2194:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #251 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/251/])
YARN-2194. Addendum patch to fix failing unit test in 
TestPrivilegedOperationExecutor. Contributed by Sidharta Seethana. (vvasudev: 
rev 63d0365088ff9fca0baaf3c4c3c01f80c72d3281)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/TestPrivilegedOperationExecutor.java


 Cgroups cease to work in RHEL7
 --

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Wei Yan
Assignee: Sidharta Seethana
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
 YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch


 In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the 
 controller name leads to container launch failure. 
 RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
 systemd has certain shortcomings as identified in this JIRA (see comments). 
 This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level


[ 
https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620329#comment-14620329
 ] 

Hadoop QA commented on YARN-3885:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 58s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 41s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 48s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 21s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 25s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  51m  7s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  88m 56s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744423/YARN-3885.04.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 63d0365 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8475/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8475/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8475/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8475/console |


This message was automatically generated.

 ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 
 level
 --

 Key: YARN-3885
 URL: https://issues.apache.org/jira/browse/YARN-3885
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.8.0
Reporter: Ajith S
Assignee: Ajith S
Priority: Critical
 Attachments: YARN-3885.02.patch, YARN-3885.03.patch, 
 YARN-3885.04.patch, YARN-3885.patch


 when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}}
 this piece of code, to calculate {{untoucable}} doesnt consider al the 
 children, it considers only immediate childern



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED


[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620333#comment-14620333
 ] 

Hadoop QA commented on YARN-3798:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744468/YARN-3798-branch-2.7.004.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / fffb15b |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8477/console |


This message was automatically generated.

 ZKRMStateStore shouldn't create new session without occurrance of 
 SESSIONEXPIED
 ---

 Key: YARN-3798
 URL: https://issues.apache.org/jira/browse/YARN-3798
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Assignee: Varun Saxena
Priority: Blocker
 Attachments: RM.log, YARN-3798-2.7.002.patch, 
 YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, 
 YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.patch


 RM going down with NoNode exception during create of znode for appattempt
 *Please find the exception logs*
 {code}
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-09 10:09:44,886 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
   at java.lang.Thread.run(Thread.java:745)
 2015-06-09 10:09:44,887 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
 out ZK retries. Giving up!
 2015-06-09 10:09:44,887 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 updating

[jira] [Updated] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED


 [ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3798:
-
Attachment: YARN-3798-branch-2.7.004.patch

Attaching a new patch.

 ZKRMStateStore shouldn't create new session without occurrance of 
 SESSIONEXPIED
 ---

 Key: YARN-3798
 URL: https://issues.apache.org/jira/browse/YARN-3798
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Assignee: Varun Saxena
Priority: Blocker
 Attachments: RM.log, YARN-3798-2.7.002.patch, 
 YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, 
 YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.patch


 RM going down with NoNode exception during create of znode for appattempt
 *Please find the exception logs*
 {code}
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-09 10:09:44,886 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
   at java.lang.Thread.run(Thread.java:745)
 2015-06-09 10:09:44,887 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
 out ZK retries. Giving up!
 2015-06-09 10:09:44,887 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 updating appAttempt: appattempt_1433764310492_7152_01
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at

[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException


[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620040#comment-14620040
 ] 

Brahma Reddy Battula commented on YARN-3381:


[~ajisakaa] thanks again for quick review..Updated the patch to address above 
comments ( missed these in earlier patch)...

 A typographical error in InvalidStateTransitonException
 -

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381-006.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3381) A typographical error in InvalidStateTransitonException


 [ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3381:
---
Attachment: YARN-3381-006.patch

 A typographical error in InvalidStateTransitonException
 -

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381-006.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED


[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619997#comment-14619997
 ] 

Tsuyoshi Ozawa commented on YARN-3798:
--

[~zxu] Sorry for the delay. I missed you comment. Agree. fixing it shortly.

 ZKRMStateStore shouldn't create new session without occurrance of 
 SESSIONEXPIED
 ---

 Key: YARN-3798
 URL: https://issues.apache.org/jira/browse/YARN-3798
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Assignee: Varun Saxena
Priority: Blocker
 Attachments: RM.log, YARN-3798-2.7.002.patch, 
 YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, 
 YARN-3798-branch-2.7.patch


 RM going down with NoNode exception during create of znode for appattempt
 *Please find the exception logs*
 {code}
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-09 10:09:44,886 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
   at java.lang.Thread.run(Thread.java:745)
 2015-06-09 10:09:44,887 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
 out ZK retries. Giving up!
 2015-06-09 10:09:44,887 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 updating appAttempt: appattempt_1433764310492_7152_01
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at

[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException


[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620011#comment-14620011
 ] 

Akira AJISAKA commented on YARN-3381:
-

Thanks Brahma for updating the patch. Two comments:
1. In the old class, would you call {{super(currentState, event)}} in the 
constructor? That way we can drop private variables and overriding getter 
methods.
2. {{serialVersionUID}} should be unique for each serializable class.

 A typographical error in InvalidStateTransitonException
 -

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model


[ 
https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620049#comment-14620049
 ] 

Hadoop QA commented on YARN-3836:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 49s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 44s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 43s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m  6s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 39s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 39s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m  4s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 22s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 57s | Tests passed in 
hadoop-yarn-common. |
| | |  43m 33s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744429/YARN-3836-YARN-2928.002.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 4c5f88f |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8469/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8469/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8469/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8469/console |


This message was automatically generated.

 add equals and hashCode to TimelineEntity and other classes in the data model
 -

 Key: YARN-3836
 URL: https://issues.apache.org/jira/browse/YARN-3836
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Li Lu
 Attachments: YARN-3836-YARN-2928.001.patch, 
 YARN-3836-YARN-2928.002.patch


 Classes in the data model API (e.g. {{TimelineEntity}}, 
 {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or 
 {{hashCode()}}. This can cause problems when these objects are used in a 
 collection such as a {{HashSet}}. We should implement these methods wherever 
 appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException


[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620056#comment-14620056
 ] 

Akira AJISAKA commented on YARN-3381:
-

{code:title=InvalidStateTransitionException.java}
  public InvalidStateTransitionException(String message) {
super(message);
  }
{code}
Would you remove the unused constructor? I'm +1 if that is addressed.

 A typographical error in InvalidStateTransitonException
 -

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381-006.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3902) Fair scheduler preempts ApplicationMaster

2015-07-09 Thread He Tianyi (JIRA)

He Tianyi created YARN-3902:
---

 Summary: Fair scheduler preempts ApplicationMaster
 Key: YARN-3902
 URL: https://issues.apache.org/jira/browse/YARN-3902
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.3.0
 Environment: 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 
(2014-12-08) x86_64
Reporter: He Tianyi


YARN-2022 have fixed the similar issue related to CapacityScheduler.
However, FairScheduler still suffer, preempting AM while other normal 
containers running out there.

I think we should take the same approach, avoid AM being preempted unless there 
is no container running other than AM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.

2015-07-09 Thread nijel (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619984#comment-14619984
]

nijel commented on YARN-3813:
-

Thanks [~sunilg] and [~devaraj.k] for the comments

bq.How frequently are you going to check this condition for each application?
Plan is to have a configurable interval default to 30 sec
(yarn.app.timeout.monitor.interval)

bq.Could we have a new TIMEOUT event in RMAppImpl for this. In that case, we
may not need a flag.
bq.I feel having a TIMEOUT state for RMAppImpl would be proper here.

ok. We will add a TIMEOUT state and handle the changes
Due to this there will be few changes in app transitions, client package and
the WEBUI

bq.I have a suggestion here.We can have a BasicAppMonitoringManager which can
keep an entry of appId, app.getSubmissionTime.
bq. when the application gets submitted to RM then we can register the
application with RMAppTimeOutMonitor using the user specified timeout.

Yes. Good suggestion. This we will update as a registration mechanism. But
since each application can have its own timeout period, the code reusability
looks like minimal.

{code}
RMAppTimeOutMonitor
local map (appid, timeout)
add/register(appid, timeout) -- from RMAppImpl
Run - if app is running/submitted and elapsed the time, kill it. If
already completed, remove from map.
No delete/unregister method -- this application will be be removed
from map from run method
{code}

Support Application timeout feature in YARN.
-

Key: YARN-3813
URL: https://issues.apache.org/jira/browse/YARN-3813
Project: Hadoop YARN
Issue Type: New Feature
Components: scheduler
Reporter: nijel
Attachments: YARN Application Timeout .pdf

It will be useful to support Application Timeout in YARN. Some use cases are
not worried about the output of the applications if the application is not
completed in a specific time.
*Background:*
The requirement is to show the CDR statistics of last few minutes, say for
every 5 minutes. The same Job will run continuously with different dataset.
So one job will be started in every 5 minutes. The estimate time for this
task is 2 minutes or lesser time.
If the application is not completing in the given time the output is not
useful.
*Proposal*
So idea is to support application timeout, with which timeout parameter is
given while submitting the job.
Here, user is expecting to finish (complete or kill) the application in the
given time.
One option for us is to move this logic to Application client (who submit the
job).
But it will be nice if it can be generic logic and can make more robust.
Kindly provide your suggestions/opinion on this feature. If it sounds good, i
will update the design doc and prototype patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7

2015-07-09 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619988#comment-14619988
 ] 

Varun Vasudev commented on YARN-2194:
-

My apologies for missing the failing unit test [~sidharta-s]. I've committed 
the fix for the failing unit test.

 Cgroups cease to work in RHEL7
 --

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Wei Yan
Assignee: Sidharta Seethana
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
 YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch


 In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the 
 controller name leads to container launch failure. 
 RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
 systemd has certain shortcomings as identified in this JIRA (see comments). 
 This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level


[ 
https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619989#comment-14619989
 ] 

Ajith S commented on YARN-3885:
---

  /root
A
/\
C B
  /\
  D E


+*Before fix:*+
NAME: queueA CUR: memory:209, vCores:0 PEN: memory:0, vCores:0 GAR: 
memory:200, vCores:0 NORM: NaN IDEAL_ASSIGNED: memory:209, vCores:0 
IDEAL_PREEMPT: memory:0, vCores:0 ACTUAL_PREEMPT: memory:0, vCores:0 
*{color:red}UNTOUCHABLE: memory:9, vCores:0 PREEMPTABLE: memory:0, 
vCores:0{color}*
NAME: queueB CUR: memory:60, vCores:0 PEN: memory:0, vCores:0 GAR: 
memory:60, vCores:0 NORM: NaN IDEAL_ASSIGNED: memory:60, vCores:0 
IDEAL_PREEMPT: memory:0, vCores:0 ACTUAL_PREEMPT: memory:0, vCores:0 
*UNTOUCHABLE: memory:0, vCores:0 PREEMPTABLE: memory:0, vCores:0*
NAME: queueC CUR: memory:150, vCores:0 PEN: memory:0, vCores:0 GAR: 
memory:139, vCores:0 NORM: 1.0 IDEAL_ASSIGNED: memory:149, vCores:1 
IDEAL_PREEMPT: memory:1, vCores:-1 ACTUAL_PREEMPT: memory:0, vCores:0 
*UNTOUCHABLE: memory:1, vCores:0 PREEMPTABLE: memory:0, vCores:0*
NAME: queueD CUR: memory:100, vCores:0 PEN: memory:0, vCores:0 GAR: 
memory:100, vCores:0 NORM: NaN IDEAL_ASSIGNED: memory:100, vCores:0 
IDEAL_PREEMPT: memory:0, vCores:0 ACTUAL_PREEMPT: memory:0, vCores:0 
*UNTOUCHABLE: memory:0, vCores:0 PREEMPTABLE: memory:0, vCores:0*
NAME: queueE CUR: memory:50, vCores:0 PEN: memory:0, vCores:0 GAR: 
memory:40, vCores:0 NORM: 1.0 IDEAL_ASSIGNED: memory:49, vCores:1 
IDEAL_PREEMPT: memory:1, vCores:-1 ACTUAL_PREEMPT: memory:0, vCores:0 
*UNTOUCHABLE: memory:0, vCores:0 PREEMPTABLE: memory:10, vCores:0*

+*After:*+
NAME: queueA CUR: memory:209, vCores:0 PEN: memory:0, vCores:0 GAR: 
memory:200, vCores:0 NORM: 1.0 IDEAL_ASSIGNED: memory:201, vCores:1 
IDEAL_PREEMPT: memory:8, vCores:-1 ACTUAL_PREEMPT: memory:0, vCores:0 
*{color:green}UNTOUCHABLE: memory:0, vCores:0 PREEMPTABLE: memory:10, 
vCores:0{color}*
NAME: queueB CUR: memory:60, vCores:0 PEN: memory:0, vCores:0 GAR: 
memory:60, vCores:0 NORM: NaN IDEAL_ASSIGNED: memory:60, vCores:0 
IDEAL_PREEMPT: memory:0, vCores:0 ACTUAL_PREEMPT: memory:0, vCores:0 
*UNTOUCHABLE: memory:0, vCores:0 PREEMPTABLE: memory:0, vCores:0*
NAME: queueC CUR: memory:150, vCores:0 PEN: memory:0, vCores:0 GAR: 
memory:139, vCores:0 NORM: 1.0 IDEAL_ASSIGNED: memory:141, vCores:1 
IDEAL_PREEMPT: memory:9, vCores:-1 ACTUAL_PREEMPT: memory:0, vCores:0 
*UNTOUCHABLE: memory:1, vCores:0 PREEMPTABLE: memory:10, vCores:0*
NAME: queueD CUR: memory:100, vCores:0 PEN: memory:0, vCores:0 GAR: 
memory:100, vCores:0 NORM: NaN IDEAL_ASSIGNED: memory:100, vCores:0 
IDEAL_PREEMPT: memory:0, vCores:0 ACTUAL_PREEMPT: memory:0, vCores:0 
*UNTOUCHABLE: memory:0, vCores:0 PREEMPTABLE: memory:0, vCores:0*
NAME: queueE CUR: memory:50, vCores:0 PEN: memory:0, vCores:0 GAR: 
memory:40, vCores:0 NORM: 1.0 IDEAL_ASSIGNED: memory:41, vCores:1 
IDEAL_PREEMPT: memory:9, vCores:-1 ACTUAL_PREEMPT: memory:0, vCores:0 
*UNTOUCHABLE: memory:0, vCores:0 PREEMPTABLE: memory:10, vCores:0*

 ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 
 level
 --

 Key: YARN-3885
 URL: https://issues.apache.org/jira/browse/YARN-3885
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.8.0
Reporter: Ajith S
Priority: Critical
 Attachments: YARN-3885.02.patch, YARN-3885.03.patch, 
 YARN-3885.04.patch, YARN-3885.patch


 when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}}
 this piece of code, to calculate {{untoucable}} doesnt consider al the 
 children, it considers only immediate childern



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException


[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619992#comment-14619992
 ] 

Brahma Reddy Battula commented on YARN-3381:


[~ajisakaa] thanks a lot for taking a look into this issue..Updated the patch 
based on your comment.Kindly review..

 A typographical error in InvalidStateTransitonException
 -

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3381) A typographical error in InvalidStateTransitonException


 [ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3381:
---
Attachment: YARN-3381-005.patch

 A typographical error in InvalidStateTransitonException
 -

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model

2015-07-09 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3836:

Attachment: YARN-3836-YARN-2928.002.patch

Hi [~sjlee0], thanks for the prompt feedback! I updated the patch according to 
your comments. Specifically:

bq. What I would prefer is to override equals() and hashCode() for Identifier 
instead, and have simple equals() and hashCode() implementations for 
TimelineEntity that mostly delegate to Identifier. The rationale is that 
Identifier can be useful as keys to collections in its own right, and thus 
should override those methods.
That's a nice suggestion! Fixed. 

bq. One related question for your use case of putting entities into a map: I 
notice that you're using the TimelineEntity instances directly as keys to maps. 
Wouldn't it be better to use their Identifier instances as keys instead? 
Identifier instances are easier and cheaper to construct and compare.
I think I used an inappropriate example here. I meant to say HashSet but not 
HashMap.

bq. We should make isValid() a proper javadoc hyperlink
Fixed. 

bq. Since we're checking the entity type and the id, wouldn't it be sufficient 
to check whether the object is an instance of TimelineEntity?
I agree. Fixed all related ones. 


 add equals and hashCode to TimelineEntity and other classes in the data model
 -

 Key: YARN-3836
 URL: https://issues.apache.org/jira/browse/YARN-3836
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Li Lu
 Attachments: YARN-3836-YARN-2928.001.patch, 
 YARN-3836-YARN-2928.002.patch


 Classes in the data model API (e.g. {{TimelineEntity}}, 
 {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or 
 {{hashCode()}}. This can cause problems when these objects are used in a 
 collection such as a {{HashSet}}. We should implement these methods wherever 
 appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model


[ 
https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620045#comment-14620045
 ] 

Varun Saxena commented on YARN-3836:


Regarding metric, can't id uniquely identify a metric ? Do we expect two 
metrics to share same id for different types ?

 add equals and hashCode to TimelineEntity and other classes in the data model
 -

 Key: YARN-3836
 URL: https://issues.apache.org/jira/browse/YARN-3836
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Li Lu
 Attachments: YARN-3836-YARN-2928.001.patch, 
 YARN-3836-YARN-2928.002.patch


 Classes in the data model API (e.g. {{TimelineEntity}}, 
 {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or 
 {{hashCode()}}. This can cause problems when these objects are used in a 
 collection such as a {{HashSet}}. We should implement these methods wherever 
 appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing

2015-07-09 Thread Arun Suresh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-3453:
--
Attachment: YARN-3453.3.patch

Uploading updated patch :

* Added unit-tests
* clean-up code based on comments

[~kasha],
bq. Nit: In each of the policies, my preference would be not make the 
calculator and comparator members static unless required. We have had cases 
where our tests would invoke multiple instances of the class leading to issues. 
Not that I foresee multiple instantiations for these classes, but would like to 
avoid it if we can.
If it ok with you, I feel we should infact make it static. Am of the opinion 
that the code reads better, is a lot cleaner and efficient, since only 1 
instance is ever created.. We are always at the liberty to over-ride the 
getComparator/Calculator method in test (and possible subclasses)

bq.  .. think we will have to fix YARN-2154 too.
On further thought.. and after consultation with [~kasha], Think we can 
decouple from that JIRA, given its larger scope.



 Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator 
 even in DRF mode causing thrashing
 

 Key: YARN-3453
 URL: https://issues.apache.org/jira/browse/YARN-3453
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Ashwin Shankar
Assignee: Arun Suresh
 Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch


 There are two places in preemption code flow where DefaultResourceCalculator 
 is used, even in DRF mode.
 Which basically results in more resources getting preempted than needed, and 
 those extra preempted containers aren’t even getting to the “starved” queue 
 since scheduling logic is based on DRF's Calculator.
 Following are the two places :
 1. {code:title=FSLeafQueue.java|borderStyle=solid}
 private boolean isStarved(Resource share)
 {code}
 A queue shouldn’t be marked as “starved” if the dominant resource usage
 is =  fair/minshare.
 2. {code:title=FairScheduler.java|borderStyle=solid}
 protected Resource resToPreempt(FSLeafQueue sched, long curTime)
 {code}
 --
 One more thing that I believe needs to change in DRF mode is : during a 
 preemption round,if preempting a few containers results in satisfying needs 
 of a resource type, then we should exit that preemption round, since the 
 containers that we just preempted should bring the dominant resource usage to 
 min/fair share.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3381) A typographical error in InvalidStateTransitonException


 [ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3381:
---
Attachment: YARN-3381-007.patch

 A typographical error in InvalidStateTransitonException
 -

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing

2015-07-09 Thread Arun Suresh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-3453:
--
Attachment: YARN-3453.4.patch

New Patch :

* Cleaned up some doc
* changed the name of {{resToPreempt}} to {{resourceDeficit}}. I feel 
{{resToPreempt}} is not just confusing but kinda wrong.. given that the method 
technically does not find resources to Preempt from the given queue. It 
actually finds the resource deficit that would bring it back to min/fair share.

 Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator 
 even in DRF mode causing thrashing
 

 Key: YARN-3453
 URL: https://issues.apache.org/jira/browse/YARN-3453
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Ashwin Shankar
Assignee: Arun Suresh
 Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch, 
 YARN-3453.4.patch


 There are two places in preemption code flow where DefaultResourceCalculator 
 is used, even in DRF mode.
 Which basically results in more resources getting preempted than needed, and 
 those extra preempted containers aren’t even getting to the “starved” queue 
 since scheduling logic is based on DRF's Calculator.
 Following are the two places :
 1. {code:title=FSLeafQueue.java|borderStyle=solid}
 private boolean isStarved(Resource share)
 {code}
 A queue shouldn’t be marked as “starved” if the dominant resource usage
 is =  fair/minshare.
 2. {code:title=FairScheduler.java|borderStyle=solid}
 protected Resource resToPreempt(FSLeafQueue sched, long curTime)
 {code}
 --
 One more thing that I believe needs to change in DRF mode is : during a 
 preemption round,if preempting a few containers results in satisfying needs 
 of a resource type, then we should exit that preemption round, since the 
 containers that we just preempted should bring the dominant resource usage to 
 min/fair share.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException


[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620139#comment-14620139
 ] 

Hadoop QA commented on YARN-3381:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  20m 15s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 40s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 33s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 28s | The applied patch generated  2 
new checkstyle issues (total was 48, now 50). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 23s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   6m  8s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   9m  4s | Tests passed in 
hadoop-mapreduce-client-app. |
| {color:green}+1{color} | yarn tests |   6m 57s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   1m 57s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |   6m  4s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |  51m  1s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 123m 52s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.nodemanager.TestDeletionService |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744425/YARN-3381-005.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 63d0365 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8468/artifact/patchprocess/trunkFindbugsWarningshadoop-mapreduce-client-app.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8468/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt
 |
| hadoop-mapreduce-client-app test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8468/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8468/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8468/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8468/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8468/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8468/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8468/console |


This message was automatically generated.

 A typographical error in InvalidStateTransitonException
 -

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level


 [ 
https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S reassigned YARN-3885:
-

Assignee: Ajith S

 ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 
 level
 --

 Key: YARN-3885
 URL: https://issues.apache.org/jira/browse/YARN-3885
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.8.0
Reporter: Ajith S
Assignee: Ajith S
Priority: Critical
 Attachments: YARN-3885.02.patch, YARN-3885.03.patch, 
 YARN-3885.04.patch, YARN-3885.patch


 when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}}
 this piece of code, to calculate {{untoucable}} doesnt consider al the 
 children, it considers only immediate childern



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3903) Disable preemption at Queue level for Fair Scheduler

2015-07-09 Thread He Tianyi (JIRA)

He Tianyi created YARN-3903:
---

 Summary: Disable preemption at Queue level for Fair Scheduler
 Key: YARN-3903
 URL: https://issues.apache.org/jira/browse/YARN-3903
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.3.0
 Environment: 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 
(2014-12-08) x86_64
Reporter: He Tianyi
Priority: Trivial


YARN-2056 supports disabling preemption at queue level for CapacityScheduler.
As for fair scheduler, we recently encountered the same need.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level


 [ 
https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-3885:
--
Attachment: YARN-3885.04.patch

 ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 
 level
 --

 Key: YARN-3885
 URL: https://issues.apache.org/jira/browse/YARN-3885
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.8.0
Reporter: Ajith S
Priority: Critical
 Attachments: YARN-3885.02.patch, YARN-3885.03.patch, 
 YARN-3885.04.patch, YARN-3885.patch


 when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}}
 this piece of code, to calculate {{untoucable}} doesnt consider al the 
 children, it considers only immediate childern



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level


[ 
https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619971#comment-14619971
 ] 

Ajith S commented on YARN-3885:
---

Hi [~sunilg] 
Sorry for the delay, i have added the testcase

 ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 
 level
 --

 Key: YARN-3885
 URL: https://issues.apache.org/jira/browse/YARN-3885
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.8.0
Reporter: Ajith S
Priority: Critical
 Attachments: YARN-3885.02.patch, YARN-3885.03.patch, 
 YARN-3885.04.patch, YARN-3885.patch


 when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}}
 this piece of code, to calculate {{untoucable}} doesnt consider al the 
 children, it considers only immediate childern



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7


[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619994#comment-14619994
 ] 

Hudson commented on YARN-2194:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8138 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8138/])
YARN-2194. Addendum patch to fix failing unit test in 
TestPrivilegedOperationExecutor. Contributed by Sidharta Seethana. (vvasudev: 
rev 63d0365088ff9fca0baaf3c4c3c01f80c72d3281)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/TestPrivilegedOperationExecutor.java


 Cgroups cease to work in RHEL7
 --

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Wei Yan
Assignee: Sidharta Seethana
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
 YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch


 In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the 
 controller name leads to container launch failure. 
 RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
 systemd has certain shortcomings as identified in this JIRA (see comments). 
 This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7

2015-07-09 Thread Sidharta Seethana (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620022#comment-14620022
 ] 

Sidharta Seethana commented on YARN-2194:
-

Thanks [~vvasudev]  - jenkins wasn't triggered so we all missed it,  

 Cgroups cease to work in RHEL7
 --

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Wei Yan
Assignee: Sidharta Seethana
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
 YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch


 In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the 
 controller name leads to container launch failure. 
 RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
 systemd has certain shortcomings as identified in this JIRA (see comments). 
 This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException


[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620084#comment-14620084
 ] 

Tsuyoshi Ozawa commented on YARN-3381:
--

Hmm, I'm still thinking of whether we should fix this or not. I know that this 
is a typo, but that makes more incompatible change for YARN apps. Currently, I 
prefer to preserve the typo as won't fix issue because the pros small and 
cons is much larger.

 A typographical error in InvalidStateTransitonException
 -

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381-006.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException


[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620372#comment-14620372
 ] 

Hadoop QA commented on YARN-3381:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  20m  4s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 48s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 46s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m  0s | The applied patch generated  1 
new checkstyle issues (total was 48, now 49). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 22s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   6m 12s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   9m  5s | Tests passed in 
hadoop-mapreduce-client-app. |
| {color:green}+1{color} | yarn tests |   6m 58s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   1m 59s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |   6m  5s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | yarn tests |  25m 24s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  98m  7s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.nodemanager.TestDeletionService |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebApp |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAppRunnability |
|   | hadoop.yarn.server.resourcemanager.TestApplicationACLs |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
 |
|   | hadoop.yarn.server.resourcemanager.TestApplicationCleanup |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue |
|   | hadoop.yarn.server.resourcemanager.TestRM |
|   | hadoop.yarn.server.resourcemanager.TestRMNodeTransitions |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes |
|   | 
hadoop.yarn.server.resourcemanager.reservation.TestFairSchedulerPlanFollower |
|   | hadoop.yarn.server.resourcemanager.TestResourceTrackerService |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification |
|   | hadoop.yarn.server.resourcemanager.webapp.TestNodesPage |
|   | hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerFairShare |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestSchedulingUpdate |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppAttempt |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior
 |
|   | hadoop.yarn.server.resourcemanager.TestMoveApplication |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched |
|   | hadoop.yarn.server.resourcemanager.TestRMHAForNodeLabels |
|   | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerEventLog |
|   | hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher |
|   | hadoop.yarn.server.resourcemanager.TestRMProxyUsersConf |
|   | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestChildQueueOrder 
|
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
|   | 
hadoop.yarn.server.resourcemanager.reservation.TestCapacityReservationSystem |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesHttpStaticUserPermissions
 |
|   | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings |
|   |

[jira] [Updated] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED


 [ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3798:
-
Attachment: (was: YARN-3798-branch-2.7.004.patch)

 ZKRMStateStore shouldn't create new session without occurrance of 
 SESSIONEXPIED
 ---

 Key: YARN-3798
 URL: https://issues.apache.org/jira/browse/YARN-3798
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Assignee: Varun Saxena
Priority: Blocker
 Attachments: RM.log, YARN-3798-2.7.002.patch, 
 YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, 
 YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.patch


 RM going down with NoNode exception during create of znode for appattempt
 *Please find the exception logs*
 {code}
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-09 10:09:44,886 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
   at java.lang.Thread.run(Thread.java:745)
 2015-06-09 10:09:44,887 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
 out ZK retries. Giving up!
 2015-06-09 10:09:44,887 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 updating appAttempt: appattempt_1433764310492_7152_01
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at

[jira] [Updated] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED


 [ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3798:
-
Attachment: YARN-3798-branch-2.7.004.patch

 ZKRMStateStore shouldn't create new session without occurrance of 
 SESSIONEXPIED
 ---

 Key: YARN-3798
 URL: https://issues.apache.org/jira/browse/YARN-3798
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Assignee: Varun Saxena
Priority: Blocker
 Attachments: RM.log, YARN-3798-2.7.002.patch, 
 YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, 
 YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.patch


 RM going down with NoNode exception during create of znode for appattempt
 *Please find the exception logs*
 {code}
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-09 10:09:44,886 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
   at java.lang.Thread.run(Thread.java:745)
 2015-06-09 10:09:44,887 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
 out ZK retries. Giving up!
 2015-06-09 10:09:44,887 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 updating appAttempt: appattempt_1433764310492_7152_01
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at

[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7


[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620353#comment-14620353
 ] 

Hudson commented on YARN-2194:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #981 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/981/])
YARN-2194. Addendum patch to fix failing unit test in 
TestPrivilegedOperationExecutor. Contributed by Sidharta Seethana. (vvasudev: 
rev 63d0365088ff9fca0baaf3c4c3c01f80c72d3281)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/TestPrivilegedOperationExecutor.java


 Cgroups cease to work in RHEL7
 --

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Wei Yan
Assignee: Sidharta Seethana
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
 YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch


 In RHEL7, the CPU controller is named cpu,cpuacct. The comma in the 
 controller name leads to container launch failure. 
 RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
 systemd has certain shortcomings as identified in this JIRA (see comments). 
 This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle


[ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620180#comment-14620180
 ] 

Varun Saxena commented on YARN-3047:


Thanks [~sjlee0] for the review and commit.
And thanks to [~zjshen], [~gtCarrera9] and [~vrushalic] as well for the reviews.

 [Data Serving] Set up ATS reader with basic request serving structure and 
 lifecycle
 ---

 Key: YARN-3047
 URL: https://issues.apache.org/jira/browse/YARN-3047
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Fix For: YARN-2928

 Attachments: Timeline_Reader(draft).pdf, 
 YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, 
 YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, 
 YARN-3047-YARN-2928.12.patch, YARN-3047-YARN-2928.13.patch, 
 YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.005.patch, 
 YARN-3047.006.patch, YARN-3047.007.patch, YARN-3047.02.patch, 
 YARN-3047.04.patch


 Per design in YARN-2938, set up the ATS reader as a service and implement the 
 basic structure as a service. It includes lifecycle management, request 
 serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException


[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620187#comment-14620187
 ] 

Hadoop QA commented on YARN-3381:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  20m 31s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 48s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 41s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 24s | The applied patch generated  2 
new checkstyle issues (total was 48, now 50). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 25s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   6m 12s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   9m  6s | Tests passed in 
hadoop-mapreduce-client-app. |
| {color:green}+1{color} | yarn tests |   6m 51s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   1m 56s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |   6m  0s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |  50m 51s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 124m  3s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744432/YARN-3381-006.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 63d0365 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8470/artifact/patchprocess/trunkFindbugsWarningshadoop-mapreduce-client-app.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8470/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt
 |
| hadoop-mapreduce-client-app test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8470/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8470/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8470/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8470/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8470/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8470/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8470/console |


This message was automatically generated.

 A typographical error in InvalidStateTransitonException
 -

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml


[ 
https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620189#comment-14620189
 ] 

Akira AJISAKA commented on YARN-3069:
-

+1, looks good to me. Thanks [~rchiang] for updating the patch. I'll commit it 
on July 13 JST if there are no objections.

 Document missing properties in yarn-default.xml
 ---

 Key: YARN-3069
 URL: https://issues.apache.org/jira/browse/YARN-3069
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: BB2015-05-TBR, supportability
 Attachments: YARN-3069.001.patch, YARN-3069.002.patch, 
 YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, 
 YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, 
 YARN-3069.009.patch, YARN-3069.010.patch, YARN-3069.011.patch, 
 YARN-3069.012.patch, YARN-3069.013.patch


 The following properties are currently not defined in yarn-default.xml.  
 These properties should either be
   A) documented in yarn-default.xml OR
   B)  listed as an exception (with comments, e.g. for internal use) in the 
 TestYarnConfigurationFields unit test
 Any comments for any of the properties below are welcome.
   org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker
   org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore
   security.applicationhistory.protocol.acl
   yarn.app.container.log.backups
   yarn.app.container.log.dir
   yarn.app.container.log.filesize
   yarn.client.app-submission.poll-interval
   yarn.client.application-client-protocol.poll-timeout-ms
   yarn.is.minicluster
   yarn.log.server.url
   yarn.minicluster.control-resource-monitoring
   yarn.minicluster.fixed.ports
   yarn.minicluster.use-rpc
   yarn.node-labels.fs-store.retry-policy-spec
   yarn.node-labels.fs-store.root-dir
   yarn.node-labels.manager-class
   yarn.nodemanager.container-executor.os.sched.priority.adjustment
   yarn.nodemanager.container-monitor.process-tree.class
   yarn.nodemanager.disk-health-checker.enable
   yarn.nodemanager.docker-container-executor.image-name
   yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms
   yarn.nodemanager.linux-container-executor.group
   yarn.nodemanager.log.deletion-threads-count
   yarn.nodemanager.user-home-dir
   yarn.nodemanager.webapp.https.address
   yarn.nodemanager.webapp.spnego-keytab-file
   yarn.nodemanager.webapp.spnego-principal
   yarn.nodemanager.windows-secure-container-executor.group
   yarn.resourcemanager.configuration.file-system-based-store
   yarn.resourcemanager.delegation-token-renewer.thread-count
   yarn.resourcemanager.delegation.key.update-interval
   yarn.resourcemanager.delegation.token.max-lifetime
   yarn.resourcemanager.delegation.token.renew-interval
   yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size
   yarn.resourcemanager.metrics.runtime.buckets
   yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs
   yarn.resourcemanager.reservation-system.class
   yarn.resourcemanager.reservation-system.enable
   yarn.resourcemanager.reservation-system.plan.follower
   yarn.resourcemanager.reservation-system.planfollower.time-step
   yarn.resourcemanager.rm.container-allocation.expiry-interval-ms
   yarn.resourcemanager.webapp.spnego-keytab-file
   yarn.resourcemanager.webapp.spnego-principal
   yarn.scheduler.include-port-in-node-name
   yarn.timeline-service.delegation.key.update-interval
   yarn.timeline-service.delegation.token.max-lifetime
   yarn.timeline-service.delegation.token.renew-interval
   yarn.timeline-service.generic-application-history.enabled
   
 yarn.timeline-service.generic-application-history.fs-history-store.compression-type
   yarn.timeline-service.generic-application-history.fs-history-store.uri
   yarn.timeline-service.generic-application-history.store-class
   yarn.timeline-service.http-cross-origin.enabled
   yarn.tracking.url.generator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3069) Document missing properties in yarn-default.xml


 [ 
https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-3069:

Target Version/s: 2.8.0
Hadoop Flags: Reviewed

 Document missing properties in yarn-default.xml
 ---

 Key: YARN-3069
 URL: https://issues.apache.org/jira/browse/YARN-3069
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: BB2015-05-TBR, supportability
 Attachments: YARN-3069.001.patch, YARN-3069.002.patch, 
 YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, 
 YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, 
 YARN-3069.009.patch, YARN-3069.010.patch, YARN-3069.011.patch, 
 YARN-3069.012.patch, YARN-3069.013.patch


 The following properties are currently not defined in yarn-default.xml.  
 These properties should either be
   A) documented in yarn-default.xml OR
   B)  listed as an exception (with comments, e.g. for internal use) in the 
 TestYarnConfigurationFields unit test
 Any comments for any of the properties below are welcome.
   org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker
   org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore
   security.applicationhistory.protocol.acl
   yarn.app.container.log.backups
   yarn.app.container.log.dir
   yarn.app.container.log.filesize
   yarn.client.app-submission.poll-interval
   yarn.client.application-client-protocol.poll-timeout-ms
   yarn.is.minicluster
   yarn.log.server.url
   yarn.minicluster.control-resource-monitoring
   yarn.minicluster.fixed.ports
   yarn.minicluster.use-rpc
   yarn.node-labels.fs-store.retry-policy-spec
   yarn.node-labels.fs-store.root-dir
   yarn.node-labels.manager-class
   yarn.nodemanager.container-executor.os.sched.priority.adjustment
   yarn.nodemanager.container-monitor.process-tree.class
   yarn.nodemanager.disk-health-checker.enable
   yarn.nodemanager.docker-container-executor.image-name
   yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms
   yarn.nodemanager.linux-container-executor.group
   yarn.nodemanager.log.deletion-threads-count
   yarn.nodemanager.user-home-dir
   yarn.nodemanager.webapp.https.address
   yarn.nodemanager.webapp.spnego-keytab-file
   yarn.nodemanager.webapp.spnego-principal
   yarn.nodemanager.windows-secure-container-executor.group
   yarn.resourcemanager.configuration.file-system-based-store
   yarn.resourcemanager.delegation-token-renewer.thread-count
   yarn.resourcemanager.delegation.key.update-interval
   yarn.resourcemanager.delegation.token.max-lifetime
   yarn.resourcemanager.delegation.token.renew-interval
   yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size
   yarn.resourcemanager.metrics.runtime.buckets
   yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs
   yarn.resourcemanager.reservation-system.class
   yarn.resourcemanager.reservation-system.enable
   yarn.resourcemanager.reservation-system.plan.follower
   yarn.resourcemanager.reservation-system.planfollower.time-step
   yarn.resourcemanager.rm.container-allocation.expiry-interval-ms
   yarn.resourcemanager.webapp.spnego-keytab-file
   yarn.resourcemanager.webapp.spnego-principal
   yarn.scheduler.include-port-in-node-name
   yarn.timeline-service.delegation.key.update-interval
   yarn.timeline-service.delegation.token.max-lifetime
   yarn.timeline-service.delegation.token.renew-interval
   yarn.timeline-service.generic-application-history.enabled
   
 yarn.timeline-service.generic-application-history.fs-history-store.compression-type
   yarn.timeline-service.generic-application-history.fs-history-store.uri
   yarn.timeline-service.generic-application-history.store-class
   yarn.timeline-service.http-cross-origin.enabled
   yarn.tracking.url.generator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing


[ 
https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620218#comment-14620218
 ] 

Hadoop QA commented on YARN-3453:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m  5s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 43s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 34s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 47s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  6s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 20s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 27s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  50m 58s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  89m  1s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/1276/YARN-3453.4.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 63d0365 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8474/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8474/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8474/console |


This message was automatically generated.

 Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator 
 even in DRF mode causing thrashing
 

 Key: YARN-3453
 URL: https://issues.apache.org/jira/browse/YARN-3453
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Ashwin Shankar
Assignee: Arun Suresh
 Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch, 
 YARN-3453.4.patch


 There are two places in preemption code flow where DefaultResourceCalculator 
 is used, even in DRF mode.
 Which basically results in more resources getting preempted than needed, and 
 those extra preempted containers aren’t even getting to the “starved” queue 
 since scheduling logic is based on DRF's Calculator.
 Following are the two places :
 1. {code:title=FSLeafQueue.java|borderStyle=solid}
 private boolean isStarved(Resource share)
 {code}
 A queue shouldn’t be marked as “starved” if the dominant resource usage
 is =  fair/minshare.
 2. {code:title=FairScheduler.java|borderStyle=solid}
 protected Resource resToPreempt(FSLeafQueue sched, long curTime)
 {code}
 --
 One more thing that I believe needs to change in DRF mode is : during a 
 preemption round,if preempting a few containers results in satisfying needs 
 of a resource type, then we should exit that preemption round, since the 
 containers that we just preempted should bring the dominant resource usage to 
 min/fair share.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing