date:20150219


[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327117#comment-14327117
 ] 

Sunil G commented on YARN-2004:
---

Thank you [~jlowe] and [~leftnoteasy] for the input.

Yes, there are alternate ways we can achieve scenario 1. Also for scenario 2, 
YARN-2009 will help. Hence this JIRA can now currently focus on the basic 
priority addition to Schedulers.

bq.Priority is only considered if both applications have a priority that was 
set. 

If a set of priorities is loaded to RM and one is  chosen as Default priority 
for a queue, it can be any priority from lowest to highest. So All the 
applications running w/o priority will be given as this default priority. Hence 
some lower priority application will end up with lower preference than an 
application running w/o priority. 
But this is also a perception from user. If user can consider that all 
applications running w/o priority will fall to default chosen one per queue , 
then the behavior will be as expected. 
On that note, I also feel that i can consider all applications running w/o 
priority will be of Default priority. [~jlowe] Pls share your thoughts w.r.t 
the above scenario.

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2015-02-19 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328071#comment-14328071
 ] 

Jason Lowe commented on YARN-1963:
--

I'd like to see changing app priorities addressed as it is a common ask from 
users.  In many cases jobs are submitted to the cluster via some 
workflow/pipeline, and they would like to change the priority of apps already 
submitted.  Otherwise they have to update their workflow/pipeline to change the 
submit-time priority, kill the active jobs, and resubmit the apps for the 
priority to take effect.  Then eventually they need to change it all back to 
normal priorities later.

 Support priorities across applications within the same queue 
 -

 Key: YARN-1963
 URL: https://issues.apache.org/jira/browse/YARN-1963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Sunil G
 Attachments: YARN Application Priorities Design.pdf, YARN Application 
 Priorities Design_01.pdf


 It will be very useful to support priorities among applications within the 
 same queue, particularly in production scenarios. It allows for finer-grained 
 controls without having to force admins to create a multitude of queues, plus 
 allows existing applications to continue using existing queues which are 
 usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2693) Priority Label Manager in RM to manage priority labels


[ 
https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327981#comment-14327981
 ] 

Wangda Tan commented on YARN-2693:
--

[~sunilg],
After thinking about this, I feel like maybe this part is not required before 
adding major functionalities. 

I found existing implementation of priority label manager is very similar to 
node label manager, but they're two different use cases.

In node label manager, each node can be assigned labels, there're lots of 
mappings in the cluster. 
However, priority labels will be much simpler, less than 2 dozens of text-based 
priority labels should satisfy most use cases, and priority labels will not 
likely to be changed frequently.

So what I suggest now is making a simple configuration-based labels first, if 
RM HA need to be supported, admin can put same priority-label configuration 
item to different RM nodes : now we don't have a centralized configuration for 
Hadoop daemon, we assume different RM nodes should have same yarn-site.xml 
setting.

After major functionality completed (saying RM / scheduler / API / Client 
side), more time could be spent on this part :).

Ideas?

 Priority Label Manager in RM to manage priority labels
 --

 Key: YARN-2693
 URL: https://issues.apache.org/jira/browse/YARN-2693
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 
 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch


 Focus of this JIRA is to have a centralized service to handle priority labels.
 Support operations such as
 * Add/Delete priority label to a specified queue
 * Manage integer mapping associated with each priority label
 * Support managing default priority label of a given queue
 * ACL support in queue level for priority label
 * Expose interface to RM to validate priority label
 Storage for this labels will be done in FileSystem and in Memory similar to 
 NodeLabel
 * FileSystem Based : persistent across RM restart
 * Memory Based: non-persistent across RM restart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3166) [Source organization] Decide detailed package structures for timeline service v2 components

2015-02-19 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327987#comment-14327987
 ] 

Sangjin Lee commented on YARN-3166:
---

[~gtCarrera9], sorry for my delayed response.

It looks good mostly. I have some feedback:
- As [~zjshen] mentioned, we need to sort out the RM/NM dependency on the 
timelineservice module. The NM dependency is more of a fluke, but we need to 
think about the RM dependency because it needs to start its own aggregator 
service. I believe [~Naganarasimha] mentioned this in another JIRA. Perhaps 
this is unavoidable if RM is going to start the aggregator? I am not aware of 
any clean pluggable service mechanism for RM (like the aux services for NM). 
Another idea if we don't want that is to move the base aggregator class into 
yarn-server-common.
- I think as a rule, it would be good to make sure not to disturb the old ATS 
classes. IIUC we're deprecating the old ATS classes, but we're not going to 
modify them in an incompatible way (e.g. moving classes, removing classes, 
changing interfaces, etc.), as that would be extremely disruptive once this is 
merged.
- What is the difference between TimelineStorage and TimelineStorageImpl?

 [Source organization] Decide detailed package structures for timeline service 
 v2 components
 ---

 Key: YARN-3166
 URL: https://issues.apache.org/jira/browse/YARN-3166
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu

 Open this JIRA to track all discussions on detailed package structures for 
 timeline services v2. This JIRA is for discussion only.
 For our current timeline service v2 design, aggregator (previously called 
 writer) implementation is in hadoop-yarn-server's:
 {{org.apache.hadoop.yarn.server.timelineservice.aggregator}}
 In YARN-2928's design, the next gen ATS reader is also a server. Maybe we 
 want to put reader related implementations into hadoop-yarn-server's:
 {{org.apache.hadoop.yarn.server.timelineservice.reader}}
 Both readers and aggregators will expose features that may be used by YARN 
 and other 3rd party components, such as aggregator/reader APIs. For those 
 features, maybe we would like to expose their interfaces to 
 hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? 
 Let's use this JIRA as a centralized place to track all related discussions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue


[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327995#comment-14327995
 ] 

Wangda Tan commented on YARN-1963:
--

One more question: I didn't see there's an API proposed to update app priority, 
I think it may be very useful when a job ran for some time, and need get 
completed as soon as we can.

Is this a valid use case that we need to do within YARN-1963 scope?

 Support priorities across applications within the same queue 
 -

 Key: YARN-1963
 URL: https://issues.apache.org/jira/browse/YARN-1963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Sunil G
 Attachments: YARN Application Priorities Design.pdf, YARN Application 
 Priorities Design_01.pdf


 It will be very useful to support priorities among applications within the 
 same queue, particularly in production scenarios. It allows for finer-grained 
 controls without having to force admins to create a multitude of queues, plus 
 allows existing applications to continue using existing queues which are 
 usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler


[ 
https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327319#comment-14327319
 ] 

Sunil G commented on YARN-3197:
---

Yes. Remark from [~rohithsharma] make sense.
I also came across scenarios where NM was slightly delayed in reporting its 
status, and application completed in mean time. Lots of this log will be 
printed on that time. 


 Confusing log generated by CapacityScheduler
 

 Key: YARN-3197
 URL: https://issues.apache.org/jira/browse/YARN-3197
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Hitesh Shah
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-3197.001.patch


 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue


[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327243#comment-14327243
 ] 

Sunil G commented on YARN-1963:
---

Thank you [~devaraj.k] for input

I have updated the subjiras and uploaded patch by considering integer rather 
than label names.
As mentioned, we can have the enums supported from MR side (can try using 
enums). But a translation table is needed for same and its better keep the same 
YarnClient side. 


 Support priorities across applications within the same queue 
 -

 Key: YARN-1963
 URL: https://issues.apache.org/jira/browse/YARN-1963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Sunil G
 Attachments: YARN Application Priorities Design.pdf, YARN Application 
 Priorities Design_01.pdf


 It will be very useful to support priorities among applications within the 
 same queue, particularly in production scenarios. It allows for finer-grained 
 controls without having to force admins to create a multitude of queues, plus 
 allows existing applications to continue using existing queues which are 
 usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1615) Fix typos in description about delay scheduling


[ 
https://issues.apache.org/jira/browse/YARN-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327274#comment-14327274
 ] 

Hudson commented on YARN-1615:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #843 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/843/])
YARN-1615. Fix typos in delay scheduler's description. Contributed by Akira 
Ajisaka. (ozawa: rev b8a14efdf535d42bcafa58d380bd2c7f4d36f8cb)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java


 Fix typos in description about delay scheduling
 ---

 Key: YARN-1615
 URL: https://issues.apache.org/jira/browse/YARN-1615
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation, scheduler
Affects Versions: 2.6.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Trivial
  Labels: newbie
 Fix For: 2.7.0

 Attachments: YARN-1615-002.patch, YARN-1615.patch


 In FSAppAttempt.java there're 4 typos:
 {code}
* containers over rack-local or off-switch containers. To acheive this
* we first only allow node-local assigments for a given prioirty level,
* then relax the locality threshold once we've had a long enough period
* without succesfully scheduling. We measure both the number of missed
 {code}
 They should be fixed as follows:
 {code}
* containers over rack-local or off-switch containers. To achieve this
* we first only allow node-local assignments for a given priority level,
* then relax the locality threshold once we've had a long enough period
* without successfully scheduling. We measure both the number of missed
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3132) RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated


[ 
https://issues.apache.org/jira/browse/YARN-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327279#comment-14327279
 ] 

Hudson commented on YARN-3132:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #843 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/843/])
YARN-3132. RMNodeLabelsManager should remove node from node-to-label mapping 
when node becomes deactivated. Contributed by Wangda Tan (jianhe: rev 
f5da5566d9c392a5df71a2dce4c2d0d50eea51ee)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java
* hadoop-yarn-project/CHANGES.txt


 RMNodeLabelsManager should remove node from node-to-label mapping when node 
 becomes deactivated
 ---

 Key: YARN-3132
 URL: https://issues.apache.org/jira/browse/YARN-3132
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.7.0

 Attachments: YARN-3132.1.patch


 Using an example to explain:
 1) Admin specify host1 has label=x
 2) node=host1:123 registered
 3) Get node-to-label mapping, return host1/host1:123
 4) node=host1:123 unregistered
 5) Get node-to-label mapping, still returns host1:123
 Probably we should remove host1:123 when it becomes deactivated and no 
 directly label assigned to it (directly assign means admin specify host1:123 
 has x instead of host1 has x).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA


[ 
https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327281#comment-14327281
 ] 

Hudson commented on YARN-1514:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #843 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/843/])
YARN-1514. Utility to benchmark ZKRMStateStore#loadState for RM HA. Contributed 
by Tsuyoshi OZAWA (jianhe: rev 1c03376300a46722d4147f5b8f37242f68dba0a2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStorePerf.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/test/YarnTestDriver.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* hadoop-yarn-project/CHANGES.txt
* hadoop-project/pom.xml


 Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
 

 Key: YARN-1514
 URL: https://issues.apache.org/jira/browse/YARN-1514
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.7.0

 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, YARN-1514.3.patch, 
 YARN-1514.4.patch, YARN-1514.4.patch, YARN-1514.5.patch, YARN-1514.5.patch, 
 YARN-1514.6.patch, YARN-1514.7.patch, YARN-1514.wip-2.patch, 
 YARN-1514.wip.patch


 ZKRMStateStore is very sensitive to ZNode-related operations as discussed in 
 YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is 
 called when RM-HA cluster does failover. Therefore, its execution time 
 impacts failover time of RM-HA.
 We need utility to benchmark time execution time of ZKRMStateStore#loadStore 
 as development tool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3132) RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated


[ 
https://issues.apache.org/jira/browse/YARN-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327295#comment-14327295
 ] 

Hudson commented on YARN-3132:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #109 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/109/])
YARN-3132. RMNodeLabelsManager should remove node from node-to-label mapping 
when node becomes deactivated. Contributed by Wangda Tan (jianhe: rev 
f5da5566d9c392a5df71a2dce4c2d0d50eea51ee)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java


 RMNodeLabelsManager should remove node from node-to-label mapping when node 
 becomes deactivated
 ---

 Key: YARN-3132
 URL: https://issues.apache.org/jira/browse/YARN-3132
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.7.0

 Attachments: YARN-3132.1.patch


 Using an example to explain:
 1) Admin specify host1 has label=x
 2) node=host1:123 registered
 3) Get node-to-label mapping, return host1/host1:123
 4) node=host1:123 unregistered
 5) Get node-to-label mapping, still returns host1:123
 Probably we should remove host1:123 when it becomes deactivated and no 
 directly label assigned to it (directly assign means admin specify host1:123 
 has x instead of host1 has x).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA


[ 
https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327297#comment-14327297
 ] 

Hudson commented on YARN-1514:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #109 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/109/])
YARN-1514. Utility to benchmark ZKRMStateStore#loadState for RM HA. Contributed 
by Tsuyoshi OZAWA (jianhe: rev 1c03376300a46722d4147f5b8f37242f68dba0a2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/test/YarnTestDriver.java
* hadoop-project/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStorePerf.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java


 Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
 

 Key: YARN-1514
 URL: https://issues.apache.org/jira/browse/YARN-1514
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.7.0

 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, YARN-1514.3.patch, 
 YARN-1514.4.patch, YARN-1514.4.patch, YARN-1514.5.patch, YARN-1514.5.patch, 
 YARN-1514.6.patch, YARN-1514.7.patch, YARN-1514.wip-2.patch, 
 YARN-1514.wip.patch


 ZKRMStateStore is very sensitive to ZNode-related operations as discussed in 
 YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is 
 called when RM-HA cluster does failover. Therefore, its execution time 
 impacts failover time of RM-HA.
 We need utility to benchmark time execution time of ZKRMStateStore#loadStore 
 as development tool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1615) Fix typos in description about delay scheduling


[ 
https://issues.apache.org/jira/browse/YARN-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327290#comment-14327290
 ] 

Hudson commented on YARN-1615:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #109 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/109/])
YARN-1615. Fix typos in delay scheduler's description. Contributed by Akira 
Ajisaka. (ozawa: rev b8a14efdf535d42bcafa58d380bd2c7f4d36f8cb)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java


 Fix typos in description about delay scheduling
 ---

 Key: YARN-1615
 URL: https://issues.apache.org/jira/browse/YARN-1615
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation, scheduler
Affects Versions: 2.6.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Trivial
  Labels: newbie
 Fix For: 2.7.0

 Attachments: YARN-1615-002.patch, YARN-1615.patch


 In FSAppAttempt.java there're 4 typos:
 {code}
* containers over rack-local or off-switch containers. To acheive this
* we first only allow node-local assigments for a given prioirty level,
* then relax the locality threshold once we've had a long enough period
* without succesfully scheduling. We measure both the number of missed
 {code}
 They should be fixed as follows:
 {code}
* containers over rack-local or off-switch containers. To achieve this
* we first only allow node-local assignments for a given priority level,
* then relax the locality threshold once we've had a long enough period
* without successfully scheduling. We measure both the number of missed
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue


[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327196#comment-14327196
 ] 

Devaraj K commented on YARN-1963:
-

I would also agree for numbers rather than labels for not to make it more 
complex. If we are moving with numbers, I think we can just use the existing 
priority API from ApplicationSubmissionContext.setPriority(Priority priority) 
and not required any new API's to expose to clients.

We may need to think for M/R Job priory case, M/R Job supports enums for 
priority (i.e. VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW) and we need to have some 
mechanism to map these enums to priority numbers.


 Support priorities across applications within the same queue 
 -

 Key: YARN-1963
 URL: https://issues.apache.org/jira/browse/YARN-1963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Sunil G
 Attachments: YARN Application Priorities Design.pdf, YARN Application 
 Priorities Design_01.pdf


 It will be very useful to support priorities among applications within the 
 same queue, particularly in production scenarios. It allows for finer-grained 
 controls without having to force admins to create a multitude of queues, plus 
 allows existing applications to continue using existing queues which are 
 usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager


[ 
https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327164#comment-14327164
 ] 

Devaraj K commented on YARN-3087:
-

Thanks [~gtCarrera9] for the link.

The patch is trying to move the static member 'pipeline' to instance level but 
still there are other places accessing the static members have not fixed which 
I mentioned in the above comment. Patch owner also says they are experiencing 
other issues with the same patch, probably it could be due to the other static 
references.

 [Aggregator implementation] the REST server (web server) for per-node 
 aggregator does not work if it runs inside node manager
 -

 Key: YARN-3087
 URL: https://issues.apache.org/jira/browse/YARN-3087
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Devaraj K

 This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
 aggregator and the associated REST server. It runs fine as a standalone 
 process, but does not work if it runs inside the node manager due to possible 
 collisions of servlet mapping.
 Exception:
 {noformat}
 org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for 
 v2 not found
   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3217) Remove httpclient dependency from hadoop-yarn-server-web-proxy


[ 
https://issues.apache.org/jira/browse/YARN-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327210#comment-14327210
 ] 

Brahma Reddy Battula commented on YARN-3217:


Manually I executed  test cases, all are passing .. From Jenkins also all are 
passing Please check following for same


{noformat}All Tests  Test name  Duration
Status{noformat}
{noformat} testWebAppProxyServerMainMethod1.4 sec   
 Passed{noformat}

{noformat}testWebAppProxyServlet  0.63 sec  
   Passed{noformat}

{{TestWebAppProxyServer}} not executed in jenkins

 Remove httpclient dependency from hadoop-yarn-server-web-proxy
 --

 Key: YARN-3217
 URL: https://issues.apache.org/jira/browse/YARN-3217
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Akira AJISAKA
Assignee: Brahma Reddy Battula
 Attachments: YARN-3217.patch


 Sub-task of HADOOP-10105. Remove httpclient dependency from 
 WebAppProxyServlet.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager


[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327610#comment-14327610
 ] 

Junping Du commented on YARN-914:
-

Break down this feature into sub-JIRAs.

 Support graceful decommission of nodemanager
 

 Key: YARN-914
 URL: https://issues.apache.org/jira/browse/YARN-914
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Luke Lu
Assignee: Junping Du
 Attachments: Gracefully Decommission of NodeManager (v1).pdf, 
 Gracefully Decommission of NodeManager (v2).pdf, 
 GracefullyDecommissionofNodeManagerv3.pdf


 When NMs are decommissioned for non-fault reasons (capacity change etc.), 
 it's desirable to minimize the impact to running applications.
 Currently if a NM is decommissioned, all running containers on the NM need to 
 be rescheduled on other NMs. Further more, for finished map tasks, if their 
 map output are not fetched by the reducers of the job, these map tasks will 
 need to be rerun as well.
 We propose to introduce a mechanism to optionally gracefully decommission a 
 node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-914) (Umbrella) Support graceful decommission of nodemanager


 [ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-914:

Summary: (Umbrella) Support graceful decommission of nodemanager  (was: 
Support graceful decommission of nodemanager)

 (Umbrella) Support graceful decommission of nodemanager
 ---

 Key: YARN-914
 URL: https://issues.apache.org/jira/browse/YARN-914
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Luke Lu
Assignee: Junping Du
 Attachments: Gracefully Decommission of NodeManager (v1).pdf, 
 Gracefully Decommission of NodeManager (v2).pdf, 
 GracefullyDecommissionofNodeManagerv3.pdf


 When NMs are decommissioned for non-fault reasons (capacity change etc.), 
 it's desirable to minimize the impact to running applications.
 Currently if a NM is decommissioned, all running containers on the NM need to 
 be rescheduled on other NMs. Further more, for finished map tasks, if their 
 map output are not fetched by the reducers of the job, these map tasks will 
 need to be rerun as well.
 We propose to introduce a mechanism to optionally gracefully decommission a 
 node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3132) RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated


[ 
https://issues.apache.org/jira/browse/YARN-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327620#comment-14327620
 ] 

Hudson commented on YARN-3132:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2060 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2060/])
YARN-3132. RMNodeLabelsManager should remove node from node-to-label mapping 
when node becomes deactivated. Contributed by Wangda Tan (jianhe: rev 
f5da5566d9c392a5df71a2dce4c2d0d50eea51ee)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java


 RMNodeLabelsManager should remove node from node-to-label mapping when node 
 becomes deactivated
 ---

 Key: YARN-3132
 URL: https://issues.apache.org/jira/browse/YARN-3132
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.7.0

 Attachments: YARN-3132.1.patch


 Using an example to explain:
 1) Admin specify host1 has label=x
 2) node=host1:123 registered
 3) Get node-to-label mapping, return host1/host1:123
 4) node=host1:123 unregistered
 5) Get node-to-label mapping, still returns host1:123
 Probably we should remove host1:123 when it becomes deactivated and no 
 directly label assigned to it (directly assign means admin specify host1:123 
 has x instead of host1 has x).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA


[ 
https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327622#comment-14327622
 ] 

Hudson commented on YARN-1514:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2060 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2060/])
YARN-1514. Utility to benchmark ZKRMStateStore#loadState for RM HA. Contributed 
by Tsuyoshi OZAWA (jianhe: rev 1c03376300a46722d4147f5b8f37242f68dba0a2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStorePerf.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/test/YarnTestDriver.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* hadoop-project/pom.xml


 Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
 

 Key: YARN-1514
 URL: https://issues.apache.org/jira/browse/YARN-1514
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.7.0

 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, YARN-1514.3.patch, 
 YARN-1514.4.patch, YARN-1514.4.patch, YARN-1514.5.patch, YARN-1514.5.patch, 
 YARN-1514.6.patch, YARN-1514.7.patch, YARN-1514.wip-2.patch, 
 YARN-1514.wip.patch


 ZKRMStateStore is very sensitive to ZNode-related operations as discussed in 
 YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is 
 called when RM-HA cluster does failover. Therefore, its execution time 
 impacts failover time of RM-HA.
 We need utility to benchmark time execution time of ZKRMStateStore#loadStore 
 as development tool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1615) Fix typos in description about delay scheduling


[ 
https://issues.apache.org/jira/browse/YARN-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327615#comment-14327615
 ] 

Hudson commented on YARN-1615:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2060 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2060/])
YARN-1615. Fix typos in delay scheduler's description. Contributed by Akira 
Ajisaka. (ozawa: rev b8a14efdf535d42bcafa58d380bd2c7f4d36f8cb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* hadoop-yarn-project/CHANGES.txt


 Fix typos in description about delay scheduling
 ---

 Key: YARN-1615
 URL: https://issues.apache.org/jira/browse/YARN-1615
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation, scheduler
Affects Versions: 2.6.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Trivial
  Labels: newbie
 Fix For: 2.7.0

 Attachments: YARN-1615-002.patch, YARN-1615.patch


 In FSAppAttempt.java there're 4 typos:
 {code}
* containers over rack-local or off-switch containers. To acheive this
* we first only allow node-local assigments for a given prioirty level,
* then relax the locality threshold once we've had a long enough period
* without succesfully scheduling. We measure both the number of missed
 {code}
 They should be fixed as follows:
 {code}
* containers over rack-local or off-switch containers. To achieve this
* we first only allow node-local assignments for a given priority level,
* then relax the locality threshold once we've had a long enough period
* without successfully scheduling. We measure both the number of missed
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3194) After NM restart, RM should handle NMCotainerStatuses sent by NM while registering if NM is Reconnected node

2015-02-19 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327630#comment-14327630
 ] 

Jason Lowe commented on YARN-3194:
--

+1 lgtm.  Will commit this tomorrow if there are no further comments.

 After NM restart, RM should handle NMCotainerStatuses sent by NM while 
 registering if NM is Reconnected node
 

 Key: YARN-3194
 URL: https://issues.apache.org/jira/browse/YARN-3194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: NM restart is enabled
Reporter: Rohith
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-3194.patch, 0001-yarn-3194-v1.patch


 On NM restart ,NM sends all the outstanding NMContainerStatus to RM during 
 registration. The registration can be treated by RM as New node or 
 Reconnecting node. RM triggers corresponding event on the basis of node added 
 or node reconnected state. 
 # Node added event : Again here 2 scenario's can occur 
 ## New node is registering with different ip:port – NOT A PROBLEM
 ## Old node is re-registering because of RESYNC command from RM when RM 
 restart – NOT A PROBLEM
 # Node reconnected event : 
 ## Existing node is re-registering i.e RM treat it as reconnecting node when 
 RM is not restarted 
 ### NM RESTART NOT Enabled – NOT A PROBLEM
 ### NM RESTART is Enabled 
  Some applications are running on this node – *Problem is here*
  Zero applications are running on this node – NOT A PROBLEM
 Since NMContainerStatus are not handled, RM never get to know about 
 completedContainer and never release resource held be containers. RM will not 
 allocate new containers for pending resource request as long as the 
 completedContainer event is triggered. This results in applications to wait 
 indefinitly because of pending containers are not served by RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler


[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327650#comment-14327650
 ] 

Sunil G commented on YARN-2004:
---

Yes [~jlowe]. 
Agreeing to your point. As of now, I have given a configuration to specify 
default priority in a queue. That can be applied for those applications which 
are submitted w/o priority. A cluster wide config also will be added, and given 
a queue level config, it can override customer wide default value. I will 
update patch as per this understanding. Thank you.

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3204) Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)


 [ 
https://issues.apache.org/jira/browse/YARN-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3204:
---
Attachment: YARN-3204-001.patch

 Fix new findbug warnings 
 inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
 -

 Key: YARN-3204
 URL: https://issues.apache.org/jira/browse/YARN-3204
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3204-001.patch


 Please check following findbug report..
 https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3195) [YARN]Missing uniformity In Yarn Queue CLI command

2015-02-19 Thread Jagadesh Kiran N (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jagadesh Kiran N updated YARN-3195:
---
Attachment: (was: YARN-3195.patch)

 [YARN]Missing uniformity  In Yarn Queue CLI command
 ---

 Key: YARN-3195
 URL: https://issues.apache.org/jira/browse/YARN-3195
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.6.0
 Environment: SUSE Linux SP3
Reporter: Jagadesh Kiran N
Assignee: Jagadesh Kiran N
Priority: Minor
 Fix For: 2.7.0

 Attachments: Helptobe removed in Queue.png


 Help is generic command should not be placed here because of this uniformity 
 is missing compared to other commands.Remove -help command inside ./yarn 
 queue as uniformity with respect to other commands 
 {code}
 SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn 
 queue -help
 15/02/13 19:30:20 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 usage: queue
 * -help  Displays help for all commands.*
  -status Queue Name   List queue information about given queue.
 SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn 
 queue
 15/02/13 19:33:14 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 Invalid Command Usage :
 usage: queue
 * -help  Displays help for all commands.*
  -status Queue Name   List queue information about given queue.
 {code}
 * -help  Displays help for all commands.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3226) UI changes for decommissioning node

Junping Du created YARN-3226:


 Summary: UI changes for decommissioning node
 Key: YARN-3226
 URL: https://issues.apache.org/jira/browse/YARN-3226
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du


Some initial thought is:
decommissioning nodes should still show up in the active nodes list since they 
are still running containers. 
A separate decommissioning tab to filter for those nodes would be nice, 
although I suppose users can also just use the jquery table to sort/search for
nodes in that state from the active nodes list if it's too crowded to add yet 
another node
state tab (or maybe get rid of some effectively dead tabs like the reboot state 
tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI


 [ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K reassigned YARN-3225:
---

Assignee: Devaraj K

 New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
 ---

 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Devaraj K

 New CLI (or existing CLI with parameters) should put each node on 
 decommission list to decommissioning status and track timeout to terminate 
 the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI


[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327941#comment-14327941
 ] 

Devaraj K commented on YARN-3225:
-

What would be the timeout units here, are we thinking of any constrained range 
for timeout value? Thanks

 New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
 ---

 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du

 New CLI (or existing CLI with parameters) should put each node on 
 decommission list to decommissioning status and track timeout to terminate 
 the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3194) After NM restart, RM should handle NMCotainerStatuses sent by NM while registering if NM is Reconnected node


[ 
https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327937#comment-14327937
 ] 

Jian He commented on YARN-3194:
---

lgtm too

 After NM restart, RM should handle NMCotainerStatuses sent by NM while 
 registering if NM is Reconnected node
 

 Key: YARN-3194
 URL: https://issues.apache.org/jira/browse/YARN-3194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: NM restart is enabled
Reporter: Rohith
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-3194.patch, 0001-yarn-3194-v1.patch


 On NM restart ,NM sends all the outstanding NMContainerStatus to RM during 
 registration. The registration can be treated by RM as New node or 
 Reconnecting node. RM triggers corresponding event on the basis of node added 
 or node reconnected state. 
 # Node added event : Again here 2 scenario's can occur 
 ## New node is registering with different ip:port – NOT A PROBLEM
 ## Old node is re-registering because of RESYNC command from RM when RM 
 restart – NOT A PROBLEM
 # Node reconnected event : 
 ## Existing node is re-registering i.e RM treat it as reconnecting node when 
 RM is not restarted 
 ### NM RESTART NOT Enabled – NOT A PROBLEM
 ### NM RESTART is Enabled 
  Some applications are running on this node – *Problem is here*
  Zero applications are running on this node – NOT A PROBLEM
 Since NMContainerStatus are not handled, RM never get to know about 
 completedContainer and never release resource held be containers. RM will not 
 allocate new containers for pending resource request as long as the 
 completedContainer event is triggered. This results in applications to wait 
 indefinitly because of pending containers are not served by RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3076) Add API/Implementation to YarnClient to retrieve label-to-node mapping


 [ 
https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3076:
-
Summary: Add API/Implementation to YarnClient to retrieve label-to-node 
mapping  (was: YarnClient implementation to retrieve label to node mapping)

 Add API/Implementation to YarnClient to retrieve label-to-node mapping
 --

 Key: YARN-3076
 URL: https://issues.apache.org/jira/browse/YARN-3076
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-3076.001.patch, YARN-3076.002.patch, 
 YARN-3076.003.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3229) Incorrect processing of container as LOST on Interruption during NM shutdown

2015-02-19 Thread Anubhav Dhoot (JIRA)

Anubhav Dhoot created YARN-3229:
---

 Summary: Incorrect processing of container as LOST on Interruption 
during NM shutdown
 Key: YARN-3229
 URL: https://issues.apache.org/jira/browse/YARN-3229
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot


YARN-2846 fixed the issue of writing to the state store incorrectly that the 
process is LOST. But even after that we still process the ContainerExitEvent. 
If notInterrupted is false in RecoveredContainerLaunch#call we should skip the 
following
{noformat}
 if (retCode != 0) {
  LOG.warn(Recovered container exited with a non-zero exit code 
  + retCode);
  this.dispatcher.getEventHandler().handle(new ContainerExitEvent(
  containerId,
  ContainerEventType.CONTAINER_EXITED_WITH_FAILURE, retCode,
  Container exited with a non-zero exit code  + retCode));
  return retCode;
}
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3076) Add API/Implementation to YarnClient to retrieve label-to-node mapping


[ 
https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327969#comment-14327969
 ] 

Hudson commented on YARN-3076:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7157 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7157/])
YARN-3076. Add API/Implementation to YarnClient to retrieve label-to-node 
mapping (Varun Saxena via wangda) (wangda: rev 
d49ae725d5fa3eecf879ac42c42a368dd811f854)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java


 Add API/Implementation to YarnClient to retrieve label-to-node mapping
 --

 Key: YARN-3076
 URL: https://issues.apache.org/jira/browse/YARN-3076
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-3076.001.patch, YARN-3076.002.patch, 
 YARN-3076.003.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3230) Clarify application states on the web UI


 [ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3230:
--
Attachment: YARN-3230.1.patch

Uploaded a patch to add more text to clarify the application state

 Clarify application states on the web UI
 

 Key: YARN-3230
 URL: https://issues.apache.org/jira/browse/YARN-3230
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3230.1.patch


 Today, application state are simply surfaced as a single word on the web UI. 
 Not everyone understands the meaning of NEW_SAVING, SUBMITTED, ACCEPTED. 
 This jira is to clarify the meaning of these states, things like what the 
 application is waiting for at this state. 
 In addition,the difference between application state and FinalStatus are 
 fairly confusing to users, especially when state=FINISHED, but 
 FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3229) Incorrect processing of container as LOST on Interruption during NM shutdown

2015-02-19 Thread Anubhav Dhoot (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot reassigned YARN-3229:
---

Assignee: Anubhav Dhoot

 Incorrect processing of container as LOST on Interruption during NM shutdown
 

 Key: YARN-3229
 URL: https://issues.apache.org/jira/browse/YARN-3229
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot

 YARN-2846 fixed the issue of writing to the state store incorrectly that the 
 process is LOST. But even after that we still process the ContainerExitEvent. 
 If notInterrupted is false in RecoveredContainerLaunch#call we should skip 
 the following
 {noformat}
  if (retCode != 0) {
   LOG.warn(Recovered container exited with a non-zero exit code 
   + retCode);
   this.dispatcher.getEventHandler().handle(new ContainerExitEvent(
   containerId,
   ContainerEventType.CONTAINER_EXITED_WITH_FAILURE, retCode,
   Container exited with a non-zero exit code  + retCode));
   return retCode;
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3230) Clarify application states on the web UI


 [ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3230:
--
Attachment: YARN-3230.2.patch

 Clarify application states on the web UI
 

 Key: YARN-3230
 URL: https://issues.apache.org/jira/browse/YARN-3230
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3230.1.patch, YARN-3230.2.patch


 Today, application state are simply surfaced as a single word on the web UI. 
 Not everyone understands the meaning of NEW_SAVING, SUBMITTED, ACCEPTED. 
 This jira is to clarify the meaning of these states, things like what the 
 application is waiting for at this state. 
 In addition,the difference between application state and FinalStatus are 
 fairly confusing to users, especially when state=FINISHED, but 
 FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3230) Clarify application states on the web UI


 [ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3230:
--
Attachment: application page.png

 Clarify application states on the web UI
 

 Key: YARN-3230
 URL: https://issues.apache.org/jira/browse/YARN-3230
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3230.1.patch, YARN-3230.2.patch


 Today, application state are simply surfaced as a single word on the web UI. 
 Not everyone understands the meaning of NEW_SAVING, SUBMITTED, ACCEPTED. 
 This jira is to clarify the meaning of these states, things like what the 
 application is waiting for at this state. 
 In addition,the difference between application state and FinalStatus are 
 fairly confusing to users, especially when state=FINISHED, but 
 FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3230) Clarify application states on the web UI


 [ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3230:
--
Attachment: (was: application page.png)

 Clarify application states on the web UI
 

 Key: YARN-3230
 URL: https://issues.apache.org/jira/browse/YARN-3230
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3230.1.patch, YARN-3230.2.patch


 Today, application state are simply surfaced as a single word on the web UI. 
 Not everyone understands the meaning of NEW_SAVING, SUBMITTED, ACCEPTED. 
 This jira is to clarify the meaning of these states, things like what the 
 application is waiting for at this state. 
 In addition,the difference between application state and FinalStatus are 
 fairly confusing to users, especially when state=FINISHED, but 
 FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3230) Clarify application states on the web UI


 [ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3230:
--
Attachment: application page.png

uploaded an application page screen shot

 Clarify application states on the web UI
 

 Key: YARN-3230
 URL: https://issues.apache.org/jira/browse/YARN-3230
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3230.1.patch, YARN-3230.2.patch, application 
 page.png


 Today, application state are simply surfaced as a single word on the web UI. 
 Not everyone understands the meaning of NEW_SAVING, SUBMITTED, ACCEPTED. 
 This jira is to clarify the meaning of these states, things like what the 
 application is waiting for at this state. 
 In addition,the difference between application state and FinalStatus are 
 fairly confusing to users, especially when state=FINISHED, but 
 FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.

2015-02-19 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328593#comment-14328593
 ] 

Tsuyoshi OZAWA commented on YARN-2820:
--

I'll take a look.

 Do retry in FileSystemRMStateStore for better error recovery when 
 update/store failure due to IOException.
 --

 Key: YARN-2820
 URL: https://issues.apache.org/jira/browse/YARN-2820
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
 YARN-2820.002.patch, YARN-2820.003.patch


 Do retry in FileSystemRMStateStore for better error recovery when 
 update/store failure due to IOException.
 When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
 saw the following IOexception cause the RM shutdown.
 {code}
 2014-10-29 23:49:12,202 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
 Updating info for attempt: appattempt_1409135750325_109118_01 at: 
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01
 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:46,283 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
 Error updating info for attempt: appattempt_1409135750325_109118_01
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas.
 2014-10-29 23:49:46,284 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
 Error storing/updating appAttempt: appattempt_1409135750325_109118_01
 2014-10-29 23:49:46,916 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
 Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause: 
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas. 
 at 
 org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
  
 at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
 at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
  
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
  
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
  
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
 at java.lang.Thread.run(Thread.java:744) 
 {code}
 As discussed at YARN-1778, TestFSRMStateStore failure is also due to  
 IOException in storeApplicationStateInternal.
 Stack trace from TestFSRMStateStore failure:
 {code}
  2015-02-03 00:09:19,092 INFO  [Thread-110] recovery.TestFSRMStateStore 
 (TestFSRMStateStore.java:run(285)) - testFSRMStateStoreClientRetry: Exception
  org.apache.hadoop.ipc.RemoteException(java.io.IOException): NameNode still 
 not started
at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkNNStartup(NameNodeRpcServer.java:1876)
at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:971)
at

[jira] [Updated] (YARN-3236) cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.


 [ 
https://issues.apache.org/jira/browse/YARN-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3236:

Issue Type: Improvement  (was: Bug)

 cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
 -

 Key: YARN-3236
 URL: https://issues.apache.org/jira/browse/YARN-3236
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
 Attachments: YARN-3236.000.patch


 cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
 RMAuthenticationFilter#AUTH_HANDLER_PROPERTY is added in YARN-2247. but the 
 code which use  AUTH_HANDLER_PROPERTY is removed at YARN-2656. We would 
 better remove it to avoid confusion since it is only introduced for a very 
 short time and no one use it now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3236) cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.


 [ 
https://issues.apache.org/jira/browse/YARN-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3236:

Attachment: YARN-3236.000.patch

 cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
 -

 Key: YARN-3236
 URL: https://issues.apache.org/jira/browse/YARN-3236
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
 Attachments: YARN-3236.000.patch


 cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
 RMAuthenticationFilter#AUTH_HANDLER_PROPERTY is added in YARN-2247. but the 
 code which use  AUTH_HANDLER_PROPERTY is removed at YARN-2656. We would 
 better remove it to avoid confusion since it is only introduce for a very 
 short time and no one use it now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3195) [YARN]Missing uniformity In Yarn Queue CLI command


[ 
https://issues.apache.org/jira/browse/YARN-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328619#comment-14328619
 ] 

Hadoop QA commented on YARN-3195:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699685/YARN-3195.patch
  against trunk revision c0d9b93.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  org.apache.hadoop.yarn.client.cli.TestLogsCLI
  org.apache.hadoop.yarn.client.cli.TestYarnCLI

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

org.apache.hadoop.yarn.client.TestResourceTrackerOnHA

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6679//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6679//console

This message is automatically generated.

 [YARN]Missing uniformity  In Yarn Queue CLI command
 ---

 Key: YARN-3195
 URL: https://issues.apache.org/jira/browse/YARN-3195
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.6.0
 Environment: SUSE Linux SP3
Reporter: Jagadesh Kiran N
Assignee: Jagadesh Kiran N
Priority: Minor
 Fix For: 2.7.0

 Attachments: Helptobe removed in Queue.png, YARN-3195.patch


 Help is generic command should not be placed here because of this uniformity 
 is missing compared to other commands.Remove -help command inside ./yarn 
 queue as uniformity with respect to other commands 
 {code}
 SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn 
 queue -help
 15/02/13 19:30:20 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 usage: queue
 * -help  Displays help for all commands.*
  -status Queue Name   List queue information about given queue.
 SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn 
 queue
 15/02/13 19:33:14 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 Invalid Command Usage :
 usage: queue
 * -help  Displays help for all commands.*
  -status Queue Name   List queue information about given queue.
 {code}
 * -help  Displays help for all commands.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-3003) Provide API for client to retrieve label to node mapping


 [ 
https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena resolved YARN-3003.

Resolution: Fixed

Thanks [~tedyu] for reporting.
Resolving it as fixed by YARN-3075 and YARN-3076. Not sure if need to be marked 
as Duplicate or some other resolution status. 

 Provide API for client to retrieve label to node mapping
 

 Key: YARN-3003
 URL: https://issues.apache.org/jira/browse/YARN-3003
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Ted Yu
Assignee: Varun Saxena
 Attachments: YARN-3003.001.patch, YARN-3003.002.patch


 Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set 
 of labels associated with the node.
 Client (such as Slider) may be interested in label to node mapping - given 
 label, return the nodes with this label.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI


[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328629#comment-14328629
 ] 

Devaraj K commented on YARN-3225:
-

Thanks [~djp] for clarification.

 New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
 ---

 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Devaraj K

 New CLI (or existing CLI with parameters) should put each node on 
 decommission list to decommissioning status and track timeout to terminate 
 the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI


[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328627#comment-14328627
 ] 

Devaraj K commented on YARN-3225:
-

I see the same mentioned in the design doc 
https://issues.apache.org/jira/secure/attachment/12699496/GracefullyDecommissionofNodeManagerv3.pdf
{quote} Before NMs get decommissioned, the timeout can be updated to shorter or
longer. e.g. admin can terminate the CLI and resubmit it with a different 
timeout
value.{quote}


 New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
 ---

 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Devaraj K

 New CLI (or existing CLI with parameters) should put each node on 
 decommission list to decommissioning status and track timeout to terminate 
 the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3236) cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.


[ 
https://issues.apache.org/jira/browse/YARN-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328641#comment-14328641
 ] 

zhihai xu commented on YARN-3236:
-

This is a code cleanup(remove unused variable), I think a test case is not 
needed.

 cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
 -

 Key: YARN-3236
 URL: https://issues.apache.org/jira/browse/YARN-3236
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
  Labels: cleanup, maintenance
 Attachments: YARN-3236.000.patch


 cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
 RMAuthenticationFilter#AUTH_HANDLER_PROPERTY is added in YARN-2247. but the 
 code which use  AUTH_HANDLER_PROPERTY is removed at YARN-2656. We would 
 better remove it to avoid confusion since it is only introduced for a very 
 short time and no one use it now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3236) cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.


 [ 
https://issues.apache.org/jira/browse/YARN-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3236:

Labels: cleanup maintenance  (was: maintenance)

 cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
 -

 Key: YARN-3236
 URL: https://issues.apache.org/jira/browse/YARN-3236
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
  Labels: cleanup, maintenance
 Attachments: YARN-3236.000.patch


 cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
 RMAuthenticationFilter#AUTH_HANDLER_PROPERTY is added in YARN-2247. but the 
 code which use  AUTH_HANDLER_PROPERTY is removed at YARN-2656. We would 
 better remove it to avoid confusion since it is only introduced for a very 
 short time and no one use it now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3236) cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.

zhihai xu created YARN-3236:
---

 Summary: cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
 Key: YARN-3236
 URL: https://issues.apache.org/jira/browse/YARN-3236
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial


cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
RMAuthenticationFilter#AUTH_HANDLER_PROPERTY is added in YARN-2247. but the 
code which use  AUTH_HANDLER_PROPERTY is removed at YARN-2656. We would better 
remove it to avoid confusion since it is only introduce for a very short time 
and no one use it now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI


[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328631#comment-14328631
 ] 

Sunil G commented on YARN-3225:
---

Yes [~devaraj.k]. Thank you for the clarification.

 New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
 ---

 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Devaraj K

 New CLI (or existing CLI with parameters) should put each node on 
 decommission list to decommissioning status and track timeout to terminate 
 the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.


[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328576#comment-14328576
 ] 

zhihai xu commented on YARN-2820:
-

All these 5 findbugs are not related to my change.

 Do retry in FileSystemRMStateStore for better error recovery when 
 update/store failure due to IOException.
 --

 Key: YARN-2820
 URL: https://issues.apache.org/jira/browse/YARN-2820
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
 YARN-2820.002.patch, YARN-2820.003.patch


 Do retry in FileSystemRMStateStore for better error recovery when 
 update/store failure due to IOException.
 When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
 saw the following IOexception cause the RM shutdown.
 {code}
 2014-10-29 23:49:12,202 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
 Updating info for attempt: appattempt_1409135750325_109118_01 at: 
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01
 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:46,283 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
 Error updating info for attempt: appattempt_1409135750325_109118_01
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas.
 2014-10-29 23:49:46,284 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
 Error storing/updating appAttempt: appattempt_1409135750325_109118_01
 2014-10-29 23:49:46,916 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
 Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause: 
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas. 
 at 
 org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
  
 at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
 at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
  
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
  
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
  
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
 at java.lang.Thread.run(Thread.java:744) 
 {code}
 As discussed at YARN-1778, TestFSRMStateStore failure is also due to  
 IOException in storeApplicationStateInternal.
 Stack trace from TestFSRMStateStore failure:
 {code}
  2015-02-03 00:09:19,092 INFO  [Thread-110] recovery.TestFSRMStateStore 
 (TestFSRMStateStore.java:run(285)) - testFSRMStateStoreClientRetry: Exception
  org.apache.hadoop.ipc.RemoteException(java.io.IOException): NameNode still 
 not started
at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkNNStartup(NameNodeRpcServer.java:1876)
at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:971)

[jira] [Commented] (YARN-2693) Priority Label Manager in RM to manage priority labels


[ 
https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328582#comment-14328582
 ] 

Sunil G commented on YARN-2693:
---

Hi [~leftnoteasy]
Thank you for the update.

NodeLabels and AppPrioirty managers are more or less same, but we cant merge 
more closer as we have different PBs for each operation. However a plan can 
laid to merge most of FileSystem and Manager classes so that more common part 
of code can be shared. 

As mentioned, I will move the parsing and config support changes to 
RMAppManager (as a separate class), and will have a minimal implementation. I 
will still keep this JIRA open so as to handle the same after the major 
scheduler changes and api support is done. 

 Priority Label Manager in RM to manage priority labels
 --

 Key: YARN-2693
 URL: https://issues.apache.org/jira/browse/YARN-2693
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 
 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch


 Focus of this JIRA is to have a centralized service to handle priority labels.
 Support operations such as
 * Add/Delete priority label to a specified queue
 * Manage integer mapping associated with each priority label
 * Support managing default priority label of a given queue
 * ACL support in queue level for priority label
 * Expose interface to RM to validate priority label
 Storage for this labels will be done in FileSystem and in Memory similar to 
 NodeLabel
 * FileSystem Based : persistent across RM restart
 * Memory Based: non-persistent across RM restart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3076) Add API/Implementation to YarnClient to retrieve label-to-node mapping


[ 
https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328617#comment-14328617
 ] 

Varun Saxena commented on YARN-3076:


Thanks [~leftnoteasy] for the review and commit.

 Add API/Implementation to YarnClient to retrieve label-to-node mapping
 --

 Key: YARN-3076
 URL: https://issues.apache.org/jira/browse/YARN-3076
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-3076.001.patch, YARN-3076.002.patch, 
 YARN-3076.003.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3236) cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.


[ 
https://issues.apache.org/jira/browse/YARN-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328638#comment-14328638
 ] 

Hadoop QA commented on YARN-3236:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699819/YARN-3236.000.patch
  against trunk revision c0d9b93.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6680//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6680//console

This message is automatically generated.

 cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
 -

 Key: YARN-3236
 URL: https://issues.apache.org/jira/browse/YARN-3236
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
  Labels: cleanup, maintenance
 Attachments: YARN-3236.000.patch


 cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
 RMAuthenticationFilter#AUTH_HANDLER_PROPERTY is added in YARN-2247. but the 
 code which use  AUTH_HANDLER_PROPERTY is removed at YARN-2656. We would 
 better remove it to avoid confusion since it is only introduced for a very 
 short time and no one use it now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2556) Tool to measure the performance of the timeline server

2015-02-19 Thread Amit Tiwari (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Amit Tiwari updated YARN-2556:
--
Attachment: YARN-2556.patch

Hi guys,
I've done the following enhancements to the previous patches that were posted:
1) Earlier, the payload was getting set as the entityId. Since the entityId is
used as a key, by LevelDB it was crashing under moderate loads, because each
key size was ~2MB. Hence I've changed it to send the payload as a part of
OtherInfo. This is handled well.
2) Instead of posting a string of repeated 'a's as a payload, I choose from a
set of characters. This ensures that the LevelDB does not get away easily with
compression ('cos algos can easily compress a string if it comprises a single
repeated character)

Here are some of the performance numbers that I've got:
I run 20 concurrent jobs, with the argument -m 300 -s 10 -t 20
On a 36 node cluster, this results in ~830 concurrent containers (e.g maps),
each firing 10KB of payload, 20 times.

Level DB seems to hold up fine.

Would you have other ways that I could stress/load the system even more?
thanks
--amit

Tool to measure the performance of the timeline server
--

Key: YARN-2556
URL: https://issues.apache.org/jira/browse/YARN-2556
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Chang Li
Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch,
YARN-2556.patch, yarn2556.patch, yarn2556.patch, yarn2556_wip.patch

We need to be able to understand the capacity model for the timeline server
to give users the tools they need to deploy a timeline server with the
correct capacity.
I propose we create a mapreduce job that can measure timeline server write
and read performance. Transactions per second, I/O for both read and write
would be a good start.
This could be done as an example or test job that could be tied into gridmix.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3230) Clarify application states on the web UI


 [ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3230:
--
Attachment: YARN-3230.3.patch

 Clarify application states on the web UI
 

 Key: YARN-3230
 URL: https://issues.apache.org/jira/browse/YARN-3230
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3230.1.patch, YARN-3230.2.patch, YARN-3230.3.patch


 Today, application state are simply surfaced as a single word on the web UI. 
 Not everyone understands the meaning of NEW_SAVING, SUBMITTED, ACCEPTED. 
 This jira is to clarify the meaning of these states, things like what the 
 application is waiting for at this state. 
 In addition,the difference between application state and FinalStatus are 
 fairly confusing to users, especially when state=FINISHED, but 
 FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer

2015-02-19 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328346#comment-14328346
 ] 

Li Lu commented on YARN-3034:
-

I agree that the RM may have a derived type of aggregator. Meanwhile, maybe 
we'd like to consider reuse the code for web server/data storage layer 
connections? BTW, I've done a simple write up for app-level aggregators and 
their relationships with RM/NMs, posted in YARN-3033. To make sure we're on the 
same page, could some one of you take a look at it? Thanks! 

 [Aggregator wireup] Implement RM starting its ATS writer
 

 Key: YARN-3034
 URL: https://issues.apache.org/jira/browse/YARN-3034
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3034.20150205-1.patch


 Per design in YARN-2928, implement resource managers starting their own ATS 
 writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3230) Clarify application states on the web UI


[ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328386#comment-14328386
 ] 

Wangda Tan commented on YARN-3230:
--

Since the new_saving issue seems hard to fit in this ticket, I suggest to file 
a separated one to tracking it.

Patch looks good to me, findbugs warning not related to this patch. I will 
commit it today.

 Clarify application states on the web UI
 

 Key: YARN-3230
 URL: https://issues.apache.org/jira/browse/YARN-3230
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3230.1.patch, YARN-3230.2.patch, YARN-3230.3.patch, 
 YARN-3230.3.patch, application page.png


 Today, application state are simply surfaced as a single word on the web UI. 
 Not everyone understands the meaning of NEW_SAVING, SUBMITTED, ACCEPTED. 
 This jira is to clarify the meaning of these states, things like what the 
 application is waiting for at this state. 
 In addition,the difference between application state and FinalStatus are 
 fairly confusing to users, especially when state=FINISHED, but 
 FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2015-02-19 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328388#comment-14328388
 ] 

Zhijie Shen commented on YARN-2423:
---

Sure, I'll review the last patch. No matter the java client lib exists or not, 
we have exposed the REST getter APIs, and have users that depend on them. 
Having java client lib may make put more issue on backward compatibility of TS 
v2, but hopefully it's not going to be a big addition, as we anyway need to 
make the REST APIs compatible, which is the internal stuff within the java 
wrapper.

 TimelineClient should wrap all GET APIs to facilitate Java users
 

 Key: YARN-2423
 URL: https://issues.apache.org/jira/browse/YARN-2423
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Robert Kanter
 Attachments: YARN-2423.004.patch, YARN-2423.005.patch, 
 YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, 
 YARN-2423.patch


 TimelineClient provides the Java method to put timeline entities. It's also 
 good to wrap over all GET APIs (both entity and domain), and deserialize the 
 json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2986) (Umbrella) Support hierarchical and unified scheduler configuration


 [ 
https://issues.apache.org/jira/browse/YARN-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2986:
-
Summary: (Umbrella) Support hierarchical and unified scheduler 
configuration  (was: Support hierarchical and unified scheduler configuration)

 (Umbrella) Support hierarchical and unified scheduler configuration
 ---

 Key: YARN-2986
 URL: https://issues.apache.org/jira/browse/YARN-2986
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Vinod Kumar Vavilapalli
Assignee: Wangda Tan
 Attachments: YARN-2986.1.patch


 Today's scheduler configuration is fragmented and non-intuitive, and needs to 
 be improved. Details in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue


[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328450#comment-14328450
 ] 

Sunil G commented on YARN-1963:
---

Thank you Wangda and Jason for the input

Yes,  it's good to change the priority of an application at runtime.  I had 
mentioned it in the design doc. 
 I have created a user api jira already,  and it's client part can be handled 
there. 

 Support priorities across applications within the same queue 
 -

 Key: YARN-1963
 URL: https://issues.apache.org/jira/browse/YARN-1963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Sunil G
 Attachments: YARN Application Priorities Design.pdf, YARN Application 
 Priorities Design_01.pdf


 It will be very useful to support priorities among applications within the 
 same queue, particularly in production scenarios. It allows for finer-grained 
 controls without having to force admins to create a multitude of queues, plus 
 allows existing applications to continue using existing queues which are 
 usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI


[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327662#comment-14327662
 ] 

Sunil G commented on YARN-3225:
---

Hi [~djp]
To understand the idea correctly, do you mean a command is to be added so that 
a given node can be made as decommissioned. And it can be given a timeout to 
gracefully verify the same is done. 
So something like ./yarn -node nodeID -timeout 200 -decommission

Pls help to clarify, and I would like pursue this if you are not assigning 
yourself. :)


 New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
 ---

 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du

 New CLI (or existing CLI with parameters) should put each node on 
 decommission list to decommissioning status and track timeout to terminate 
 the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission


[ 
https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327711#comment-14327711
 ] 

Varun Saxena commented on YARN-3223:


Junping Du, pls reassign if you plan to work on this

 Resource update during NM graceful decommission
 ---

 Key: YARN-3223
 URL: https://issues.apache.org/jira/browse/YARN-3223
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Junping Du
Assignee: Varun Saxena

 During NM graceful decommission, we should handle resource update properly, 
 include: make RMNode keep track of old resource for possible rollback, keep 
 available resource to 0 and used resource get updated when
 container finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3224) Notify AM with containers (on decommissioning node) could be preempted after timeout.


[ 
https://issues.apache.org/jira/browse/YARN-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327668#comment-14327668
 ] 

Sunil G commented on YARN-3224:
---

We have an event name PREEMPT_CONTAINER which is used in ProportionalPolicy 
preemption to notify AM, It can be used here.
Do you mind if I also participate in this JIRA. Thank you [~djp]

 Notify AM with containers (on decommissioning node) could be preempted after 
 timeout.
 -

 Key: YARN-3224
 URL: https://issues.apache.org/jira/browse/YARN-3224
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3204) Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)


[ 
https://issues.apache.org/jira/browse/YARN-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327673#comment-14327673
 ] 

Hadoop QA commented on YARN-3204:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699676/YARN-3204-001.patch
  against trunk revision 2fd02af.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6668//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6668//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6668//console

This message is automatically generated.

 Fix new findbug warnings 
 inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
 -

 Key: YARN-3204
 URL: https://issues.apache.org/jira/browse/YARN-3204
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3204-001.patch


 Please check following findbug report..
 https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3223) Resource update during NM graceful decommission


 [ 
https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-3223:
--

Assignee: Varun Saxena

 Resource update during NM graceful decommission
 ---

 Key: YARN-3223
 URL: https://issues.apache.org/jira/browse/YARN-3223
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Junping Du
Assignee: Varun Saxena

 During NM graceful decommission, we should handle resource update properly, 
 include: make RMNode keep track of old resource for possible rollback, keep 
 available resource to 0 and used resource get updated when
 container finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI


[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327680#comment-14327680
 ] 

Sunil G commented on YARN-3225:
---

Sorry, I slightly misunderstood earlier.
You meant rmadmin command with a new option such as -g. So one doubt here, can 
timeout also be passed here?

 New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
 ---

 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du

 New CLI (or existing CLI with parameters) should put each node on 
 decommission list to decommissioning status and track timeout to terminate 
 the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3204) Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)


 [ 
https://issues.apache.org/jira/browse/YARN-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3204:
---
Attachment: YARN-3204-002.patch

 Fix new findbug warnings 
 inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
 -

 Key: YARN-3204
 URL: https://issues.apache.org/jira/browse/YARN-3204
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3204-001.patch, YARN-3204-002.patch


 Please check following findbug report..
 https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2004) Priority scheduling support in Capacity scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2004:
--
Attachment: 0002-YARN-2004.patch

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3228) Deadlock altering user resource queue

2015-02-19 Thread Christian Hott (JIRA)

Christian Hott created YARN-3228:


 Summary: Deadlock altering user resource queue
 Key: YARN-3228
 URL: https://issues.apache.org/jira/browse/YARN-3228
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager, scheduler
Affects Versions: 2.0.1-alpha
 Environment: hadoop yarn, postgresql 
Reporter: Christian Hott
Priority: Blocker


let me introduce you with my problem:
all of this began after we created some resources queues on postgresql,
well we created it, assign it to the users and all was fine...
until we run a process (a large one iterative query) and I do an Alter Role 
over the user and the resource queue that he was using, before that I can't 
login whit the user and got a message saying deadlock detection, locking 
against self
does you have any idea why this for? or if have any comprensible log in to I 
can search for more information?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3132) RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated


[ 
https://issues.apache.org/jira/browse/YARN-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327452#comment-14327452
 ] 

Hudson commented on YARN-3132:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2041 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2041/])
YARN-3132. RMNodeLabelsManager should remove node from node-to-label mapping 
when node becomes deactivated. Contributed by Wangda Tan (jianhe: rev 
f5da5566d9c392a5df71a2dce4c2d0d50eea51ee)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java


 RMNodeLabelsManager should remove node from node-to-label mapping when node 
 becomes deactivated
 ---

 Key: YARN-3132
 URL: https://issues.apache.org/jira/browse/YARN-3132
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.7.0

 Attachments: YARN-3132.1.patch


 Using an example to explain:
 1) Admin specify host1 has label=x
 2) node=host1:123 registered
 3) Get node-to-label mapping, return host1/host1:123
 4) node=host1:123 unregistered
 5) Get node-to-label mapping, still returns host1:123
 Probably we should remove host1:123 when it becomes deactivated and no 
 directly label assigned to it (directly assign means admin specify host1:123 
 has x instead of host1 has x).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA


[ 
https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327454#comment-14327454
 ] 

Hudson commented on YARN-1514:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2041 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2041/])
YARN-1514. Utility to benchmark ZKRMStateStore#loadState for RM HA. Contributed 
by Tsuyoshi OZAWA (jianhe: rev 1c03376300a46722d4147f5b8f37242f68dba0a2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStorePerf.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/test/YarnTestDriver.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* hadoop-yarn-project/CHANGES.txt
* hadoop-project/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml


 Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
 

 Key: YARN-1514
 URL: https://issues.apache.org/jira/browse/YARN-1514
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.7.0

 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, YARN-1514.3.patch, 
 YARN-1514.4.patch, YARN-1514.4.patch, YARN-1514.5.patch, YARN-1514.5.patch, 
 YARN-1514.6.patch, YARN-1514.7.patch, YARN-1514.wip-2.patch, 
 YARN-1514.wip.patch


 ZKRMStateStore is very sensitive to ZNode-related operations as discussed in 
 YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is 
 called when RM-HA cluster does failover. Therefore, its execution time 
 impacts failover time of RM-HA.
 We need utility to benchmark time execution time of ZKRMStateStore#loadStore 
 as development tool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1615) Fix typos in description about delay scheduling


[ 
https://issues.apache.org/jira/browse/YARN-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327447#comment-14327447
 ] 

Hudson commented on YARN-1615:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2041 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2041/])
YARN-1615. Fix typos in delay scheduler's description. Contributed by Akira 
Ajisaka. (ozawa: rev b8a14efdf535d42bcafa58d380bd2c7f4d36f8cb)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java


 Fix typos in description about delay scheduling
 ---

 Key: YARN-1615
 URL: https://issues.apache.org/jira/browse/YARN-1615
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation, scheduler
Affects Versions: 2.6.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Trivial
  Labels: newbie
 Fix For: 2.7.0

 Attachments: YARN-1615-002.patch, YARN-1615.patch


 In FSAppAttempt.java there're 4 typos:
 {code}
* containers over rack-local or off-switch containers. To acheive this
* we first only allow node-local assigments for a given prioirty level,
* then relax the locality threshold once we've had a long enough period
* without succesfully scheduling. We measure both the number of missed
 {code}
 They should be fixed as follows:
 {code}
* containers over rack-local or off-switch containers. To achieve this
* we first only allow node-local assignments for a given priority level,
* then relax the locality threshold once we've had a long enough period
* without successfully scheduling. We measure both the number of missed
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA


[ 
https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327489#comment-14327489
 ] 

Hudson commented on YARN-1514:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #110 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/110/])
YARN-1514. Utility to benchmark ZKRMStateStore#loadState for RM HA. Contributed 
by Tsuyoshi OZAWA (jianhe: rev 1c03376300a46722d4147f5b8f37242f68dba0a2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStorePerf.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/test/YarnTestDriver.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
* hadoop-project/pom.xml
* hadoop-yarn-project/CHANGES.txt


 Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
 

 Key: YARN-1514
 URL: https://issues.apache.org/jira/browse/YARN-1514
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.7.0

 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, YARN-1514.3.patch, 
 YARN-1514.4.patch, YARN-1514.4.patch, YARN-1514.5.patch, YARN-1514.5.patch, 
 YARN-1514.6.patch, YARN-1514.7.patch, YARN-1514.wip-2.patch, 
 YARN-1514.wip.patch


 ZKRMStateStore is very sensitive to ZNode-related operations as discussed in 
 YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is 
 called when RM-HA cluster does failover. Therefore, its execution time 
 impacts failover time of RM-HA.
 We need utility to benchmark time execution time of ZKRMStateStore#loadStore 
 as development tool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3132) RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated


[ 
https://issues.apache.org/jira/browse/YARN-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327487#comment-14327487
 ] 

Hudson commented on YARN-3132:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #110 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/110/])
YARN-3132. RMNodeLabelsManager should remove node from node-to-label mapping 
when node becomes deactivated. Contributed by Wangda Tan (jianhe: rev 
f5da5566d9c392a5df71a2dce4c2d0d50eea51ee)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java


 RMNodeLabelsManager should remove node from node-to-label mapping when node 
 becomes deactivated
 ---

 Key: YARN-3132
 URL: https://issues.apache.org/jira/browse/YARN-3132
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.7.0

 Attachments: YARN-3132.1.patch


 Using an example to explain:
 1) Admin specify host1 has label=x
 2) node=host1:123 registered
 3) Get node-to-label mapping, return host1/host1:123
 4) node=host1:123 unregistered
 5) Get node-to-label mapping, still returns host1:123
 Probably we should remove host1:123 when it becomes deactivated and no 
 directly label assigned to it (directly assign means admin specify host1:123 
 has x instead of host1 has x).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3195) [YARN]Missing uniformity In Yarn Queue CLI command

2015-02-19 Thread Jagadesh Kiran N (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jagadesh Kiran N updated YARN-3195:
---
Attachment: YARN-3195.patch

Attached the patch after fix.Please check

 [YARN]Missing uniformity  In Yarn Queue CLI command
 ---

 Key: YARN-3195
 URL: https://issues.apache.org/jira/browse/YARN-3195
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.6.0
 Environment: SUSE Linux SP3
Reporter: Jagadesh Kiran N
Assignee: Jagadesh Kiran N
Priority: Minor
 Fix For: 2.7.0

 Attachments: Helptobe removed in Queue.png, YARN-3195.patch


 Help is generic command should not be placed here because of this uniformity 
 is missing compared to other commands.Remove -help command inside ./yarn 
 queue as uniformity with respect to other commands 
 {code}
 SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn 
 queue -help
 15/02/13 19:30:20 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 usage: queue
 * -help  Displays help for all commands.*
  -status Queue Name   List queue information about given queue.
 SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn 
 queue
 15/02/13 19:33:14 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 Invalid Command Usage :
 usage: queue
 * -help  Displays help for all commands.*
  -status Queue Name   List queue information about given queue.
 {code}
 * -help  Displays help for all commands.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3224) Notify AM with containers (on decommissioning node) could be preempted after timeout.

Junping Du created YARN-3224:


 Summary: Notify AM with containers (on decommissioning node) could 
be preempted after timeout.
 Key: YARN-3224
 URL: https://issues.apache.org/jira/browse/YARN-3224
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler


[ 
https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327445#comment-14327445
 ] 

Varun Saxena commented on YARN-3197:


[~devaraj.k] and others,
I meant printing unknown container or unknown application while printing their 
respective IDs' might be deemed as confusing by some too.
Cant we say something Non-alive container containerid  ?

AppID can probably be printed from ContainerID.
Thoughts ?

 Confusing log generated by CapacityScheduler
 

 Key: YARN-3197
 URL: https://issues.apache.org/jira/browse/YARN-3197
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Hitesh Shah
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-3197.001.patch


 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1615) Fix typos in description about delay scheduling


[ 
https://issues.apache.org/jira/browse/YARN-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327464#comment-14327464
 ] 

Hudson commented on YARN-1615:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #100 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/100/])
YARN-1615. Fix typos in delay scheduler's description. Contributed by Akira 
Ajisaka. (ozawa: rev b8a14efdf535d42bcafa58d380bd2c7f4d36f8cb)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java


 Fix typos in description about delay scheduling
 ---

 Key: YARN-1615
 URL: https://issues.apache.org/jira/browse/YARN-1615
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation, scheduler
Affects Versions: 2.6.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Trivial
  Labels: newbie
 Fix For: 2.7.0

 Attachments: YARN-1615-002.patch, YARN-1615.patch


 In FSAppAttempt.java there're 4 typos:
 {code}
* containers over rack-local or off-switch containers. To acheive this
* we first only allow node-local assigments for a given prioirty level,
* then relax the locality threshold once we've had a long enough period
* without succesfully scheduling. We measure both the number of missed
 {code}
 They should be fixed as follows:
 {code}
* containers over rack-local or off-switch containers. To achieve this
* we first only allow node-local assignments for a given priority level,
* then relax the locality threshold once we've had a long enough period
* without successfully scheduling. We measure both the number of missed
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3204) Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)


[ 
https://issues.apache.org/jira/browse/YARN-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327514#comment-14327514
 ] 

Brahma Reddy Battula commented on YARN-3204:


Updateinterval is read from initilzataion of fairschduler, so it will not 
change.. Hence need not be protected by a lock for following piece of code..I 
want to add this in findbug-exclude file...

{code}
public void run() {
  while (!Thread.currentThread().isInterrupted()) {
try {
  Thread.sleep(updateInterval);
  long start = getClock().getTime();
  update();
{code}



 Fix new findbug warnings 
 inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
 -

 Key: YARN-3204
 URL: https://issues.apache.org/jira/browse/YARN-3204
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3204-001.patch


 Please check following findbug report..
 https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-02-19 Thread Jason Lowe (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327527#comment-14327527
]

Jason Lowe commented on YARN-2004:
--

My thoughts are as I stated above. We should not ignore priorities if one of
the apps does not have a priority specified. A lack of a specified priority on
an application should imply a default priority value and still be compared to
the other application's priority rather than skipping the priority comparison.
That would be the expected behavior. We can come up with all sorts of schemes
to determine what the default priority value should be (e.g.: hardcoded default
value, cluster-wide configurable, queue-specific configurable, etc.). The
important part is to not skip the priority comparison completely as that would
be unexpected behavior for users.

Priority scheduling support in Capacity scheduler
-

Key: YARN-2004
URL: https://issues.apache.org/jira/browse/YARN-2004
Project: Hadoop YARN
Issue Type: Sub-task
Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
Attachments: 0001-YARN-2004.patch

Based on the priority of the application, Capacity Scheduler should be able
to give preference to application while doing scheduling.
ComparatorFiCaSchedulerApp applicationComparator can be changed as below.

1.Check for Application priority. If priority is available, then return
the highest priority job.
2.Otherwise continue with existing logic such as App ID comparison and
then TimeStamp comparison.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3223) Resource update during NM graceful decommission

Junping Du created YARN-3223:


 Summary: Resource update during NM graceful decommission
 Key: YARN-3223
 URL: https://issues.apache.org/jira/browse/YARN-3223
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Junping Du


During NM graceful decommission, we should handle resource update properly, 
include: make RMNode keep track of old resource for possible rollback, keep 
available resource to 0 and used resource get updated when
container finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA


[ 
https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327471#comment-14327471
 ] 

Hudson commented on YARN-1514:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #100 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/100/])
YARN-1514. Utility to benchmark ZKRMStateStore#loadState for RM HA. Contributed 
by Tsuyoshi OZAWA (jianhe: rev 1c03376300a46722d4147f5b8f37242f68dba0a2)
* hadoop-project/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStorePerf.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/test/YarnTestDriver.java


 Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
 

 Key: YARN-1514
 URL: https://issues.apache.org/jira/browse/YARN-1514
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.7.0

 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, YARN-1514.3.patch, 
 YARN-1514.4.patch, YARN-1514.4.patch, YARN-1514.5.patch, YARN-1514.5.patch, 
 YARN-1514.6.patch, YARN-1514.7.patch, YARN-1514.wip-2.patch, 
 YARN-1514.wip.patch


 ZKRMStateStore is very sensitive to ZNode-related operations as discussed in 
 YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is 
 called when RM-HA cluster does failover. Therefore, its execution time 
 impacts failover time of RM-HA.
 We need utility to benchmark time execution time of ZKRMStateStore#loadStore 
 as development tool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3132) RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated


[ 
https://issues.apache.org/jira/browse/YARN-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327469#comment-14327469
 ] 

Hudson commented on YARN-3132:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #100 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/100/])
YARN-3132. RMNodeLabelsManager should remove node from node-to-label mapping 
when node becomes deactivated. Contributed by Wangda Tan (jianhe: rev 
f5da5566d9c392a5df71a2dce4c2d0d50eea51ee)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java


 RMNodeLabelsManager should remove node from node-to-label mapping when node 
 becomes deactivated
 ---

 Key: YARN-3132
 URL: https://issues.apache.org/jira/browse/YARN-3132
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.7.0

 Attachments: YARN-3132.1.patch


 Using an example to explain:
 1) Admin specify host1 has label=x
 2) node=host1:123 registered
 3) Get node-to-label mapping, return host1/host1:123
 4) node=host1:123 unregistered
 5) Get node-to-label mapping, still returns host1:123
 Probably we should remove host1:123 when it becomes deactivated and no 
 directly label assigned to it (directly assign means admin specify host1:123 
 has x instead of host1 has x).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1615) Fix typos in description about delay scheduling


[ 
https://issues.apache.org/jira/browse/YARN-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327482#comment-14327482
 ] 

Hudson commented on YARN-1615:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #110 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/110/])
YARN-1615. Fix typos in delay scheduler's description. Contributed by Akira 
Ajisaka. (ozawa: rev b8a14efdf535d42bcafa58d380bd2c7f4d36f8cb)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java


 Fix typos in description about delay scheduling
 ---

 Key: YARN-1615
 URL: https://issues.apache.org/jira/browse/YARN-1615
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation, scheduler
Affects Versions: 2.6.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Trivial
  Labels: newbie
 Fix For: 2.7.0

 Attachments: YARN-1615-002.patch, YARN-1615.patch


 In FSAppAttempt.java there're 4 typos:
 {code}
* containers over rack-local or off-switch containers. To acheive this
* we first only allow node-local assigments for a given prioirty level,
* then relax the locality threshold once we've had a long enough period
* without succesfully scheduling. We measure both the number of missed
 {code}
 They should be fixed as follows:
 {code}
* containers over rack-local or off-switch containers. To achieve this
* we first only allow node-local assignments for a given priority level,
* then relax the locality threshold once we've had a long enough period
* without successfully scheduling. We measure both the number of missed
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

Junping Du created YARN-3225:


 Summary: New parameter or CLI for decommissioning node gracefully 
in RMAdmin CLI
 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du


New CLI (or existing CLI with parameters) should put each node on decommission 
list to decommissioning status and track timeout to terminate the nodes that 
haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler


[ 
https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327604#comment-14327604
 ] 

Devaraj K commented on YARN-3197:
-

I am not completely convinced to change the log level to debug, even if there 
are many logs those would be one log per container. If we change the log level 
to debug then we would be missing the update of those containers after NM 
restart in the usual cases where the log level is Info. And also there is a 
debug log in the caller method would probably serve the same and (rmContainer 
== null) {} log wouldn't be required if you have decided to make the log level 
as debug.

{code:xml}
LOG.debug(Container FINISHED:  + containerId);
{code}


IMO, we don't need to explicitly derive and print the application id from 
container id, just logging container id would be enough and user can derive 
application id from it if they really want.


 Confusing log generated by CapacityScheduler
 

 Key: YARN-3197
 URL: https://issues.apache.org/jira/browse/YARN-3197
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Hitesh Shah
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-3197.001.patch


 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-02-19 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327716#comment-14327716
 ] 

Craig Welch commented on YARN-2495:
---

So, here's my proposal [~Naganarasimha] [~leftnoteasy], take a minute and 
consider whether or not DECENTRALIZED_CONFIGURATION_ENABLED is more likely to 
cause difficulty than prevent it, as I'm suggesting, and then you all can 
decide to keep it or not as you wish - I don't want to hold up the way forward 
over something which is, on the whole, a detail...

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327718#comment-14327718
 ] 

Hadoop QA commented on YARN-2495:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12685787/YARN-2495.20141208-1.patch
  against trunk revision 2fd02af.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6670//console

This message is automatically generated.

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3227) Timeline renew delegation token fails when RM user's TGT is expired

2015-02-19 Thread Jonathan Eagles (JIRA)

Jonathan Eagles created YARN-3227:
-

 Summary: Timeline renew delegation token fails when RM user's TGT 
is expired
 Key: YARN-3227
 URL: https://issues.apache.org/jira/browse/YARN-3227
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Priority: Critical


When the RM user's kerberos TGT is expired, the RM renew delegation token 
operation fails as part of job submission. Expected behavior is that RM will 
relogin to get a new TGT.

{quote}
2015-02-06 18:54:05,617 [DelegationTokenRenewer #25954] WARN
security.DelegationTokenRenewer: Unable to add the application to the
delegation token renewer.
java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN,
Service: timelineserver.example.com:4080, Ident: (owner=user,
renewer=rmuser, realUser=oozie, issueDate=1423248845528,
maxDate=1423853645528, sequenceNumber=9716, masterKeyId=9)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:443)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$800(DelegationTokenRenewer.java:77)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:808)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:789)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: HTTP status [401], message [Unauthorized]
at
org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:169)
at
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:286)
at
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:211)
at
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:414)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:374)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:360)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$4.run(TimelineClientImpl.java:429)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:161)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:444)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:378)
at
org.apache.hadoop.yarn.security.client.TimelineDelegationTokenIdentifier$Renewer.renew(TimelineDelegationTokenIdentifier.java:81)
at org.apache.hadoop.security.token.Token.renew(Token.java:377)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:532)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:529)
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler


[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327730#comment-14327730
 ] 

Sunil G commented on YARN-2004:
---

As per YARN-2003, RMAppManager#submitApplication process input from 
submissionContext. I will add a case here which will handle the scenario where 
priority is NULL from submission context. It can be updated with default 
priority from queue. 

As for this patch, i can remove NULL check. Will only have a direct compareTo 
check for priority.

 Priority scheduling support in Capacity scheduler
 -

 Key: YARN-2004
 URL: https://issues.apache.org/jira/browse/YARN-2004
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2004.patch


 Based on the priority of the application, Capacity Scheduler should be able 
 to give preference to application while doing scheduling.
 ComparatorFiCaSchedulerApp applicationComparator can be changed as below.   
 
 1.Check for Application priority. If priority is available, then return 
 the highest priority job.
 2.Otherwise continue with existing logic such as App ID comparison and 
 then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-3228) Deadlock altering user resource queue

2015-02-19 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-3228.
---
Resolution: Incomplete

Not sure how/why this is related to Hadoop. In any case, please first try to 
resolve user issues in the user mailing lists 
(http://hadoop.apache.org/mailing_lists.html).

The JIRA is a place to address existing bugs/new features in the project. 
Closing this for now. Thanks.

 Deadlock altering user resource queue
 -

 Key: YARN-3228
 URL: https://issues.apache.org/jira/browse/YARN-3228
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager, scheduler
Affects Versions: 2.0.1-alpha
 Environment: hadoop yarn, postgresql 
Reporter: Christian Hott
Priority: Blocker
  Labels: newbie
   Original Estimate: 203h
  Remaining Estimate: 203h

 let me introduce you with my problem:
 all of this began after we created some resources queues on postgresql,
 well we created it, assign it to the users and all was fine...
 until we run a process (a large one iterative query) and I do an Alter Role 
 over the user and the resource queue that he was using, before that I can't 
 login whit the user and got a message saying deadlock detection, locking 
 against self
 does you have any idea why this for? or if have any comprensible log in to I 
 can search for more information?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3227) Timeline renew delegation token fails when RM user's TGT is expired

2015-02-19 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327798#comment-14327798
 ] 

Vinod Kumar Vavilapalli commented on YARN-3227:
---

Is it only the Timeline delegation token that fails renewal or all the tokens?

 Timeline renew delegation token fails when RM user's TGT is expired
 ---

 Key: YARN-3227
 URL: https://issues.apache.org/jira/browse/YARN-3227
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Priority: Critical

 When the RM user's kerberos TGT is expired, the RM renew delegation token 
 operation fails as part of job submission. Expected behavior is that RM will 
 relogin to get a new TGT.
 {quote}
 2015-02-06 18:54:05,617 [DelegationTokenRenewer #25954] WARN
 security.DelegationTokenRenewer: Unable to add the application to the
 delegation token renewer.
 java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN,
 Service: timelineserver.example.com:4080, Ident: (owner=user,
 renewer=rmuser, realUser=oozie, issueDate=1423248845528,
 maxDate=1423853645528, sequenceNumber=9716, masterKeyId=9)
 at
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:443)
 at
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$800(DelegationTokenRenewer.java:77)
 at
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:808)
 at
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:789)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: java.io.IOException: HTTP status [401], message [Unauthorized]
 at
 org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:169)
 at
 org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:286)
 at
 org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:211)
 at
 org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:414)
 at
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:374)
 at
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:360)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
 at
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$4.run(TimelineClientImpl.java:429)
 at
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:161)
 at
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:444)
 at
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:378)
 at
 org.apache.hadoop.yarn.security.client.TimelineDelegationTokenIdentifier$Renewer.renew(TimelineDelegationTokenIdentifier.java:81)
 at org.apache.hadoop.security.token.Token.renew(Token.java:377)
 at
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:532)
 at
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:529)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI


[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327807#comment-14327807
 ] 

Junping Du commented on YARN-3225:
--

Thanks [~sunilg] for the comments! Yes. I mean mradmin command line. I think it 
could be better to pass a timeout with adding a parameter something like -t. 
Without this parameter, it will decommission node forcefully just like the old. 
Thoughts?

 New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
 ---

 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du

 New CLI (or existing CLI with parameters) should put each node on 
 decommission list to decommissioning status and track timeout to terminate 
 the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3224) Notify AM with containers (on decommissioning node) could be preempted after timeout.


[ 
https://issues.apache.org/jira/browse/YARN-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327811#comment-14327811
 ] 

Junping Du commented on YARN-3224:
--

Sure. Please go ahead to take on this JIRA. Thanks [~sunilg]!

 Notify AM with containers (on decommissioning node) could be preempted after 
 timeout.
 -

 Key: YARN-3224
 URL: https://issues.apache.org/jira/browse/YARN-3224
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3204) Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)


[ 
https://issues.apache.org/jira/browse/YARN-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327816#comment-14327816
 ] 

Hadoop QA commented on YARN-3204:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699701/YARN-3204-002.patch
  against trunk revision 2fd02af.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6669//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6669//console

This message is automatically generated.

 Fix new findbug warnings 
 inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
 -

 Key: YARN-3204
 URL: https://issues.apache.org/jira/browse/YARN-3204
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3204-001.patch, YARN-3204-002.patch


 Please check following findbug report..
 https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3204) Fix new findbug warnings in hadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)


 [ 
https://issues.apache.org/jira/browse/YARN-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3204:
---
Summary: Fix new findbug warnings in 
hadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)  (was: Fix 
new findbug warnings 
inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair))

 Fix new findbug warnings in 
 hadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
 --

 Key: YARN-3204
 URL: https://issues.apache.org/jira/browse/YARN-3204
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3204-001.patch, YARN-3204-002.patch


 Please check following findbug report..
 https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3230) Clarify application states on the web UI


[ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328264#comment-14328264
 ] 

Wangda Tan commented on YARN-3230:
--

[~jianhe], thanks for working on this, generally looks good to me, some minor 
comments:
1) FinalStatus from Application's POV: to Final State Reported by 
Application Master?
2) NEW_SAVING: is not necessary to be seen by client?
3) RUNNING: AM container has registered to RM and started running.

Wangda

 Clarify application states on the web UI
 

 Key: YARN-3230
 URL: https://issues.apache.org/jira/browse/YARN-3230
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3230.1.patch, YARN-3230.2.patch, application 
 page.png


 Today, application state are simply surfaced as a single word on the web UI. 
 Not everyone understands the meaning of NEW_SAVING, SUBMITTED, ACCEPTED. 
 This jira is to clarify the meaning of these states, things like what the 
 application is waiting for at this state. 
 In addition,the difference between application state and FinalStatus are 
 fairly confusing to users, especially when state=FINISHED, but 
 FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3230) Clarify application states on the web UI


 [ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3230:
--
Attachment: (was: application page.png)

 Clarify application states on the web UI
 

 Key: YARN-3230
 URL: https://issues.apache.org/jira/browse/YARN-3230
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3230.1.patch, YARN-3230.2.patch


 Today, application state are simply surfaced as a single word on the web UI. 
 Not everyone understands the meaning of NEW_SAVING, SUBMITTED, ACCEPTED. 
 This jira is to clarify the meaning of these states, things like what the 
 application is waiting for at this state. 
 In addition,the difference between application state and FinalStatus are 
 fairly confusing to users, especially when state=FINISHED, but 
 FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck

2015-02-19 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3231:
---
Priority: Critical  (was: Major)

 FairScheduler changing queueMaxRunningApps on the fly will cause all pending 
 job stuck
 --

 Key: YARN-3231
 URL: https://issues.apache.org/jira/browse/YARN-3231
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-3231.v1.patch


 When a queue is piling up with a lot of pending jobs due to the 
 maxRunningApps limit. We want to increase this property on the fly to make 
 some of the pending job active. However, once we increase the limit, all 
 pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication