date:20140616


[ 
https://issues.apache.org/jira/browse/YARN-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032167#comment-14032167
 ] 

Hadoop QA commented on YARN-2032:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12644451/YARN-2032-branch-2-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3992//console

This message is automatically generated.

 Implement a scalable, available TimelineStore using HBase
 -

 Key: YARN-2032
 URL: https://issues.apache.org/jira/browse/YARN-2032
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2032-branch-2-1.patch


 As discussed on YARN-1530, we should pursue implementing a scalable, 
 available Timeline store using HBase.
 One goal is to reuse most of the code from the levelDB Based store - 
 YARN-1635.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status

2014-06-16 Thread anders (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anders updated YARN-2142:
-

Attachment: trust.patch

This is patch file is base on the version 2.2.0;
On my computer it can work, if you have any question ,please tell me.
On the webUI ,the function was not completed(but it seems work well).

 Add one service to check the nodes' TRUST status 
 -

 Key: YARN-2142
 URL: https://issues.apache.org/jira/browse/YARN-2142
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager, scheduler
Affects Versions: 2.2.0
 Environment: OS:Ubuntu 13.04; 
 JAVA:OpenJDK 7u51-2.4.4-0
Reporter: anders
Priority: Minor
  Labels: patch
 Fix For: 2.2.0

 Attachments: trust.patch

   Original Estimate: 1m
  Remaining Estimate: 1m

 Because of critical computing environment ,we must test every node's TRUST 
 status in the cluster (We can get the TRUST status by the API of OAT 
 sever),So I add this feature into hadoop's schedule .
 By the TRUST check service ,node can get the TRUST status of itself,
 then through the heartbeat ,send the TRUST status to resource manager for 
 scheduling.
 In the scheduling step,if the node's TRUST status is 'false', it will be 
 abandoned until it's TRUST status turn to 'true'.
 ***The logic of this feature is similar to node's healthcheckservice.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2142) Add one service to check the nodes' TRUST status

[
https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032173#comment-14032173
]

Hadoop QA commented on YARN-2142:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12650522/trust.patch
against trunk revision .

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3993//console

This message is automatically generated.

Add one service to check the nodes' TRUST status
-

Key: YARN-2142
URL: https://issues.apache.org/jira/browse/YARN-2142
Project: Hadoop YARN
Issue Type: New Feature
Components: nodemanager, resourcemanager, scheduler
Affects Versions: 2.2.0
Environment: OS:Ubuntu 13.04;
JAVA:OpenJDK 7u51-2.4.4-0
Reporter: anders
Priority: Minor
Labels: patch
Fix For: 2.2.0

Attachments: trust.patch

Original Estimate: 1m
Remaining Estimate: 1m

Because of critical computing environment ,we must test every node's TRUST
status in the cluster (We can get the TRUST status by the API of OAT
sever),So I add this feature into hadoop's schedule .
By the TRUST check service ,node can get the TRUST status of itself,
then through the heartbeat ,send the TRUST status to resource manager for
scheduling.
In the scheduling step,if the node's TRUST status is 'false', it will be
abandoned until it's TRUST status turn to 'true'.
***The logic of this feature is similar to node's healthcheckservice.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2161) Fix build on macosx: YARN parts


[ 
https://issues.apache.org/jira/browse/YARN-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032175#comment-14032175
 ] 

Hadoop QA commented on YARN-2161:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650514/YARN-2161.v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3991//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3991//console

This message is automatically generated.

 Fix build on macosx: YARN parts
 ---

 Key: YARN-2161
 URL: https://issues.apache.org/jira/browse/YARN-2161
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: YARN-2161.v1.patch


 When compiling on macosx with -Pnative, there are several warning and errors, 
 fix this would help hadoop developers with macosx env. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1782) CLI should let users to query cluster metrics

2014-06-16 Thread Kenji Kikushima (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenji Kikushima updated YARN-1782:
--

Attachment: YARN-1782.patch

Attached a patch.
This patch introduces yarn metrics -status command which outputs like this.
{noformat}
$ yarn metrics -status
14/06/16 17:13:47 INFO client.RMProxy: Connecting to ResourceManager at 
/0.0.0.0:8032
14/06/16 17:13:48 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
Cluster Metrics :
appsSubmmitted : 2
appsCompleted : 1
appsPending : 0
appsRunning : 1
appsFailed : 0
appsKilled : 0
reservedMB : 0
availableMB : 3072
allocatedMB : 5120
totalMB : 8192
containersAllocated : 4
containersReserved : 0
containersPending : 0
totalNodes : 1
activeNodes : 1
lostNodes : 0
unhealthyNodes : 0
decommissionedNodes : 0
rebootedNodes : 0

{noformat}

 CLI should let users to query cluster metrics
 -

 Key: YARN-1782
 URL: https://issues.apache.org/jira/browse/YARN-1782
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
 Attachments: YARN-1782.patch


 Like RM webUI and RESTful services, YARN CLI should also enable users to 
 query the cluster metrics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-1782) CLI should let users to query cluster metrics

2014-06-16 Thread Kenji Kikushima (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenji Kikushima reassigned YARN-1782:
-

Assignee: Kenji Kikushima

 CLI should let users to query cluster metrics
 -

 Key: YARN-1782
 URL: https://issues.apache.org/jira/browse/YARN-1782
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Zhijie Shen
Assignee: Kenji Kikushima
 Attachments: YARN-1782.patch


 Like RM webUI and RESTful services, YARN CLI should also enable users to 
 query the cluster metrics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2032) Implement a scalable, available TimelineStore using HBase

[
https://issues.apache.org/jira/browse/YARN-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032226#comment-14032226
]

Zhijie Shen commented on YARN-2032:
---

[~mayank_bansal], thanks for the patch! Here're some quick comment after the
first glance.

1. Why does this patch target branch-2 instead of trunk?

2. The packages of the newly added classes need to be updated after YARN-2107.

3. TTL configure should be put in YarnConfiguration. Another concern is that
the data retention policy is different between Hbase and Leveldb. In Leveldb,
we determine whether an entity is old enough according to TTL, and then delete
it as well as its events. However, in HBase impl, it seems that deletion
depends on each column family's TTL individually. In this case, it is possible
that the entity is deleted, but its events (or part of them) are still there.

4. fromId and fromTs not implemented seems not be implemented yet.

5. Why do ENTITY_TABLE and INDEX_TABLE have the same schema? If I remember it
correctly, we only index against the primary filters only.

6. Query parameters need to fully function, such as the secondary filters.

Implement a scalable, available TimelineStore using HBase
-

Key: YARN-2032
URL: https://issues.apache.org/jira/browse/YARN-2032
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
Attachments: YARN-2032-branch-2-1.patch

As discussed on YARN-1530, we should pursue implementing a scalable,
available Timeline store using HBase.
One goal is to reuse most of the code from the levelDB Based store -
YARN-1635.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1564) add some basic workflow YARN services

2014-06-16 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032234#comment-14032234
 ] 

Tsuyoshi OZAWA commented on YARN-1564:
--

Resubmitted for kicking Jenkins CI.

 add some basic workflow YARN services
 -

 Key: YARN-1564
 URL: https://issues.apache.org/jira/browse/YARN-1564
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api
Affects Versions: 2.4.0
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Attachments: YARN-1564-001.patch

   Original Estimate: 24h
  Time Spent: 48h
  Remaining Estimate: 0h

 I've been using some alternative composite services to help build workflows 
 of process execution in a YARN AM.
 They and their tests could be moved in YARN for the use by others -this would 
 make it easier to build aggregate services in an AM



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1782) CLI should let users to query cluster metrics


[ 
https://issues.apache.org/jira/browse/YARN-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032253#comment-14032253
 ] 

Hadoop QA commented on YARN-1782:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650527/YARN-1782.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3994//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3994//console

This message is automatically generated.

 CLI should let users to query cluster metrics
 -

 Key: YARN-1782
 URL: https://issues.apache.org/jira/browse/YARN-1782
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Zhijie Shen
Assignee: Kenji Kikushima
 Attachments: YARN-1782.patch


 Like RM webUI and RESTful services, YARN CLI should also enable users to 
 query the cluster metrics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2144) Add logs when preemption occurs


 [ 
https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2144:
-

Attachment: AM-page-preemption-info.png

 Add logs when preemption occurs
 ---

 Key: YARN-2144
 URL: https://issues.apache.org/jira/browse/YARN-2144
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.5.0
Reporter: Tassapol Athiapinya
Assignee: Wangda Tan
 Attachments: AM-page-preemption-info.png, YARN-2144.patch


 There should be easy-to-read logs when preemption does occur. 
 1. For debugging purpose, RM should log this.
 2. For administrative purpose, RM webpage should have a page to show recent 
 preemption events.
 RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2144) Add logs when preemption occurs

[
https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wangda Tan updated YARN-2144:
-

Attachment: YARN-2144.patch

I’ve attached a patch contains changes to show preemption information on RM app
page and RM log,

1) log style:
{code}
2014-06-16 10:45:22,247 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
Non-AM container preempted, appId=appattempt_1402886643897_0002_01,
containerId=container_1402886643897_0002_01_04
{code}
{code}
2014-06-16 10:45:22,247 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
AM container preempted, appId=appattempt_1402886643897_0002_01,
containerId=container_1402886643897_0002_01_01
{code}

2) Info in app page:
See AM-page-preemption-info.jpg

Not Included,
1) Persist preemption info across RM restart/HA.
2) FairScheduler related changes to show preemption info on RM app page is not
covered in this patch.

Any feedback is welcome!
Thanks

Add logs when preemption occurs
---

Key: YARN-2144
URL: https://issues.apache.org/jira/browse/YARN-2144
Project: Hadoop YARN
Issue Type: Improvement
Components: capacityscheduler
Affects Versions: 2.5.0
Reporter: Tassapol Athiapinya
Assignee: Wangda Tan
Attachments: AM-page-preemption-info.png, YARN-2144.patch

There should be easy-to-read logs when preemption does occur.
1. For debugging purpose, RM should log this.
2. For administrative purpose, RM webpage should have a page to show recent
preemption events.
RM logs should have following properties:
* Logs are retrievable when an application is still running and often flushed.
* Can distinguish between AM container preemption and task container
preemption with container ID shown.
* Should be INFO level log.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2144) Add logs when preemption occurs


[ 
https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032287#comment-14032287
 ] 

Hadoop QA commented on YARN-2144:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12650540/AM-page-preemption-info.png
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3995//console

This message is automatically generated.

 Add logs when preemption occurs
 ---

 Key: YARN-2144
 URL: https://issues.apache.org/jira/browse/YARN-2144
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.5.0
Reporter: Tassapol Athiapinya
Assignee: Wangda Tan
 Attachments: AM-page-preemption-info.png, YARN-2144.patch


 There should be easy-to-read logs when preemption does occur. 
 1. For debugging purpose, RM should log this.
 2. For administrative purpose, RM webpage should have a page to show recent 
 preemption events.
 RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2144) Add logs when preemption occurs


 [ 
https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2144:
-

Attachment: YARN-2144.patch

Re-add patch to trigger jenkins building.

 Add logs when preemption occurs
 ---

 Key: YARN-2144
 URL: https://issues.apache.org/jira/browse/YARN-2144
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.5.0
Reporter: Tassapol Athiapinya
Assignee: Wangda Tan
 Attachments: AM-page-preemption-info.png, YARN-2144.patch, 
 YARN-2144.patch


 There should be easy-to-read logs when preemption does occur. 
 1. For debugging purpose, RM should log this.
 2. For administrative purpose, RM webpage should have a page to show recent 
 preemption events.
 RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2163) WebUI: AppId should be treated as text when sort by AppId in Applications table

Wangda Tan created YARN-2163:


 Summary: WebUI: AppId should be treated as text when sort by AppId 
in Applications table
 Key: YARN-2163
 URL: https://issues.apache.org/jira/browse/YARN-2163
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Priority: Minor


Currently, AppId is treated as numeric, so the sort result in applications 
table is sorted by id (not included cluster timestamp), see attached 
screenshot. This is incorrect when there're multiple cluster timestamp exists.
The AppId should be treated as text, we need sort AppId alphabetized.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2163) WebUI: AppId should be treated as text when sort by AppId in Applications table


 [ 
https://issues.apache.org/jira/browse/YARN-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2163:
-

Description: Currently, AppId is treated as numeric, so the sort result in 
applications table is sorted by int typed id only (not included cluster 
timestamp), see attached screenshot. Order of AppId in web page should be 
consistent with ApplicationId.compareTo().  (was: Currently, AppId is treated 
as numeric, so the sort result in applications table is sorted by id (not 
included cluster timestamp), see attached screenshot. This is incorrect when 
there're multiple cluster timestamp exists.
The AppId should be treated as text, we need sort AppId alphabetized.)

 WebUI: AppId should be treated as text when sort by AppId in Applications 
 table
 ---

 Key: YARN-2163
 URL: https://issues.apache.org/jira/browse/YARN-2163
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Priority: Minor

 Currently, AppId is treated as numeric, so the sort result in applications 
 table is sorted by int typed id only (not included cluster timestamp), see 
 attached screenshot. Order of AppId in web page should be consistent with 
 ApplicationId.compareTo().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2163) WebUI: Order of AppId in apps table should be consistent with ApplicationId.compareTo().


 [ 
https://issues.apache.org/jira/browse/YARN-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2163:
-

Summary: WebUI: Order of AppId in apps table should be consistent with 
ApplicationId.compareTo().  (was: WebUI: AppId should be treated as text when 
sort by AppId in Applications table)

 WebUI: Order of AppId in apps table should be consistent with 
 ApplicationId.compareTo().
 

 Key: YARN-2163
 URL: https://issues.apache.org/jira/browse/YARN-2163
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Priority: Minor

 Currently, AppId is treated as numeric, so the sort result in applications 
 table is sorted by int typed id only (not included cluster timestamp), see 
 attached screenshot. Order of AppId in web page should be consistent with 
 ApplicationId.compareTo().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2163) WebUI: Order of AppId in apps table should be consistent with ApplicationId.compareTo().


 [ 
https://issues.apache.org/jira/browse/YARN-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2163:
-

Attachment: YARN-2163.patch

 WebUI: Order of AppId in apps table should be consistent with 
 ApplicationId.compareTo().
 

 Key: YARN-2163
 URL: https://issues.apache.org/jira/browse/YARN-2163
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Priority: Minor
 Attachments: YARN-2163.patch


 Currently, AppId is treated as numeric, so the sort result in applications 
 table is sorted by int typed id only (not included cluster timestamp), see 
 attached screenshot. Order of AppId in web page should be consistent with 
 ApplicationId.compareTo().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2163) WebUI: Order of AppId in apps table should be consistent with ApplicationId.compareTo().


 [ 
https://issues.apache.org/jira/browse/YARN-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2163:
-

Attachment: apps page.png

Attached screenshot of apps table and a simple fix of it.

 WebUI: Order of AppId in apps table should be consistent with 
 ApplicationId.compareTo().
 

 Key: YARN-2163
 URL: https://issues.apache.org/jira/browse/YARN-2163
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Priority: Minor
 Attachments: YARN-2163.patch, apps page.png


 Currently, AppId is treated as numeric, so the sort result in applications 
 table is sorted by int typed id only (not included cluster timestamp), see 
 attached screenshot. Order of AppId in web page should be consistent with 
 ApplicationId.compareTo().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2144) Add logs when preemption occurs


[ 
https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032333#comment-14032333
 ] 

Hadoop QA commented on YARN-2144:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650541/YARN-2144.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestChildQueueOrder
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebApp

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3996//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3996//console

This message is automatically generated.

 Add logs when preemption occurs
 ---

 Key: YARN-2144
 URL: https://issues.apache.org/jira/browse/YARN-2144
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.5.0
Reporter: Tassapol Athiapinya
Assignee: Wangda Tan
 Attachments: AM-page-preemption-info.png, YARN-2144.patch, 
 YARN-2144.patch


 There should be easy-to-read logs when preemption does occur. 
 1. For debugging purpose, RM should log this.
 2. For administrative purpose, RM webpage should have a page to show recent 
 preemption events.
 RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2147) client lacks delegation token exception details when application submit fails

2014-06-16 Thread Chen He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-2147:
--

Attachment: YARN-2147-v2.patch

 client lacks delegation token exception details when application submit fails
 -

 Key: YARN-2147
 URL: https://issues.apache.org/jira/browse/YARN-2147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor
 Attachments: YARN-2147-v2.patch, YARN-2147-v2.patch, YARN-2147.patch


 When an client submits an application and the delegation token process fails 
 the client can lack critical details needed to understand the nature of the 
 error.  Only the message of the error exception is conveyed to the client, 
 which sometimes isn't enough to debug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2147) client lacks delegation token exception details when application submit fails

2014-06-16 Thread Chen He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032441#comment-14032441
 ] 

Chen He commented on YARN-2147:
---

Thank you for the comment, [~ozawa]. Patch updated.

 client lacks delegation token exception details when application submit fails
 -

 Key: YARN-2147
 URL: https://issues.apache.org/jira/browse/YARN-2147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor
 Attachments: YARN-2147-v2.patch, YARN-2147-v2.patch, YARN-2147.patch


 When an client submits an application and the delegation token process fails 
 the client can lack critical details needed to understand the nature of the 
 error.  Only the message of the error exception is conveyed to the client, 
 which sometimes isn't enough to debug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2147) client lacks delegation token exception details when application submit fails

2014-06-16 Thread Chen He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-2147:
--

Attachment: (was: YARN-2147-v2.patch)

 client lacks delegation token exception details when application submit fails
 -

 Key: YARN-2147
 URL: https://issues.apache.org/jira/browse/YARN-2147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor
 Attachments: YARN-2147-v2.patch, YARN-2147.patch


 When an client submits an application and the delegation token process fails 
 the client can lack critical details needed to understand the nature of the 
 error.  Only the message of the error exception is conveyed to the client, 
 which sometimes isn't enough to debug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2144) Add logs when preemption occurs


 [ 
https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2144:
-

Attachment: YARN-2144.patch

Attached new patch fixed test failures.

 Add logs when preemption occurs
 ---

 Key: YARN-2144
 URL: https://issues.apache.org/jira/browse/YARN-2144
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.5.0
Reporter: Tassapol Athiapinya
Assignee: Wangda Tan
 Attachments: AM-page-preemption-info.png, YARN-2144.patch, 
 YARN-2144.patch, YARN-2144.patch


 There should be easy-to-read logs when preemption does occur. 
 1. For debugging purpose, RM should log this.
 2. For administrative purpose, RM webpage should have a page to show recent 
 preemption events.
 RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1769) CapacityScheduler: Improve reservations

2014-06-16 Thread Thomas Graves (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Thomas Graves updated YARN-1769:

Attachment: YARN-1769.patch

fix patch . I generated it from the wrong directory.

CapacityScheduler: Improve reservations

Key: YARN-1769
URL: https://issues.apache.org/jira/browse/YARN-1769
Project: Hadoop YARN
Issue Type: Improvement
Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch,
YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch,
YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch

Currently the CapacityScheduler uses reservations in order to handle requests
for large containers and the fact there might not currently be enough space
available on a single host.
The current algorithm for reservations is to reserve as many containers as
currently required and then it will start to reserve more above that after a
certain number of re-reservations (currently biased against larger
containers). Anytime it hits the limit of number reserved it stops looking
at any other nodes. This results in potentially missing nodes that have
enough space to fullfill the request.
The other place for improvement is currently reservations count against your
queue capacity. If you have reservations you could hit the various limits
which would then stop you from looking further at that node.
The above 2 cases can cause an application requesting a larger container to
take a long time to gets it resources.
We could improve upon both of those by simply continuing to look at incoming
nodes to see if we could potentially swap out a reservation for an actual
allocation.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2164) Add switch 'restart' for yarn-daemon.sh

2014-06-16 Thread Jun Gong (JIRA)

Jun Gong created YARN-2164:
--

 Summary: Add switch 'restart'  for yarn-daemon.sh
 Key: YARN-2164
 URL: https://issues.apache.org/jira/browse/YARN-2164
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jun Gong
Priority: Minor


For convenience, add an switch 'restart' for yarn-daemon.sh. 

e.g. We could use yarn-daemon.sh restart nodemanager  instead of 
yarn-daemon.sh stop nodemanager;  yarn-daemon.sh start nodemanager.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2144) Add logs when preemption occurs

2014-06-16 Thread Tassapol Athiapinya (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032603#comment-14032603
]

Tassapol Athiapinya commented on YARN-2144:
---

[~leftnoteasy] Can you please clarify me on these points?
- In AM page, Does Resource Preempted from Current Attempt mean Total
Resource Preempted from Latest AM attempt? Can it show only data point from
current (is it latest?) attempt?
- Can you change #Container Preempted from Current Attempt: to Number of
Containers Preempted from Current(Latest) Attempt? # syntax maybe hard to
comprehend for wider group of user.

Add logs when preemption occurs
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-853) maximum-am-resource-percent doesn't work after refreshQueues command

2014-06-16 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-853:


Fix Version/s: 0.23.11

Thanks, Deveraj!  I committed this to branch-0.23 as well.

 maximum-am-resource-percent doesn't work after refreshQueues command
 

 Key: YARN-853
 URL: https://issues.apache.org/jira/browse/YARN-853
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.1.0-beta, 2.0.5-alpha
Reporter: Devaraj K
Assignee: Devaraj K
 Fix For: 2.1.0-beta, 0.23.11

 Attachments: YARN-853-1.patch, YARN-853-2.patch, YARN-853-3.patch, 
 YARN-853-4.patch, YARN-853.patch


 If we update yarn.scheduler.capacity.maximum-am-resource-percent / 
 yarn.scheduler.capacity.queue-path.maximum-am-resource-percent 
 configuration and then do the refreshNodes, it uses the new config value to 
 calculate Max Active Applications and Max Active Application Per User. If we 
 add new node after issuing  'rmadmin -refreshQueues' command, it uses the old 
 maximum-am-resource-percent config value to calculate Max Active Applications 
 and Max Active Application Per User. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2164) Add switch 'restart' for yarn-daemon.sh

2014-06-16 Thread Jun Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-2164:
---

Attachment: YARN-2164.patch

 Add switch 'restart'  for yarn-daemon.sh
 

 Key: YARN-2164
 URL: https://issues.apache.org/jira/browse/YARN-2164
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jun Gong
Priority: Minor
 Attachments: YARN-2164.patch


 For convenience, add an switch 'restart' for yarn-daemon.sh. 
 e.g. We could use yarn-daemon.sh restart nodemanager  instead of 
 yarn-daemon.sh stop nodemanager;  yarn-daemon.sh start nodemanager.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2144) Add logs when preemption occurs


[ 
https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032611#comment-14032611
 ] 

Hadoop QA commented on YARN-2144:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650581/YARN-2144.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3998//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3998//console

This message is automatically generated.

 Add logs when preemption occurs
 ---

 Key: YARN-2144
 URL: https://issues.apache.org/jira/browse/YARN-2144
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.5.0
Reporter: Tassapol Athiapinya
Assignee: Wangda Tan
 Attachments: AM-page-preemption-info.png, YARN-2144.patch, 
 YARN-2144.patch, YARN-2144.patch


 There should be easy-to-read logs when preemption does occur. 
 1. For debugging purpose, RM should log this.
 2. For administrative purpose, RM webpage should have a page to show recent 
 preemption events.
 RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations

[
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032646#comment-14032646
]

Hadoop QA commented on YARN-1769:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12650590/YARN-1769.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 5 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:red}-1 findbugs{color}. The patch appears to introduce 2 new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/3999//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-YARN-Build/3999//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3999//console

This message is automatically generated.

CapacityScheduler: Improve reservations

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1769) CapacityScheduler: Improve reservations

2014-06-16 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-1769:


Attachment: YARN-1769.patch

 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
 YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
 YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
 YARN-1769.patch, YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2165) Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero

2014-06-16 Thread Karam Singh (JIRA)

Karam Singh created YARN-2165:
-

 Summary: Timelineserver should validate that 
yarn.timeline-service.ttl-ms is greater than zero
 Key: YARN-2165
 URL: https://issues.apache.org/jira/browse/YARN-2165
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Karam Singh


Timelineserver should validate that yarn.timeline-service.ttl-ms is greater 
than zero

Currently if set yarn.timeline-service.ttl-ms=0 
Or yarn.timeline-service.ttl-ms=-86400 

Timeline server start successfully with complaining
{code}
2014-06-15 14:52:16,562 INFO  timeline.LeveldbTimelineStore 
(LeveldbTimelineStore.java:init(247)) - Starting deletion thread with ttl 
-60480 and cycle interval 30
{code}
At starting timelinserver should that yarn.timeline-service-ttl-ms  0
otherwise specially for -ive value discard oldvalues timestamp will be set 
future value. Which may lead to inconsistancy in behavior 
{code}
public void run() {
  while (true) {
long timestamp = System.currentTimeMillis() - ttl;
try {
  discardOldEntities(timestamp);
  Thread.sleep(ttlInterval);
{code}




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations


[ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032821#comment-14032821
 ] 

Hadoop QA commented on YARN-1769:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650615/YARN-1769.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4001//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4001//console

This message is automatically generated.

 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
 YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
 YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
 YARN-1769.patch, YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2166) Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store

2014-06-16 Thread Karam Singh (JIRA)

Karam Singh created YARN-2166:
-

 Summary: Timelineserver should validate that 
yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than 
zero when level db is for timeline store
 Key: YARN-2166
 URL: https://issues.apache.org/jira/browse/YARN-2166
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Karam Singh


Timelineserver should validate that 
yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than 
zero when level db is for timeline store

other if we start timelineserver with
yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms=-5000

Timeline starts but Thread.sleep call in EntityDeletionThread.run keep on 
throwing UnCaughtExcpetion -ive value
{code}
2014-06-16 10:22:03,537 ERROR yarn.YarnUncaughtExceptionHandler 
(YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread 
Thread[Thread-4,5,main] threw an Exception.
java.lang.IllegalArgumentException: timeout value is negative
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore$EntityDeletionThread.run(LeveldbTimelineStore.java:257)

{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2166) Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store

2014-06-16 Thread Karam Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karam Singh updated YARN-2166:
--

Description: 
Timelineserver should validate that 
yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than 
zero when level db is for timeline store

other if we start timelineserver with
yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms=-5000

Timeline starts but Thread.sleep call in EntityDeletionThread.run keep on 
throwing UncaughtException -ive value
{code}
2014-06-16 10:22:03,537 ERROR yarn.YarnUncaughtExceptionHandler 
(YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread 
Thread[Thread-4,5,main] threw an Exception.
java.lang.IllegalArgumentException: timeout value is negative
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore$EntityDeletionThread.run(LeveldbTimelineStore.java:257)

{code}

  was:
Timelineserver should validate that 
yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than 
zero when level db is for timeline store

other if we start timelineserver with
yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms=-5000

Timeline starts but Thread.sleep call in EntityDeletionThread.run keep on 
throwing UnCaughtExcpetion -ive value
{code}
2014-06-16 10:22:03,537 ERROR yarn.YarnUncaughtExceptionHandler 
(YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread 
Thread[Thread-4,5,main] threw an Exception.
java.lang.IllegalArgumentException: timeout value is negative
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore$EntityDeletionThread.run(LeveldbTimelineStore.java:257)

{code}


 Timelineserver should validate that 
 yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than 
 zero when level db is for timeline store
 -

 Key: YARN-2166
 URL: https://issues.apache.org/jira/browse/YARN-2166
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Karam Singh

 Timelineserver should validate that 
 yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than 
 zero when level db is for timeline store
 other if we start timelineserver with
 yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms=-5000
 Timeline starts but Thread.sleep call in EntityDeletionThread.run keep on 
 throwing UncaughtException -ive value
 {code}
 2014-06-16 10:22:03,537 ERROR yarn.YarnUncaughtExceptionHandler 
 (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread 
 Thread[Thread-4,5,main] threw an Exception.
 java.lang.IllegalArgumentException: timeout value is negative
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore$EntityDeletionThread.run(LeveldbTimelineStore.java:257)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1898) Standby RM's conf, stacks, logLevel, metrics, jmx and logs links are redirecting to Active RM

2014-06-16 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032909#comment-14032909
 ] 

Karthik Kambatla commented on YARN-1898:


I agree with Robert here. We wouldn't be able to track metrics and jmx of the 
Standby RM if we redirect them. [~xgong], [~acmurthy] - what do you think? 

 Standby RM's conf, stacks, logLevel, metrics, jmx and logs links are 
 redirecting to Active RM
 -

 Key: YARN-1898
 URL: https://issues.apache.org/jira/browse/YARN-1898
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Yesha Vora
Assignee: Xuan Gong
 Fix For: 2.4.1

 Attachments: YARN-1898.1.patch, YARN-1898.2.patch, YARN-1898.3.patch, 
 YARN-1898.addendum.patch, YARN-1898.addendum.patch


 Standby RM links /conf, /stacks, /logLevel, /metrics, /jmx is redirected to 
 Active RM.
 It should not be redirected to Active RM



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2159) allocateContainer() in SchedulerNode needs a clearer LOG.info message

2014-06-16 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032913#comment-14032913
 ] 

Karthik Kambatla commented on YARN-2159:


+1. Committing this.

 allocateContainer() in SchedulerNode needs a clearer LOG.info message
 -

 Key: YARN-2159
 URL: https://issues.apache.org/jira/browse/YARN-2159
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Trivial
  Labels: newbie, supportability
 Attachments: YARN2159-01.patch


 This bit of code:
 {quote}
 LOG.info(Assigned container  + container.getId() +  of capacity 
 + container.getResource() +  on host  + rmNode.getNodeAddress()
 + , which currently has  + numContainers +  containers, 
 + getUsedResource() +  used and  + getAvailableResource()
 +  available);
 {quote}
 results in a line like:
 {quote}
 2014-05-30 16:17:43,573 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: 
 Assigned container container_14000_0009_01_00 of capacity 
 memory:1536, vCores:1 on host machine.host.domain.com:8041, which currently 
 has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 
 available
 {quote}
 That message is fine in most cases, but looks pretty bad after the last 
 available allocation, since it says something like vCores:0 available.
 Here is one suggested phrasing
   - which has 18 containers, memory:27648, vCores:18 used and 
 memory:3072, vCores:0 available after allocation



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-16 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032918#comment-14032918
 ] 

Mayank Bansal commented on YARN-2022:
-

HI [~sunilg] 

Thanks for the patch.

Overall looks ok however I think we need to add the test case for AM percentage 
per queue as well.

Thanks,
Mayank 

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2159) Better logging in SchedulerNode#allocateContainer

2014-06-16 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2159:
---

Summary: Better logging in SchedulerNode#allocateContainer  (was: 
allocateContainer() in SchedulerNode needs a clearer LOG.info message)

 Better logging in SchedulerNode#allocateContainer
 -

 Key: YARN-2159
 URL: https://issues.apache.org/jira/browse/YARN-2159
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Trivial
  Labels: newbie, supportability
 Attachments: YARN2159-01.patch


 This bit of code:
 {quote}
 LOG.info(Assigned container  + container.getId() +  of capacity 
 + container.getResource() +  on host  + rmNode.getNodeAddress()
 + , which currently has  + numContainers +  containers, 
 + getUsedResource() +  used and  + getAvailableResource()
 +  available);
 {quote}
 results in a line like:
 {quote}
 2014-05-30 16:17:43,573 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: 
 Assigned container container_14000_0009_01_00 of capacity 
 memory:1536, vCores:1 on host machine.host.domain.com:8041, which currently 
 has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 
 available
 {quote}
 That message is fine in most cases, but looks pretty bad after the last 
 available allocation, since it says something like vCores:0 available.
 Here is one suggested phrasing
   - which has 18 containers, memory:27648, vCores:18 used and 
 memory:3072, vCores:0 available after allocation



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2159) Better logging in SchedulerNode#allocateContainer


[ 
https://issues.apache.org/jira/browse/YARN-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032930#comment-14032930
 ] 

Hudson commented on YARN-2159:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5712 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5712/])
YARN-2159. Better logging in SchedulerNode#allocateContainer. (Ray Chiang via 
kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603003)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java


 Better logging in SchedulerNode#allocateContainer
 -

 Key: YARN-2159
 URL: https://issues.apache.org/jira/browse/YARN-2159
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Trivial
  Labels: newbie, supportability
 Attachments: YARN2159-01.patch


 This bit of code:
 {quote}
 LOG.info(Assigned container  + container.getId() +  of capacity 
 + container.getResource() +  on host  + rmNode.getNodeAddress()
 + , which currently has  + numContainers +  containers, 
 + getUsedResource() +  used and  + getAvailableResource()
 +  available);
 {quote}
 results in a line like:
 {quote}
 2014-05-30 16:17:43,573 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: 
 Assigned container container_14000_0009_01_00 of capacity 
 memory:1536, vCores:1 on host machine.host.domain.com:8041, which currently 
 has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 
 available
 {quote}
 That message is fine in most cases, but looks pretty bad after the last 
 available allocation, since it says something like vCores:0 available.
 Here is one suggested phrasing
   - which has 18 containers, memory:27648, vCores:18 used and 
 memory:3072, vCores:0 available after allocation



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2159) Better logging in SchedulerNode#allocateContainer

2014-06-16 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032986#comment-14032986
 ] 

Tsuyoshi OZAWA commented on YARN-2159:
--

Thanks Ray for the contribution, and thanks Karthik for the review.

 Better logging in SchedulerNode#allocateContainer
 -

 Key: YARN-2159
 URL: https://issues.apache.org/jira/browse/YARN-2159
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Trivial
  Labels: newbie, supportability
 Attachments: YARN2159-01.patch


 This bit of code:
 {quote}
 LOG.info(Assigned container  + container.getId() +  of capacity 
 + container.getResource() +  on host  + rmNode.getNodeAddress()
 + , which currently has  + numContainers +  containers, 
 + getUsedResource() +  used and  + getAvailableResource()
 +  available);
 {quote}
 results in a line like:
 {quote}
 2014-05-30 16:17:43,573 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: 
 Assigned container container_14000_0009_01_00 of capacity 
 memory:1536, vCores:1 on host machine.host.domain.com:8041, which currently 
 has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 
 available
 {quote}
 That message is fine in most cases, but looks pretty bad after the last 
 available allocation, since it says something like vCores:0 available.
 Here is one suggested phrasing
   - which has 18 containers, memory:27648, vCores:18 used and 
 memory:3072, vCores:0 available after allocation



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1885) RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts


[ 
https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033069#comment-14033069
 ] 

Jian He commented on YARN-1885:
---

lgtm,  +1.  other than a minor code comment, fixed myself, waiting for  jenkins 
to commit..

 RM may not send the app-finished signal after RM restart to some nodes where 
 the application ran before RM restarts
 ---

 Key: YARN-1885
 URL: https://issues.apache.org/jira/browse/YARN-1885
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Wangda Tan
 Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, 
 YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, 
 YARN-1885.patch, YARN-1885.patch


 During our HA testing we have seen cases where yarn application logs are not 
 available through the cli but i can look at AM logs through the UI. RM was 
 also being restarted in the background as the application was running.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1885) RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts


 [ 
https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1885:
--

Attachment: YARN-1885.patch

 RM may not send the app-finished signal after RM restart to some nodes where 
 the application ran before RM restarts
 ---

 Key: YARN-1885
 URL: https://issues.apache.org/jira/browse/YARN-1885
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Wangda Tan
 Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, 
 YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, 
 YARN-1885.patch, YARN-1885.patch


 During our HA testing we have seen cases where yarn application logs are not 
 available through the cli but i can look at AM logs through the UI. RM was 
 also being restarted in the background as the application was running.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken


[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033101#comment-14033101
 ] 

Jian He commented on YARN-2052:
---

Application itself may possibly use Container.getId to differentiate the 
containers,  two containers allocated by two RMs may have the same id integer, 
then the application logic will break. will this be fine?
If we are taking this approach of adding a new field to differentiate the 
containerId, we should at least document that ContainerId.getid is not the way 
to differentiate containers.

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2052.1.patch, YARN-2052.2.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1339) Recover DeletionService state upon nodemanager restart


[ 
https://issues.apache.org/jira/browse/YARN-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033106#comment-14033106
 ] 

Junping Du commented on YARN-1339:
--

Thanks for addressing my comments, [~jlowe]! 
+1. The v6 patch LGTM, will commit it shortly.

 Recover DeletionService state upon nodemanager restart
 --

 Key: YARN-1339
 URL: https://issues.apache.org/jira/browse/YARN-1339
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1339.patch, YARN-1339v2.patch, 
 YARN-1339v3-and-YARN-1987.patch, YARN-1339v4.patch, YARN-1339v5.patch, 
 YARN-1339v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2159) Better logging in SchedulerNode#allocateContainer

2014-06-16 Thread Ray Chiang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033109#comment-14033109
 ] 

Ray Chiang commented on YARN-2159:
--

Great.  Thanks!

 Better logging in SchedulerNode#allocateContainer
 -

 Key: YARN-2159
 URL: https://issues.apache.org/jira/browse/YARN-2159
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Trivial
  Labels: newbie, supportability
 Fix For: 2.5.0

 Attachments: YARN2159-01.patch


 This bit of code:
 {quote}
 LOG.info(Assigned container  + container.getId() +  of capacity 
 + container.getResource() +  on host  + rmNode.getNodeAddress()
 + , which currently has  + numContainers +  containers, 
 + getUsedResource() +  used and  + getAvailableResource()
 +  available);
 {quote}
 results in a line like:
 {quote}
 2014-05-30 16:17:43,573 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: 
 Assigned container container_14000_0009_01_00 of capacity 
 memory:1536, vCores:1 on host machine.host.domain.com:8041, which currently 
 has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 
 available
 {quote}
 That message is fine in most cases, but looks pretty bad after the last 
 available allocation, since it says something like vCores:0 available.
 Here is one suggested phrasing
   - which has 18 containers, memory:27648, vCores:18 used and 
 memory:3072, vCores:0 available after allocation



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1885) RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts


 [ 
https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1885:
--

Attachment: YARN-1885.patch

 RM may not send the app-finished signal after RM restart to some nodes where 
 the application ran before RM restarts
 ---

 Key: YARN-1885
 URL: https://issues.apache.org/jira/browse/YARN-1885
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Wangda Tan
 Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, 
 YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, 
 YARN-1885.patch, YARN-1885.patch, YARN-1885.patch


 During our HA testing we have seen cases where yarn application logs are not 
 available through the cli but i can look at AM logs through the UI. RM was 
 also being restarted in the background as the application was running.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2167) LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block

Junping Du created YARN-2167:


 Summary: LeveldbIterator should get closed in 
NMLeveldbStateStoreService#loadLocalizationState() within finally block
 Key: YARN-2167
 URL: https://issues.apache.org/jira/browse/YARN-2167
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du


In NMLeveldbStateStoreService#loadLocalizationState(), we have LeveldbIterator 
to read NM's localization state but it is not get closed in finally block. We 
should close this connection to DB as a common practice. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1885) RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts

[
https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033117#comment-14033117
]

Hadoop QA commented on YARN-1885:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12650661/YARN-1885.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 11 new
or modified test files.

{color:red}-1 javac{color:red}. The patch appears to cause the build to
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4002//console

This message is automatically generated.

RM may not send the app-finished signal after RM restart to some nodes where
the application ran before RM restarts
---

Key: YARN-1885
URL: https://issues.apache.org/jira/browse/YARN-1885
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Wangda Tan
Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch,
YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch,
YARN-1885.patch, YARN-1885.patch, YARN-1885.patch

During our HA testing we have seen cases where yarn application logs are not
available through the cli but i can look at AM logs through the UI. RM was
also being restarted in the background as the application was running.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2167) LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block


 [ 
https://issues.apache.org/jira/browse/YARN-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2167:
-

Attachment: YARN-2167.patch

Upload a quick patch to fix it.

 LeveldbIterator should get closed in 
 NMLeveldbStateStoreService#loadLocalizationState() within finally block
 

 Key: YARN-2167
 URL: https://issues.apache.org/jira/browse/YARN-2167
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2167.patch


 In NMLeveldbStateStoreService#loadLocalizationState(), we have 
 LeveldbIterator to read NM's localization state but it is not get closed in 
 finally block. We should close this connection to DB as a common practice. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2167) LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block

2014-06-16 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033127#comment-14033127
 ] 

Jason Lowe commented on YARN-2167:
--

+1 pending Jenkins.

 LeveldbIterator should get closed in 
 NMLeveldbStateStoreService#loadLocalizationState() within finally block
 

 Key: YARN-2167
 URL: https://issues.apache.org/jira/browse/YARN-2167
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2167.patch


 In NMLeveldbStateStoreService#loadLocalizationState(), we have 
 LeveldbIterator to read NM's localization state but it is not get closed in 
 finally block. We should close this connection to DB as a common practice. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2167) LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block


[ 
https://issues.apache.org/jira/browse/YARN-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033157#comment-14033157
 ] 

Hadoop QA commented on YARN-2167:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650677/YARN-2167.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4004//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4004//console

This message is automatically generated.

 LeveldbIterator should get closed in 
 NMLeveldbStateStoreService#loadLocalizationState() within finally block
 

 Key: YARN-2167
 URL: https://issues.apache.org/jira/browse/YARN-2167
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2167.patch


 In NMLeveldbStateStoreService#loadLocalizationState(), we have 
 LeveldbIterator to read NM's localization state but it is not get closed in 
 finally block. We should close this connection to DB as a common practice. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2157) Document YARN metrics


[ 
https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033165#comment-14033165
 ] 

Jian He commented on YARN-2157:
---

Thanks for the patch!  Some suggestions on the patch:

ClusterMetrics shows the metrics of the YARN cluster? such as… 
{code}
ClusterMetrics shows the statistics of NodeManagers from the
+  ResourceManager's perspective
{code}

Do you mean the queue name? if so, we can use queue name.
{code}
queue identifier
{code}
Can you clarify more about what this format means?
{code}
running_num 
{code}

Can you please clarify the definition of pending applications? i.e. an 
application that has not yet been assigned any containers.
 Total number of applications killed -  Total number of killed applications, 
similarly for “Total number of applications failed”
can you clarify the meaning of PendingMB, PendingVCores, PendingContainers 
also? i.e. the pending resource requests that are not yet fulfilled by the 
scheduler.
allocatedContainers can be put before allocatedGB for consistency.
{code}
+*-+--+
+|allocatedGB | Current allocated memory in GB
+*-+--+
+|allocatedContainers | Current number of allocated containers
+*-+--+
{code}

 Document YARN metrics
 -

 Key: YARN-2157
 URL: https://issues.apache.org/jira/browse/YARN-2157
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
 Attachments: YARN-2157.patch


 YARN-side of HADOOP-6350. Add YARN metrics to Metrics document.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1339) Recover DeletionService state upon nodemanager restart


[ 
https://issues.apache.org/jira/browse/YARN-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033178#comment-14033178
 ] 

Hadoop QA commented on YARN-1339:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12649673/YARN-1339v6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4005//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4005//console

This message is automatically generated.

 Recover DeletionService state upon nodemanager restart
 --

 Key: YARN-1339
 URL: https://issues.apache.org/jira/browse/YARN-1339
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1339.patch, YARN-1339v2.patch, 
 YARN-1339v3-and-YARN-1987.patch, YARN-1339v4.patch, YARN-1339v5.patch, 
 YARN-1339v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1885) RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts


[ 
https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033185#comment-14033185
 ] 

Hadoop QA commented on YARN-1885:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650674/YARN-1885.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4003//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4003//console

This message is automatically generated.

 RM may not send the app-finished signal after RM restart to some nodes where 
 the application ran before RM restarts
 ---

 Key: YARN-1885
 URL: https://issues.apache.org/jira/browse/YARN-1885
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Wangda Tan
 Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, 
 YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, 
 YARN-1885.patch, YARN-1885.patch, YARN-1885.patch


 During our HA testing we have seen cases where yarn application logs are not 
 available through the cli but i can look at AM logs through the UI. RM was 
 also being restarted in the background as the application was running.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (YARN-1898) Standby RM's conf, stacks, logLevel, metrics, jmx and logs links are redirecting to Active RM

2014-06-16 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-1898.
---

Resolution: Fixed

Unfortunately jmx is a mess right now - it includes both machine metrics 
together with what should usually belong to /metrics. So you would get some 
stale metrics related to YARN if we don't redirect it to the active. Not sure 
what the right fix is without explicitly listing down and reasoning about all 
the stuff that is exposed in /jmx.

IAC, let's open a new ticket and link to this one. Tx.

 Standby RM's conf, stacks, logLevel, metrics, jmx and logs links are 
 redirecting to Active RM
 -

 Key: YARN-1898
 URL: https://issues.apache.org/jira/browse/YARN-1898
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Yesha Vora
Assignee: Xuan Gong
 Fix For: 2.4.1

 Attachments: YARN-1898.1.patch, YARN-1898.2.patch, YARN-1898.3.patch, 
 YARN-1898.addendum.patch, YARN-1898.addendum.patch


 Standby RM links /conf, /stacks, /logLevel, /metrics, /jmx is redirected to 
 Active RM.
 It should not be redirected to Active RM



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2144) Add logs when preemption occurs


[ 
https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033190#comment-14033190
 ] 

Jian He commented on YARN-2144:
---

Haven't looked at the patch. YARN-1809 is adding the attempt UI, Maybe the app 
UI should show the total preempted containers info, and attempt UI should show 
each attempt's preempted containers info ?

 Add logs when preemption occurs
 ---

 Key: YARN-2144
 URL: https://issues.apache.org/jira/browse/YARN-2144
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.5.0
Reporter: Tassapol Athiapinya
Assignee: Wangda Tan
 Attachments: AM-page-preemption-info.png, YARN-2144.patch, 
 YARN-2144.patch, YARN-2144.patch


 There should be easy-to-read logs when preemption does occur. 
 1. For debugging purpose, RM should log this.
 2. For administrative purpose, RM webpage should have a page to show recent 
 preemption events.
 RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1885) RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts


[ 
https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033233#comment-14033233
 ] 

Hudson commented on YARN-1885:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5714 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5714/])
YARN-1885. Fixed a bug that RM may not send application-clean-up signal to NMs 
where the completed applications previously ran in case of RM restart. 
Contributed by Wangda Tan (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603028)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api/protocolrecords/TestProtocolRecords.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api/protocolrecords/TestRegisterNodeManagerRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppRunningOnNodeEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptContainerAcquiredEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStartedEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java
*

[jira] [Created] (YARN-2168) SCM/Client/NM/Admin protocols

2014-06-16 Thread Chris Trezzo (JIRA)

Chris Trezzo created YARN-2168:
--

 Summary: SCM/Client/NM/Admin protocols
 Key: YARN-2168
 URL: https://issues.apache.org/jira/browse/YARN-2168
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo


Define and implement the following protocols and protocol messages using 
protobufs:
* ClientSCMProtocol - The protocol between the yarn client and the cache 
manager. This protocol controls how resources in the cache are claimed and 
released.
** UseSharedCacheResourceRequest
** UseSharedCacheResourceResponse
** ReleaseSharedCacheResourceRequest
** ReleaseSharedCacheResourceResponse
* SCMAdminProtocol - This is an administrative protocol for the cache manager. 
It allows administrators to manually trigger cleaner runs.
** RunSharedCacheCleanerTaskRequest
** RunSharedCacheCleanerTaskResponse
* NMCacheUploaderSCMProtocol - The protocol between the NodeManager and the 
cache manager. This allows the NodeManager to coordinate with the cache manager 
when uploading new resources to the shared cache.
** NotifySCMRequest
** NotifySCMResponse



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2168) SCM/Client/NM/Admin protocols

2014-06-16 Thread Chris Trezzo (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2168:
---

Attachment: YARN-2168-trunk-v1.patch

Attached is v1 patch based off of trunk.

 SCM/Client/NM/Admin protocols
 -

 Key: YARN-2168
 URL: https://issues.apache.org/jira/browse/YARN-2168
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2168-trunk-v1.patch


 Define and implement the following protocols and protocol messages using 
 protobufs:
 * ClientSCMProtocol - The protocol between the yarn client and the cache 
 manager. This protocol controls how resources in the cache are claimed and 
 released.
 ** UseSharedCacheResourceRequest
 ** UseSharedCacheResourceResponse
 ** ReleaseSharedCacheResourceRequest
 ** ReleaseSharedCacheResourceResponse
 * SCMAdminProtocol - This is an administrative protocol for the cache 
 manager. It allows administrators to manually trigger cleaner runs.
 ** RunSharedCacheCleanerTaskRequest
 ** RunSharedCacheCleanerTaskResponse
 * NMCacheUploaderSCMProtocol - The protocol between the NodeManager and the 
 cache manager. This allows the NodeManager to coordinate with the cache 
 manager when uploading new resources to the shared cache.
 ** NotifySCMRequest
 ** NotifySCMResponse



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2167) LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block


[ 
https://issues.apache.org/jira/browse/YARN-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033286#comment-14033286
 ] 

Junping Du commented on YARN-2167:
--

The patch is very tiny and straight-forward, so no need for additional unit 
test.

 LeveldbIterator should get closed in 
 NMLeveldbStateStoreService#loadLocalizationState() within finally block
 

 Key: YARN-2167
 URL: https://issues.apache.org/jira/browse/YARN-2167
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2167.patch


 In NMLeveldbStateStoreService#loadLocalizationState(), we have 
 LeveldbIterator to read NM's localization state but it is not get closed in 
 finally block. We should close this connection to DB as a common practice. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2168) SCM/Client/NM/Admin protocols


[ 
https://issues.apache.org/jira/browse/YARN-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033293#comment-14033293
 ] 

Hadoop QA commented on YARN-2168:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12650693/YARN-2168-trunk-v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4006//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4006//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4006//console

This message is automatically generated.

 SCM/Client/NM/Admin protocols
 -

 Key: YARN-2168
 URL: https://issues.apache.org/jira/browse/YARN-2168
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2168-trunk-v1.patch


 Define and implement the following protocols and protocol messages using 
 protobufs:
 * ClientSCMProtocol - The protocol between the yarn client and the cache 
 manager. This protocol controls how resources in the cache are claimed and 
 released.
 ** UseSharedCacheResourceRequest
 ** UseSharedCacheResourceResponse
 ** ReleaseSharedCacheResourceRequest
 ** ReleaseSharedCacheResourceResponse
 * SCMAdminProtocol - This is an administrative protocol for the cache 
 manager. It allows administrators to manually trigger cleaner runs.
 ** RunSharedCacheCleanerTaskRequest
 ** RunSharedCacheCleanerTaskResponse
 * NMCacheUploaderSCMProtocol - The protocol between the NodeManager and the 
 cache manager. This allows the NodeManager to coordinate with the cache 
 manager when uploading new resources to the shared cache.
 ** NotifySCMRequest
 ** NotifySCMResponse



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1339) Recover DeletionService state upon nodemanager restart


[ 
https://issues.apache.org/jira/browse/YARN-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033295#comment-14033295
 ] 

Hudson commented on YARN-1339:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5715 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5715/])
YARN-1339. Recover DeletionService state upon nodemanager restart. (Contributed 
by Jason Lowe) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603036)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DeletionService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_recovery.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDeletionService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java


 Recover DeletionService state upon nodemanager restart
 --

 Key: YARN-1339
 URL: https://issues.apache.org/jira/browse/YARN-1339
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 2.5.0

 Attachments: YARN-1339.patch, YARN-1339v2.patch, 
 YARN-1339v3-and-YARN-1987.patch, YARN-1339v4.patch, YARN-1339v5.patch, 
 YARN-1339v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1885) RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts


[ 
https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033299#comment-14033299
 ] 

Wangda Tan commented on YARN-1885:
--

Thanks [~vinodkv] and [~jianhe] for review and commit!

 RM may not send the app-finished signal after RM restart to some nodes where 
 the application ran before RM restarts
 ---

 Key: YARN-1885
 URL: https://issues.apache.org/jira/browse/YARN-1885
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Wangda Tan
 Fix For: 2.5.0

 Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, 
 YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, 
 YARN-1885.patch, YARN-1885.patch, YARN-1885.patch


 During our HA testing we have seen cases where yarn application logs are not 
 available through the cli but i can look at AM logs through the UI. RM was 
 also being restarted in the background as the application was running.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2144) Add logs when preemption occurs

[
https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033306#comment-14033306
]

Wangda Tan commented on YARN-2144:
--

Hi [~tassapola],
bq. In AM page, Does Resource Preempted from Current Attempt mean Total
Resource Preempted from Latest AM attempt? Can it show only data point from
current (is it latest?) attempt?
Yes,
Yes, it can only show data point from current(latest) attempt.
bq. Can you change #Container Preempted from Current Attempt: to Number of
Containers Preempted from Current(Latest) Attempt? # syntax maybe hard to
comprehend for wider group of user.
Thanks for this comment, I agree with you. I'll address this comment later.

Add logs when preemption occurs
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2157) Document YARN metrics

2014-06-16 Thread Akira AJISAKA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-2157:


Attachment: YARN-2157.2.patch

Thanks [~jianhe] for the suggestions! Updated the patch.

 Document YARN metrics
 -

 Key: YARN-2157
 URL: https://issues.apache.org/jira/browse/YARN-2157
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
 Attachments: YARN-2157.2.patch, YARN-2157.patch


 YARN-side of HADOOP-6350. Add YARN metrics to Metrics document.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2144) Add logs when preemption occurs

[
https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033326#comment-14033326
]

Wangda Tan commented on YARN-2144:
--

bq. Haven't looked at the patch. YARN-1809 is adding the attempt UI, Maybe the
app UI should show the total preempted containers info, and attempt UI should
show each attempt's preempted containers info?
Thanks for pointing the attempt UI JIRA. I think RM will cleanup application's
resource usage at the beginning of each attempt start, so it should make sense
to show latest attempt's preempted containers info on app UI.
And after we can persist preemption info across RM restart and YARN-1807
committed, we can show each attempt's preempted containers info on attempts UI.
Do you agree?

Add logs when preemption occurs
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures


[ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033327#comment-14033327
 ] 

Wangda Tan commented on YARN-2074:
--

Hi Jian,
I've reviewed your patch, one question,

Is following a bug?
{code}
int exitStatus = ContainerExitStatus.PREEMPTED;
switch (event.getType()) {
case LAUNCH_FAILED:
  RMAppAttemptLaunchFailedEvent launchFaileEvent =
  (RMAppAttemptLaunchFailedEvent) event;
  diags = launchFaileEvent.getMessage();
  break;
{code}
amContainerExitStatus will be set to ContainerExitStatus.PREEMPTED in any case.

If it's a bug, I think we should cover a AM completed/fail case and it 
shouldn't be treated as preempted.
Thanks,

 Preemption of AM containers shouldn't count towards AM failures
 ---

 Key: YARN-2074
 URL: https://issues.apache.org/jira/browse/YARN-2074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, 
 YARN-2074.4.patch, YARN-2074.5.patch


 One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
 containers getting preempted shouldn't count towards AM failures and thus 
 shouldn't eventually fail applications.
 We should explicitly handle AM container preemption/kill as a separate issue 
 and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2167) LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block

2014-06-16 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033328#comment-14033328
 ] 

Jason Lowe commented on YARN-2167:
--

+1 lgtm.  Committing this.

 LeveldbIterator should get closed in 
 NMLeveldbStateStoreService#loadLocalizationState() within finally block
 

 Key: YARN-2167
 URL: https://issues.apache.org/jira/browse/YARN-2167
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2167.patch


 In NMLeveldbStateStoreService#loadLocalizationState(), we have 
 LeveldbIterator to read NM's localization state but it is not get closed in 
 finally block. We should close this connection to DB as a common practice. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-06-16 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1405#comment-1405
 ] 

Tsuyoshi OZAWA commented on YARN-2074:
--

{quote}
amContainerExitStatus will be set to ContainerExitStatus.PREEMPTED in any case.
If it's a bug, I think we should cover a AM completed/fail case and it 
shouldn't be treated as preempted.
Thanks,
{quote}

[~wangda], thank you for pointing it out. I'll check it.

 Preemption of AM containers shouldn't count towards AM failures
 ---

 Key: YARN-2074
 URL: https://issues.apache.org/jira/browse/YARN-2074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, 
 YARN-2074.4.patch, YARN-2074.5.patch


 One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
 containers getting preempted shouldn't count towards AM failures and thus 
 shouldn't eventually fail applications.
 We should explicitly handle AM container preemption/kill as a separate issue 
 and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2167) LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block


[ 
https://issues.apache.org/jira/browse/YARN-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033342#comment-14033342
 ] 

Junping Du commented on YARN-2167:
--

Thanks [~jlowe] for review and commit!

 LeveldbIterator should get closed in 
 NMLeveldbStateStoreService#loadLocalizationState() within finally block
 

 Key: YARN-2167
 URL: https://issues.apache.org/jira/browse/YARN-2167
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-2167.patch


 In NMLeveldbStateStoreService#loadLocalizationState(), we have 
 LeveldbIterator to read NM's localization state but it is not get closed in 
 finally block. We should close this connection to DB as a common practice. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2167) LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block


[ 
https://issues.apache.org/jira/browse/YARN-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033346#comment-14033346
 ] 

Hudson commented on YARN-2167:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5716 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5716/])
YARN-2167. LeveldbIterator should get closed in 
NMLeveldbStateStoreService#loadLocalizationState() within finally block. 
Contributed by Junping Du (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603039)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java


 LeveldbIterator should get closed in 
 NMLeveldbStateStoreService#loadLocalizationState() within finally block
 

 Key: YARN-2167
 URL: https://issues.apache.org/jira/browse/YARN-2167
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-2167.patch


 In NMLeveldbStateStoreService#loadLocalizationState(), we have 
 LeveldbIterator to read NM's localization state but it is not get closed in 
 finally block. We should close this connection to DB as a common practice. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures


 [ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2074:
--

Attachment: YARN-2074.6.patch

thanks for pointing out! fixed it.

One thing to note here is that AM ContainerExitStatus for succeeded app is not 
saved in state store. only containerExitStatus for failed apps is saved.

 Preemption of AM containers shouldn't count towards AM failures
 ---

 Key: YARN-2074
 URL: https://issues.apache.org/jira/browse/YARN-2074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, 
 YARN-2074.4.patch, YARN-2074.5.patch, YARN-2074.6.patch


 One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
 containers getting preempted shouldn't count towards AM failures and thus 
 shouldn't eventually fail applications.
 We should explicitly handle AM container preemption/kill as a separate issue 
 and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures


[ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033356#comment-14033356
 ] 

Hadoop QA commented on YARN-2074:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650717/YARN-2074.6.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4008//console

This message is automatically generated.

 Preemption of AM containers shouldn't count towards AM failures
 ---

 Key: YARN-2074
 URL: https://issues.apache.org/jira/browse/YARN-2074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, 
 YARN-2074.4.patch, YARN-2074.5.patch, YARN-2074.6.patch


 One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
 containers getting preempted shouldn't count towards AM failures and thus 
 shouldn't eventually fail applications.
 We should explicitly handle AM container preemption/kill as a separate issue 
 and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2157) Document YARN metrics


[ 
https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033358#comment-14033358
 ] 

Hadoop QA commented on YARN-2157:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650712/YARN-2157.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4007//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4007//console

This message is automatically generated.

 Document YARN metrics
 -

 Key: YARN-2157
 URL: https://issues.apache.org/jira/browse/YARN-2157
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
 Attachments: YARN-2157.2.patch, YARN-2157.patch


 YARN-side of HADOOP-6350. Add YARN metrics to Metrics document.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2165) Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero


[ 
https://issues.apache.org/jira/browse/YARN-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033377#comment-14033377
 ] 

Zhijie Shen commented on YARN-2165:
---

[~karams], how about gathering the similar validation issues (YARN-2166) here?

 Timelineserver should validate that yarn.timeline-service.ttl-ms is greater 
 than zero
 -

 Key: YARN-2165
 URL: https://issues.apache.org/jira/browse/YARN-2165
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Karam Singh

 Timelineserver should validate that yarn.timeline-service.ttl-ms is greater 
 than zero
 Currently if set yarn.timeline-service.ttl-ms=0 
 Or yarn.timeline-service.ttl-ms=-86400 
 Timeline server start successfully with complaining
 {code}
 2014-06-15 14:52:16,562 INFO  timeline.LeveldbTimelineStore 
 (LeveldbTimelineStore.java:init(247)) - Starting deletion thread with ttl 
 -60480 and cycle interval 30
 {code}
 At starting timelinserver should that yarn.timeline-service-ttl-ms  0
 otherwise specially for -ive value discard oldvalues timestamp will be set 
 future value. Which may lead to inconsistancy in behavior 
 {code}
 public void run() {
   while (true) {
 long timestamp = System.currentTimeMillis() - ttl;
 try {
   discardOldEntities(timestamp);
   Thread.sleep(ttlInterval);
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures


[ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033396#comment-14033396
 ] 

Hadoop QA commented on YARN-2074:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650719/YARN-2074.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4009//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4009//console

This message is automatically generated.

 Preemption of AM containers shouldn't count towards AM failures
 ---

 Key: YARN-2074
 URL: https://issues.apache.org/jira/browse/YARN-2074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, 
 YARN-2074.4.patch, YARN-2074.5.patch, YARN-2074.6.patch, YARN-2074.6.patch


 One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
 containers getting preempted shouldn't count towards AM failures and thus 
 shouldn't eventually fail applications.
 We should explicitly handle AM container preemption/kill as a separate issue 
 and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2162) Fair Scheduler :ability to configure minResources and maxResources in terms of percentage

2014-06-16 Thread Maysam Yabandeh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033399#comment-14033399
 ] 

Maysam Yabandeh commented on YARN-2162:
---

If we add this feature, it should certainly be optional. Think of the following 
scenario, which quite usual:
# Cluster is divided between 100 queues
# Queue 1 require more resources and ask for more capacity
# More machines added to the cluster to respond to the new demand of  Queue 1
# The min and max Resources of Queue 1 is updated accordingly

If the  queues' min and max resources are expressed in terms of percentage, 
then all the queues have to update their percentage.

 Fair Scheduler :ability to configure minResources and maxResources in terms 
 of percentage
 -

 Key: YARN-2162
 URL: https://issues.apache.org/jira/browse/YARN-2162
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Ashwin Shankar
  Labels: scheduler

 minResources and maxResources in fair scheduler configs are expressed in 
 terms of absolute numbers X mb, Y vcores. 
 As a result, when we expand or shrink our hadoop cluster, we need to 
 recalculate and change minResources/maxResources accordingly, which is pretty 
 inconvenient.
 We can circumvent this problem if we can (optionally) configure these 
 properties in terms of percentage of cluster capacity. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2165) Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero


 [ 
https://issues.apache.org/jira/browse/YARN-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2165:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-1530

 Timelineserver should validate that yarn.timeline-service.ttl-ms is greater 
 than zero
 -

 Key: YARN-2165
 URL: https://issues.apache.org/jira/browse/YARN-2165
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Karam Singh

 Timelineserver should validate that yarn.timeline-service.ttl-ms is greater 
 than zero
 Currently if set yarn.timeline-service.ttl-ms=0 
 Or yarn.timeline-service.ttl-ms=-86400 
 Timeline server start successfully with complaining
 {code}
 2014-06-15 14:52:16,562 INFO  timeline.LeveldbTimelineStore 
 (LeveldbTimelineStore.java:init(247)) - Starting deletion thread with ttl 
 -60480 and cycle interval 30
 {code}
 At starting timelinserver should that yarn.timeline-service-ttl-ms  0
 otherwise specially for -ive value discard oldvalues timestamp will be set 
 future value. Which may lead to inconsistancy in behavior 
 {code}
 public void run() {
   while (true) {
 long timestamp = System.currentTimeMillis() - ttl;
 try {
   discardOldEntities(timestamp);
   Thread.sleep(ttlInterval);
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2166) Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store


 [ 
https://issues.apache.org/jira/browse/YARN-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2166:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-1530

 Timelineserver should validate that 
 yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than 
 zero when level db is for timeline store
 -

 Key: YARN-2166
 URL: https://issues.apache.org/jira/browse/YARN-2166
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Karam Singh

 Timelineserver should validate that 
 yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than 
 zero when level db is for timeline store
 other if we start timelineserver with
 yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms=-5000
 Timeline starts but Thread.sleep call in EntityDeletionThread.run keep on 
 throwing UncaughtException -ive value
 {code}
 2014-06-16 10:22:03,537 ERROR yarn.YarnUncaughtExceptionHandler 
 (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread 
 Thread[Thread-4,5,main] threw an Exception.
 java.lang.IllegalArgumentException: timeout value is negative
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore$EntityDeletionThread.run(LeveldbTimelineStore.java:257)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures


[ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033402#comment-14033402
 ] 

Wangda Tan commented on YARN-2074:
--

[~jianhe], changes almost LGTM, one comment in test,
Could you add a test case of a app has several attempts, some of them is 
failed, some is preempted, we need check if the RMAppAttemptImpl.isLastAttempt 
properly set.

 Preemption of AM containers shouldn't count towards AM failures
 ---

 Key: YARN-2074
 URL: https://issues.apache.org/jira/browse/YARN-2074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, 
 YARN-2074.4.patch, YARN-2074.5.patch, YARN-2074.6.patch, YARN-2074.6.patch


 One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
 containers getting preempted shouldn't count towards AM failures and thus 
 shouldn't eventually fail applications.
 We should explicitly handle AM container preemption/kill as a separate issue 
 and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2152) Recover missing container information


[ 
https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033403#comment-14033403
 ] 

Wangda Tan commented on YARN-2152:
--

Thanks [~jianhe] for update, LGTM, +1.

 Recover missing container information
 -

 Key: YARN-2152
 URL: https://issues.apache.org/jira/browse/YARN-2152
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2152.1.patch, YARN-2152.1.patch, YARN-2152.2.patch


 Container information such as container priority and container start time 
 cannot be recovered because NM container today lacks such container 
 information to send across on NM registration when RM recovery happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures


[ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033421#comment-14033421
 ] 

Jian He commented on YARN-2074:
---

 testPreemptedAMRestartOnRMRestart is doing with multiple attempts?

 Preemption of AM containers shouldn't count towards AM failures
 ---

 Key: YARN-2074
 URL: https://issues.apache.org/jira/browse/YARN-2074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, 
 YARN-2074.4.patch, YARN-2074.5.patch, YARN-2074.6.patch, YARN-2074.6.patch


 One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
 containers getting preempted shouldn't count towards AM failures and thus 
 shouldn't eventually fail applications.
 We should explicitly handle AM container preemption/kill as a separate issue 
 and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures