date:20140314


 [ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1197:
-

Assignee: (was: Wangda Tan)

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1197) Support changing resources of an allocated container


[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934673#comment-13934673
 ] 

Wangda Tan commented on YARN-1197:
--

I'm leaving my current company on next week, and am no longer involved in 
YARN-1197, one of my colleagues will take this Jira and sub tasks.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1609) Add Service Container type to NodeManager in YARN


 [ 
https://issues.apache.org/jira/browse/YARN-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1609:
-

Assignee: (was: Wangda Tan)

 Add Service Container type to NodeManager in YARN
 -

 Key: YARN-1609
 URL: https://issues.apache.org/jira/browse/YARN-1609
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Wangda Tan
 Attachments: Add Service Container type to NodeManager in YARN-V1.pdf


 From our work to support running OpenMPI on YARN (MAPREDUCE-2911), we found 
 that it’s important to have framework specific daemon process manage the 
 tasks on each node directly. The daemon process, most likely similar in other 
 frameworks as well, provides critical services to tasks running on that 
 node(for example “wireup”, spawn user process in large numbers at once etc). 
 In YARN, it’s hard, if not possible, to have the those processes to be 
 managed by YARN. 
 We propose to extend the container model on NodeManager side to support 
 “Service Container” to run/manage such framework daemon/services process. We 
 believe this is very useful to other application framework developers as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1609) Add Service Container type to NodeManager in YARN


[ 
https://issues.apache.org/jira/browse/YARN-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934675#comment-13934675
 ] 

Wangda Tan commented on YARN-1609:
--

I'm leaving my current company on next week, and am no longer involved in 
YARN-1609, one of my colleagues will take this Jira.

 Add Service Container type to NodeManager in YARN
 -

 Key: YARN-1609
 URL: https://issues.apache.org/jira/browse/YARN-1609
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Wangda Tan
 Attachments: Add Service Container type to NodeManager in YARN-V1.pdf


 From our work to support running OpenMPI on YARN (MAPREDUCE-2911), we found 
 that it’s important to have framework specific daemon process manage the 
 tasks on each node directly. The daemon process, most likely similar in other 
 frameworks as well, provides critical services to tasks running on that 
 node(for example “wireup”, spawn user process in large numbers at once etc). 
 In YARN, it’s hard, if not possible, to have the those processes to be 
 managed by YARN. 
 We propose to extend the container model on NodeManager side to support 
 “Service Container” to run/manage such framework daemon/services process. We 
 believe this is very useful to other application framework developers as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1685) Bugs around log URL


 [ 
https://issues.apache.org/jira/browse/YARN-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1685:
--

Attachment: YARN-1685.5.patch

Upload a new patch with the new approach

 Bugs around log URL
 ---

 Key: YARN-1685
 URL: https://issues.apache.org/jira/browse/YARN-1685
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Zhijie Shen
 Attachments: YARN-1685-1.patch, YARN-1685.2.patch, YARN-1685.3.patch, 
 YARN-1685.4.patch, YARN-1685.5.patch


 1. Log URL should be different when the container is running and finished
 2. Null case needs to be handled
 3. The way of constructing log URL should be corrected



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1685) Bugs around log URL


[ 
https://issues.apache.org/jira/browse/YARN-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934679#comment-13934679
 ] 

Zhijie Shen commented on YARN-1685:
---

Vinod, good suggestion. We can make use of the existing info to construct a log 
url string when rendering it. However, there's one issue. Based on current 
stored container information, we're unable to know whether the container has 
been launched before or not. If the container was not launched, we should not 
show the log url. Anyway, I agree not storing log url is a right way. How about 
we go back to fix the container not launched case separately, by enhancing the 
container exit status, state or something to indicate what happened to a 
container before?

 Bugs around log URL
 ---

 Key: YARN-1685
 URL: https://issues.apache.org/jira/browse/YARN-1685
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Zhijie Shen
 Attachments: YARN-1685-1.patch, YARN-1685.2.patch, YARN-1685.3.patch, 
 YARN-1685.4.patch, YARN-1685.5.patch


 1. Log URL should be different when the container is running and finished
 2. Null case needs to be handled
 3. The way of constructing log URL should be corrected



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1685) Bugs around log URL


[ 
https://issues.apache.org/jira/browse/YARN-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934707#comment-13934707
 ] 

Hadoop QA commented on YARN-1685:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634661/YARN-1685.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3361//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3361//console

This message is automatically generated.

 Bugs around log URL
 ---

 Key: YARN-1685
 URL: https://issues.apache.org/jira/browse/YARN-1685
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Zhijie Shen
 Attachments: YARN-1685-1.patch, YARN-1685.2.patch, YARN-1685.3.patch, 
 YARN-1685.4.patch, YARN-1685.5.patch


 1. Log URL should be different when the container is running and finished
 2. Null case needs to be handled
 3. The way of constructing log URL should be corrected



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-1609) Add Service Container type to NodeManager in YARN

2014-03-14 Thread Jeff Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang reassigned YARN-1609:


Assignee: Jeff Zhang

 Add Service Container type to NodeManager in YARN
 -

 Key: YARN-1609
 URL: https://issues.apache.org/jira/browse/YARN-1609
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Wangda Tan
Assignee: Jeff Zhang
 Attachments: Add Service Container type to NodeManager in YARN-V1.pdf


 From our work to support running OpenMPI on YARN (MAPREDUCE-2911), we found 
 that it’s important to have framework specific daemon process manage the 
 tasks on each node directly. The daemon process, most likely similar in other 
 frameworks as well, provides critical services to tasks running on that 
 node(for example “wireup”, spawn user process in large numbers at once etc). 
 In YARN, it’s hard, if not possible, to have the those processes to be 
 managed by YARN. 
 We propose to extend the container model on NodeManager side to support 
 “Service Container” to run/manage such framework daemon/services process. We 
 believe this is very useful to other application framework developers as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1832) wrong MockLocalizerStatus.equals() method implementation

2014-03-14 Thread Hong Zhiguo (JIRA)

Hong Zhiguo created YARN-1832:
-

 Summary: wrong MockLocalizerStatus.equals() method implementation
 Key: YARN-1832
 URL: https://issues.apache.org/jira/browse/YARN-1832
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Hong Zhiguo
Priority: Trivial



return getLocalizerId().equals(other)  ...; should be
return getLocalizerId().equals(other. getLocalizerId())  ...;

getLocalizerId() returns String. It's expected to compare this.getLocalizerId() 
against other.getLocalizerId().




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-1197) Support changing resources of an allocated container

2014-03-14 Thread Jeff Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang reassigned YARN-1197:


Assignee: Jeff Zhang

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
Assignee: Jeff Zhang
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: [jira] [Assigned] (YARN-1197) Support changing resources of anallocated container

2014-03-14 Thread Zhiguo

Hi,  Jeff,

How can you assign an issue to yourself when you are not contributor yet?

I want to assign an issue too.

Thanks,
Zhiguo

-- Original --
From:  Jeff Zhang (JIRA);j...@apache.org;
Send time: Friday, Mar 14, 2014 4:59 PM
To: yarn-issuesyarn-issues@hadoop.apache.org; 

Subject:  [jira] [Assigned] (YARN-1197) Support changing resources of 
anallocated container




 [ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang reassigned YARN-1197:


Assignee: Jeff Zhang

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
Assignee: Jeff Zhang
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.2#6252)
.

[jira] [Updated] (YARN-1832) wrong MockLocalizerStatus.equals() method implementation

2014-03-14 Thread Hong Zhiguo (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-1832:
--

Attachment: YARN-1832.patch

 wrong MockLocalizerStatus.equals() method implementation
 

 Key: YARN-1832
 URL: https://issues.apache.org/jira/browse/YARN-1832
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Hong Zhiguo
Priority: Trivial
 Attachments: YARN-1832.patch


 return getLocalizerId().equals(other)  ...; should be
 return getLocalizerId().equals(other. getLocalizerId())  ...;
 getLocalizerId() returns String. It's expected to compare 
 this.getLocalizerId() against other.getLocalizerId().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1832) wrong MockLocalizerStatus.equals() method implementation


[ 
https://issues.apache.org/jira/browse/YARN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934816#comment-13934816
 ] 

Hadoop QA commented on YARN-1832:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634678/YARN-1832.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3362//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3362//console

This message is automatically generated.

 wrong MockLocalizerStatus.equals() method implementation
 

 Key: YARN-1832
 URL: https://issues.apache.org/jira/browse/YARN-1832
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Hong Zhiguo
Priority: Trivial
 Attachments: YARN-1832.patch


 return getLocalizerId().equals(other)  ...; should be
 return getLocalizerId().equals(other. getLocalizerId())  ...;
 getLocalizerId() returns String. It's expected to compare 
 this.getLocalizerId() against other.getLocalizerId().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1775) Create SMAPBasedProcessTree to get PSS information

2014-03-14 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated YARN-1775:
---

Attachment: yarn-1775-2.4.0.patch

Computes the RSS by reading /proc/pid/smaps.  Tested with branch 2.4.0 on 20 
node cluster.

 Create SMAPBasedProcessTree to get PSS information
 --

 Key: YARN-1775
 URL: https://issues.apache.org/jira/browse/YARN-1775
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Rajesh Balamohan
Priority: Minor
 Attachments: yarn-1775-2.4.0.patch


 Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will 
 make use of PSS for computing the memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1775) Create SMAPBasedProcessTree to get PSS information

2014-03-14 Thread Rajesh Balamohan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934829#comment-13934829
 ] 

Rajesh Balamohan commented on YARN-1775:


Review request link : https://reviews.apache.org/r/19220/

 Create SMAPBasedProcessTree to get PSS information
 --

 Key: YARN-1775
 URL: https://issues.apache.org/jira/browse/YARN-1775
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Rajesh Balamohan
Priority: Minor
 Attachments: yarn-1775-2.4.0.patch


 Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will 
 make use of PSS for computing the memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1775) Create SMAPBasedProcessTree to get PSS information

2014-03-14 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated YARN-1775:
---

Fix Version/s: 2.5.0

 Create SMAPBasedProcessTree to get PSS information
 --

 Key: YARN-1775
 URL: https://issues.apache.org/jira/browse/YARN-1775
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Rajesh Balamohan
Priority: Minor
 Fix For: 2.5.0

 Attachments: yarn-1775-2.4.0.patch


 Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will 
 make use of PSS for computing the memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1658) Webservice should redirect to active RM when HA is enabled.


[ 
https://issues.apache.org/jira/browse/YARN-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934873#comment-13934873
 ] 

Hudson commented on YARN-1658:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #509 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/509/])
YARN-1658. Modified web-app framework to let standby RMs redirect web-service 
calls to the active RM. Contributed by Cindy Li. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577408)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/Dispatcher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/Router.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/WebApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMDispatcher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebAppFilter.java


 Webservice should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1658
 URL: https://issues.apache.org/jira/browse/YARN-1658
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Cindy Li
Assignee: Cindy Li
  Labels: YARN
 Fix For: 2.4.0

 Attachments: YARN1658.1.patch, YARN1658.2.patch, YARN1658.3.patch, 
 YARN1658.patch


 When HA is enabled, web service to standby RM should be redirected to the 
 active RM. This is a related Jira to YARN-1525.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource


[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934867#comment-13934867
 ] 

Hudson commented on YARN-1771:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #509 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/509/])
YARN-1771. Reduce the number of NameNode operations during localization of
public resources using a cache. Contributed by Sangjin Lee (cdouglas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577391)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapred/TestLocalDistributedCacheManager.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizerContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java


 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Fix For: 3.0.0, 2.4.0

 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch, 
 yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1813) Better error message for yarn logs when permission denied


 [ 
https://issues.apache.org/jira/browse/YARN-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1813:
-

Attachment: YARN-1813.2.patch

Included tests.

 Better error message for yarn logs when permission denied
 ---

 Key: YARN-1813
 URL: https://issues.apache.org/jira/browse/YARN-1813
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Andrew Wang
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Attachments: YARN-1813.1.patch, YARN-1813.2.patch


 I ran some MR jobs as the hdfs user, and then forgot to sudo -u when 
 grabbing the logs. yarn logs prints an error message like the following:
 {noformat}
 [andrew.wang@a2402 ~]$ yarn logs -applicationId application_1394482121761_0010
 14/03/10 16:05:10 INFO client.RMProxy: Connecting to ResourceManager at 
 a2402.halxg.cloudera.com/10.20.212.10:8032
 Logs not available at 
 /tmp/logs/andrew.wang/logs/application_1394482121761_0010
 Log aggregation has not completed or is not enabled.
 {noformat}
 It'd be nicer if it said Permission denied or AccessControlException or 
 something like that instead, since that's the real issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1813) Better error message for yarn logs when permission denied


[ 
https://issues.apache.org/jira/browse/YARN-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934937#comment-13934937
 ] 

Hadoop QA commented on YARN-1813:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634698/YARN-1813.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3363//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3363//console

This message is automatically generated.

 Better error message for yarn logs when permission denied
 ---

 Key: YARN-1813
 URL: https://issues.apache.org/jira/browse/YARN-1813
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Andrew Wang
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Attachments: YARN-1813.1.patch, YARN-1813.2.patch


 I ran some MR jobs as the hdfs user, and then forgot to sudo -u when 
 grabbing the logs. yarn logs prints an error message like the following:
 {noformat}
 [andrew.wang@a2402 ~]$ yarn logs -applicationId application_1394482121761_0010
 14/03/10 16:05:10 INFO client.RMProxy: Connecting to ResourceManager at 
 a2402.halxg.cloudera.com/10.20.212.10:8032
 Logs not available at 
 /tmp/logs/andrew.wang/logs/application_1394482121761_0010
 Log aggregation has not completed or is not enabled.
 {noformat}
 It'd be nicer if it said Permission denied or AccessControlException or 
 something like that instead, since that's the real issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1813) Better error message for yarn logs when permission denied


[ 
https://issues.apache.org/jira/browse/YARN-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934960#comment-13934960
 ] 

Tsuyoshi OZAWA commented on YARN-1813:
--

[~andrew.wang], can you review a latest patch?

 Better error message for yarn logs when permission denied
 ---

 Key: YARN-1813
 URL: https://issues.apache.org/jira/browse/YARN-1813
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Andrew Wang
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Attachments: YARN-1813.1.patch, YARN-1813.2.patch


 I ran some MR jobs as the hdfs user, and then forgot to sudo -u when 
 grabbing the logs. yarn logs prints an error message like the following:
 {noformat}
 [andrew.wang@a2402 ~]$ yarn logs -applicationId application_1394482121761_0010
 14/03/10 16:05:10 INFO client.RMProxy: Connecting to ResourceManager at 
 a2402.halxg.cloudera.com/10.20.212.10:8032
 Logs not available at 
 /tmp/logs/andrew.wang/logs/application_1394482121761_0010
 Log aggregation has not completed or is not enabled.
 {noformat}
 It'd be nicer if it said Permission denied or AccessControlException or 
 something like that instead, since that's the real issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1769) CapacityScheduler: Improve reservations

2014-03-14 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-1769:


Attachment: YARN-1769.patch

 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
 YARN-1769.patch, YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource


[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935021#comment-13935021
 ] 

Hudson commented on YARN-1771:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1701 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1701/])
YARN-1771. Reduce the number of NameNode operations during localization of
public resources using a cache. Contributed by Sangjin Lee (cdouglas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577391)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapred/TestLocalDistributedCacheManager.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizerContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java


 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Fix For: 3.0.0, 2.4.0

 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch, 
 yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1658) Webservice should redirect to active RM when HA is enabled.


[ 
https://issues.apache.org/jira/browse/YARN-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935027#comment-13935027
 ] 

Hudson commented on YARN-1658:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1701 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1701/])
YARN-1658. Modified web-app framework to let standby RMs redirect web-service 
calls to the active RM. Contributed by Cindy Li. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577408)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/Dispatcher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/Router.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/WebApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMDispatcher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebAppFilter.java


 Webservice should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1658
 URL: https://issues.apache.org/jira/browse/YARN-1658
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Cindy Li
Assignee: Cindy Li
  Labels: YARN
 Fix For: 2.4.0

 Attachments: YARN1658.1.patch, YARN1658.2.patch, YARN1658.3.patch, 
 YARN1658.patch


 When HA is enabled, web service to standby RM should be redirected to the 
 active RM. This is a related Jira to YARN-1525.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1769) CapacityScheduler: Improve reservations

2014-03-14 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-1769:


Attachment: YARN-1769.patch

 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
 YARN-1769.patch, YARN-1769.patch, YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations


[ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935038#comment-13935038
 ] 

Hadoop QA commented on YARN-1769:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634711/YARN-1769.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3364//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3364//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3364//console

This message is automatically generated.

 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
 YARN-1769.patch, YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1717) Enable offline deletion of entries in leveldb timeline store


 [ 
https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-1717:
-

Attachment: YARN-1717.11.patch

I renamed the thread EntityDeletionThread and logged a warning when 
interrupted.  I also found a couple of missing checks for null and addressed 
those.

 Enable offline deletion of entries in leveldb timeline store
 

 Key: YARN-1717
 URL: https://issues.apache.org/jira/browse/YARN-1717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1717.1.patch, YARN-1717.10.patch, 
 YARN-1717.11.patch, YARN-1717.2.patch, YARN-1717.3.patch, YARN-1717.4.patch, 
 YARN-1717.5.patch, YARN-1717.6-extra.patch, YARN-1717.6.patch, 
 YARN-1717.7.patch, YARN-1717.8.patch, YARN-1717.9.patch


 The leveldb timeline store implementation needs the following:
 * better documentation of its internal structures
 * internal changes to enable deleting entities
 ** never overwrite existing primary filter entries
 ** add hidden reverse pointers to related entities



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1717) Enable offline deletion of entries in leveldb timeline store


[ 
https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935117#comment-13935117
 ] 

Hadoop QA commented on YARN-1717:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634721/YARN-1717.11.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3366//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3366//console

This message is automatically generated.

 Enable offline deletion of entries in leveldb timeline store
 

 Key: YARN-1717
 URL: https://issues.apache.org/jira/browse/YARN-1717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1717.1.patch, YARN-1717.10.patch, 
 YARN-1717.11.patch, YARN-1717.2.patch, YARN-1717.3.patch, YARN-1717.4.patch, 
 YARN-1717.5.patch, YARN-1717.6-extra.patch, YARN-1717.6.patch, 
 YARN-1717.7.patch, YARN-1717.8.patch, YARN-1717.9.patch


 The leveldb timeline store implementation needs the following:
 * better documentation of its internal structures
 * internal changes to enable deleting entities
 ** never overwrite existing primary filter entries
 ** add hidden reverse pointers to related entities



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1658) Webservice should redirect to active RM when HA is enabled.


[ 
https://issues.apache.org/jira/browse/YARN-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935106#comment-13935106
 ] 

Hudson commented on YARN-1658:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1726 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1726/])
YARN-1658. Modified web-app framework to let standby RMs redirect web-service 
calls to the active RM. Contributed by Cindy Li. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577408)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/Dispatcher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/Router.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/WebApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMDispatcher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebAppFilter.java


 Webservice should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1658
 URL: https://issues.apache.org/jira/browse/YARN-1658
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Cindy Li
Assignee: Cindy Li
  Labels: YARN
 Fix For: 2.4.0

 Attachments: YARN1658.1.patch, YARN1658.2.patch, YARN1658.3.patch, 
 YARN1658.patch


 When HA is enabled, web service to standby RM should be redirected to the 
 active RM. This is a related Jira to YARN-1525.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource


[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935100#comment-13935100
 ] 

Hudson commented on YARN-1771:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1726 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1726/])
YARN-1771. Reduce the number of NameNode operations during localization of
public resources using a cache. Contributed by Sangjin Lee (cdouglas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577391)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapred/TestLocalDistributedCacheManager.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizerContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java


 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Fix For: 3.0.0, 2.4.0

 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch, 
 yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1833) TestRMAdminService Fails in branch-2

Mit Desai created YARN-1833:
---

 Summary: TestRMAdminService Fails in branch-2
 Key: YARN-1833
 URL: https://issues.apache.org/jira/browse/YARN-1833
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai


In the test 
testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the 
following assert is not needed.

{code}
Assert.assertTrue(groupWithInit.size() != groupBefore.size());
{code}

As the assert takes the default groups for groupWithInit (which in my case are 
users, sshusers and wheel), it fails as the size of both groupWithInit and 
groupBefore are same.

I do not think we need to have this assert here. Moreover we are also checking 
that the groupInit does not have the userGroups that are in the groupBefore so 
removing the assert may not be harmful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations

[
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935095#comment-13935095
]

Hadoop QA commented on YARN-1769:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12634719/YARN-1769.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 5 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:red}-1 findbugs{color}. The patch appears to introduce 1 new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService

org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/3365//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-YARN-Build/3365//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3365//console

This message is automatically generated.

CapacityScheduler: Improve reservations

Key: YARN-1769
URL: https://issues.apache.org/jira/browse/YARN-1769
Project: Hadoop YARN
Issue Type: Improvement
Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch,
YARN-1769.patch, YARN-1769.patch, YARN-1769.patch

Currently the CapacityScheduler uses reservations in order to handle requests
for large containers and the fact there might not currently be enough space
available on a single host.
The current algorithm for reservations is to reserve as many containers as
currently required and then it will start to reserve more above that after a
certain number of re-reservations (currently biased against larger
containers). Anytime it hits the limit of number reserved it stops looking
at any other nodes. This results in potentially missing nodes that have
enough space to fullfill the request.
The other place for improvement is currently reservations count against your
queue capacity. If you have reservations you could hit the various limits
which would then stop you from looking further at that node.
The above 2 cases can cause an application requesting a larger container to
take a long time to gets it resources.
We could improve upon both of those by simply continuing to look at incoming
nodes to see if we could potentially swap out a reservation for an actual
allocation.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1717) Enable offline deletion of entries in leveldb timeline store


[ 
https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935128#comment-13935128
 ] 

Billie Rinaldi commented on YARN-1717:
--

bq. We do deletion according to entity's TS and at the entity's granularity, 
thus, the events that are still alive are likely to be deleted as well.

I believe this is the desired behavior.  For example, in the case where we have 
a job entity that starts several shorter-lived task entities, we would not want 
to remove task entities before the job entity is removed.  With the current 
behavior, the job entity would be removed at the same time or earlier than the 
task entities.

We don't yet have a good understanding of how applications with long-lived 
entities would want to use the timeline store, so it's hard to design for them. 
 Perhaps an option for the future would be to have a configurable deletion 
strategy, if some applications have different requirements.

 Enable offline deletion of entries in leveldb timeline store
 

 Key: YARN-1717
 URL: https://issues.apache.org/jira/browse/YARN-1717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1717.1.patch, YARN-1717.10.patch, 
 YARN-1717.11.patch, YARN-1717.2.patch, YARN-1717.3.patch, YARN-1717.4.patch, 
 YARN-1717.5.patch, YARN-1717.6-extra.patch, YARN-1717.6.patch, 
 YARN-1717.7.patch, YARN-1717.8.patch, YARN-1717.9.patch


 The leveldb timeline store implementation needs the following:
 * better documentation of its internal structures
 * internal changes to enable deleting entities
 ** never overwrite existing primary filter entries
 ** add hidden reverse pointers to related entities



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1690) sending ATS events from Distributed shell

2014-03-14 Thread Mayank Bansal (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935148#comment-13935148
]

Mayank Bansal commented on YARN-1690:
-

Thanks [~zjshen] for the review

bq. 1. Catch Exception and merge the duplicate handling.
Done

bq. 2. Call it timeline client, and similar for the following code.
Done

bq. 3. Is the following the related change
This is done due to logging issue, so its good to put it.

bq. 4. Don't create a new config, but use the existing one.
It has to be created, there is no previous config

bq. 5. Call it DS_CONTAINER? Do not confuse it with the generic information.
Done

bq. 6. Entity type is different from event type. Call it DS_APPLICATION_ATTEMPT?
Done

bq. 7. Event type is not set
Done

bq. 8. Correct STatus
Done

bq. 9. Can you add user as the primary filter?
Done

bq. 10. In general, it doesn't make sense to record the information that the
generic history service has already captured, such as the other info for
container. It's per-framework data, such that it's better to record some DS
specific information.
chaged the names

bq. 11. Need more assertion. For example, test both container and attempt
entities.
Done

bq. 12. Mark it @Private as well
Done

bq. 13. Correct comment? It seems you choose to set default AHS address, and
don't understand why it is related to YARN_MINICLUSTER_FIXED_PORTS.
Done

sending ATS events from Distributed shell
--

Key: YARN-1690
URL: https://issues.apache.org/jira/browse/YARN-1690
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Attachments: YARN-1690-1.patch, YARN-1690-2.patch, YARN-1690-3.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1690) sending ATS events from Distributed shell

2014-03-14 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1690:


Attachment: YARN-1690-4.patch

Attaching latest patch

Thanks,
Mayank

 sending ATS events from Distributed shell 
 --

 Key: YARN-1690
 URL: https://issues.apache.org/jira/browse/YARN-1690
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1690-1.patch, YARN-1690-2.patch, YARN-1690-3.patch, 
 YARN-1690-4.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1690) sending ATS events from Distributed shell


[ 
https://issues.apache.org/jira/browse/YARN-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935180#comment-13935180
 ] 

Hadoop QA commented on YARN-1690:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634729/YARN-1690-4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3367//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3367//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-applications-distributedshell.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3367//console

This message is automatically generated.

 sending ATS events from Distributed shell 
 --

 Key: YARN-1690
 URL: https://issues.apache.org/jira/browse/YARN-1690
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1690-1.patch, YARN-1690-2.patch, YARN-1690-3.patch, 
 YARN-1690-4.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1685) Bugs around log URL


[ 
https://issues.apache.org/jira/browse/YARN-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935195#comment-13935195
 ] 

Tsuyoshi OZAWA commented on YARN-1685:
--

TestResourceTrackerService failure is tracked on YARN-1591. TestRMRestart 
failure is tracked on YARN-1830.

 Bugs around log URL
 ---

 Key: YARN-1685
 URL: https://issues.apache.org/jira/browse/YARN-1685
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Zhijie Shen
 Attachments: YARN-1685-1.patch, YARN-1685.2.patch, YARN-1685.3.patch, 
 YARN-1685.4.patch, YARN-1685.5.patch


 1. Log URL should be different when the container is running and finished
 2. Null case needs to be handled
 3. The way of constructing log URL should be corrected



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk


[ 
https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935205#comment-13935205
 ] 

Tsuyoshi OZAWA commented on YARN-1591:
--

The failure of TestRMRestart is unrelated and tracked on YARN-1830. I ran 
hundreds time of TestResourceTrackerService last night and a latest patch looks 
work well. [~jianhe], can you take a look?

 TestResourceTrackerService fails randomly on trunk
 --

 Key: YARN-1591
 URL: https://issues.apache.org/jira/browse/YARN-1591
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, 
 YARN-1591.3.patch, YARN-1591.5.patch, YARN-1591.6.patch


 As evidenced by Jenkins at 
 https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621.
 It's failing randomly on trunk on my local box too 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1834) YarnClient will not be redirected to the history server when RM is done

Zhijie Shen created YARN-1834:
-

 Summary: YarnClient will not be redirected to the history server 
when RM is done
 Key: YARN-1834
 URL: https://issues.apache.org/jira/browse/YARN-1834
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen


When RM is not available, the client will keep retrying on RM, such that it 
won't reach the history server to get the app/atttempt/container's info. 
Therefore, during RM restart, such a request will be blocked. However, it has 
the opportunity to move on given history service is enabled.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1834) YarnClient will not be redirected to the history server when RM is done


 [ 
https://issues.apache.org/jira/browse/YARN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1834:
--

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-321

 YarnClient will not be redirected to the history server when RM is done
 ---

 Key: YARN-1834
 URL: https://issues.apache.org/jira/browse/YARN-1834
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen

 When RM is not available, the client will keep retrying on RM, such that it 
 won't reach the history server to get the app/atttempt/container's info. 
 Therefore, during RM restart, such a request will be blocked. However, it has 
 the opportunity to move on given history service is enabled.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation


[ 
https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935274#comment-13935274
 ] 

Zhijie Shen commented on YARN-1521:
---

Just a reminder, ApplicationClientProtocol has four more methods, whose 
idempotency needs to be verified as well:

1. getApplicationAttemptReport
2. getApplicationAttempts
3. getContainerReport
4. getContainers

 Mark appropriate protocol methods with the idempotent annotation or 
 AtMostOnce annotation
 -

 Key: YARN-1521
 URL: https://issues.apache.org/jira/browse/YARN-1521
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong

 After YARN-1028, we add the automatically failover into RMProxy. This JIRA is 
 to identify whether we need to add idempotent annotation and which methods 
 can be marked as idempotent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1835) History client service needs to be more robust

Zhijie Shen created YARN-1835:
-

 Summary: History client service needs to be more robust
 Key: YARN-1835
 URL: https://issues.apache.org/jira/browse/YARN-1835
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen


While doing the test, I've found the following issues so far:

1. The history file not found exception is exposed to the user directly, which 
is better to be caught and translated into ApplicationNotFound.
2. NPE will be exposed as well, since ApplicationHistoryManager doesn't do 
necessary null check.

In addition, TestApplicationHistoryManagerImpl missed to test most 
ApplicationHistoryManager methods.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1835) History client service needs to be more robust


 [ 
https://issues.apache.org/jira/browse/YARN-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1835:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-321

 History client service needs to be more robust
 --

 Key: YARN-1835
 URL: https://issues.apache.org/jira/browse/YARN-1835
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 While doing the test, I've found the following issues so far:
 1. The history file not found exception is exposed to the user directly, 
 which is better to be caught and translated into ApplicationNotFound.
 2. NPE will be exposed as well, since ApplicationHistoryManager doesn't do 
 necessary null check.
 In addition, TestApplicationHistoryManagerImpl missed to test most 
 ApplicationHistoryManager methods.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-14 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-1771:


Fix Version/s: 2.5.0

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Fix For: 3.0.0, 2.4.0, 2.5.0

 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch, 
 yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-14 Thread Chris Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935310#comment-13935310
 ] 

Chris Douglas commented on YARN-1771:
-

bq. It would be great if you could commit this to branch-2.4 too...

Sure, np. Done

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Fix For: 3.0.0, 2.4.0, 2.5.0

 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch, 
 yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1833) TestRMAdminService Fails in branch-2


[ 
https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935313#comment-13935313
 ] 

Akira AJISAKA commented on YARN-1833:
-

I understand the test will fail if testuser belongs to 3 
(={{groupBefore.size()}}) groups. +1 for removing the assertion.
In addition, I think the test will fail in trunk also.

 TestRMAdminService Fails in branch-2
 

 Key: YARN-1833
 URL: https://issues.apache.org/jira/browse/YARN-1833
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai

 In the test 
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the 
 following assert is not needed.
 {code}
 Assert.assertTrue(groupWithInit.size() != groupBefore.size());
 {code}
 As the assert takes the default groups for groupWithInit (which in my case 
 are users, sshusers and wheel), it fails as the size of both groupWithInit 
 and groupBefore are same.
 I do not think we need to have this assert here. Moreover we are also 
 checking that the groupInit does not have the userGroups that are in the 
 groupBefore so removing the assert may not be harmful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-14 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935315#comment-13935315
 ] 

Sangjin Lee commented on YARN-1771:
---

Thanks!

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Fix For: 3.0.0, 2.4.0, 2.5.0

 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch, 
 yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM


[ 
https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935329#comment-13935329
 ] 

Robert Kanter commented on YARN-1811:
-

Ok, I'll make it {{@Public}} and put back the old constants and behavior.  The 
AmIpFilter is supposed to be created by the AmFilterInititalizer, so I think we 
can make the new constants package private so that only the Inititalizer uses 
them going forward; and mark the old constants as deprecated.

I agree that it would be simpler to just redirect to any of the RMs and assume 
it auto-redirects to the active RM.  Though this won't work if that RM is 
currently down; so I think we have to check for the active RM, which should be 
up.  I'll look at RMHAUtils though.

{{conf.getValByRegex()}} seemed simpler, but I see your point.  If they have 
some invalid config properties that match, it will pick those up.  

 RM HA: AM link broken if the AM is on nodes other than RM
 -

 Key: YARN-1811
 URL: https://issues.apache.org/jira/browse/YARN-1811
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, 
 YARN-1811.patch


 When using RM HA, if you click on the Application Master link in the RM web 
 UI while the job is running, you get an Error 500:



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1836) Add retry cache support in ResourceManager

Tsuyoshi OZAWA created YARN-1836:


 Summary: Add retry cache support in ResourceManager
 Key: YARN-1836
 URL: https://issues.apache.org/jira/browse/YARN-1836
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA


HDFS-4942 supports RetryCache on NN. This JIRA tracks RetryCache on 
ResourceManager. If the RPCs are non-idempotent, we should use RetryCache to 
avoid returning incorrect failures to client.

YARN-1521 is a related JIRA. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1833) TestRMAdminService Fails in branch-2


[ 
https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935331#comment-13935331
 ] 

Mit Desai commented on YARN-1833:
-

I am in the process of generating the patch. I will be uploading it soon.

 TestRMAdminService Fails in branch-2
 

 Key: YARN-1833
 URL: https://issues.apache.org/jira/browse/YARN-1833
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai

 In the test 
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the 
 following assert is not needed.
 {code}
 Assert.assertTrue(groupWithInit.size() != groupBefore.size());
 {code}
 As the assert takes the default groups for groupWithInit (which in my case 
 are users, sshusers and wheel), it fails as the size of both groupWithInit 
 and groupBefore are same.
 I do not think we need to have this assert here. Moreover we are also 
 checking that the groupInit does not have the userGroups that are in the 
 groupBefore so removing the assert may not be harmful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation


[ 
https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935336#comment-13935336
 ] 

Tsuyoshi OZAWA commented on YARN-1521:
--

We should introduce RetryCache to avoid returning incorrect errors to client if 
non-idempotent RPC(ie. submitApplication) are executed. Opened YARN-1836 for 
this.

 Mark appropriate protocol methods with the idempotent annotation or 
 AtMostOnce annotation
 -

 Key: YARN-1521
 URL: https://issues.apache.org/jira/browse/YARN-1521
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong

 After YARN-1028, we add the automatically failover into RMProxy. This JIRA is 
 to identify whether we need to add idempotent annotation and which methods 
 can be marked as idempotent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1833) TestRMAdminService Fails in branch-2


 [ 
https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1833:


Attachment: YARN-1833.patch

Attaching the patch for trunk and branch-2

 TestRMAdminService Fails in branch-2
 

 Key: YARN-1833
 URL: https://issues.apache.org/jira/browse/YARN-1833
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: Test
 Attachments: YARN-1833.patch


 In the test 
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the 
 following assert is not needed.
 {code}
 Assert.assertTrue(groupWithInit.size() != groupBefore.size());
 {code}
 As the assert takes the default groups for groupWithInit (which in my case 
 are users, sshusers and wheel), it fails as the size of both groupWithInit 
 and groupBefore are same.
 I do not think we need to have this assert here. Moreover we are also 
 checking that the groupInit does not have the userGroups that are in the 
 groupBefore so removing the assert may not be harmful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1833) TestRMAdminService Fails in branch-2


[ 
https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935374#comment-13935374
 ] 

Akira AJISAKA commented on YARN-1833:
-

+1

 TestRMAdminService Fails in branch-2
 

 Key: YARN-1833
 URL: https://issues.apache.org/jira/browse/YARN-1833
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: Test
 Attachments: YARN-1833.patch


 In the test 
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the 
 following assert is not needed.
 {code}
 Assert.assertTrue(groupWithInit.size() != groupBefore.size());
 {code}
 As the assert takes the default groups for groupWithInit (which in my case 
 are users, sshusers and wheel), it fails as the size of both groupWithInit 
 and groupBefore are same.
 I do not think we need to have this assert here. Moreover we are also 
 checking that the groupInit does not have the userGroups that are in the 
 groupBefore so removing the assert may not be harmful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1833) TestRMAdminService Fails in trunk and branch-2


 [ 
https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-1833:


Hadoop Flags: Reviewed
 Summary: TestRMAdminService Fails in trunk and branch-2  (was: 
TestRMAdminService Fails in branch-2)

 TestRMAdminService Fails in trunk and branch-2
 --

 Key: YARN-1833
 URL: https://issues.apache.org/jira/browse/YARN-1833
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: Test
 Attachments: YARN-1833.patch


 In the test 
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the 
 following assert is not needed.
 {code}
 Assert.assertTrue(groupWithInit.size() != groupBefore.size());
 {code}
 As the assert takes the default groups for groupWithInit (which in my case 
 are users, sshusers and wheel), it fails as the size of both groupWithInit 
 and groupBefore are same.
 I do not think we need to have this assert here. Moreover we are also 
 checking that the groupInit does not have the userGroups that are in the 
 groupBefore so removing the assert may not be harmful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1717) Enable offline deletion of entries in leveldb timeline store


[ 
https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935409#comment-13935409
 ] 

Billie Rinaldi commented on YARN-1717:
--

In testing of writing and age off at the same time, the deletion thread did not 
seem to adversely affect the write rate.  With a single writer, I saw about 450 
single-entity puts per second, which is comparable to what I had observed 
previously.  I configured the deletion thread to age data off after 90 seconds, 
and also set the deletion cycle interval to 90 seconds.  It was able to age off 
data around 4500 entities per second.  With these settings, it typically aged 
off on the order of 36,000 entities per cycle in less than 8 seconds.

 Enable offline deletion of entries in leveldb timeline store
 

 Key: YARN-1717
 URL: https://issues.apache.org/jira/browse/YARN-1717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1717.1.patch, YARN-1717.10.patch, 
 YARN-1717.11.patch, YARN-1717.2.patch, YARN-1717.3.patch, YARN-1717.4.patch, 
 YARN-1717.5.patch, YARN-1717.6-extra.patch, YARN-1717.6.patch, 
 YARN-1717.7.patch, YARN-1717.8.patch, YARN-1717.9.patch


 The leveldb timeline store implementation needs the following:
 * better documentation of its internal structures
 * internal changes to enable deleting entities
 ** never overwrite existing primary filter entries
 ** add hidden reverse pointers to related entities



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1795) After YARN-713, using FairScheduler can cause an InvalidToken Exception for NMTokens


[ 
https://issues.apache.org/jira/browse/YARN-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935428#comment-13935428
 ] 

Jian He commented on YARN-1795:
---

[~rkanter], 
{code}
2014-03-06 19:01:24,731 INFO [ContainerLauncher #1] 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing 
the event EventType: CONTAINER_REMOTE_LAUNCH for container 
container_1394161202967_0004_01_04 taskAttempt 
attempt_1394161202967_0004_m_01_0
2014-03-06 19:01:24,733 INFO [ContainerLauncher #0] 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching 
attempt_1394161202967_0004_m_00_0
2014-03-06 19:01:24,733 INFO [ContainerLauncher #1] 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching 
attempt_1394161202967_0004_m_01_0
2014-03-06 19:01:24,734 INFO [ContainerLauncher #0] 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: AAA 
numTokens = 1
  NMToken :: 172.16.1.64:52707 :: 172.16.1.64:52707 
2014-03-06 19:01:24,734 INFO [ContainerLauncher #0] 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: 
Opening proxy : 172.16.1.64:52707
2014-03-06 19:01:24,748 INFO [ContainerLauncher #1] 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: AAA 
numTokens = 1
  NMToken :: 172.16.1.64:52707 :: 172.16.1.64:52707  
{code}
How are you printing the logging? why two duplicate NMTokens printed? but 
numTokens == 1

 After YARN-713, using FairScheduler can cause an InvalidToken Exception for 
 NMTokens
 

 Key: YARN-1795
 URL: https://issues.apache.org/jira/browse/YARN-1795
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Robert Kanter
Priority: Blocker
 Attachments: 
 org.apache.oozie.action.hadoop.TestMapReduceActionExecutor-output.txt, syslog


 Running the Oozie unit tests against a Hadoop build with YARN-713 causes many 
 of the tests to be flakey.  Doing some digging, I found that they were 
 failing because some of the MR jobs were failing; I found this in the syslog 
 of the failed jobs:
 {noformat}
 2014-03-05 16:18:23,452 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
 report from attempt_1394064846476_0013_m_00_0: Container launch failed 
 for container_1394064846476_0013_01_03 : 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
 for 192.168.1.77:50759
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206)
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
 {noformat}
 I did some debugging and found that the NMTokenCache has a different port 
 number than what's being looked up.  For example, the NMTokenCache had one 
 token with address 192.168.1.77:58217 but 
 ContainerManagementProtocolProxy.java:119 is looking for 192.168.1.77:58213. 
 The 58213 address comes from ContainerLauncherImpl's constructor. So when the 
 Container is being launched it somehow has a different port than when the 
 token was created.
 Any ideas why the port numbers wouldn't match?
 Update: This also happens in an actual cluster, not just Oozie's unit tests



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1799) Enhance LocalDirAllocator in NM to consider DiskMaxUtilization cutoff

2014-03-14 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935431#comment-13935431
 ] 

Karthik Kambatla commented on YARN-1799:


bq. Given the disk write speed as a configuration (based on disk type, rpm 
etc), these factors can be derived. And allotted space for a task can also be 
considered.
Sounds reasonable. 

 Enhance LocalDirAllocator in NM to consider DiskMaxUtilization cutoff
 -

 Key: YARN-1799
 URL: https://issues.apache.org/jira/browse/YARN-1799
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sunil G

 LocalDirAllocator provides paths for all tasks for its local write.
 This considers the good list of directories which are selected by the 
 HealthCheck mechamnism in LocalDirsHandlerService
 getLocalPathForWrite() considers whether input demand size can meet the 
 capacity in lastAccessed directory.
 If more tasks asks for path from LocalDirAllocator, then it is possible that 
 the allocation is done based on the current disk availability at that given 
 time.
 But this path would have earlier given to some other tasks to write and they 
 may be sequentially doing writing.
 It is better to check for an upper cutoff for disk availability



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1795) After YARN-713, using FairScheduler can cause an InvalidToken Exception for NMTokens


[ 
https://issues.apache.org/jira/browse/YARN-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935437#comment-13935437
 ] 

Robert Kanter commented on YARN-1795:
-

Sorry, I didn't explain more specifically what I had printed out.  
Each line is a for a token and in this format: {{NMToken :: key :: service}}
where the {{key}} is the key from the hash map in NMTokenCache and the 
{{service}} is the service in the token.  So those end up being the same.

So, its only printing one token in that snippet

 After YARN-713, using FairScheduler can cause an InvalidToken Exception for 
 NMTokens
 

 Key: YARN-1795
 URL: https://issues.apache.org/jira/browse/YARN-1795
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Robert Kanter
Priority: Blocker
 Attachments: 
 org.apache.oozie.action.hadoop.TestMapReduceActionExecutor-output.txt, syslog


 Running the Oozie unit tests against a Hadoop build with YARN-713 causes many 
 of the tests to be flakey.  Doing some digging, I found that they were 
 failing because some of the MR jobs were failing; I found this in the syslog 
 of the failed jobs:
 {noformat}
 2014-03-05 16:18:23,452 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
 report from attempt_1394064846476_0013_m_00_0: Container launch failed 
 for container_1394064846476_0013_01_03 : 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
 for 192.168.1.77:50759
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206)
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
 {noformat}
 I did some debugging and found that the NMTokenCache has a different port 
 number than what's being looked up.  For example, the NMTokenCache had one 
 token with address 192.168.1.77:58217 but 
 ContainerManagementProtocolProxy.java:119 is looking for 192.168.1.77:58213. 
 The 58213 address comes from ContainerLauncherImpl's constructor. So when the 
 Container is being launched it somehow has a different port than when the 
 token was created.
 Any ideas why the port numbers wouldn't match?
 Update: This also happens in an actual cluster, not just Oozie's unit tests



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-1795) After YARN-713, using FairScheduler can cause an InvalidToken Exception for NMTokens

2014-03-14 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-1795:
--

Assignee: Karthik Kambatla

 After YARN-713, using FairScheduler can cause an InvalidToken Exception for 
 NMTokens
 

 Key: YARN-1795
 URL: https://issues.apache.org/jira/browse/YARN-1795
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Robert Kanter
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: 
 org.apache.oozie.action.hadoop.TestMapReduceActionExecutor-output.txt, syslog


 Running the Oozie unit tests against a Hadoop build with YARN-713 causes many 
 of the tests to be flakey.  Doing some digging, I found that they were 
 failing because some of the MR jobs were failing; I found this in the syslog 
 of the failed jobs:
 {noformat}
 2014-03-05 16:18:23,452 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
 report from attempt_1394064846476_0013_m_00_0: Container launch failed 
 for container_1394064846476_0013_01_03 : 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
 for 192.168.1.77:50759
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206)
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
 {noformat}
 I did some debugging and found that the NMTokenCache has a different port 
 number than what's being looked up.  For example, the NMTokenCache had one 
 token with address 192.168.1.77:58217 but 
 ContainerManagementProtocolProxy.java:119 is looking for 192.168.1.77:58213. 
 The 58213 address comes from ContainerLauncherImpl's constructor. So when the 
 Container is being launched it somehow has a different port than when the 
 token was created.
 Any ideas why the port numbers wouldn't match?
 Update: This also happens in an actual cluster, not just Oozie's unit tests



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1795) After YARN-713, using FairScheduler can cause an InvalidToken Exception for NMTokens

2014-03-14 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935439#comment-13935439
 ] 

Karthik Kambatla commented on YARN-1795:


Taking this up to investigate. 

 After YARN-713, using FairScheduler can cause an InvalidToken Exception for 
 NMTokens
 

 Key: YARN-1795
 URL: https://issues.apache.org/jira/browse/YARN-1795
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Robert Kanter
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: 
 org.apache.oozie.action.hadoop.TestMapReduceActionExecutor-output.txt, syslog


 Running the Oozie unit tests against a Hadoop build with YARN-713 causes many 
 of the tests to be flakey.  Doing some digging, I found that they were 
 failing because some of the MR jobs were failing; I found this in the syslog 
 of the failed jobs:
 {noformat}
 2014-03-05 16:18:23,452 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
 report from attempt_1394064846476_0013_m_00_0: Container launch failed 
 for container_1394064846476_0013_01_03 : 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
 for 192.168.1.77:50759
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206)
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
 {noformat}
 I did some debugging and found that the NMTokenCache has a different port 
 number than what's being looked up.  For example, the NMTokenCache had one 
 token with address 192.168.1.77:58217 but 
 ContainerManagementProtocolProxy.java:119 is looking for 192.168.1.77:58213. 
 The 58213 address comes from ContainerLauncherImpl's constructor. So when the 
 Container is being launched it somehow has a different port than when the 
 token was created.
 Any ideas why the port numbers wouldn't match?
 Update: This also happens in an actual cluster, not just Oozie's unit tests



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1837) TestMoveApplication.testMoveRejectedByScheduler randomly fails

Tsuyoshi OZAWA created YARN-1837:


 Summary: TestMoveApplication.testMoveRejectedByScheduler randomly 
fails
 Key: YARN-1837
 URL: https://issues.apache.org/jira/browse/YARN-1837
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Tsuyoshi OZAWA


TestMoveApplication#testMoveRejectedByScheduler fails because of 
NullPointerException. It looks caused by unhandled exception handling at 
server-side.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1837) TestMoveApplication.testMoveRejectedByScheduler randomly fails


[ 
https://issues.apache.org/jira/browse/YARN-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935448#comment-13935448
 ] 

Tsuyoshi OZAWA commented on YARN-1837:
--

Terminal log:
{code}
$ mvn test
...
Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.243 sec  
FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
testMoveRejectedByScheduler(org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication)
  Time elapsed: 0.36 sec   ERROR!
java.lang.NullPointerException: null
at 
org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication.testMoveRejectedByScheduler(TestMoveApplication.java:83)
{code}

TestMoveApplication-output.txt:
{code}
2014-03-14 18:43:31,582 ERROR [AsyncDispatcher event handler] rmapp.RMAppImpl 
(RMAppImpl.java:handle(634)) - Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
APP_ACCEPTED at NEW_SAVING
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:632)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:82)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:685)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:669)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:744)
{code}

 TestMoveApplication.testMoveRejectedByScheduler randomly fails
 --

 Key: YARN-1837
 URL: https://issues.apache.org/jira/browse/YARN-1837
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Tsuyoshi OZAWA

 TestMoveApplication#testMoveRejectedByScheduler fails because of 
 NullPointerException. It looks caused by unhandled exception handling at 
 server-side.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1833) TestRMAdminService Fails in trunk and branch-2


[ 
https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935450#comment-13935450
 ] 

Hadoop QA commented on YARN-1833:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634771/YARN-1833.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3368//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3368//console

This message is automatically generated.

 TestRMAdminService Fails in trunk and branch-2
 --

 Key: YARN-1833
 URL: https://issues.apache.org/jira/browse/YARN-1833
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: Test
 Attachments: YARN-1833.patch


 In the test 
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the 
 following assert is not needed.
 {code}
 Assert.assertTrue(groupWithInit.size() != groupBefore.size());
 {code}
 As the assert takes the default groups for groupWithInit (which in my case 
 are users, sshusers and wheel), it fails as the size of both groupWithInit 
 and groupBefore are same.
 I do not think we need to have this assert here. Moreover we are also 
 checking that the groupInit does not have the userGroups that are in the 
 groupBefore so removing the assert may not be harmful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1474) Make schedulers services


 [ 
https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1474:
-

Attachment: YARN-1474.7.patch

Updated a patch to pass tests.

 Make schedulers services
 

 Key: YARN-1474
 URL: https://issues.apache.org/jira/browse/YARN-1474
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.3.0
Reporter: Sandy Ryza
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1474.1.patch, YARN-1474.2.patch, YARN-1474.3.patch, 
 YARN-1474.4.patch, YARN-1474.5.patch, YARN-1474.6.patch, YARN-1474.7.patch


 Schedulers currently have a reinitialize but no start and stop.  Fitting them 
 into the YARN service model would make things more coherent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk


[ 
https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935502#comment-13935502
 ] 

Mit Desai commented on YARN-1591:
-

Hey, I have done a little investigation on the test.
{code}
static {
DefaultMetricsSystem.setMiniClusterMode(true);
  }
{code}

Setting this property ignores the Metrics source already in the unit test.
This change seems to be working on my local machine. What do you guys think of 
it?

 TestResourceTrackerService fails randomly on trunk
 --

 Key: YARN-1591
 URL: https://issues.apache.org/jira/browse/YARN-1591
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, 
 YARN-1591.3.patch, YARN-1591.5.patch, YARN-1591.6.patch


 As evidenced by Jenkins at 
 https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621.
 It's failing randomly on trunk on my local box too 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1833) TestRMAdminService Fails in trunk and branch-2


[ 
https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935505#comment-13935505
 ] 

Mit Desai commented on YARN-1833:
-

Thanks Akira. Even I verified that it is not related to my patch.

 TestRMAdminService Fails in trunk and branch-2
 --

 Key: YARN-1833
 URL: https://issues.apache.org/jira/browse/YARN-1833
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: Test
 Attachments: YARN-1833.patch


 In the test 
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the 
 following assert is not needed.
 {code}
 Assert.assertTrue(groupWithInit.size() != groupBefore.size());
 {code}
 As the assert takes the default groups for groupWithInit (which in my case 
 are users, sshusers and wheel), it fails as the size of both groupWithInit 
 and groupBefore are same.
 I do not think we need to have this assert here. Moreover we are also 
 checking that the groupInit does not have the userGroups that are in the 
 groupBefore so removing the assert may not be harmful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM


[ 
https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935528#comment-13935528
 ] 

Robert Kanter commented on YARN-1811:
-

{quote}If we still do the redirection, where you concatenate RM-IDs, you should 
use RMHAUtils.{quote}
Actually, [~vinodkv], what do you mean by this?  It's already using RMHAUtils 
to find the active RM.  

 RM HA: AM link broken if the AM is on nodes other than RM
 -

 Key: YARN-1811
 URL: https://issues.apache.org/jira/browse/YARN-1811
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, 
 YARN-1811.patch


 When using RM HA, if you click on the Application Master link in the RM web 
 UI while the job is running, you get an Error 500:



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1833) TestRMAdminService Fails in trunk and branch-2

2014-03-14 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935546#comment-13935546
 ] 

Jonathan Eagles commented on YARN-1833:
---

[~mdesai], instead of removing the check, could we investigate using 
UserGroupInformation.createUserForTesting. This will allow us to isolate the 
developer environment as a requirement to the correctness of the test.

 TestRMAdminService Fails in trunk and branch-2
 --

 Key: YARN-1833
 URL: https://issues.apache.org/jira/browse/YARN-1833
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: Test
 Attachments: YARN-1833.patch


 In the test 
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the 
 following assert is not needed.
 {code}
 Assert.assertTrue(groupWithInit.size() != groupBefore.size());
 {code}
 As the assert takes the default groups for groupWithInit (which in my case 
 are users, sshusers and wheel), it fails as the size of both groupWithInit 
 and groupBefore are same.
 I do not think we need to have this assert here. Moreover we are also 
 checking that the groupInit does not have the userGroups that are in the 
 groupBefore so removing the assert may not be harmful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1474) Make schedulers services


[ 
https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935559#comment-13935559
 ] 

Hadoop QA commented on YARN-1474:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634796/YARN-1474.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 10 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-tools/hadoop-sls 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3369//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3369//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3369//console

This message is automatically generated.

 Make schedulers services
 

 Key: YARN-1474
 URL: https://issues.apache.org/jira/browse/YARN-1474
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.3.0
Reporter: Sandy Ryza
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1474.1.patch, YARN-1474.2.patch, YARN-1474.3.patch, 
 YARN-1474.4.patch, YARN-1474.5.patch, YARN-1474.6.patch, YARN-1474.7.patch


 Schedulers currently have a reinitialize but no start and stop.  Fitting them 
 into the YARN service model would make things more coherent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1717) Enable offline deletion of entries in leveldb timeline store

[
https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935568#comment-13935568
]

Zhijie Shen commented on YARN-1717:
---

Billie, thanks for your metrics. I've don some simple calculation myself. In
long term, if a cluster has x entities written per second, no matter how long
the ttl is, the number of entities to delete per second should be x on average.
Therefore, let's say throughput of put requests is 100 entities/sec, the number
of entities to delete per second will be 100 as well. Given we do the deletion
every 5 minutes, we have 30,000 entities to delete per round. According to your
measurement, it will take less than 8 sec to complete the deletion. The
deletion will delay put request, but every 5 mins, it just happens for 8 secs,
i.e., 2.67%. It sounds good to me.

+1 for the patch. Will commit it.

Enable offline deletion of entries in leveldb timeline store

Key: YARN-1717
URL: https://issues.apache.org/jira/browse/YARN-1717
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
Attachments: YARN-1717.1.patch, YARN-1717.10.patch,
YARN-1717.11.patch, YARN-1717.2.patch, YARN-1717.3.patch, YARN-1717.4.patch,
YARN-1717.5.patch, YARN-1717.6-extra.patch, YARN-1717.6.patch,
YARN-1717.7.patch, YARN-1717.8.patch, YARN-1717.9.patch

The leveldb timeline store implementation needs the following:
* better documentation of its internal structures
* internal changes to enable deleting entities
** never overwrite existing primary filter entries
** add hidden reverse pointers to related entities

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1717) Enable offline deletion of entries in leveldb timeline store


 [ 
https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1717:
--

Hadoop Flags: Reviewed

 Enable offline deletion of entries in leveldb timeline store
 

 Key: YARN-1717
 URL: https://issues.apache.org/jira/browse/YARN-1717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1717.1.patch, YARN-1717.10.patch, 
 YARN-1717.11.patch, YARN-1717.2.patch, YARN-1717.3.patch, YARN-1717.4.patch, 
 YARN-1717.5.patch, YARN-1717.6-extra.patch, YARN-1717.6.patch, 
 YARN-1717.7.patch, YARN-1717.8.patch, YARN-1717.9.patch


 The leveldb timeline store implementation needs the following:
 * better documentation of its internal structures
 * internal changes to enable deleting entities
 ** never overwrite existing primary filter entries
 ** add hidden reverse pointers to related entities



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM


 [ 
https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-1811:


Attachment: YARN-1811.patch

Updated patch based on Vinod's comments

 RM HA: AM link broken if the AM is on nodes other than RM
 -

 Key: YARN-1811
 URL: https://issues.apache.org/jira/browse/YARN-1811
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, 
 YARN-1811.patch, YARN-1811.patch


 When using RM HA, if you click on the Application Master link in the RM web 
 UI while the job is running, you get an Error 500:



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1833) TestRMAdminService Fails in trunk and branch-2


 [ 
https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1833:


Attachment: YARN-1833-v2.patch

Thanks [~jeagles] for the suggestion. I did not think about that solution. 
Attaching the new patch with the dummyUser for the test and no Assert Removed.

 TestRMAdminService Fails in trunk and branch-2
 --

 Key: YARN-1833
 URL: https://issues.apache.org/jira/browse/YARN-1833
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: Test
 Attachments: YARN-1833-v2.patch, YARN-1833.patch


 In the test 
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the 
 following assert is not needed.
 {code}
 Assert.assertTrue(groupWithInit.size() != groupBefore.size());
 {code}
 As the assert takes the default groups for groupWithInit (which in my case 
 are users, sshusers and wheel), it fails as the size of both groupWithInit 
 and groupBefore are same.
 I do not think we need to have this assert here. Moreover we are also 
 checking that the groupInit does not have the userGroups that are in the 
 groupBefore so removing the assert may not be harmful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-03-14 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935594#comment-13935594
 ] 

Alejandro Abdelnur commented on YARN-796:
-

scheduler configurations are refreshed dynamically, if the list of valid labels 
is there, it could be refreshed as well. i would prefer to detect  reject 
typos from a user experience and troubleshooting point of view.

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1717) Enable offline deletion of entries in leveldb timeline store


[ 
https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935618#comment-13935618
 ] 

Hudson commented on YARN-1717:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5331 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5331/])
YARN-1717. Enabled periodically discarding old data in LeveldbTimelineStore. 
Contributed by Billie Rinaldi. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577693)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/GenericObjectMapper.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/LeveldbTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TimelineWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TestLeveldbTimelineStore.java


 Enable offline deletion of entries in leveldb timeline store
 

 Key: YARN-1717
 URL: https://issues.apache.org/jira/browse/YARN-1717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Fix For: 2.4.0

 Attachments: YARN-1717.1.patch, YARN-1717.10.patch, 
 YARN-1717.11.patch, YARN-1717.2.patch, YARN-1717.3.patch, YARN-1717.4.patch, 
 YARN-1717.5.patch, YARN-1717.6-extra.patch, YARN-1717.6.patch, 
 YARN-1717.7.patch, YARN-1717.8.patch, YARN-1717.9.patch


 The leveldb timeline store implementation needs the following:
 * better documentation of its internal structures
 * internal changes to enable deleting entities
 ** never overwrite existing primary filter entries
 ** add hidden reverse pointers to related entities



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1833) TestRMAdminService Fails in trunk and branch-2

2014-03-14 Thread Chen He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935625#comment-13935625
 ] 

Chen He commented on YARN-1833:
---

+1, theYARN-1833-v2.patch works and the unit test passed. 

 TestRMAdminService Fails in trunk and branch-2
 --

 Key: YARN-1833
 URL: https://issues.apache.org/jira/browse/YARN-1833
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: Test
 Attachments: YARN-1833-v2.patch, YARN-1833.patch


 In the test 
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the 
 following assert is not needed.
 {code}
 Assert.assertTrue(groupWithInit.size() != groupBefore.size());
 {code}
 As the assert takes the default groups for groupWithInit (which in my case 
 are users, sshusers and wheel), it fails as the size of both groupWithInit 
 and groupBefore are same.
 I do not think we need to have this assert here. Moreover we are also 
 checking that the groupInit does not have the userGroups that are in the 
 groupBefore so removing the assert may not be harmful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1551) Allow user-specified reason for killApplication

2014-03-14 Thread Gera Shegalov (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated YARN-1551:


 Target Version/s: 2.4.0
Affects Version/s: (was: 2.4.0)
   2.3.0

 Allow user-specified reason for killApplication
 ---

 Key: YARN-1551
 URL: https://issues.apache.org/jira/browse/YARN-1551
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: YARN-1551.v01.patch, YARN-1551.v02.patch, 
 YARN-1551.v03.patch, YARN-1551.v04.patch, YARN-1551.v05.patch, 
 YARN-1551.v06.patch, YARN-1551.v06.patch


 This completes MAPREDUCE-5648



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1206) Container logs link is broken on RM web UI after application finished


[ 
https://issues.apache.org/jira/browse/YARN-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935658#comment-13935658
 ] 

Jian He commented on YARN-1206:
---

Thanks for the patch !  LGTM, can you add a comment say why we should not check 
container == null ?

 Container logs link is broken on RM web UI after application finished
 -

 Key: YARN-1206
 URL: https://issues.apache.org/jira/browse/YARN-1206
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Rohith
Priority: Blocker
 Attachments: YARN-1206.patch


 With log aggregation disabled, when container is running, its logs link works 
 properly, but after the application is finished, the link shows 'Container 
 does not exist.'



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1833) TestRMAdminService Fails in trunk and branch-2


[ 
https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935673#comment-13935673
 ] 

Hadoop QA commented on YARN-1833:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634829/YARN-1833-v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3370//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3370//console

This message is automatically generated.

 TestRMAdminService Fails in trunk and branch-2
 --

 Key: YARN-1833
 URL: https://issues.apache.org/jira/browse/YARN-1833
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: Test
 Attachments: YARN-1833-v2.patch, YARN-1833.patch


 In the test 
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the 
 following assert is not needed.
 {code}
 Assert.assertTrue(groupWithInit.size() != groupBefore.size());
 {code}
 As the assert takes the default groups for groupWithInit (which in my case 
 are users, sshusers and wheel), it fails as the size of both groupWithInit 
 and groupBefore are same.
 I do not think we need to have this assert here. Moreover we are also 
 checking that the groupInit does not have the userGroups that are in the 
 groupBefore so removing the assert may not be harmful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1474) Make schedulers services


[ 
https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935689#comment-13935689
 ] 

Tsuyoshi OZAWA commented on YARN-1474:
--

[~kkambatl], [~sandyr], [~vinodkv], Now a latest patch passes tests. The test 
failure is filed on YARN-1591 and unrelated to this JIRA. I appreciate if you 
can take a look at the patch. If you have additional comments or better 
approach, please let me know. Thanks!

 Make schedulers services
 

 Key: YARN-1474
 URL: https://issues.apache.org/jira/browse/YARN-1474
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.3.0
Reporter: Sandy Ryza
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1474.1.patch, YARN-1474.2.patch, YARN-1474.3.patch, 
 YARN-1474.4.patch, YARN-1474.5.patch, YARN-1474.6.patch, YARN-1474.7.patch


 Schedulers currently have a reinitialize but no start and stop.  Fitting them 
 into the YARN service model would make things more coherent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1838) ATS entities api should provide ability to get entities from given id

2014-03-14 Thread Srimanth Gunturi (JIRA)

Srimanth Gunturi created YARN-1838:
--

 Summary: ATS entities api should provide ability to get entities 
from given id
 Key: YARN-1838
 URL: https://issues.apache.org/jira/browse/YARN-1838
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: Srimanth Gunturi


To support pagination, we need ability to get entities from a certain ID by 
providing a new param called {{fromid}}.

For example on a page of 10 jobs, our first call will be like
[http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11]
When user hits next, we would like to call
[http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11]
and continue on for further _Next_ clicks

On hitting back, we will make similar calls for previous items
[http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11]

{{fromid}} should be inclusive of the id given.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1515) Ability to dump the container threads and stop the containers in a single RPC

2014-03-14 Thread Gera Shegalov (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated YARN-1515:


Target Version/s: 2.4.0

 Ability to dump the container threads and stop the containers in a single RPC
 -

 Key: YARN-1515
 URL: https://issues.apache.org/jira/browse/YARN-1515
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, nodemanager
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: YARN-1515.v01.patch, YARN-1515.v02.patch, 
 YARN-1515.v03.patch, YARN-1515.v04.patch, YARN-1515.v05.patch


 This is needed to implement MAPREDUCE-5044 to enable thread diagnostics for 
 timed-out task attempts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id


 [ 
https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1838:
--

Component/s: (was: api)
   Assignee: Billie Rinaldi
Summary: Timeline service getEntities API should provide ability to get 
entities from given id  (was: ATS entities api should provide ability to get 
entities from given id)

 Timeline service getEntities API should provide ability to get entities from 
 given id
 -

 Key: YARN-1838
 URL: https://issues.apache.org/jira/browse/YARN-1838
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Srimanth Gunturi
Assignee: Billie Rinaldi

 To support pagination, we need ability to get entities from a certain ID by 
 providing a new param called {{fromid}}.
 For example on a page of 10 jobs, our first call will be like
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11]
 When user hits next, we would like to call
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11]
 and continue on for further _Next_ clicks
 On hitting back, we will make similar calls for previous items
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11]
 {{fromid}} should be inclusive of the id given.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1833) TestRMAdminService Fails in trunk and branch-2

2014-03-14 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935704#comment-13935704
 ] 

Jonathan Eagles commented on YARN-1833:
---

+1. YARN-1830 causes the TestRMRestart error.

 TestRMAdminService Fails in trunk and branch-2
 --

 Key: YARN-1833
 URL: https://issues.apache.org/jira/browse/YARN-1833
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: Test
 Attachments: YARN-1833-v2.patch, YARN-1833.patch


 In the test 
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the 
 following assert is not needed.
 {code}
 Assert.assertTrue(groupWithInit.size() != groupBefore.size());
 {code}
 As the assert takes the default groups for groupWithInit (which in my case 
 are users, sshusers and wheel), it fails as the size of both groupWithInit 
 and groupBefore are same.
 I do not think we need to have this assert here. Moreover we are also 
 checking that the groupInit does not have the userGroups that are in the 
 groupBefore so removing the assert may not be harmful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id


 [ 
https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-1838:
-

Attachment: YARN-1838.1.patch

The attached patch implements the fromId parameter.  I used camel case for 
fromId to match the other query parameters (primaryFilter, windowEnd, etc.).

 Timeline service getEntities API should provide ability to get entities from 
 given id
 -

 Key: YARN-1838
 URL: https://issues.apache.org/jira/browse/YARN-1838
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Srimanth Gunturi
Assignee: Billie Rinaldi
 Attachments: YARN-1838.1.patch


 To support pagination, we need ability to get entities from a certain ID by 
 providing a new param called {{fromid}}.
 For example on a page of 10 jobs, our first call will be like
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11]
 When user hits next, we would like to call
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11]
 and continue on for further _Next_ clicks
 On hitting back, we will make similar calls for previous items
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11]
 {{fromid}} should be inclusive of the id given.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1833) TestRMAdminService Fails in trunk and branch-2


[ 
https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935716#comment-13935716
 ] 

Akira AJISAKA commented on YARN-1833:
-

Thanks [~jeagles] and [~mitdesai] for improving. +1 for the v2 patch.

 TestRMAdminService Fails in trunk and branch-2
 --

 Key: YARN-1833
 URL: https://issues.apache.org/jira/browse/YARN-1833
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: Test
 Attachments: YARN-1833-v2.patch, YARN-1833.patch


 In the test 
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the 
 following assert is not needed.
 {code}
 Assert.assertTrue(groupWithInit.size() != groupBefore.size());
 {code}
 As the assert takes the default groups for groupWithInit (which in my case 
 are users, sshusers and wheel), it fails as the size of both groupWithInit 
 and groupBefore are same.
 I do not think we need to have this assert here. Moreover we are also 
 checking that the groupInit does not have the userGroups that are in the 
 groupBefore so removing the assert may not be harmful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1833) TestRMAdminService Fails in trunk and branch-2


[ 
https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935729#comment-13935729
 ] 

Hudson commented on YARN-1833:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5333 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5333/])
YARN-1833. TestRMAdminService Fails in trunk and branch-2 (Mit Desais via 
jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577737)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java


 TestRMAdminService Fails in trunk and branch-2
 --

 Key: YARN-1833
 URL: https://issues.apache.org/jira/browse/YARN-1833
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: Test
 Attachments: YARN-1833-v2.patch, YARN-1833.patch


 In the test 
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the 
 following assert is not needed.
 {code}
 Assert.assertTrue(groupWithInit.size() != groupBefore.size());
 {code}
 As the assert takes the default groups for groupWithInit (which in my case 
 are users, sshusers and wheel), it fails as the size of both groupWithInit 
 and groupBefore are same.
 I do not think we need to have this assert here. Moreover we are also 
 checking that the groupInit does not have the userGroups that are in the 
 groupBefore so removing the assert may not be harmful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1690) sending ATS events from Distributed shell

2014-03-14 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1690:


Attachment: YARN-1690-5.patch

Fixing Find bug warning

Thanks,
Mayank

 sending ATS events from Distributed shell 
 --

 Key: YARN-1690
 URL: https://issues.apache.org/jira/browse/YARN-1690
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1690-1.patch, YARN-1690-2.patch, YARN-1690-3.patch, 
 YARN-1690-4.patch, YARN-1690-5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1599) webUI rm.webapp.AppBlock should redirect to a history App page if and when available

2014-03-14 Thread Gera Shegalov (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935742#comment-13935742
 ] 

Gera Shegalov commented on YARN-1599:
-

[~jlowe] thanks for pointing in the right direction. E.g., setting 
yarn.log.server.url to 
{code}http://${mapreduce.jobhistory.webapp.address}/jobhistory/logs{code} 
solves the problem on the pseudo-distributed cluster.

 webUI rm.webapp.AppBlock should redirect to a history App page if and when 
 available
 

 Key: YARN-1599
 URL: https://issues.apache.org/jira/browse/YARN-1599
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha, 2.2.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: Screen Shot 2014-01-16 at 6.52.17 PM.png, Screen Shot 
 2014-01-16 at 7.30.32 PM.png, YARN-1599.v01.patch, YARN-1599.v02.patch, 
 YARN-1599.v03.patch


 When the log aggregation is enabled, and the application finishes, our users 
 think that the AppMaster logs were lost because the link to the AM attempt 
 logs are not updated and result in HTTP 404. Only tracking URL is updated. In 
 order to have a smoother user experience, we propose to simply redirect to 
 the new tracking URL when the page with invalid log links is accessed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1690) sending ATS events from Distributed shell


[ 
https://issues.apache.org/jira/browse/YARN-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935768#comment-13935768
 ] 

Hadoop QA commented on YARN-1690:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634854/YARN-1690-5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3371//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3371//console

This message is automatically generated.

 sending ATS events from Distributed shell 
 --

 Key: YARN-1690
 URL: https://issues.apache.org/jira/browse/YARN-1690
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1690-1.patch, YARN-1690-2.patch, YARN-1690-3.patch, 
 YARN-1690-4.patch, YARN-1690-5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk


[ 
https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935794#comment-13935794
 ] 

Tsuyoshi OZAWA commented on YARN-1591:
--

Hi [~mitdesai], thank you for joining this JIRA! In fact, the approach you 
suggested was essentially same as YARN-1591.1.patch. 
It's insufficient to deal with the intermittent test failure, because sometimes 
the other problem can happen. It's caused by unhandled YarnRuntimeException 
from AsyncDispacher. The log at the time is as follows:
{code}
$ for i in `seq 1 100`; do mvn test -Dtest=TestResourceTrackerService | grep 
FAILURE; done

... sometimes occurs failure and output file is as follows...
2014-03-14 22:59:31,468 FATAL [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(180)) - Error in 
dispatcher thread
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.lang.InterruptedException
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher.handle(ResourceManager.java:633)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher.handle(ResourceManager.java:539)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.InterruptedException
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
at 
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
at 
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher.handle(ResourceManager.java:631)
... 4 more
{code}

First patch is available at:
https://issues.apache.org/jira/secure/attachment/12633362/YARN-1591.1.patch

 TestResourceTrackerService fails randomly on trunk
 --

 Key: YARN-1591
 URL: https://issues.apache.org/jira/browse/YARN-1591
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, 
 YARN-1591.3.patch, YARN-1591.5.patch, YARN-1591.6.patch


 As evidenced by Jenkins at 
 https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621.
 It's failing randomly on trunk on my local box too 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1536) Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the RMContext methods instead

2014-03-14 Thread Anubhav Dhoot (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935803#comment-13935803
 ] 

Anubhav Dhoot commented on YARN-1536:
-

The test failures are unrelated. The change only replaced a function with its 
inline expansion and removed the function.

 Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the 
 RMContext methods instead
 -

 Key: YARN-1536
 URL: https://issues.apache.org/jira/browse/YARN-1536
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Anubhav Dhoot
Priority: Minor
  Labels: newbie
 Attachments: yarn-1536.patch


 Both ResourceManager and RMContext have methods to access the secret 
 managers, and it should be safe (cleaner) to get rid of the ResourceManager 
 methods.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1536) Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the RMContext methods instead


[ 
https://issues.apache.org/jira/browse/YARN-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935811#comment-13935811
 ] 

Tsuyoshi OZAWA commented on YARN-1536:
--

+1, LGTM. Confirmed to pass tests correctly on local. TestRMRestart's failure 
has been already filed as YARN-1830. [~kkambatl], can you also check it?

 Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the 
 RMContext methods instead
 -

 Key: YARN-1536
 URL: https://issues.apache.org/jira/browse/YARN-1536
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Anubhav Dhoot
Priority: Minor
  Labels: newbie
 Attachments: yarn-1536.patch


 Both ResourceManager and RMContext have methods to access the secret 
 managers, and it should be safe (cleaner) to get rid of the ResourceManager 
 methods.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1536) Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the RMContext methods instead


 [ 
https://issues.apache.org/jira/browse/YARN-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1536:
-

Hadoop Flags: Reviewed

 Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the 
 RMContext methods instead
 -

 Key: YARN-1536
 URL: https://issues.apache.org/jira/browse/YARN-1536
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Anubhav Dhoot
Priority: Minor
  Labels: newbie
 Attachments: yarn-1536.patch


 Both ResourceManager and RMContext have methods to access the secret 
 managers, and it should be safe (cleaner) to get rid of the ResourceManager 
 methods.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation

2014-03-14 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935821#comment-13935821
 ] 

Xuan Gong commented on YARN-1521:
-

But first of all, we need to find which apis can be marked as Idempotent
Here is the list of APIs that I think we can mark as Idempotent:
* ResourceTracker
** registerNodeManager
** nodeHeartbeat

* ResourceManagerAdministrationProtocol
** refreshQueues
** refreshNodes
** refreshSuperUserGroupsConfiguration
** refreshUserToGroupsMappings
** refreshAdminAcls
** refreshServiceAcls

* ApplicationClientProtocol
** forceKillApplication
** getApplicationReport (already marked)
** getClusterMetrics
** getApplications
** getClusterNodes
** getQueueInfo
** getQueueUserAcls
** getApplicationAttemptReport
** getApplicationAttempts
** getContainerReport
** getContainers


 Mark appropriate protocol methods with the idempotent annotation or 
 AtMostOnce annotation
 -

 Key: YARN-1521
 URL: https://issues.apache.org/jira/browse/YARN-1521
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong

 After YARN-1028, we add the automatically failover into RMProxy. This JIRA is 
 to identify whether we need to add idempotent annotation and which methods 
 can be marked as idempotent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1839) Capacity scheduler preempts an AM out. AM attempt 2 fails to launch task container with SecretManager$InvalidToken: No NMToken sent

2014-03-14 Thread Tassapol Athiapinya (JIRA)

Tassapol Athiapinya created YARN-1839:
-

 Summary: Capacity scheduler preempts an AM out. AM attempt 2 fails 
to launch task container with SecretManager$InvalidToken: No NMToken sent
 Key: YARN-1839
 URL: https://issues.apache.org/jira/browse/YARN-1839
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, capacityscheduler
Affects Versions: 2.3.0
Reporter: Tassapol Athiapinya
Priority: Critical


Use single-node cluster. Turn on capacity scheduler preemption. Run MR sleep 
job as app 1. Take entire cluster. Run MR sleep job as app 2. Preempt app1 out. 
Wait till app 2 finishes. App 1 AM attempt 2 will start. It won't be able to 
launch a task container with this error stack trace in AM logs:

{code}
2014-03-13 20:13:50,254 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report 
from attempt_1394741557066_0001_m_00_1009: Container launch failed for 
container_1394741557066_0001_02_21 : 
org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
for host:45454
at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206)
at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
{code}





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-1839) Capacity scheduler preempts an AM out. AM attempt 2 fails to launch task container with SecretManager$InvalidToken: No NMToken sent


 [ 
https://issues.apache.org/jira/browse/YARN-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned YARN-1839:
-

Assignee: Jian He

 Capacity scheduler preempts an AM out. AM attempt 2 fails to launch task 
 container with SecretManager$InvalidToken: No NMToken sent
 ---

 Key: YARN-1839
 URL: https://issues.apache.org/jira/browse/YARN-1839
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, capacityscheduler
Affects Versions: 2.3.0
Reporter: Tassapol Athiapinya
Assignee: Jian He
Priority: Critical

 Use single-node cluster. Turn on capacity scheduler preemption. Run MR sleep 
 job as app 1. Take entire cluster. Run MR sleep job as app 2. Preempt app1 
 out. Wait till app 2 finishes. App 1 AM attempt 2 will start. It won't be 
 able to launch a task container with this error stack trace in AM logs:
 {code}
 2014-03-13 20:13:50,254 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
 report from attempt_1394741557066_0001_m_00_1009: Container launch failed 
 for container_1394741557066_0001_02_21 : 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
 for host:45454
   at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206)
   at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
   at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:722)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1839) Capacity scheduler preempts an AM out. AM attempt 2 fails to launch task container with SecretManager$InvalidToken: No NMToken sent


[ 
https://issues.apache.org/jira/browse/YARN-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935849#comment-13935849
 ] 

Jian He commented on YARN-1839:
---

{code}
  // The node set here is used for differentiating whether the NMToken
  // has been issued for this node from the client's perspective. If
  // this is an AM container, the NMToken is issued only for RM and so
  // we should not update the node set.
  if (container.getId().getId() != 1) {
nodeSet.add(container.getNodeId()); 
{code}
This piece of code is flawed. We cannot assume AM container Id always  equal to 
1. If AM container Id doesn't equal to one and it's added into the node set, RM 
will think this NMToken has already been sent and won't sent for other normal 
containers which AM asks.

 Capacity scheduler preempts an AM out. AM attempt 2 fails to launch task 
 container with SecretManager$InvalidToken: No NMToken sent
 ---

 Key: YARN-1839
 URL: https://issues.apache.org/jira/browse/YARN-1839
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, capacityscheduler
Affects Versions: 2.3.0
Reporter: Tassapol Athiapinya
Assignee: Jian He
Priority: Critical

 Use single-node cluster. Turn on capacity scheduler preemption. Run MR sleep 
 job as app 1. Take entire cluster. Run MR sleep job as app 2. Preempt app1 
 out. Wait till app 2 finishes. App 1 AM attempt 2 will start. It won't be 
 able to launch a task container with this error stack trace in AM logs:
 {code}
 2014-03-13 20:13:50,254 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
 report from attempt_1394741557066_0001_m_00_1009: Container launch failed 
 for container_1394741557066_0001_02_21 : 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
 for host:45454
   at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206)
   at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
   at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:722)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1795) After YARN-713, using FairScheduler can cause an InvalidToken Exception for NMTokens