[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs

2014-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917826#comment-13917826
 ] 

Hadoop QA commented on YARN-1389:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12632192/YARN-1389-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3228//console

This message is automatically generated.

 ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog 
 APIs
 --

 Key: YARN-1389
 URL: https://issues.apache.org/jira/browse/YARN-1389
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1389-1.patch, YARN-1389-2.patch


 As we plan to have the APIs in ApplicationHistoryProtocol to expose the 
 reports of *finished* application attempts and containers, we should do the 
 same for ApplicationClientProtocol, which will return the reports of 
 *running* attempts and containers.
 Later on, we can improve YarnClient to direct the query of running instance 
 to ApplicationClientProtocol, while that of finished instance to 
 ApplicationHistoryProtocol, making it transparent to the users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1759) Configuration settings can potentially disappear post YARN-1666

2014-03-03 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917895#comment-13917895
 ] 

Steve Loughran commented on YARN-1759:
--

I think Hitesh's concern is of the workflow

# load YARN config
# subclass overrides values in its serviceInit()
# new YarnConfig overwrites this.

I'm not sure this happens, certainly {{new YarnConfig(Configuration)}} doesn't 
-it pops up in a few places -hence some logic in 
{{AbstractService.init(Configuration)}} to recognise and handle this situation 
by updating its own {{config}} field.

A small unit test should be able to replicate the problem if it does exist.

 Configuration settings can potentially disappear post YARN-1666
 ---

 Key: YARN-1759
 URL: https://issues.apache.org/jira/browse/YARN-1759
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah

 By implicitly loading core-site and yarn-site again in the RM::serviceInit(), 
 some configs may be unintentionally overridden.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1206) Container logs link is broken on RM web UI after application finished

2014-03-03 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918064#comment-13918064
 ] 

Rohith commented on YARN-1206:
--

 I am able to reproduce this issue in today's trunk with log aggregation 
disable. I verified hadoop-2.1 , this issue does not ocure.

I just going through fix for YARN-649, found there is null check for container 
in ContainerLogsUtils.getContainerLogDirs() method.
{noformat}
   if (container == null) {
  throw new YarnException(Container does not exist.);
}
{noformat}
In hadoop-2.1, above piece of code not there. I am not pretty sure why this is 
added.!!

Basically if container is COMPLETED  than it will be  removed from NMContext ( 
NodeStatusUpdaterImpl.updateAndGetContainerStatuses() ). NM does not have any 
information regarding this container. 

Is it really required to have this check ? 

 Container logs link is broken on RM web UI after application finished
 -

 Key: YARN-1206
 URL: https://issues.apache.org/jira/browse/YARN-1206
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Priority: Blocker

 With log aggregation disabled, when container is running, its logs link works 
 properly, but after the application is finished, the link shows 'Container 
 does not exist.'



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918112#comment-13918112
 ] 

Jason Lowe commented on YARN-1771:
--

Here's a thought to possibly avoid checking each directory level individually: 
what if the NM simply tried to read the file as the user requesting it to be 
public?  The NM should already have the necessary tokens to access the 
resource, so it should be able to use doAs to read the file as the requesting 
user.  The rationale for this approach being that if the user can read the 
resource and is asking for it to be public then they can trivially make the 
data public themselves by copying to /tmp and make the copy publicly accessible.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical

 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1769) CapacityScheduler: Improve reservations

2014-03-03 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-1769:


Attachment: YARN-1769.patch

In this patch I tried to minimize the code changes.  I choose to keep the 
accounting/book keeping of reservations the same to hopefully minimize the 
impact of this and keep it small.   I made this change configurable (which is 
refreshable via yarn rmadmin -refreshQueues). 

At a high level what it does is:

- for the limit checks, it does the normal checks but then if it has hit a 
limit and this is configured on, it does the check again subtracting out the 
amount reserved.  If that is under the limit it allows it to go on to see if it 
could unreserve a spot and use the current node.

- for the number of reservation limit, we simply delay that check and if we 
could allocate on the current node by unreserving then we do.


 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations

2014-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918192#comment-13918192
 ] 

Hadoop QA commented on YARN-1769:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12632271/YARN-1769.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3229//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3229//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3229//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3229//console

This message is automatically generated.

 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-03 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918247#comment-13918247
 ] 

Sangjin Lee commented on YARN-1771:
---

Would it be a little weaker condition than the current public check? The 
current check calls for the READ permission by others.

One possible case here is if the user has a group READ permission on the file 
(but others' READ permission is off). Then the user's doAs would succeed even 
though others do not have the READ permission.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical

 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1776) renewDelegationToken should survive RM failover

2014-03-03 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-1776:
-

 Summary: renewDelegationToken should survive RM failover
 Key: YARN-1776
 URL: https://issues.apache.org/jira/browse/YARN-1776
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen


When a delegation token is renewed, two RMStateStore operations: 1) removing 
the old DT, and 2) storing the new DT will happen. If RM fails in between. 
There would be problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918301#comment-13918301
 ] 

Jason Lowe commented on YARN-1771:
--

Yes, it would be a weaker condition check, but I'm wondering if the weaker 
check still meets the security needs of the dist cache.

A user is requesting a resource to be publicly localized.  If they have read 
permissions to it then even if others lack access then the original user can 
trivially work around that obstacle by copying to a publicly accessible 
location (e.g.: /tmp).  So in that sense the user has a legitimate way to make 
the resource data public even if it isn't right now.

A subsequent request for the same resource would check the timestamp doing the 
same doAs logic, so if another user doesn't have access then they won't 
localize.  It's true that the other user's container can still access the 
resource by avoiding explicit localization and instead scanning/scraping the 
local public distcache area directly once it runs.  However the original user 
who requested the resource asked for it to be public and has the means to make 
it public, so they probably aren't concerned that the public can access it.

This approach would also be useful to the shared cache design in YARN-1492, 
where it was calling for the ability to make something a public resource 
directly from a user's staging area.

There may be some security concerns that I've missed, but if this ends up being 
a possibility then it would eliminate all of the parent directory stat calls on 
public localization.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical

 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active

2014-03-03 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918307#comment-13918307
 ] 

Xuan Gong commented on YARN-1734:
-

bq. It appears the AdminService#refreshAll is called on transition to active. 
However, calling any of the refresh commands on the Standby throws 
StandbyException. This can lead to confusion - we throw an exception even 
though the refresh command takes affect when the RM transitions to Active.

After rm.transitionToActive() is successfully executed, the rm is at Active 
state. So, it will not throw out StandbyException.

 RM should get the updated Configurations when it transits from Standby to 
 Active
 

 Key: YARN-1734
 URL: https://issues.apache.org/jira/browse/YARN-1734
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, 
 YARN-1734.4.patch, YARN-1734.5.patch, YARN-1734.6.patch, YARN-1734.7.patch


 Currently, we have ConfigurationProvider which can support 
 LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and 
 FileSystemBasedConfiguration is enabled, RM can not get the updated 
 Configurations when it transits from Standby to Active



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1759) Configuration settings can potentially disappear post YARN-1666

2014-03-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-1759:
---

Assignee: Xuan Gong

 Configuration settings can potentially disappear post YARN-1666
 ---

 Key: YARN-1759
 URL: https://issues.apache.org/jira/browse/YARN-1759
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Xuan Gong

 By implicitly loading core-site and yarn-site again in the RM::serviceInit(), 
 some configs may be unintentionally overridden.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active

2014-03-03 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918365#comment-13918365
 ] 

Karthik Kambatla commented on YARN-1734:


In our case, we plan to use the LocalConfiguration and not the FileSystemBased 
one. So, in the HA case, we would update the local configs on both RMs and call 
the appropriate refresh command on both RMs - this is what we do for HDFS as 
well. The expectation is that the Active picks these up immediately, and the 
Standby picks them eventually when it becomes Active. In other words, the 
expectation is that these updates are not lost. 

With the current code, the Standby would throw a StandbyException, thereby 
telling the user that the config refresh has failed. This is not exactly true, 
because the Standby would actually pick the latest configs when transitioning 
to Active. No? 

Let me think more on this, but thought I should raise this concern. 
 

 RM should get the updated Configurations when it transits from Standby to 
 Active
 

 Key: YARN-1734
 URL: https://issues.apache.org/jira/browse/YARN-1734
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, 
 YARN-1734.4.patch, YARN-1734.5.patch, YARN-1734.6.patch, YARN-1734.7.patch


 Currently, we have ConfigurationProvider which can support 
 LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and 
 FileSystemBasedConfiguration is enabled, RM can not get the updated 
 Configurations when it transits from Standby to Active



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1768) yarn kill non-existent application is too verbose

2014-03-03 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918381#comment-13918381
 ] 

Ravi Prakash commented on YARN-1768:


With this patch the return code is wrong (0). Earlier it was returning a 
non-zero error code. Please also consider adding that check to the test

 yarn kill non-existent application is too verbose
 -

 Key: YARN-1768
 URL: https://issues.apache.org/jira/browse/YARN-1768
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Hitesh Shah
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Attachments: YARN-1768.1.patch, YARN-1768.2.patch


 Instead of catching ApplicationNotFound and logging a simple app not found 
 message, the whole stack trace is logged.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs

2014-03-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918401#comment-13918401
 ] 

Zhijie Shen commented on YARN-1389:
---

[~mayank_bansal], the patch still doesn't compile. Would you please check it 
again?

 ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog 
 APIs
 --

 Key: YARN-1389
 URL: https://issues.apache.org/jira/browse/YARN-1389
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1389-1.patch, YARN-1389-2.patch


 As we plan to have the APIs in ApplicationHistoryProtocol to expose the 
 reports of *finished* application attempts and containers, we should do the 
 same for ApplicationClientProtocol, which will return the reports of 
 *running* attempts and containers.
 Later on, we can improve YarnClient to direct the query of running instance 
 to ApplicationClientProtocol, while that of finished instance to 
 ApplicationHistoryProtocol, making it transparent to the users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again

2014-03-03 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918405#comment-13918405
 ] 

Varun Vasudev commented on YARN-90:
---

Ravi, are you still working on this ticket? Do you mind if I take over?

 NodeManager should identify failed disks becoming good back again
 -

 Key: YARN-90
 URL: https://issues.apache.org/jira/browse/YARN-90
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Ravi Gummadi
 Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
 YARN-90.patch, YARN-90.patch


 MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
 down, it is marked as failed forever. To reuse that disk (after it becomes 
 good), NodeManager needs restart. This JIRA is to improve NodeManager to 
 reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1777) Nodemanager fails to detect Full disk and try to launch container

2014-03-03 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-1777:


 Summary: Nodemanager fails to detect Full disk and try to launch 
container
 Key: YARN-1777
 URL: https://issues.apache.org/jira/browse/YARN-1777
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


Nodemanager is not able to recognize that the disk is full. it keeps retrying 
to launch a container on full disk. 

--
2013-06-06 17:45:25,319 INFO  container.Container 
(ContainerImpl.java:handle(852)) - Container 
container_1370473246485_0136_01_18 transitioned from LOCALIZING to LOCALIZED
2013-06-06 17:45:25,328 INFO  container.Container 
(ContainerImpl.java:handle(852)) - Container 
container_1370473246485_0136_01_19 transitioned from LOCALIZED to RUNNING
2013-06-06 17:45:25,329 WARN  launcher.ContainerLaunch 
(ContainerLaunch.java:call(255)) - Failed to launch container.
java.io.IOException: mkdir of 
/tmp/1/hdp/yarn/local/usercache/hrt_qa/appcache/application_1370473246485_0136/container_1370473246485_0136_01_19
 failed
at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1044)
at 
org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:150)
at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:187)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:730)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:726)
at 
org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2379)
at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:726)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:412)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:130)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:250)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:73)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)   
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2013-06-06 17:45:25,330 INFO  container.Container 
(ContainerImpl.java:handle(852)) - Container 
container_1370473246485_0136_01_19 transitioned from RUNNING to 
EXITED_WITH_FAILURE
2013-06-06 17:45:25,330 INFO  launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(307)) - Cleaning up container 
container_1370473246485_0136_01_19

2013-06-06 17:45:25,333 WARN  launcher.ContainerLaunch 
(ContainerLaunch.java:call(255)) - Failed to launch container.
java.io.IOException: mkdir of 
/tmp/1/hdp/yarn/local/usercache/hrt_qa/appcache/application_1370473246485_0136/container_1370473246485_0136_01_18
 failed
at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1044)
at 
org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:150)
at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:187)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:730)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:726)
at 
org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2379)
at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:726)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:412)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:130)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:250)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:73)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)   
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
   at java.lang.Thread.run(Thread.java:662)
--



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active

2014-03-03 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918433#comment-13918433
 ] 

Xuan Gong commented on YARN-1734:
-

So, if the Standby RM transits to Active, it will pick the latest 
configuration. 
For calling refresh* in standby RM, it will throw a standbyException and 
trigger the retry. In that case, even if we call refresh* in Standby RM, it 
actually do the refresh* in active RM. 

bq. With the current code, the Standby would throw a StandbyException, thereby 
telling the user that the config refresh has failed. This is not exactly true, 
because the Standby would actually pick the latest configs when transitioning 
to Active. No?

When RM is at Standby state, all of the active services have already been 
stopped. I think this pick the latest configs should mean all the related 
services pick the latest configs, such as CapacityScheduler, NodesListManager, 
ClientRMService, ResourceTrackerService, etc. But since most of these services 
are stopped in standby mode, they can not get the latest configurations.

 RM should get the updated Configurations when it transits from Standby to 
 Active
 

 Key: YARN-1734
 URL: https://issues.apache.org/jira/browse/YARN-1734
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, 
 YARN-1734.4.patch, YARN-1734.5.patch, YARN-1734.6.patch, YARN-1734.7.patch


 Currently, we have ConfigurationProvider which can support 
 LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and 
 FileSystemBasedConfiguration is enabled, RM can not get the updated 
 Configurations when it transits from Standby to Active



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active

2014-03-03 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918437#comment-13918437
 ] 

Karthik Kambatla commented on YARN-1734:


I guess the ambiguity stems from the definition of success for {{rmadmin 
-refresh*}} commands.

I propose adding a config - yarn.resourcemanager.ha.refresh-all-rms. When set, 
the refresh commands should attempt to refresh on all RMs and fail if it can't 
- i.e., this should fail when called on the StandbyRM? When cleared, the 
refresh command should attempt to refresh only on this RM and should succeed as 
long as the configs are refreshed as early as they are required - i.e., it 
should be okay to refresh on transition to active and the StandbyRM should also 
succeed? [~xgong], [~vinodkv] - do you think this captures the behavior well 
enough and is reasonable? 

 RM should get the updated Configurations when it transits from Standby to 
 Active
 

 Key: YARN-1734
 URL: https://issues.apache.org/jira/browse/YARN-1734
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, 
 YARN-1734.4.patch, YARN-1734.5.patch, YARN-1734.6.patch, YARN-1734.7.patch


 Currently, we have ConfigurationProvider which can support 
 LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and 
 FileSystemBasedConfiguration is enabled, RM can not get the updated 
 Configurations when it transits from Standby to Active



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again

2014-03-03 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918439#comment-13918439
 ] 

Ravi Prakash commented on YARN-90:
--

I'm not working on it. Please feel free to take it over. Thanks Varun

 NodeManager should identify failed disks becoming good back again
 -

 Key: YARN-90
 URL: https://issues.apache.org/jira/browse/YARN-90
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Ravi Gummadi
 Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
 YARN-90.patch, YARN-90.patch


 MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
 down, it is marked as failed forever. To reuse that disk (after it becomes 
 good), NodeManager needs restart. This JIRA is to improve NodeManager to 
 reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1752) Unexpected Unregistered event at Attempt Launched state

2014-03-03 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918459#comment-13918459
 ] 

Jian He commented on YARN-1752:
---

Patch looks good overall, some minors:
- styling: exceed the 80 column limit, 
{code}
public void unregisterAppAttempt(final FinishApplicationMasterRequest 
req,boolean waitForStateRunning)
{code}

- we can consolidate the exception comments like this ?
{code}
* This exception is thrown when an ApplicationMaster asks for resources by
 * calling {@link ApplicationMasterProtocol#allocate(AllocateRequest)} or tries
 * to unregister by calling
 * {@link 
ApplicationMasterProtocol#finishApplicationMaster(FinishApplicationMasterRequest)}
 * without first registering with ResourceManager by calling
 * {@link 
ApplicationMasterProtocol#registerApplicationMaster(RegisterApplicationMasterRequest)}
 * or if it tries to register more than once.
{code}
- Test: we can check the attempt state to be Launched state after this call. 
Simply, we can just use MockRM.launchAM
 {code}
  MockAM am1 = rm.sendAMLaunched(attempt1.getAppAttemptId());
{code}

 Unexpected Unregistered event at Attempt Launched state
 ---

 Key: YARN-1752
 URL: https://issues.apache.org/jira/browse/YARN-1752
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Rohith
 Attachments: YARN-1752.1.patch, YARN-1752.2.patch, YARN-1752.3.patch


 {code}
 2014-02-21 14:56:03,453 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 UNREGISTERED at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:647)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:714)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:695)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active

2014-03-03 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918441#comment-13918441
 ] 

Karthik Kambatla commented on YARN-1734:


bq. For calling refresh* in standby RM, it will throw a standbyException and 
trigger the retry. In that case, even if we call refresh* in Standby RM, it 
actually do the refresh* in active RM.
Sorry, I missed this while browsing through the code. Let me try this on a 
cluster and report.

 RM should get the updated Configurations when it transits from Standby to 
 Active
 

 Key: YARN-1734
 URL: https://issues.apache.org/jira/browse/YARN-1734
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, 
 YARN-1734.4.patch, YARN-1734.5.patch, YARN-1734.6.patch, YARN-1734.7.patch


 Currently, we have ConfigurationProvider which can support 
 LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and 
 FileSystemBasedConfiguration is enabled, RM can not get the updated 
 Configurations when it transits from Standby to Active



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1758) MiniYARNCluster broken post YARN-1666

2014-03-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918462#comment-13918462
 ] 

Vinod Kumar Vavilapalli commented on YARN-1758:
---

This looks fine enough for me for now. In the interest of progress, let's track 
YARN-1759 separately.

+1. Checking this in now.

 MiniYARNCluster broken post YARN-1666
 -

 Key: YARN-1758
 URL: https://issues.apache.org/jira/browse/YARN-1758
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-1758.1.patch, YARN-1758.2.patch


 NPE seen when trying to use MiniYARNCluster



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1759) Configuration settings can potentially disappear post YARN-1666

2014-03-03 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918465#comment-13918465
 ] 

Hitesh Shah commented on YARN-1759:
---

[~ste...@apache.org] A common case will be with mini clusters where the code 
itself updates the config based on what ports the daemon binds to. 




 Configuration settings can potentially disappear post YARN-1666
 ---

 Key: YARN-1759
 URL: https://issues.apache.org/jira/browse/YARN-1759
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Xuan Gong

 By implicitly loading core-site and yarn-site again in the RM::serviceInit(), 
 some configs may be unintentionally overridden.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-1777) Nodemanager fails to detect Full disk and try to launch container

2014-03-03 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved YARN-1777.
--

Resolution: Duplicate

This is a duplicate of YARN-257.

 Nodemanager fails to detect Full disk and try to launch container
 -

 Key: YARN-1777
 URL: https://issues.apache.org/jira/browse/YARN-1777
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora

 Nodemanager is not able to recognize that the disk is full. it keeps retrying 
 to launch a container on full disk. 
 --
 2013-06-06 17:45:25,319 INFO  container.Container 
 (ContainerImpl.java:handle(852)) - Container 
 container_1370473246485_0136_01_18 transitioned from LOCALIZING to 
 LOCALIZED
 2013-06-06 17:45:25,328 INFO  container.Container 
 (ContainerImpl.java:handle(852)) - Container 
 container_1370473246485_0136_01_19 transitioned from LOCALIZED to RUNNING
 2013-06-06 17:45:25,329 WARN  launcher.ContainerLaunch 
 (ContainerLaunch.java:call(255)) - Failed to launch container.
 java.io.IOException: mkdir of 
 /tmp/1/hdp/yarn/local/usercache/hrt_qa/appcache/application_1370473246485_0136/container_1370473246485_0136_01_19
  failed
 at 
 org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1044)
 at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:150)
 at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:187)
 at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:730)
 at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:726)
 at 
 org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2379)
 at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:726)
 at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:412)
 at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:130)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:250)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:73)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 2013-06-06 17:45:25,330 INFO  container.Container 
 (ContainerImpl.java:handle(852)) - Container 
 container_1370473246485_0136_01_19 transitioned from RUNNING to 
 EXITED_WITH_FAILURE
 2013-06-06 17:45:25,330 INFO  launcher.ContainerLaunch 
 (ContainerLaunch.java:cleanupContainer(307)) - Cleaning up container 
 container_1370473246485_0136_01_19
 2013-06-06 17:45:25,333 WARN  launcher.ContainerLaunch 
 (ContainerLaunch.java:call(255)) - Failed to launch container.
 java.io.IOException: mkdir of 
 /tmp/1/hdp/yarn/local/usercache/hrt_qa/appcache/application_1370473246485_0136/container_1370473246485_0136_01_18
  failed
 at 
 org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1044)
 at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:150)
 at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:187)
 at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:730)
 at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:726)
 at 
 org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2379)
 at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:726)   
  at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:412)
 at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:130)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:250)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:73)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 --



--

[jira] [Commented] (YARN-1764) Handle RM fail overs after the submitApplication call.

2014-03-03 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918499#comment-13918499
 ] 

Xuan Gong commented on YARN-1764:
-

Let us continue our discussions on case 3: Handle RM fail overs after the 
submitApplication call. 

Reply to [~kkambatl]‘s comment:
“ I don't see 3 to be as straight-forward, and suspect would require revisiting 
the state machine.”

We will only consider the case that failover happens after submitApplication 
call. It means when failover happens, we have already received the 
SubmitApplicationResponse.

When the failover happens, we will *not re-entry* 
clientRMService#submitApplication() again. What will happen next is that 
getApplicationReport() will start to execute. And YarnClient will start to 
re-try until it finds the next active RM, and continue execute 
getApplicationReport().

Now we have two cases to handle:
* RMStateStore already saved the ApplicationState when failover happens.
* RMStateStore does not save the ApplicationState when failover happens.

For case1, we do not need to make any changes.
For case2, if the failover happens, when we try to execute 
getApplicationReport, we will get ApplicationNotFoundException. I think this is 
the only case we should handle here.


 Handle RM fail overs after the submitApplication call.
 --

 Key: YARN-1764
 URL: https://issues.apache.org/jira/browse/YARN-1764
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1729) TimelineWebServices always passes primary and secondary filters as strings

2014-03-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918507#comment-13918507
 ] 

Zhijie Shen commented on YARN-1729:
---

1. mapper is not necessary, objectReader and objectReader should be final, and 
both constants can be initiated in a static block. And please follow the name 
convention of the static final constant.
{code}
+  private static ObjectMapper mapper = new ObjectMapper();
+  private static ObjectReader objectReader = mapper.reader(Object.class);
+  private static ObjectWriter objectWriter = mapper.writer();
{code}

2. Similar problem here.
{code}
+  private static ObjectReader objectReader =
+  new ObjectMapper().reader(Object.class);
{code}

3. In the test case, would you mind adding one more test case of other:123abc 
to show the difference?
{code}
+ClientResponse response = r.path(ws).path(v1).path(timeline)
+.path(type_1).queryParam(primaryFilter, other:\123abc\)
{code}

Other than that, the patch looks good to me.

In addition, I'm aware of an additional issue of the leveldb implementation, 
which is aware of JSON input specification. This means whenever our RESTful 
APIs allows to take XML input, the current implementation may not work 
correctly. IMHO, ideally the store should be isolated from the RESTful 
interface input types. Anyway, let's leave the issue separately not to block 
this patch, as the issue happens before this patch as well.

 TimelineWebServices always passes primary and secondary filters as strings
 --

 Key: YARN-1729
 URL: https://issues.apache.org/jira/browse/YARN-1729
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1729.1.patch, YARN-1729.2.patch, YARN-1729.3.patch, 
 YARN-1729.4.patch, YARN-1729.5.patch


 Primary filters and secondary filter values can be arbitrary json-compatible 
 Object.  The web services should determine if the filters specified as query 
 parameters are objects or strings before passing them to the store.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-1704) Review LICENSE and NOTICE to reflect new levelDB releated libraries being used

2014-03-03 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-1704.
---

   Resolution: Fixed
Fix Version/s: 2.4.0
 Hadoop Flags: Reviewed

Committed this to trunk, branch-2 and branch-2.4. Thanks Billie!

 Review LICENSE and NOTICE to reflect new levelDB releated libraries being used
 --

 Key: YARN-1704
 URL: https://issues.apache.org/jira/browse/YARN-1704
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
Priority: Blocker
 Fix For: 2.4.0

 Attachments: YARN-1704.1.patch, YARN-1704.2.patch, YARN-1704.3.patch


 Make any changes necessary in LICENSE and NOTICE related to dependencies 
 introduced by the application timeline store.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1704) Review LICENSE and NOTICE to reflect new levelDB releated libraries being used

2014-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918526#comment-13918526
 ] 

Hudson commented on YARN-1704:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5254 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5254/])
YARN-1704. Modified LICENSE and NOTICE files to reflect newly used levelDB 
related libraries. Contributed by Billie Rinaldi. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573702)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/LICENSE.txt
* /hadoop/common/trunk/hadoop-yarn-project/NOTICE.txt


 Review LICENSE and NOTICE to reflect new levelDB releated libraries being used
 --

 Key: YARN-1704
 URL: https://issues.apache.org/jira/browse/YARN-1704
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
Priority: Blocker
 Fix For: 2.4.0

 Attachments: YARN-1704.1.patch, YARN-1704.2.patch, YARN-1704.3.patch


 Make any changes necessary in LICENSE and NOTICE related to dependencies 
 introduced by the application timeline store.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1758) MiniYARNCluster broken post YARN-1666

2014-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918525#comment-13918525
 ] 

Hudson commented on YARN-1758:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5254 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5254/])
YARN-1758. Fixed ResourceManager to not mandate the presence of site specific 
configuration files and thus fix failures in downstream tests. Contributed by 
Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573695)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/FileSystemBasedConfigurationProvider.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java


 MiniYARNCluster broken post YARN-1666
 -

 Key: YARN-1758
 URL: https://issues.apache.org/jira/browse/YARN-1758
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.4.0

 Attachments: YARN-1758.1.patch, YARN-1758.2.patch


 NPE seen when trying to use MiniYARNCluster



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1765) Write test cases to verify that killApplication API works in RM HA

2014-03-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918545#comment-13918545
 ] 

Vinod Kumar Vavilapalli commented on YARN-1765:
---

Looks good. +1. Checking this in.

 Write test cases to verify that killApplication API works in RM HA
 --

 Key: YARN-1765
 URL: https://issues.apache.org/jira/browse/YARN-1765
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1765.1.patch, YARN-1765.2.patch, YARN-1765.2.patch, 
 YARN-1765.3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-03 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918548#comment-13918548
 ] 

Chris Douglas commented on YARN-1771:
-

The simpler check doesn't seem to have any practical issues. Since the cache is 
keyed on Paths, the case where a user can refer to an object without access to 
it seems pretty esoteric. As long as the public cache runs with lowered 
privileges, and the check isn't necessary to verify that the public resource 
isn't private to YARN. Copying with the user's HDFS credentials avoids that, 
though that seems like a heavyweight solution if reducing getFileStatus calls 
is the only motivation.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical

 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1765) Write test cases to verify that killApplication API works in RM HA

2014-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918557#comment-13918557
 ] 

Hudson commented on YARN-1765:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5255 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5255/])
YARN-1765. Added test cases to verify that killApplication API works across 
ResourceManager failover. Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573735)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Write test cases to verify that killApplication API works in RM HA
 --

 Key: YARN-1765
 URL: https://issues.apache.org/jira/browse/YARN-1765
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.4.0

 Attachments: YARN-1765.1.patch, YARN-1765.2.patch, YARN-1765.2.patch, 
 YARN-1765.3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1675) Application does not change to RUNNING after being scheduled

2014-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918558#comment-13918558
 ] 

Hudson commented on YARN-1675:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5255 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5255/])
YARN-1675. Added the previously missed new file. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573736)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestKillApplicationWithRMHA.java


 Application does not change to RUNNING after being scheduled
 

 Key: YARN-1675
 URL: https://issues.apache.org/jira/browse/YARN-1675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Trupti Dhavle

 I dont see any stacktraces in logs. But the debug logs show negative vcores-
 {noformat}
 2014-01-29 18:42:26,357 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(808)) - assignContainers: 
 node=hor11n39.gq1.ygridcore.net #applications=5
 2014-01-29 18:42:26,357 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(827)) - pre-assignContainers for application 
 application_1390986573180_0269
 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
 application=application_1390986573180_0269 headRoom=memory:22528, vCores:0 
 currentConsumption=2048
 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
 application=application_1390986573180_0269 request={Priority: 0, Capability: 
 memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
 2014-01-29 18:42:26,358 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(911)) - post-assignContainers for 
 application application_1390986573180_0269
 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
 application=application_1390986573180_0269 headRoom=memory:22528, vCores:0 
 currentConsumption=2048
 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
 application=application_1390986573180_0269 request={Priority: 0, Capability: 
 memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
 2014-01-29 18:42:26,358 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(827)) - pre-assignContainers for application 
 application_1390986573180_0272
 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
 application=application_1390986573180_0272 headRoom=memory:18432, vCores:-2 
 currentConsumption=2048
 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
 application=application_1390986573180_0272 request={Priority: 0, Capability: 
 memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
 2014-01-29 18:42:26,359 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(911)) - post-assignContainers for 
 application application_1390986573180_0272
 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
 application=application_1390986573180_0272 headRoom=memory:18432, vCores:-2 
 currentConsumption=2048
 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
 application=application_1390986573180_0272 request={Priority: 0, Capability: 
 memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
 2014-01-29 18:42:26,359 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(827)) - pre-assignContainers for application 
 application_1390986573180_0273
 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
 application=application_1390986573180_0273 headRoom=memory:18432, vCores:-2 
 currentConsumption=2048
 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
 application=application_1390986573180_0273 request={Priority: 0, Capability: 
 memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
 2014-01-29 18:42:26,360 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(911)) - post-assignContainers for 
 application 

[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918565#comment-13918565
 ] 

Jason Lowe commented on YARN-1771:
--

Today the public cache localizes as the NM user, so the public checking is 
important to avoid a security problem where the user could convince the NM to 
localize a file for which the user does not have privileges but the NM user 
does (e.g.: please localize that other job's .jhist file, aggregated logs, 
etc.).  So I think we need some kind of access check, either as the requesting 
user or explicit access checks like it does today, to avoid a malicious client 
obtaining access to private files via the NM.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical

 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-03 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai reassigned YARN-1670:
---

Assignee: Mit Desai

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical

 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1729) TimelineWebServices always passes primary and secondary filters as strings

2014-03-03 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-1729:
-

Attachment: YARN-1729.6.patch

Thanks for the additional review.  I've attached a new patch addressing your 
comments.

 TimelineWebServices always passes primary and secondary filters as strings
 --

 Key: YARN-1729
 URL: https://issues.apache.org/jira/browse/YARN-1729
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1729.1.patch, YARN-1729.2.patch, YARN-1729.3.patch, 
 YARN-1729.4.patch, YARN-1729.5.patch, YARN-1729.6.patch


 Primary filters and secondary filter values can be arbitrary json-compatible 
 Object.  The web services should determine if the filters specified as query 
 parameters are objects or strings before passing them to the store.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-03 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918605#comment-13918605
 ] 

Gera Shegalov commented on YARN-1771:
-

Orthogonal to this we have been discussing adding a FileStatus[] 
getFileStatus(Path f) API that returns FileStatus for each path component of f 
in a single RPC. Interested in comments about this idea.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical

 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1729) TimelineWebServices always passes primary and secondary filters as strings

2014-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918611#comment-13918611
 ] 

Hadoop QA commented on YARN-1729:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12632356/YARN-1729.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3230//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3230//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3230//console

This message is automatically generated.

 TimelineWebServices always passes primary and secondary filters as strings
 --

 Key: YARN-1729
 URL: https://issues.apache.org/jira/browse/YARN-1729
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1729.1.patch, YARN-1729.2.patch, YARN-1729.3.patch, 
 YARN-1729.4.patch, YARN-1729.5.patch, YARN-1729.6.patch


 Primary filters and secondary filter values can be arbitrary json-compatible 
 Object.  The web services should determine if the filters specified as query 
 parameters are objects or strings before passing them to the store.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-03 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918625#comment-13918625
 ] 

Chris Douglas commented on YARN-1771:
-

bq. Orthogonal to this we have been discussing adding a FileStatus[] 
getFileStatus(Path f) API that returns FileStatus for each path component of f 
in a single RPC.

Symlinks might be awkward to support, but that discussion is for a separate 
ticket. Do you have a JIRA ref?

bq. So I think we need some kind of access check, either as the requesting user 
or explicit access checks like it does today, to avoid a malicious client 
obtaining access to private files via the NM.

An HDFS nobody account?

A cache would probably be correct in almost all cases, though. Since the check 
is only performed when the resource is localized, there could be cases where 
the filesystem is never in the cached state, but those are rare (and as Sandy 
points out, already in the current design). To attack the cache, the writer 
would need to take an unprotected directory, change its permissions, then 
populate it with private data (whose attributes are guessable). Expiring after 
short internals and not populating the cache with failed localization attempts 
could help mitigate its effectiveness.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical

 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1751) Improve MiniYarnCluster and LogCLIHelpers for log aggregation testing

2014-03-03 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated YARN-1751:
--

Attachment: YARN-1751.patch

Here is the patch.

 Improve MiniYarnCluster and LogCLIHelpers for log aggregation testing
 -

 Key: YARN-1751
 URL: https://issues.apache.org/jira/browse/YARN-1751
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Ming Ma
 Attachments: YARN-1751.patch


 MiniYarnCluster specifies individual remote log aggregation root dir for each 
 NM. Test code that uses MiniYarnCluster won't be able to get the value of log 
 aggregation root dir. The following code isn't necessary in MiniYarnCluster.
   File remoteLogDir =
   new File(testWorkDir, MiniYARNCluster.this.getName()
   + -remoteLogDir-nm- + index);
   remoteLogDir.mkdir();
   config.set(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
   remoteLogDir.getAbsolutePath());
 In LogCLIHelpers.java, dumpAllContainersLogs should pass its conf object to 
 FileContext.getFileContext() call.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1751) Improve MiniYarnCluster and LogCLIHelpers for log aggregation testing

2014-03-03 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated YARN-1751:
--

Attachment: (was: YARN-1751.patch)

 Improve MiniYarnCluster and LogCLIHelpers for log aggregation testing
 -

 Key: YARN-1751
 URL: https://issues.apache.org/jira/browse/YARN-1751
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Ming Ma
 Attachments: YARN-1751-trunk.patch


 MiniYarnCluster specifies individual remote log aggregation root dir for each 
 NM. Test code that uses MiniYarnCluster won't be able to get the value of log 
 aggregation root dir. The following code isn't necessary in MiniYarnCluster.
   File remoteLogDir =
   new File(testWorkDir, MiniYARNCluster.this.getName()
   + -remoteLogDir-nm- + index);
   remoteLogDir.mkdir();
   config.set(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
   remoteLogDir.getAbsolutePath());
 In LogCLIHelpers.java, dumpAllContainersLogs should pass its conf object to 
 FileContext.getFileContext() call.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918649#comment-13918649
 ] 

Jason Lowe commented on YARN-1771:
--

Agreed, a nobody account would make the check similarly cheap.

I also like the idea of caching these a bit more rather than pinging the 
namenode each time a new container arrives with an existing resource requested. 
 That latter idea is similar to what Koji was asking for way back in 
MAPREDUCE-2011.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical

 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-03 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918661#comment-13918661
 ] 

Gera Shegalov commented on YARN-1771:
-

bq. Symlinks might be awkward to support, but that discussion is for a separate 
ticket. Do you have a JIRA ref?

Now I do: HDFS-6045

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical

 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1751) Improve MiniYarnCluster and LogCLIHelpers for log aggregation testing

2014-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918674#comment-13918674
 ] 

Hadoop QA commented on YARN-1751:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12632365/YARN-1751-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3231//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3231//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3231//console

This message is automatically generated.

 Improve MiniYarnCluster and LogCLIHelpers for log aggregation testing
 -

 Key: YARN-1751
 URL: https://issues.apache.org/jira/browse/YARN-1751
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Ming Ma
 Attachments: YARN-1751-trunk.patch


 MiniYarnCluster specifies individual remote log aggregation root dir for each 
 NM. Test code that uses MiniYarnCluster won't be able to get the value of log 
 aggregation root dir. The following code isn't necessary in MiniYarnCluster.
   File remoteLogDir =
   new File(testWorkDir, MiniYARNCluster.this.getName()
   + -remoteLogDir-nm- + index);
   remoteLogDir.mkdir();
   config.set(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
   remoteLogDir.getAbsolutePath());
 In LogCLIHelpers.java, dumpAllContainersLogs should pass its conf object to 
 FileContext.getFileContext() call.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1748) hadoop-yarn-server-tests packages core-site.xml breaking downstream tests

2014-03-03 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1748:
--

Priority: Blocker  (was: Major)

 hadoop-yarn-server-tests packages core-site.xml breaking downstream tests
 -

 Key: YARN-1748
 URL: https://issues.apache.org/jira/browse/YARN-1748
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Sravya Tirukkovalur
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Attachments: YARN-1748-1.patch, YARN-1748-1.patch


 Jars should not package config files, as this might come into the classpaths 
 of clients causing the clients to break.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1766) When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration.

2014-03-03 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918691#comment-13918691
 ] 

Xuan Gong commented on YARN-1766:
-

create the patch based on the latest trunk code

 When RM does the initiation, it should use loaded Configuration instead of 
 bootstrap configuration.
 ---

 Key: YARN-1766
 URL: https://issues.apache.org/jira/browse/YARN-1766
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1766.1.patch, YARN-1766.2.patch


 Right now, we have FileSystemBasedConfigurationProvider to let Users upload 
 the configurations into remote File System, and let different RMs share the 
 same configurations.  During the initiation, RM will load the configurations 
 from Remote File System. So when RM initiates the services, it should use the 
 loaded Configurations instead of using the bootstrap configurations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1766) When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration.

2014-03-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1766:


Attachment: YARN-1766.2.patch

 When RM does the initiation, it should use loaded Configuration instead of 
 bootstrap configuration.
 ---

 Key: YARN-1766
 URL: https://issues.apache.org/jira/browse/YARN-1766
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1766.1.patch, YARN-1766.2.patch


 Right now, we have FileSystemBasedConfigurationProvider to let Users upload 
 the configurations into remote File System, and let different RMs share the 
 same configurations.  During the initiation, RM will load the configurations 
 from Remote File System. So when RM initiates the services, it should use the 
 loaded Configurations instead of using the bootstrap configurations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-986) RM DT token service should have service addresses of both RMs

2014-03-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918700#comment-13918700
 ] 

Vinod Kumar Vavilapalli commented on YARN-986:
--

Looking at it now.

 RM DT token service should have service addresses of both RMs
 -

 Key: YARN-986
 URL: https://issues.apache.org/jira/browse/YARN-986
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-986-1.patch, yarn-986-2.patch, 
 yarn-986-prelim-0.patch


 Previously: YARN should use cluster-id as token service address
 This needs to be done to support non-ip based fail over of RM. Once the 
 server sets the token service address to be this generic ClusterId/ServiceId, 
 clients can translate it to appropriate final IP and then be able to select 
 tokens via TokenSelectors.
 Some workarounds for other related issues were put in place at YARN-945.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1748) hadoop-yarn-server-tests packages core-site.xml breaking downstream tests

2014-03-03 Thread Sravya Tirukkovalur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918702#comment-13918702
 ] 

Sravya Tirukkovalur commented on YARN-1748:
---

Great, thanks Vinod!

 hadoop-yarn-server-tests packages core-site.xml breaking downstream tests
 -

 Key: YARN-1748
 URL: https://issues.apache.org/jira/browse/YARN-1748
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Sravya Tirukkovalur
Assignee: Sravya Tirukkovalur
Priority: Blocker
 Fix For: 2.4.0

 Attachments: YARN-1748-1.patch, YARN-1748-1.patch


 Jars should not package config files, as this might come into the classpaths 
 of clients causing the clients to break.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1729) TimelineWebServices always passes primary and secondary filters as strings

2014-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918706#comment-13918706
 ] 

Hadoop QA commented on YARN-1729:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12632368/YARN-1729.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3232//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3232//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3232//console

This message is automatically generated.

 TimelineWebServices always passes primary and secondary filters as strings
 --

 Key: YARN-1729
 URL: https://issues.apache.org/jira/browse/YARN-1729
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1729.1.patch, YARN-1729.2.patch, YARN-1729.3.patch, 
 YARN-1729.4.patch, YARN-1729.5.patch, YARN-1729.6.patch, YARN-1729.7.patch


 Primary filters and secondary filter values can be arbitrary json-compatible 
 Object.  The web services should determine if the filters specified as query 
 parameters are objects or strings before passing them to the store.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1747) Better physical memory monitoring for containers

2014-03-03 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918729#comment-13918729
 ] 

Colin Patrick McCabe commented on YARN-1747:


My first thought here is to read /proc/pid/maps and look for the [stack] and 
[heap] sections, and just count those.  There might be something I'm not 
considering, though.  I wonder if there is ever a case where we'd want to 
charge an application for the page cache its use of a file takes up?

 Better physical memory monitoring for containers
 

 Key: YARN-1747
 URL: https://issues.apache.org/jira/browse/YARN-1747
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla

 YARN currently uses RSS to compute the physical memory being used by a 
 container. This can lead to issues, as noticed in HDFS-5957.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1748) hadoop-yarn-server-tests packages core-site.xml breaking downstream tests

2014-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918745#comment-13918745
 ] 

Hudson commented on YARN-1748:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5257 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5257/])
YARN-1748. Excluded core-site.xml from hadoop-yarn-server-tests package's jar 
and thus avoid breaking downstream tests. Contributed by Sravya Tirukkovalur. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573795)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/pom.xml


 hadoop-yarn-server-tests packages core-site.xml breaking downstream tests
 -

 Key: YARN-1748
 URL: https://issues.apache.org/jira/browse/YARN-1748
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Sravya Tirukkovalur
Assignee: Sravya Tirukkovalur
Priority: Blocker
 Fix For: 2.4.0

 Attachments: YARN-1748-1.patch, YARN-1748-1.patch


 Jars should not package config files, as this might come into the classpaths 
 of clients causing the clients to break.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-986) RM DT token service should have service addresses of both RMs

2014-03-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918755#comment-13918755
 ] 

Vinod Kumar Vavilapalli commented on YARN-986:
--

Some more comments:
 - Let's mark YarnConfiguration.getClusterId() as Private
 - Can we move the getRMDelegationTokenService() API to ClientRMProxy? (The 
later BTW is missing the visibility annotations). That seems like a better 
place.
 - There are some related TODOs in ClientRMProxy.setupTokens() that we put 
before. Search for YARN-986. We can fix them here or separately.
 - getRMDelegationTokenService() API: Not sure why we are doing 
{{yarnConf.set(YarnConfiguration.RM_HA_ID, rmId);}}. And like I mentioned 
before,
{code}
+services.add(SecurityUtil.buildTokenService(
+yarnConf.getSocketAddr(YarnConfiguration.RM_ADDRESS,
+YarnConfiguration.DEFAULT_RM_ADDRESS,
+YarnConfiguration.DEFAULT_RM_PORT)).toString());
{code}
is looking at RM_ADDRESS instead of 
HAUtil.addSuffix(YarnConfiguration.RM_ADDRESS, rmId). It should do the later, 
no?

 RM DT token service should have service addresses of both RMs
 -

 Key: YARN-986
 URL: https://issues.apache.org/jira/browse/YARN-986
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-986-1.patch, yarn-986-2.patch, 
 yarn-986-prelim-0.patch


 Previously: YARN should use cluster-id as token service address
 This needs to be done to support non-ip based fail over of RM. Once the 
 server sets the token service address to be this generic ClusterId/ServiceId, 
 clients can translate it to appropriate final IP and then be able to select 
 tokens via TokenSelectors.
 Some workarounds for other related issues were put in place at YARN-945.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-986) RM DT token service should have service addresses of both RMs

2014-03-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918757#comment-13918757
 ] 

Vinod Kumar Vavilapalli commented on YARN-986:
--

In my earlier review comment, I thought that MR changes imply other apps need 
to change too, I was wrong. MR is wrapping about our delegation-token APIs so 
needed to change.

 RM DT token service should have service addresses of both RMs
 -

 Key: YARN-986
 URL: https://issues.apache.org/jira/browse/YARN-986
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-986-1.patch, yarn-986-2.patch, 
 yarn-986-prelim-0.patch


 Previously: YARN should use cluster-id as token service address
 This needs to be done to support non-ip based fail over of RM. Once the 
 server sets the token service address to be this generic ClusterId/ServiceId, 
 clients can translate it to appropriate final IP and then be able to select 
 tokens via TokenSelectors.
 Some workarounds for other related issues were put in place at YARN-945.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1774) FS: Submitting to non-leaf queue throws NPE

2014-03-03 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918786#comment-13918786
 ] 

Anubhav Dhoot commented on YARN-1774:
-

Manual test consisted of 
a) Configure yarn to use fair scheduler
b) Create hierarchical queue in fair-scheduler.xml
c) Try to run a job assigned to a parent queue.

Without the fix Resource manager would terminate with the exception in fair 
scheduler.
With the fix the job submission is rejected with an error and Resource manager 
continues running.

 FS: Submitting to non-leaf queue throws NPE
 ---

 Key: YARN-1774
 URL: https://issues.apache.org/jira/browse/YARN-1774
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Blocker
 Attachments: YARN-1774.patch


 If you create a hierarchy of queues and assign a job to parent queue, 
 FairScheduler quits with a NPE.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1766) When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration.

2014-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918800#comment-13918800
 ] 

Hadoop QA commented on YARN-1766:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12632378/YARN-1766.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3233//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3233//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3233//console

This message is automatically generated.

 When RM does the initiation, it should use loaded Configuration instead of 
 bootstrap configuration.
 ---

 Key: YARN-1766
 URL: https://issues.apache.org/jira/browse/YARN-1766
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1766.1.patch, YARN-1766.2.patch


 Right now, we have FileSystemBasedConfigurationProvider to let Users upload 
 the configurations into remote File System, and let different RMs share the 
 same configurations.  During the initiation, RM will load the configurations 
 from Remote File System. So when RM initiates the services, it should use the 
 loaded Configurations instead of using the bootstrap configurations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1747) Better physical memory monitoring for containers

2014-03-03 Thread Adar Dembo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918802#comment-13918802
 ] 

Adar Dembo commented on YARN-1747:
--

If you're willing to use the memory cgroup subsystem, you can get more accurate 
RSS (i.e. w/o pages from mapped files) in memory.stat. Is that an option?

 Better physical memory monitoring for containers
 

 Key: YARN-1747
 URL: https://issues.apache.org/jira/browse/YARN-1747
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla

 YARN currently uses RSS to compute the physical memory being used by a 
 container. This can lead to issues, as noticed in HDFS-5957.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1774) FS: Submitting to non-leaf queue throws NPE

2014-03-03 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918815#comment-13918815
 ] 

Tsuyoshi OZAWA commented on YARN-1774:
--

+1. Confirmed to reproduce the problem, and the patch fix NPE.

The test failure is obviously unrelated - it says 
java.lang.UnsupportedOperationException: libhadoop cannot be loaded.. We 
should discuss it on another JIRA. [~sandyr], can you take a look?

 FS: Submitting to non-leaf queue throws NPE
 ---

 Key: YARN-1774
 URL: https://issues.apache.org/jira/browse/YARN-1774
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Blocker
 Attachments: YARN-1774.patch


 If you create a hierarchy of queues and assign a job to parent queue, 
 FairScheduler quits with a NPE.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1778) TestFSRMStateStore fails on trunk

2014-03-03 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-1778:
---

 Summary: TestFSRMStateStore fails on trunk
 Key: YARN-1778
 URL: https://issues.apache.org/jira/browse/YARN-1778
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1778) TestFSRMStateStore fails on trunk

2014-03-03 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918845#comment-13918845
 ] 

Tsuyoshi OZAWA commented on YARN-1778:
--

A log of the test failure is available here: 
https://builds.apache.org/job/PreCommit-YARN-Build/3234//testReport/

 TestFSRMStateStore fails on trunk
 -

 Key: YARN-1778
 URL: https://issues.apache.org/jira/browse/YARN-1778
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1766) When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration.

2014-03-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918862#comment-13918862
 ] 

Vinod Kumar Vavilapalli commented on YARN-1766:
---

The patch looks fine to me, but I wonder how we missed this before. This seems 
like a basic things that our tests should have caught before itself.

 When RM does the initiation, it should use loaded Configuration instead of 
 bootstrap configuration.
 ---

 Key: YARN-1766
 URL: https://issues.apache.org/jira/browse/YARN-1766
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1766.1.patch, YARN-1766.2.patch


 Right now, we have FileSystemBasedConfigurationProvider to let Users upload 
 the configurations into remote File System, and let different RMs share the 
 same configurations.  During the initiation, RM will load the configurations 
 from Remote File System. So when RM initiates the services, it should use the 
 loaded Configurations instead of using the bootstrap configurations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1779) Handle AMRMTokens across RM failover

2014-03-03 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-1779:
--

 Summary: Handle AMRMTokens across RM failover
 Key: YARN-1779
 URL: https://issues.apache.org/jira/browse/YARN-1779
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker


Verify if AMRMTokens continue to work against RM failover. If not, we will have 
to do something along the lines of YARN-986. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1761) RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby

2014-03-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918879#comment-13918879
 ] 

Vinod Kumar Vavilapalli commented on YARN-1761:
---

Remote-configuration-provider on RM is a server side property. We will not use 
it to specify client-side configuration. Given that, why do we need to use the 
config-provider on the client side?

 RMAdminCLI should check whether HA is enabled before executes 
 transitionToActive/transitionToStandby
 

 Key: YARN-1761
 URL: https://issues.apache.org/jira/browse/YARN-1761
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1761.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1729) TimelineWebServices always passes primary and secondary filters as strings

2014-03-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918890#comment-13918890
 ] 

Hudson commented on YARN-1729:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5258 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5258/])
YARN-1729. Made TimelineWebServices deserialize the string primary- and 
secondary-filters param into the JSON-compatible object. Contributed by Billie 
Rinaldi. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573825)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/GenericObjectMapper.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/MemoryTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TimelineWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TestGenericObjectMapper.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TimelineStoreTestUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestTimelineWebServices.java


 TimelineWebServices always passes primary and secondary filters as strings
 --

 Key: YARN-1729
 URL: https://issues.apache.org/jira/browse/YARN-1729
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Fix For: 2.4.0

 Attachments: YARN-1729.1.patch, YARN-1729.2.patch, YARN-1729.3.patch, 
 YARN-1729.4.patch, YARN-1729.5.patch, YARN-1729.6.patch, YARN-1729.7.patch


 Primary filters and secondary filter values can be arbitrary json-compatible 
 Object.  The web services should determine if the filters specified as query 
 parameters are objects or strings before passing them to the store.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1778) TestFSRMStateStore fails on trunk

2014-03-03 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918899#comment-13918899
 ] 

Tsuyoshi OZAWA commented on YARN-1778:
--

The error message reported on HDFS-6048 is exactly same.

 TestFSRMStateStore fails on trunk
 -

 Key: YARN-1778
 URL: https://issues.apache.org/jira/browse/YARN-1778
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-986) RM DT token service should have service addresses of both RMs

2014-03-03 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-986:
--

Attachment: yarn-986-3.patch

 RM DT token service should have service addresses of both RMs
 -

 Key: YARN-986
 URL: https://issues.apache.org/jira/browse/YARN-986
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-986-1.patch, yarn-986-2.patch, yarn-986-3.patch, 
 yarn-986-prelim-0.patch


 Previously: YARN should use cluster-id as token service address
 This needs to be done to support non-ip based fail over of RM. Once the 
 server sets the token service address to be this generic ClusterId/ServiceId, 
 clients can translate it to appropriate final IP and then be able to select 
 tokens via TokenSelectors.
 Some workarounds for other related issues were put in place at YARN-945.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-986) RM DT token service should have service addresses of both RMs

2014-03-03 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918951#comment-13918951
 ] 

Karthik Kambatla commented on YARN-986:
---

bq. There are some related TODOs in ClientRMProxy.setupTokens() that we put 
before. Search for YARN-986. We can fix them here or separately.
Created YARN-1779 to address AMRMTokens. This JIRA is only for RMDTTokens. 

bq. getRMDelegationTokenService() API: Not sure why we are doing 
yarnConf.set(YarnConfiguration.RM_HA_ID, rmId);
bq. you are only building the service against one address RM_ADDRESS.
Discussed with Vinod offline. YarnConfiguration#getSocketAddr already handles 
the HA case. Updated its javadoc to reflect that. 

Addressed other comments. Again, verified manually running Oozie jobs. 

 RM DT token service should have service addresses of both RMs
 -

 Key: YARN-986
 URL: https://issues.apache.org/jira/browse/YARN-986
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-986-1.patch, yarn-986-2.patch, yarn-986-3.patch, 
 yarn-986-prelim-0.patch


 Previously: YARN should use cluster-id as token service address
 This needs to be done to support non-ip based fail over of RM. Once the 
 server sets the token service address to be this generic ClusterId/ServiceId, 
 clients can translate it to appropriate final IP and then be able to select 
 tokens via TokenSelectors.
 Some workarounds for other related issues were put in place at YARN-945.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1768) yarn kill non-existent application is too verbose

2014-03-03 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1768:
-

Attachment: YARN-1768.3.patch

Fixed exit code to return non-zero value(-1) when application doesn't exist.

 yarn kill non-existent application is too verbose
 -

 Key: YARN-1768
 URL: https://issues.apache.org/jira/browse/YARN-1768
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Hitesh Shah
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Attachments: YARN-1768.1.patch, YARN-1768.2.patch, YARN-1768.3.patch


 Instead of catching ApplicationNotFound and logging a simple app not found 
 message, the whole stack trace is logged.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1445) Separate FINISHING and FINISHED state in YarnApplicationState

2014-03-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918984#comment-13918984
 ] 

Zhijie Shen commented on YARN-1445:
---

I've thought about postponing the unregistration success notification until the 
application is at FINISHED. It seems impossible, because it will result in a 
dead lock bellow

1. AM container is waiting for finishing unregistration to move on and exit;
2. Unregistration is waiting for RM notifying success;
3. RM is waiting for RMApp moving from FINISHING to FINISHED to return success;
4. RMApp is waiting for RMAppAttempt moving from FINISHING to FINISHED;
5. RMAppAttempt is waiting for AM container being finished.

Then, if we return a prior state to the client given the internal FINISHING, 
and still return unregistration success when RMApp reaches FINISHING, client 
will see, for example, RUNNING, while the registration is already successful. 
The inconsistency here may result in some race condition for the process 
relying on checking the final state.

For example, MR client will direct user to AM if the application is said not to 
be in a final state. Then, it is possible that AM is unregistered, and RM tells 
the client that the application is still running. When the client moves on to 
contact AM, AM has proceeded and exited before being able to respond the client 
request.

It seems that we cannot avoid splitting the user-faced state, and FINISHING can 
map to a period of an application's life cycle, which is from unregistration to 
process exit. [~jlowe] and [~jianhe], how do you think about it?

 Separate FINISHING and FINISHED state in YarnApplicationState
 -

 Key: YARN-1445
 URL: https://issues.apache.org/jira/browse/YARN-1445
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1445.1.patch, YARN-1445.2.patch, YARN-1445.3.patch, 
 YARN-1445.4.patch, YARN-1445.5.patch, YARN-1445.5.patch, YARN-1445.6.patch


 Today, we will transmit both RMAppState.FINISHING and RMAppState.FINISHED to 
 YarnApplicationState.FINISHED.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1780) Improve logging in timeline service

2014-03-03 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-1780:
-

 Summary: Improve logging in timeline service
 Key: YARN-1780
 URL: https://issues.apache.org/jira/browse/YARN-1780
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen


The server side of timeline service is lacking logging information, which makes 
debugging difficult



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1752) Unexpected Unregistered event at Attempt Launched state

2014-03-03 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-1752:
-

Attachment: YARN-1752.4.patch

Attaching patch for fixing comments.Please review.

 Unexpected Unregistered event at Attempt Launched state
 ---

 Key: YARN-1752
 URL: https://issues.apache.org/jira/browse/YARN-1752
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Rohith
 Attachments: YARN-1752.1.patch, YARN-1752.2.patch, YARN-1752.3.patch, 
 YARN-1752.4.patch


 {code}
 2014-02-21 14:56:03,453 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 UNREGISTERED at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:647)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:714)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:695)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1752) Unexpected Unregistered event at Attempt Launched state

2014-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919019#comment-13919019
 ] 

Hadoop QA commented on YARN-1752:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12632436/YARN-1752.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3236//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3236//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3236//console

This message is automatically generated.

 Unexpected Unregistered event at Attempt Launched state
 ---

 Key: YARN-1752
 URL: https://issues.apache.org/jira/browse/YARN-1752
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Rohith
 Attachments: YARN-1752.1.patch, YARN-1752.2.patch, YARN-1752.3.patch, 
 YARN-1752.4.patch


 {code}
 2014-02-21 14:56:03,453 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 UNREGISTERED at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:647)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:714)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:695)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1766) When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration.

2014-03-03 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919029#comment-13919029
 ] 

Xuan Gong commented on YARN-1766:
-

Our previous tests missed this. Our tests covered: startRM w/o 
LocalConfigurationProvider/FSBasedConfigurationProvider, do refresh* with 
LocalConfigurationProvider/FSBasedConfigurationProvider, and RMHA with 
FSBasedConfigurationProvider. But we did not verify whether all RM services get 
correct configuration when RM initiates with FSBasedConfigurationProvider.

 When RM does the initiation, it should use loaded Configuration instead of 
 bootstrap configuration.
 ---

 Key: YARN-1766
 URL: https://issues.apache.org/jira/browse/YARN-1766
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1766.1.patch, YARN-1766.2.patch


 Right now, we have FileSystemBasedConfigurationProvider to let Users upload 
 the configurations into remote File System, and let different RMs share the 
 same configurations.  During the initiation, RM will load the configurations 
 from Remote File System. So when RM initiates the services, it should use the 
 loaded Configurations instead of using the bootstrap configurations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active

2014-03-03 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919031#comment-13919031
 ] 

Karthik Kambatla commented on YARN-1734:


Sorry for all the confusion caused here - forgot that the rmadmin command also 
uses ConfiguredRMFailoverProxyProvider.

Played with a cluster with local configurations. It behaves as expected. 
refresh* refreshes the Active. The Standby refreshes everything on transition 
to active. Thanks [~xgong] for fixing the refresh commands, and for being 
patient with my questions/concerns. 

 RM should get the updated Configurations when it transits from Standby to 
 Active
 

 Key: YARN-1734
 URL: https://issues.apache.org/jira/browse/YARN-1734
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, 
 YARN-1734.4.patch, YARN-1734.5.patch, YARN-1734.6.patch, YARN-1734.7.patch


 Currently, we have ConfigurationProvider which can support 
 LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and 
 FileSystemBasedConfiguration is enabled, RM can not get the updated 
 Configurations when it transits from Standby to Active



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-986) RM DT token service should have service addresses of both RMs

2014-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919038#comment-13919038
 ] 

Hadoop QA commented on YARN-986:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12632426/yarn-986-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.mapreduce.v2.TestNonExistentJob
  org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
  org.apache.hadoop.mapred.TestClusterMapReduceTestCase
  org.apache.hadoop.mapred.TestJobName
  org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser
  org.apache.hadoop.fs.TestDFSIO
  org.apache.hadoop.mapreduce.v2.TestUberAM
  org.apache.hadoop.mapreduce.TestMRJobClient
  org.apache.hadoop.mapred.TestMerge
  org.apache.hadoop.mapred.TestReduceFetch
  org.apache.hadoop.mapred.TestLazyOutput
  org.apache.hadoop.mapred.TestReduceFetchFromPartialMem
  org.apache.hadoop.mapreduce.v2.TestMRJobs
  org.apache.hadoop.mapred.TestMRCJCFileInputFormat
  org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers
  org.apache.hadoop.mapred.TestJobSysDirWithDFS
  org.apache.hadoop.mapreduce.security.TestMRCredentials
  org.apache.hadoop.mapreduce.TestMapReduceLazyOutput
  org.apache.hadoop.mapreduce.lib.join.TestJoinProperties
  org.apache.hadoop.ipc.TestMRCJCSocketFactory
  org.apache.hadoop.mapred.TestMiniMRClasspath
  org.apache.hadoop.mapreduce.security.ssl.TestEncryptedShuffle
  org.apache.hadoop.conf.TestNoDefaultsJobConf
  org.apache.hadoop.mapred.TestMiniMRChildTask
  
org.apache.hadoop.mapreduce.lib.input.TestDelegatingInputFormat
  org.apache.hadoop.mapred.join.TestDatamerge
  org.apache.hadoop.mapred.lib.TestDelegatingInputFormat
  org.apache.hadoop.fs.TestFileSystem
  org.apache.hadoop.mapreduce.lib.join.TestJoinDatamerge
  org.apache.hadoop.mapreduce.security.TestBinaryTokenFile
  
org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3237//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3237//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3237//console

This message is automatically generated.

 RM DT token service should have service addresses of both RMs
 -

 Key: YARN-986
 URL: https://issues.apache.org/jira/browse/YARN-986
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-986-1.patch, yarn-986-2.patch, yarn-986-3.patch, 
 yarn-986-prelim-0.patch


 Previously: YARN should use cluster-id as token service address
 This needs to be done to support non-ip based fail over of RM. Once the 
 server sets the token service address to be this generic ClusterId/ServiceId, 
 clients can translate it to appropriate final IP and then be able to select 
 tokens via TokenSelectors.
 Some workarounds for other related issues 

[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919086#comment-13919086
 ] 

Hadoop QA commented on YARN-1408:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12629000/Yarn-1408.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3238//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3238//console

This message is automatically generated.

 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
 Fix For: 2.4.0

 Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
 Yarn-1408.4.patch, Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)