[jira] [Assigned] (YARN-1801) NPE in public localizer

2014-05-22 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo reassigned YARN-1801:
-

Assignee: Hong Zhiguo

> NPE in public localizer
> ---
>
> Key: YARN-1801
> URL: https://issues.apache.org/jira/browse/YARN-1801
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Jason Lowe
>Assignee: Hong Zhiguo
>Priority: Critical
>
> While investigating YARN-1800 found this in the NM logs that caused the 
> public localizer to shutdown:
> {noformat}
> 2014-01-23 01:26:38,655 INFO  localizer.ResourceLocalizationService 
> (ResourceLocalizationService.java:addResource(651)) - Downloading public 
> rsrc:{ 
> hdfs://colo-2:8020/user/fertrist/oozie-oozi/601-140114233013619-oozie-oozi-W/aggregator--map-reduce/map-reduce-launcher.jar,
>  1390440382009, FILE, null }
> 2014-01-23 01:26:38,656 FATAL localizer.ResourceLocalizationService 
> (ResourceLocalizationService.java:run(726)) - Error: Shutting down
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.run(ResourceLocalizationService.java:712)
> 2014-01-23 01:26:38,656 INFO  localizer.ResourceLocalizationService 
> (ResourceLocalizationService.java:run(728)) - Public cache exiting
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1801) NPE in public localizer

2014-05-22 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-1801:
--

Attachment: YARN-1801.patch

{code}
Path local = completed.get();
{code}
may throw ExecutionException and assoc may be null.

When both of them happen, we got NPE in
{code}
LOG.info("Failed to download rsrc " + assoc.getResource(),
  e.getCause());
{code}

And this is exactly the line "ResourceLocalizationService.java:712" of commit 
dd9c059 (2013-10-05 YARN-1254).

> NPE in public localizer
> ---
>
> Key: YARN-1801
> URL: https://issues.apache.org/jira/browse/YARN-1801
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Jason Lowe
>Assignee: Hong Zhiguo
>Priority: Critical
> Attachments: YARN-1801.patch
>
>
> While investigating YARN-1800 found this in the NM logs that caused the 
> public localizer to shutdown:
> {noformat}
> 2014-01-23 01:26:38,655 INFO  localizer.ResourceLocalizationService 
> (ResourceLocalizationService.java:addResource(651)) - Downloading public 
> rsrc:{ 
> hdfs://colo-2:8020/user/fertrist/oozie-oozi/601-140114233013619-oozie-oozi-W/aggregator--map-reduce/map-reduce-launcher.jar,
>  1390440382009, FILE, null }
> 2014-01-23 01:26:38,656 FATAL localizer.ResourceLocalizationService 
> (ResourceLocalizationService.java:run(726)) - Error: Shutting down
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.run(ResourceLocalizationService.java:712)
> 2014-01-23 01:26:38,656 INFO  localizer.ResourceLocalizationService 
> (ResourceLocalizationService.java:run(728)) - Public cache exiting
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2049) Delegation token stuff for the timeline sever

2014-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005700#comment-14005700
 ] 

Hadoop QA commented on YARN-2049:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646158/YARN-2049.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3788//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3788//console

This message is automatically generated.

> Delegation token stuff for the timeline sever
> -
>
> Key: YARN-2049
> URL: https://issues.apache.org/jira/browse/YARN-2049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch, 
> YARN-2049.4.patch, YARN-2049.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2092) Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 2.5.0-SNAPSHOT

2014-05-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005718#comment-14005718
 ] 

Steve Loughran commented on YARN-2092:
--

This seems like from the HADOOP-10104 patch.  Which went in because the 2.2+ 
version of jackson was so out of date is was breaking other things.

I'm not sure its so much incompatible as that TEZ is trying to push in its own 
version of jackon, which is then leading to classpath mixing problems. Even if 
you try to push in one set of the JARs ahead of the other, things are going to 
break. I know, I've tried.

jackson 1.x should be compatible at run time with code build for previous 
versions. If there's a link problem there then it's something we can take up 
with the Jackson team. 



> Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 
> 2.5.0-SNAPSHOT
> 
>
> Key: YARN-2092
> URL: https://issues.apache.org/jira/browse/YARN-2092
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>
> Came across this when trying to integrate with the timeline server. Using a 
> 1.8.8 dependency of jackson works fine against 2.4.0 but fails against 
> 2.5.0-SNAPSHOT which needs 1.9.13. This is in the scenario where the user 
> jars are first in the classpath.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2088) Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder

2014-05-22 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005722#comment-14005722
 ] 

Binglin Chang commented on YARN-2088:
-

Hi Zhiguo, Thanks for the comments, nice catch.
Those two lines are used in every record class... so delete them in a single 
place actually break code conversion, and it's not related to this jira.
We may discuss whether to delete them all in other jira.


> Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder
> 
>
> Key: YARN-2088
> URL: https://issues.apache.org/jira/browse/YARN-2088
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: YARN-2088.v1.patch
>
>
> Some fields(set,list) are added to proto builders many times, we need to 
> clear those fields before add, otherwise the result proto contains more 
> contents.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2030) Use StateMachine to simplify handleStoreEvent() in RMStateStore

2014-05-22 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005753#comment-14005753
 ] 

Binglin Chang commented on YARN-2030:
-

Hi Jian He,
Thanks for the comments, looks like PBImpl already has ProtoBase as super 
class, so we can't change interface to abstract class

{code}
public class ApplicationAttemptStateDataPBImpl
extends ProtoBase 
implements ApplicationAttemptStateData {
{code}


> Use StateMachine to simplify handleStoreEvent() in RMStateStore
> ---
>
> Key: YARN-2030
> URL: https://issues.apache.org/jira/browse/YARN-2030
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Junping Du
>Assignee: Binglin Chang
> Attachments: YARN-2030.v1.patch, YARN-2030.v2.patch
>
>
> Now the logic to handle different store events in handleStoreEvent() is as 
> following:
> {code}
> if (event.getType().equals(RMStateStoreEventType.STORE_APP)
> || event.getType().equals(RMStateStoreEventType.UPDATE_APP)) {
>   ...
>   if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
> ...
>   } else {
> ...
>   }
>   ...
>   try {
> if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
>   ...
> } else {
>   ...
> }
>   } 
>   ...
> } else if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)
> || event.getType().equals(RMStateStoreEventType.UPDATE_APP_ATTEMPT)) {
>   ...
>   if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) {
> ...
>   } else {
> ...
>   }
> ...
> if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) {
>   ...
> } else {
>   ...
> }
>   }
>   ...
> } else if (event.getType().equals(RMStateStoreEventType.REMOVE_APP)) {
> ...
> } else {
>   ...
> }
> }
> {code}
> This is not only confuse people but also led to mistake easily. We may 
> leverage state machine to simply this even no state transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2092) Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 2.5.0-SNAPSHOT

2014-05-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005804#comment-14005804
 ] 

Steve Loughran commented on YARN-2092:
--

I should add that the underlying issue is that the AM gets then entire CP from 
the {{yarn.lib.classpath}}. That's mandatory to pick up a version of the hadoop 
binaries (and -site.xml files) compatible with the rest of the cluster. But it 
brings in all the other dependencies which hadoop itself relies on. As hadoop 
evolves, this problem will only continue.

The only viable long-term solution is to somehow support OSGi-launched AMs, so 
the AM only gets the org.apache.hadoop classes from the hadoop JARs, and has to 
explicitly add everything itself. See HADOOP-7977 for this -maybe it's 
something we could target for hadoop 3.0 driven by the needs of AMs

> Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 
> 2.5.0-SNAPSHOT
> 
>
> Key: YARN-2092
> URL: https://issues.apache.org/jira/browse/YARN-2092
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>
> Came across this when trying to integrate with the timeline server. Using a 
> 1.8.8 dependency of jackson works fine against 2.4.0 but fails against 
> 2.5.0-SNAPSHOT which needs 1.9.13. This is in the scenario where the user 
> jars are first in the classpath.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-22 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005863#comment-14005863
 ] 

Rohith commented on YARN-1366:
--

bq. I mean what will go wrong is we allow unregister without register? Is it 
fundamentally wrong?
Allowing unregister without register, move application to FINISHED state(after 
handling unregistered event at launched) which supposed to be Failed state. If 
it can be acceptable, then its fine to go ahead.

> ApplicationMasterService should Resync with the AM upon allocate call after 
> restart
> ---
>
> Key: YARN-1366
> URL: https://issues.apache.org/jira/browse/YARN-1366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Rohith
> Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.patch, 
> YARN-1366.prototype.patch, YARN-1366.prototype.patch
>
>
> The ApplicationMasterService currently sends a resync response to which the 
> AM responds by shutting down. The AM behavior is expected to change to 
> calling resyncing with the RM. Resync means resetting the allocate RPC 
> sequence number to 0 and the AM should send its entire outstanding request to 
> the RM. Note that if the AM is making its first allocate call to the RM then 
> things should proceed like normal without needing a resync. The RM will 
> return all containers that have completed since the RM last synced with the 
> AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-05-22 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006066#comment-14006066
 ] 

Eric Payne commented on YARN-415:
-

The Generic Application History server stores all of the information about 
containers that are needed to calculate memory seconds and vcore seconds. Right 
now, since the Generic Application Server is tied closely with the Timeline 
Server, this does not work on a secured cluster. Also, the information is only 
available via REST API right now, and there would need to be some scripting and 
parsing of the REST APIs to rolled up metrics for each app. So, I think this 
JIRA still would be very helpful and useful.

FYI, On an unsecured cluster with the Generic Application History Server and 
the Timeline Server configured and running, the following REST APIs will give 
enough information about an app to calculate memory seconds and vcore seconds:

{panel:title=Get list of app attempts for a specified 
appID|titleBGColor=#F7D6C1}
curl --compressed -H "Accept: application/json" -X GET 
"http://:/ws/v1/applicationhistory/apps//appattempts"
{panel}
{panel:title=For each app attempt, get all container info|titleBGColor=#F7D6C1}
curl --compressed -H "Accept: application/json" -X GET 
"http:///ws/v1/applicationhistory/apps//appattempts//containers"
 
{panel}

> Capture memory utilization at the app-level for chargeback
> --
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>Assignee: Andrey Klochkov
> Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
> YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
> YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
> YARN-415--n9.patch, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1474) Make schedulers services

2014-05-22 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1474:
-

Attachment: YARN-1474.16.patch

Rebased on trunk. 

> Make schedulers services
> 
>
> Key: YARN-1474
> URL: https://issues.apache.org/jira/browse/YARN-1474
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Sandy Ryza
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1474.1.patch, YARN-1474.10.patch, 
> YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, 
> YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.16.patch, 
> YARN-1474.2.patch, YARN-1474.3.patch, YARN-1474.4.patch, YARN-1474.5.patch, 
> YARN-1474.6.patch, YARN-1474.7.patch, YARN-1474.8.patch, YARN-1474.9.patch
>
>
> Schedulers currently have a reinitialize but no start and stop.  Fitting them 
> into the YARN service model would make things more coherent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2049) Delegation token stuff for the timeline sever

2014-05-22 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006138#comment-14006138
 ] 

Varun Vasudev commented on YARN-2049:
-

Couple of things -
1. In the function managementOperation, should there be a null check for token?
2. In the function managementOperation, you call secretManager.cancelToken(dt, 
UserGroupInformation.getCurrentUser().getUserName()) - should you use 
getCurrentuser().getUserName? or ownerUGI.getUserName()? The reason I ask is 
that when creating the token, you're using ownerUGI.

> Delegation token stuff for the timeline sever
> -
>
> Key: YARN-2049
> URL: https://issues.apache.org/jira/browse/YARN-2049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch, 
> YARN-2049.4.patch, YARN-2049.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-05-22 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1365:


Attachment: YARN-1365.003.patch

Fixed race conditions in the test that was failing. The failing test would only 
repro in hudson after uploading patch. 

> ApplicationMasterService to allow Register and Unregister of an app that was 
> running before restart
> ---
>
> Key: YARN-1365
> URL: https://issues.apache.org/jira/browse/YARN-1365
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Anubhav Dhoot
> Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
> YARN-1365.003.patch, YARN-1365.initial.patch
>
>
> For an application that was running before restart, the 
> ApplicationMasterService currently throws an exception when the app tries to 
> make the initial register or final unregister call. These should succeed and 
> the RMApp state machine should transition to completed like normal. 
> Unregistration should succeed for an app that the RM considers complete since 
> the RM may have died after saving completion in the store but before 
> notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2095) Large MapReduce Job stops responding

2014-05-22 Thread Clay McDonald (JIRA)
Clay McDonald created YARN-2095:
---

 Summary: Large MapReduce Job stops responding
 Key: YARN-2095
 URL: https://issues.apache.org/jira/browse/YARN-2095
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.3 (x86_64) on vmware 10 running HDP-2.0.6
Reporter: Clay McDonald
Priority: Blocker


Very large jobs (7,455 Mappers and 999 Reducers) hang. Jobs run well but 
logging to container logs stop after running 33 hours. The job appears to be 
hung. The status of the job is "RUNNING". No error messages found in logs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2014-05-22 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006222#comment-14006222
 ] 

Abin Shahab commented on YARN-1964:
---

Does others have comments on it. [~acmurthy]] ?


> Create Docker analog of the LinuxContainerExecutor in YARN
> --
>
> Key: YARN-1964
> URL: https://issues.apache.org/jira/browse/YARN-1964
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Abin Shahab
> Attachments: yarn-1964-branch-2.2.0-docker.patch, 
> yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch
>
>
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to *package* their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions

2014-05-22 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006248#comment-14006248
 ] 

Wei Yan commented on YARN-596:
--

Hey, [~sandyr], sorry for the late reply. Still confuse here.
So as you said, a queue is safe and doesn't allow preemption only it satisfies 
the condition "(usage.memory <= fairshare.memory) & (usage.vcores <= 
fairshare.vcores)". This condition works fine for DRF. But for FairSharePolicy, 
as the fairshare.vcores is always 0 (except for root), so this condition cannot 
be satisfied and all queues are always allowed to preempt.

> In fair scheduler, intra-application container priorities affect 
> inter-application preemption decisions
> ---
>
> Key: YARN-596
> URL: https://issues.apache.org/jira/browse/YARN-596
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, 
> YARN-596.patch, YARN-596.patch
>
>
> In the fair scheduler, containers are chosen for preemption in the following 
> way:
> All containers for all apps that are in queues that are over their fair share 
> are put in a list.
> The list is sorted in order of the priority that the container was requested 
> in.
> This means that an application can shield itself from preemption by 
> requesting it's containers at higher priorities, which doesn't really make 
> sense.
> Also, an application that is not over its fair share, but that is in a queue 
> that is over it's fair share is just as likely to have containers preempted 
> as an application that is over its fair share.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2049) Delegation token stuff for the timeline sever

2014-05-22 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006453#comment-14006453
 ] 

Vinod Kumar Vavilapalli commented on YARN-2049:
---

Thanks for working on this, Zhijie!

Some comments on the patch

TimelineKerberosAuthenticator
 - Not clear what TimelineDelegationTokenResponse validateAndParseResponse() is 
doing with class loading, construction etc. Can you explain and may be also add 
code comments?

TimelineAuthenticationFilter
 - Explain what getConfiguration() overrides and add a code comment?

TimelineKerberosAuthenticationHandler
 - This borrows a lot of code from HttpFSKerberosAuthenticationHandler.java. We 
should refactor either here or in a separate JIRA.

Nits
 - TestDistributedShell change is unnecessary
 - TimelineDelegationTokenSelector: Wrap the debug logging in debugEnabled 
checks.
 - ApplicationHistoryServer.java
-- Forced config setting of the filter: What happens if  the cluster has 
another authentication filter? Is the guideline to override it (which is what 
the patch is doing)?

h4. Source code refactor

TimelineKerberosAuthenticationHandler
 - Rename to TimelineClientAuthenticationService?

TimelineKerberosAuthenticator
 - It seems like TimelineKerberosAuthenticator is completely client side code 
and so should be moved to the client module
 - To do that we will extract some of the constants and the 
DelegationTokenOperation enum as top level entities into the common module.

TimelineAuthenticationFilterInitializer
 - This is almost the same as the common AuthenticationFilterInitializer.java. 
Let's just refactor AuthenticationFilterInitializer.java and extend it to only 
change class names. Similarly to how TimelineAuthenticationFilter extends 
AuthenticationFilter.

TimelineDelegationTokenSecretManagerService:
 - We are sharing the configs for update/renewal etc with the ResourceManager. 
That seems fine for now - logically you want both the tokens to follow similar 
expiry and life-cycle
 - This also shares a bunch of code with 
org/apache/hadoop/lib/service/security/DelegationTokenManagerService. We may or 
may not want to reuse some code - just throwing it out.

> Delegation token stuff for the timeline sever
> -
>
> Key: YARN-2049
> URL: https://issues.apache.org/jira/browse/YARN-2049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch, 
> YARN-2049.4.patch, YARN-2049.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-05-22 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006459#comment-14006459
 ] 

Mayank Bansal commented on YARN-2074:
-

Thanks [~jianhe] for the patch. Overall looks good.
some nits

{code}
  maxAppAttempts <= attempts.size()
{code}
Can we use this?
{code}
maxAppAttempts == getAttemptFailureCount()
{code}

{code}
  public boolean isPreempted() {
 return getDiagnostics().contains(SchedulerUtils.PREEMPTED_CONTAINER);
   }
{code}

I think we need to compare the exit status  (-102) instead of relying on string 
message.


> Preemption of AM containers shouldn't count towards AM failures
> ---
>
> Key: YARN-2074
> URL: https://issues.apache.org/jira/browse/YARN-2074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jian He
> Attachments: YARN-2074.1.patch, YARN-2074.2.patch
>
>
> One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
> containers getting preempted shouldn't count towards AM failures and thus 
> shouldn't eventually fail applications.
> We should explicitly handle AM container preemption/kill as a separate issue 
> and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-05-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006465#comment-14006465
 ] 

Jian He commented on YARN-1408:
---

Hi [~sunilg], agree that we should remove container from 
newlyAllocatedContainers when preemption happens. As per the race condition you 
mentioned, we may also preempt ACQUIRED container?
In fact, I think the best container to be preempted is the ALLOCATED container 
as these containers are not yet alive from the user's perspective. As per the 
race condition that [RM lost the resource request], today the resource request 
is decremented when container is allocated.  we may change it to decrement the 
resource request only when the container is pulled by the AM ? We can do this 
separately if this makes sense.

> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
> Yarn-1408.4.patch, Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-05-22 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006486#comment-14006486
 ] 

Tsuyoshi OZAWA commented on YARN-1365:
--

[~adhoot], thank you for updating a patch. Looks good to me overall.  
Minor nits:

We can removed following unused values:
{code}
// TestApplicationMasterLauncher.java
boolean thrown = false;
{code}

{code}
// TestRMRestart.java
Map rmAppState =
rmState.getApplicationState();
{code}

> ApplicationMasterService to allow Register and Unregister of an app that was 
> running before restart
> ---
>
> Key: YARN-1365
> URL: https://issues.apache.org/jira/browse/YARN-1365
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Anubhav Dhoot
> Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
> YARN-1365.003.patch, YARN-1365.initial.patch
>
>
> For an application that was running before restart, the 
> ApplicationMasterService currently throws an exception when the app tries to 
> make the initial register or final unregister call. These should succeed and 
> the RMApp state machine should transition to completed like normal. 
> Unregistration should succeed for an app that the RM considers complete since 
> the RM may have died after saving completion in the store but before 
> notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-05-22 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2074:
--

Attachment: YARN-2074.3.patch

> Preemption of AM containers shouldn't count towards AM failures
> ---
>
> Key: YARN-2074
> URL: https://issues.apache.org/jira/browse/YARN-2074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jian He
> Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch
>
>
> One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
> containers getting preempted shouldn't count towards AM failures and thus 
> shouldn't eventually fail applications.
> We should explicitly handle AM container preemption/kill as a separate issue 
> and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-05-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006525#comment-14006525
 ] 

Jian He commented on YARN-2074:
---

Thanks Xuan and Mayank for the review ! 
bq. maxAppAttempts == getAttemptFailureCount()
good point.
Fixed the attempt to compare against the exit status to determine preempted or 
not.

> Preemption of AM containers shouldn't count towards AM failures
> ---
>
> Key: YARN-2074
> URL: https://issues.apache.org/jira/browse/YARN-2074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jian He
> Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch
>
>
> One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
> containers getting preempted shouldn't count towards AM failures and thus 
> shouldn't eventually fail applications.
> We should explicitly handle AM container preemption/kill as a separate issue 
> and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2049) Delegation token stuff for the timeline sever

2014-05-22 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2049:
--

Attachment: YARN-2049.6.patch

Update the patch accordingly

> Delegation token stuff for the timeline sever
> -
>
> Key: YARN-2049
> URL: https://issues.apache.org/jira/browse/YARN-2049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch, 
> YARN-2049.4.patch, YARN-2049.5.patch, YARN-2049.6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2049) Delegation token stuff for the timeline sever

2014-05-22 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006542#comment-14006542
 ] 

Zhijie Shen commented on YARN-2049:
---

Thanks for review, Vinod and Varun! Please see the response bellow:

bq. 1. In the function managementOperation, should there be a null check for 
token?

There're following code before processing each dtOp:
{code}
if (dtOp.requiresKerberosCredentials() && token == null) {
  response.sendError(HttpServletResponse.SC_UNAUTHORIZED,
  MessageFormat.format(
  "Operation [{0}] requires SPNEGO authentication established",
  dtOp));
  requestContinues = false;
{code}
Get and renew both require kerberos credentials, such that if token == null, 
the code will fall into this part. Cancel didn't require credentials before 
refer to HttpFS's code. However, I think we should enforce kerberos credentials 
for cancel as well. After that, the NPE risk is gone.

bq. In the function managementOperation, you call secretManager.cancelToken(dt, 
UserGroupInformation.getCurrentUser().getUserName()) - should you use 
getCurrentuser().getUserName? or ownerUGI.getUserName()? 

Good catch, we should use token.getUserName here as well.

bq. TimelineKerberosAuthenticator

Some errors may cause TimelineAuthenticator not getting the correct response. 
If the status code is not 200, the json content may contain the exception 
information from the server, we can use the information recover exception 
object. This is inspired by HttpFSUtils.validateResponse, but I changed to use 
Jackson to parse the json content here.

bq. TimelineAuthenticationFilter

In the configuration we can simply set the authentication type to "kerberos", 
but in the timeline sever, we want to replace it with the class name of the 
customized authentication service. Otherwise, the standard authentication 
handler will be used instead. I added the code comments there.

bq. TimelineKerberosAuthenticationHandler
bq. TimelineDelegationTokenSecretManagerService.

Yeah, we need to look into how to reuse the existing code, but how about 
postpone it later? I'm going to file a separate Jira for code refactoring.

bq. TestDistributedShell change is unnecessary

Removed.

bq. TimelineDelegationTokenSelector: Wrap the debug logging in debugEnabled 
checks.

Added the debugEnabled checks.

bq. ApplicationHistoryServer.java

Actually it will not override the other initializers. Instead, I just append a 
TimelineAuthenticationFilterInitializer. Anyway, I enhance the condition here: 
not only the security should be enabled, but also "kerberos" authentication is 
desired.

bq. TimelineKerberosAuthenticationHandler

Done.

bq. TimelineKerberosAuthenticator.

Good suggestion. I split the code accordingly.

bq. TimelineAuthenticationFilterInitializer

AuthenticationFilterInitializer has a single method to do everything, and the 
prefix is a static variable, which makes me a bit difficult to override part of 
code without changing AuthenticationFilterInitializer. One another issue is 
that AuthenticationFilterInitializer requires user to supply a secret file, 
which is not actually required by AuthenticationFilter (HADOOP-10600).





> Delegation token stuff for the timeline sever
> -
>
> Key: YARN-2049
> URL: https://issues.apache.org/jira/browse/YARN-2049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch, 
> YARN-2049.4.patch, YARN-2049.5.patch, YARN-2049.6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-05-22 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006548#comment-14006548
 ] 

Mayank Bansal commented on YARN-1408:
-

I agree with [~jianhe] and [~devaraj.k]
We should be able to preempt the container in ALLOCATED state. 

bq. oday the resource request is decremented when container is allocated. we 
may change it to decrement the resource request only when the container is 
pulled by the AM ?
I am not sure if thats the right thing as you dont want to run into other race 
conditions when container is been allocated however capacity is given to some 
other AM's?



> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
> Yarn-1408.4.patch, Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-05-22 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-1913:
--

Attachment: YARN-1913.patch

> With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
> --
>
> Key: YARN-1913
> URL: https://issues.apache.org/jira/browse/YARN-1913
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: Wei Yan
> Attachments: YARN-1913.patch, YARN-1913.patch
>
>
> It's possible to deadlock a cluster by submitting many applications at once, 
> and have all cluster resources taken up by AMs.
> One solution is for the scheduler to limit resources taken up by AMs, as a 
> percentage of total cluster resources, via a "maxApplicationMasterShare" 
> config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions

2014-05-22 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006609#comment-14006609
 ] 

Sandy Ryza commented on YARN-596:
-

Ah, I see what you're saying.  Good point.  In that case we'll probably need to 
push that check into the SchedulingPolicy and call it inside the loop in 
preemptContainer().

> In fair scheduler, intra-application container priorities affect 
> inter-application preemption decisions
> ---
>
> Key: YARN-596
> URL: https://issues.apache.org/jira/browse/YARN-596
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, 
> YARN-596.patch, YARN-596.patch
>
>
> In the fair scheduler, containers are chosen for preemption in the following 
> way:
> All containers for all apps that are in queues that are over their fair share 
> are put in a list.
> The list is sorted in order of the priority that the container was requested 
> in.
> This means that an application can shield itself from preemption by 
> requesting it's containers at higher priorities, which doesn't really make 
> sense.
> Also, an application that is not over its fair share, but that is in a queue 
> that is over it's fair share is just as likely to have containers preempted 
> as an application that is over its fair share.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-05-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006613#comment-14006613
 ] 

Jian He commented on YARN-1408:
---

Seems more problem with the approach I mentioned, if the request is not updated 
at the time container is allocated, and AM doesn't do the following allocate, 
more containers will be allocated from the same request when NM heartbeats

> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
> Yarn-1408.4.patch, Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions

2014-05-22 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006615#comment-14006615
 ] 

Wei Yan commented on YARN-596:
--

yes, we can check the queue's policy in the preCheck function. If DRF, we use 
Resources.fitsIn(); if Fair, we use DEFAULT_CALCULATOR. Sounds good?

> In fair scheduler, intra-application container priorities affect 
> inter-application preemption decisions
> ---
>
> Key: YARN-596
> URL: https://issues.apache.org/jira/browse/YARN-596
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, 
> YARN-596.patch, YARN-596.patch
>
>
> In the fair scheduler, containers are chosen for preemption in the following 
> way:
> All containers for all apps that are in queues that are over their fair share 
> are put in a list.
> The list is sorted in order of the priority that the container was requested 
> in.
> This means that an application can shield itself from preemption by 
> requesting it's containers at higher priorities, which doesn't really make 
> sense.
> Also, an application that is not over its fair share, but that is in a queue 
> that is over it's fair share is just as likely to have containers preempted 
> as an application that is over its fair share.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2012) Fair Scheduler : Default rule in queue placement policy can take a queue as an optional attribute

2014-05-22 Thread Ashwin Shankar (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar updated YARN-2012:
-

Description: 
Currently 'default' rule in queue placement policy,if applied,puts the app in 
root.default queue. It would be great if we can make 'default' rule optionally 
point to a different queue as default queue .
This default queue can be a leaf queue or it can also be an parent queue if the 
'default' rule is nested inside nestedUserQueue rule(YARN-1864).

  was:
Currently 'default' rule in queue placement policy,if applied,puts the app in 
root.default queue. It would be great if we can make 'default' rule optionally 
point to a different queue as default queue . This queue should be an existing 
queue,if not we fall back to root.default queue hence keeping this rule as 
terminal.
This default queue can be a leaf queue or it can also be an parent queue if the 
'default' rule is nested inside nestedUserQueue rule(YARN-1864).


> Fair Scheduler : Default rule in queue placement policy can take a queue as 
> an optional attribute
> -
>
> Key: YARN-2012
> URL: https://issues.apache.org/jira/browse/YARN-2012
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
>  Labels: scheduler
> Attachments: YARN-2012-v1.txt, YARN-2012-v2.txt
>
>
> Currently 'default' rule in queue placement policy,if applied,puts the app in 
> root.default queue. It would be great if we can make 'default' rule 
> optionally point to a different queue as default queue .
> This default queue can be a leaf queue or it can also be an parent queue if 
> the 'default' rule is nested inside nestedUserQueue rule(YARN-1864).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions

2014-05-22 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006641#comment-14006641
 ] 

Sandy Ryza commented on YARN-596:
-

Sounds good

> In fair scheduler, intra-application container priorities affect 
> inter-application preemption decisions
> ---
>
> Key: YARN-596
> URL: https://issues.apache.org/jira/browse/YARN-596
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, 
> YARN-596.patch, YARN-596.patch
>
>
> In the fair scheduler, containers are chosen for preemption in the following 
> way:
> All containers for all apps that are in queues that are over their fair share 
> are put in a list.
> The list is sorted in order of the priority that the container was requested 
> in.
> This means that an application can shield itself from preemption by 
> requesting it's containers at higher priorities, which doesn't really make 
> sense.
> Also, an application that is not over its fair share, but that is in a queue 
> that is over it's fair share is just as likely to have containers preempted 
> as an application that is over its fair share.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2049) Delegation token stuff for the timeline sever

2014-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006645#comment-14006645
 ] 

Hadoop QA commented on YARN-2049:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646401/YARN-2049.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/3789//artifact/trunk/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

  org.apache.hadoop.yarn.client.TestRMAdminCLI

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3789//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3789//console

This message is automatically generated.

> Delegation token stuff for the timeline sever
> -
>
> Key: YARN-2049
> URL: https://issues.apache.org/jira/browse/YARN-2049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch, 
> YARN-2049.4.patch, YARN-2049.5.patch, YARN-2049.6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006646#comment-14006646
 ] 

Wangda Tan commented on YARN-1368:
--

[~jianhe],
Thanks for addressing my comments, I've looked at your latest patch, only some 
minor comments
1) yarn_server_common_service_protos.proto:
{code}
+  repeated ContainerRecoveryReportProto container_report = 6;
{code}
should be container_reports

2) AppSchedulingInfo.java:
{code}
+if (containerId >= containerIdCounter.get()) {
+  containerIdCounter.set(containerId);
+}
{code}
Better to use compareAndSet in a while loop in case of race condition

3) It's better to add a test  for ContainerRecoveryReport

> Common work to re-populate containers’ state into scheduler
> ---
>
> Key: YARN-1368
> URL: https://issues.apache.org/jira/browse/YARN-1368
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
> YARN-1368.combined.001.patch, YARN-1368.preliminary.patch
>
>
> YARN-1367 adds support for the NM to tell the RM about all currently running 
> containers upon registration. The RM needs to send this information to the 
> schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
> the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions

2014-05-22 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-596:
-

Attachment: YARN-596.patch

Update a new patch to solve Sandy's comments.

> In fair scheduler, intra-application container priorities affect 
> inter-application preemption decisions
> ---
>
> Key: YARN-596
> URL: https://issues.apache.org/jira/browse/YARN-596
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, 
> YARN-596.patch, YARN-596.patch, YARN-596.patch
>
>
> In the fair scheduler, containers are chosen for preemption in the following 
> way:
> All containers for all apps that are in queues that are over their fair share 
> are put in a list.
> The list is sorted in order of the priority that the container was requested 
> in.
> This means that an application can shield itself from preemption by 
> requesting it's containers at higher priorities, which doesn't really make 
> sense.
> Also, an application that is not over its fair share, but that is in a queue 
> that is over it's fair share is just as likely to have containers preempted 
> as an application that is over its fair share.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster

2014-05-22 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2073:
---

Attachment: yarn-2073-3.patch

Spoke to Sandy offline. We think there should be a utilization threshold after 
which preemption kicks in. The new patch is along those lines. 

> FairScheduler starts preempting resources even with free resources on the 
> cluster
> -
>
> Key: YARN-2073
> URL: https://issues.apache.org/jira/browse/YARN-2073
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Attachments: yarn-2073-0.patch, yarn-2073-1.patch, yarn-2073-2.patch, 
> yarn-2073-3.patch
>
>
> Preemption should kick in only when the currently available slots don't match 
> the request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions

2014-05-22 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006676#comment-14006676
 ] 

Sandy Ryza commented on YARN-596:
-

The current patch uses the queue's policy to preemptContainerPreCheck.  We 
should the parent's policy. (Consider the case of a leaf queue with FIFO under 
a parent queue with DRF - we should use DRF to decide whether we should skip 
the leaf queue).

Also, we should add a new method to SchedulingPolicy instead of checking with 
instanceof.

{code}
+  if (Resources.fitsIn(getResourceUsage(), getFairShare())) {
+return false;
+  } else {
+return true;
+  }
{code}
Can just use "return Resources.fitsIn(getResourceUsage(), getFairShare())".


> In fair scheduler, intra-application container priorities affect 
> inter-application preemption decisions
> ---
>
> Key: YARN-596
> URL: https://issues.apache.org/jira/browse/YARN-596
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, 
> YARN-596.patch, YARN-596.patch, YARN-596.patch
>
>
> In the fair scheduler, containers are chosen for preemption in the following 
> way:
> All containers for all apps that are in queues that are over their fair share 
> are put in a list.
> The list is sorted in order of the priority that the container was requested 
> in.
> This means that an application can shield itself from preemption by 
> requesting it's containers at higher priorities, which doesn't really make 
> sense.
> Also, an application that is not over its fair share, but that is in a queue 
> that is over it's fair share is just as likely to have containers preempted 
> as an application that is over its fair share.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster

2014-05-22 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006682#comment-14006682
 ] 

Sandy Ryza commented on YARN-2073:
--

{code}
+  /** Preemption related variables */
{code}
Nit: use "//" like the other comments.

Can you add the new property in the Fair Scheduler doc?

{code}
+  updateRootQueueMetrics();
{code}
My understanding is that this shouldn't be needed in shouldAttemptPreemption.  
Have you observed otherwise?

Would it be possible to move the TestFairScheduler refactoring to a separate 
JIRA?  If it's too difficult to entangle at this point, I'm ok with it.

> FairScheduler starts preempting resources even with free resources on the 
> cluster
> -
>
> Key: YARN-2073
> URL: https://issues.apache.org/jira/browse/YARN-2073
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Attachments: yarn-2073-0.patch, yarn-2073-1.patch, yarn-2073-2.patch, 
> yarn-2073-3.patch
>
>
> Preemption should kick in only when the currently available slots don't match 
> the request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-2095) Large MapReduce Job stops responding

2014-05-22 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-2095.
---

Resolution: Invalid

[~sunliners81], we have run much bigger jobs (100K maps) and those that run for 
long time without any issues. There is only one limitation that I know of - in 
secure clusters tokens expire after 7 days.

In any case, please pursue this on user mailing lists and create a bug when you 
are sure there is one. Closing this as invalid for now, please reopen if you 
disagree.

> Large MapReduce Job stops responding
> 
>
> Key: YARN-2095
> URL: https://issues.apache.org/jira/browse/YARN-2095
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0
> Environment: CentOS 6.3 (x86_64) on vmware 10 running HDP-2.0.6
>Reporter: Clay McDonald
>Priority: Blocker
>
> Very large jobs (7,455 Mappers and 999 Reducers) hang. Jobs run well but 
> logging to container logs stop after running 33 hours. The job appears to be 
> hung. The status of the job is "RUNNING". No error messages found in logs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2049) Delegation token stuff for the timeline sever

2014-05-22 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2049:
--

Attachment: YARN-2049.7.patch

Fix the javadoc warnings, the test failure is not related. See YARN-2075.

> Delegation token stuff for the timeline sever
> -
>
> Key: YARN-2049
> URL: https://issues.apache.org/jira/browse/YARN-2049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch, 
> YARN-2049.4.patch, YARN-2049.5.patch, YARN-2049.6.patch, YARN-2049.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster

2014-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006705#comment-14006705
 ] 

Hadoop QA commented on YARN-2073:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646425/yarn-2073-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3790//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3790//console

This message is automatically generated.

> FairScheduler starts preempting resources even with free resources on the 
> cluster
> -
>
> Key: YARN-2073
> URL: https://issues.apache.org/jira/browse/YARN-2073
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Attachments: yarn-2073-0.patch, yarn-2073-1.patch, yarn-2073-2.patch, 
> yarn-2073-3.patch
>
>
> Preemption should kick in only when the currently available slots don't match 
> the request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-05-22 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006735#comment-14006735
 ] 

Mayank Bansal commented on YARN-2074:
-

+1 LGTM

Thanks,
Mayank

> Preemption of AM containers shouldn't count towards AM failures
> ---
>
> Key: YARN-2074
> URL: https://issues.apache.org/jira/browse/YARN-2074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jian He
> Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch
>
>
> One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
> containers getting preempted shouldn't count towards AM failures and thus 
> shouldn't eventually fail applications.
> We should explicitly handle AM container preemption/kill as a separate issue 
> and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-05-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006743#comment-14006743
 ] 

Jian He commented on YARN-2074:
---

Talked with Vinod offline, the big problem with this is even if we don't count 
AM preemption towards AM failures on RM side, MR AM itself checks the attempt 
id against the max-attempt count for recovery. Work around is to reset the 
MAX-ATTEMPT env each time launching the AM which sounds a bit hacky though.

> Preemption of AM containers shouldn't count towards AM failures
> ---
>
> Key: YARN-2074
> URL: https://issues.apache.org/jira/browse/YARN-2074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jian He
> Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch
>
>
> One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
> containers getting preempted shouldn't count towards AM failures and thus 
> shouldn't eventually fail applications.
> We should explicitly handle AM container preemption/kill as a separate issue 
> and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2024) IOException in AppLogAggregatorImpl does not give stacktrace and leaves aggregated TFile in a bad state.

2014-05-22 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2024:
--

Priority: Critical  (was: Major)
Target Version/s: 2.4.1

> IOException in AppLogAggregatorImpl does not give stacktrace and leaves 
> aggregated TFile in a bad state.
> 
>
> Key: YARN-2024
> URL: https://issues.apache.org/jira/browse/YARN-2024
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Eric Payne
>Priority: Critical
>
> Multiple issues were encountered when AppLogAggregatorImpl encountered an 
> IOException in AppLogAggregatorImpl#uploadLogsForContainer while aggregating 
> yarn-logs for an application that had very large (>150G each) error logs.
> - An IOException was encountered during the LogWriter#append call, and a 
> message was printed, but no stacktrace was provided. Message: "ERROR: 
> Couldn't upload logs for container_n_nnn_nn_nn. Skipping 
> this container."
> - After the IOExceptin, the TFile is in a bad state, so subsequent calls to 
> LogWriter#append fail with the following stacktrace:
> 2014-04-16 13:29:09,772 [LogAggregationService #17907] ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[LogAggregationService #17907,5,main] threw an Exception.
> java.lang.IllegalStateException: Incorrect state to start a new key: IN_VALUE
> at 
> org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:528)
> at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.append(AggregatedLogFormat.java:262)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:164)
> ...
> - At this point, the yarn-logs cleaner still thinks the thread is 
> aggregating, so the huge yarn-logs never get cleaned up for that application.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2024) IOException in AppLogAggregatorImpl does not give stacktrace and leaves aggregated TFile in a bad state.

2014-05-22 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2024:
--

Component/s: log-aggregation

> IOException in AppLogAggregatorImpl does not give stacktrace and leaves 
> aggregated TFile in a bad state.
> 
>
> Key: YARN-2024
> URL: https://issues.apache.org/jira/browse/YARN-2024
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Eric Payne
>Priority: Critical
>
> Multiple issues were encountered when AppLogAggregatorImpl encountered an 
> IOException in AppLogAggregatorImpl#uploadLogsForContainer while aggregating 
> yarn-logs for an application that had very large (>150G each) error logs.
> - An IOException was encountered during the LogWriter#append call, and a 
> message was printed, but no stacktrace was provided. Message: "ERROR: 
> Couldn't upload logs for container_n_nnn_nn_nn. Skipping 
> this container."
> - After the IOExceptin, the TFile is in a bad state, so subsequent calls to 
> LogWriter#append fail with the following stacktrace:
> 2014-04-16 13:29:09,772 [LogAggregationService #17907] ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[LogAggregationService #17907,5,main] threw an Exception.
> java.lang.IllegalStateException: Incorrect state to start a new key: IN_VALUE
> at 
> org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:528)
> at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.append(AggregatedLogFormat.java:262)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:164)
> ...
> - At this point, the yarn-logs cleaner still thinks the thread is 
> aggregating, so the huge yarn-logs never get cleaned up for that application.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2024) IOException in AppLogAggregatorImpl does not give stacktrace and leaves aggregated TFile in a bad state.

2014-05-22 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2024:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-431

> IOException in AppLogAggregatorImpl does not give stacktrace and leaves 
> aggregated TFile in a bad state.
> 
>
> Key: YARN-2024
> URL: https://issues.apache.org/jira/browse/YARN-2024
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Eric Payne
>Priority: Critical
>
> Multiple issues were encountered when AppLogAggregatorImpl encountered an 
> IOException in AppLogAggregatorImpl#uploadLogsForContainer while aggregating 
> yarn-logs for an application that had very large (>150G each) error logs.
> - An IOException was encountered during the LogWriter#append call, and a 
> message was printed, but no stacktrace was provided. Message: "ERROR: 
> Couldn't upload logs for container_n_nnn_nn_nn. Skipping 
> this container."
> - After the IOExceptin, the TFile is in a bad state, so subsequent calls to 
> LogWriter#append fail with the following stacktrace:
> 2014-04-16 13:29:09,772 [LogAggregationService #17907] ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[LogAggregationService #17907,5,main] threw an Exception.
> java.lang.IllegalStateException: Incorrect state to start a new key: IN_VALUE
> at 
> org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:528)
> at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.append(AggregatedLogFormat.java:262)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:164)
> ...
> - At this point, the yarn-logs cleaner still thinks the thread is 
> aggregating, so the huge yarn-logs never get cleaned up for that application.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2082) Support for alternative log aggregation mechanism

2014-05-22 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006753#comment-14006753
 ] 

Vinod Kumar Vavilapalli commented on YARN-2082:
---

We should also consider some scalable solutions on HDFS itself - to post 
process the logs automatically to reduce the file-count and may be NMs forming 
a tree of aggregation (with network copy of logs) before hitting HDFS.

IAC, the pluggability is sort of a dup of the proposal at YARN-1440 (albeit for 
a different reason)?

> Support for alternative log aggregation mechanism
> -
>
> Key: YARN-2082
> URL: https://issues.apache.org/jira/browse/YARN-2082
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Ming Ma
>
> I will post a more detailed design later. Here is the brief summary and would 
> like to get early feedback.
> Problem Statement:
> Current implementation of log aggregation create one HDFS file for each 
> {application, nodemanager }. These files are relative small, in the range of 
> 1-2 MB. In a large cluster with lots of application and many nodemanagers, it 
> ends up creating lots of small files in HDFS. This creates pressure on HDFS 
> NN on the following ways.
> 1. It increases NN Memory size. It is mitigated by having history server 
> deletes old log files in HDFS.
> 2. Runtime RPC hit on HDFS. Each log aggregation file introduced several NN 
> RPCs such as create, getAdditionalBlock, complete, rename. When the cluster 
> is busy, such RPC hit has impact on NN performance.
> In addition, to support non-MR applications on YARN, we might need to support 
> aggregation for long running applications.
> Design choices:
> 1. Don't aggregate all the logs, as in YARN-221.
> 2. Create a dedicated HDFS namespace used only for log aggregation.
> 3. Write logs to some key-value store like HBase. HBase's RPC hit on NN will 
> be much less.
> 4. Decentralize the application level log aggregation to NMs. All logs for a 
> given application are aggregated first by a dedicated NM before it is pushed 
> to HDFS.
> 5. Have NM aggregate logs on a regular basis; each of these log files will 
> have data from different applications and there needs to be some index for 
> quick lookup.
> Proposal:
> 1. Make yarn log aggregation pluggable for both read and write path. Note 
> that Hadoop FileSystem provides an abstraction and we could ask alternative 
> log aggregator implement compatable FileSystem, but that seems to an overkill.
> 2. Provide a log aggregation plugin that write to HBase. The scheme design 
> needs to support efficient read on a per application as well as per 
> application+container basis; in addition, it shouldn't create hotspot in a 
> cluster where certain users might create more jobs than others. For example, 
> we can use hash($user+$applicationId} + containerid as the row key.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1545) [Umbrella] Prevent DoS of YARN components by putting in limits

2014-05-22 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006754#comment-14006754
 ] 

Hong Zhiguo commented on YARN-1545:
---

You mean we should define the upper bound of the number or length of fields 
inside the messages. Should we have these bounds configurable? or pre-defined 
as constants?

How about the rate of messages?  For example, a bad client performs query of 
getApplications at it's full speed.

> [Umbrella] Prevent DoS of YARN components by putting in limits
> --
>
> Key: YARN-1545
> URL: https://issues.apache.org/jira/browse/YARN-1545
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Vinod Kumar Vavilapalli
>
> I did a pass and found many places that can cause DoS on various YARN 
> services. Need to fix them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster

2014-05-22 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2073:
---

Attachment: yarn-2073-4.patch

Thanks for the review, Sandy. Updated the patch to reflect your suggestions 
except the Test refactoring. For the tests, it was easier to split and I think 
it is the right direction forward. If you don't mind, I would like to leave the 
patch as is. 

> FairScheduler starts preempting resources even with free resources on the 
> cluster
> -
>
> Key: YARN-2073
> URL: https://issues.apache.org/jira/browse/YARN-2073
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Attachments: yarn-2073-0.patch, yarn-2073-1.patch, yarn-2073-2.patch, 
> yarn-2073-3.patch, yarn-2073-4.patch
>
>
> Preemption should kick in only when the currently available slots don't match 
> the request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2049) Delegation token stuff for the timeline sever

2014-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006774#comment-14006774
 ] 

Hadoop QA commented on YARN-2049:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646431/YARN-2049.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

  org.apache.hadoop.yarn.client.TestRMAdminCLI

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3791//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3791//console

This message is automatically generated.

> Delegation token stuff for the timeline sever
> -
>
> Key: YARN-2049
> URL: https://issues.apache.org/jira/browse/YARN-2049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch, 
> YARN-2049.4.patch, YARN-2049.5.patch, YARN-2049.6.patch, YARN-2049.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1936) Secured timeline client

2014-05-22 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006776#comment-14006776
 ] 

Zhijie Shen commented on YARN-1936:
---

Vinod, thanks for review. See my response bellow:

bq. Make the event-put as one of the options "-put"

Good point. I make use of CommandLine to do simple CLI.

bq. Add delegation token only if timeline-service is enabled.

Added the check

bq. Also move this main to TimelineClientImpl

moved

bq. selectToken() can use a TimelineDelegationTokenSelector to find the token?

Use selector instead, and do some refactoring required.

bq. Can we add a simple test to validate the addition of the Delegation Token 
to the client credentials?

Added a test case


> Secured timeline client
> ---
>
> Key: YARN-1936
> URL: https://issues.apache.org/jira/browse/YARN-1936
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-1936.1.patch, YARN-1936.2.patch, YARN-1936.3.patch
>
>
> TimelineClient should be able to talk to the timeline server with kerberos 
> authentication or delegation token



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1936) Secured timeline client

2014-05-22 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1936:
--

Attachment: YARN-1936.3.patch

> Secured timeline client
> ---
>
> Key: YARN-1936
> URL: https://issues.apache.org/jira/browse/YARN-1936
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-1936.1.patch, YARN-1936.2.patch, YARN-1936.3.patch
>
>
> TimelineClient should be able to talk to the timeline server with kerberos 
> authentication or delegation token



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1545) [Umbrella] Prevent DoS of YARN components by putting in limits

2014-05-22 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006777#comment-14006777
 ] 

Vinod Kumar Vavilapalli commented on YARN-1545:
---

I covered the details on the individual tickets - it's mostly about bounding 
buffers, lists etc.

When I filed this I was only focusing on application level stuff. A bad client 
firing off RPCs in rapid fire can and should be addressed at in the RPC layer 
itself IMO.

> [Umbrella] Prevent DoS of YARN components by putting in limits
> --
>
> Key: YARN-1545
> URL: https://issues.apache.org/jira/browse/YARN-1545
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Vinod Kumar Vavilapalli
>
> I did a pass and found many places that can cause DoS on various YARN 
> services. Need to fix them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1474) Make schedulers services

2014-05-22 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006778#comment-14006778
 ] 

Tsuyoshi OZAWA commented on YARN-1474:
--

[~kkambatl],  could you kick the Jenkins and check the latest patch?

> Make schedulers services
> 
>
> Key: YARN-1474
> URL: https://issues.apache.org/jira/browse/YARN-1474
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Sandy Ryza
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1474.1.patch, YARN-1474.10.patch, 
> YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, 
> YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.16.patch, 
> YARN-1474.2.patch, YARN-1474.3.patch, YARN-1474.4.patch, YARN-1474.5.patch, 
> YARN-1474.6.patch, YARN-1474.7.patch, YARN-1474.8.patch, YARN-1474.9.patch
>
>
> Schedulers currently have a reinitialize but no start and stop.  Fitting them 
> into the YARN service model would make things more coherent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster

2014-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006781#comment-14006781
 ] 

Hadoop QA commented on YARN-2073:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646441/yarn-2073-4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3792//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3792//console

This message is automatically generated.

> FairScheduler starts preempting resources even with free resources on the 
> cluster
> -
>
> Key: YARN-2073
> URL: https://issues.apache.org/jira/browse/YARN-2073
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Attachments: yarn-2073-0.patch, yarn-2073-1.patch, yarn-2073-2.patch, 
> yarn-2073-3.patch, yarn-2073-4.patch
>
>
> Preemption should kick in only when the currently available slots don't match 
> the request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2081) TestDistributedShell fails after YARN-1962

2014-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006793#comment-14006793
 ] 

Hudson commented on YARN-2081:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #563 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/563/])
YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by 
Zhiguo Hong. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596724)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


> TestDistributedShell fails after YARN-1962
> --
>
> Key: YARN-2081
> URL: https://issues.apache.org/jira/browse/YARN-2081
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Affects Versions: 3.0.0, 2.4.1
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
> Fix For: 2.4.1
>
> Attachments: YARN-2081.patch
>
>
> java.lang.AssertionError: expected:<1> but was:<0>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1938) Kerberos authentication for the timeline server

2014-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006801#comment-14006801
 ] 

Hudson commented on YARN-1938:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #563 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/563/])
YARN-1938. Added kerberos login for the Timeline Server. Contributed by Zhijie 
Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596710)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java


> Kerberos authentication for the timeline server
> ---
>
> Key: YARN-1938
> URL: https://issues.apache.org/jira/browse/YARN-1938
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Fix For: 2.5.0
>
> Attachments: YARN-1938.1.patch, YARN-1938.2.patch, YARN-1938.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2089) FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations

2014-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006796#comment-14006796
 ] 

Hudson commented on YARN-2089:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #563 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/563/])
YARN-2089. FairScheduler: QueuePlacementPolicy and QueuePlacementRule are 
missing audience annotations. (Zhihai Xu via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596765)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java


> FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing 
> audience annotations
> ---
>
> Key: YARN-2089
> URL: https://issues.apache.org/jira/browse/YARN-2089
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.4.0
>Reporter: Anubhav Dhoot
>Assignee: zhihai xu
>  Labels: newbie
> Fix For: 2.5.0
>
> Attachments: yarn-2089.patch
>
>
> We should mark QueuePlacementPolicy and QueuePlacementRule with audience 
> annotations @Private  @Unstable



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers

2014-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006797#comment-14006797
 ] 

Hudson commented on YARN-2017:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #563 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/563/])
YARN-2017. Merged some of the common scheduler code. Contributed by Jian He. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596753)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplication.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
/hadoop/common/trunk/hadoop

[jira] [Commented] (YARN-1962) Timeline server is enabled by default

2014-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006798#comment-14006798
 ] 

Hudson commented on YARN-1962:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #563 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/563/])
YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by 
Zhiguo Hong. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596724)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


> Timeline server is enabled by default
> -
>
> Key: YARN-1962
> URL: https://issues.apache.org/jira/browse/YARN-1962
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.0
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Fix For: 2.4.1
>
> Attachments: YARN-1962.1.patch, YARN-1962.2.patch
>
>
> Since Timeline server is not matured and secured yet, enabling  it by default 
> might create some confusion.
> We were playing with 2.4.0 and found a lot of exceptions for distributed 
> shell example related to connection refused error. Btw, we didn't run TS 
> because it is not secured yet.
> Although it is possible to explicitly turn it off through yarn-site config. 
> In my opinion,  this extra change for this new service is not worthy at this 
> point,.  
> This JIRA is to turn it off by default.
> If there is an agreement, i can put a simple patch about this.
> {noformat}
> 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response 
> from the timeline server.
> com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: 
> Connection refused
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281)
> Caused by: java.net.ConnectException: Connection refused
>   at java.net.PlainSocketImpl.socketConnect(Native Method)
>   at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>   at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
>   at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>   at java.net.Socket.connect(Socket.java:579)
>   at java.net.Socket.connect(Socket.java:528)
>   at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
>   at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
>   at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
>   at sun.net.www.http.HttpClient. impl.TimelineClientImpl: Failed to get the response from the timeline server.
> com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: 
> Connection refused
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMa

[jira] [Commented] (YARN-2050) Fix LogCLIHelpers to create the correct FileContext

2014-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006794#comment-14006794
 ] 

Hudson commented on YARN-2050:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #563 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/563/])
YARN-2050. Fix LogCLIHelpers to create the correct FileContext. Contributed by 
Ming Ma (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596310)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java


> Fix LogCLIHelpers to create the correct FileContext
> ---
>
> Key: YARN-2050
> URL: https://issues.apache.org/jira/browse/YARN-2050
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 3.0.0, 2.5.0
>
> Attachments: YARN-2050-2.patch, YARN-2050.patch
>
>
> LogCLIHelpers calls FileContext.getFileContext() without any parameters. Thus 
> the FileContext created isn't necessarily the FileContext for remote log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2088) Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder

2014-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006845#comment-14006845
 ] 

Hadoop QA commented on YARN-2088:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646030/YARN-2088.v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3794//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3794//console

This message is automatically generated.

> Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder
> 
>
> Key: YARN-2088
> URL: https://issues.apache.org/jira/browse/YARN-2088
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: YARN-2088.v1.patch
>
>
> Some fields(set,list) are added to proto builders many times, we need to 
> clear those fields before add, otherwise the result proto contains more 
> contents.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2030) Use StateMachine to simplify handleStoreEvent() in RMStateStore

2014-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006858#comment-14006858
 ] 

Hadoop QA commented on YARN-2030:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645932/YARN-2030.v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3793//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3793//console

This message is automatically generated.

> Use StateMachine to simplify handleStoreEvent() in RMStateStore
> ---
>
> Key: YARN-2030
> URL: https://issues.apache.org/jira/browse/YARN-2030
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Junping Du
>Assignee: Binglin Chang
> Attachments: YARN-2030.v1.patch, YARN-2030.v2.patch
>
>
> Now the logic to handle different store events in handleStoreEvent() is as 
> following:
> {code}
> if (event.getType().equals(RMStateStoreEventType.STORE_APP)
> || event.getType().equals(RMStateStoreEventType.UPDATE_APP)) {
>   ...
>   if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
> ...
>   } else {
> ...
>   }
>   ...
>   try {
> if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
>   ...
> } else {
>   ...
> }
>   } 
>   ...
> } else if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)
> || event.getType().equals(RMStateStoreEventType.UPDATE_APP_ATTEMPT)) {
>   ...
>   if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) {
> ...
>   } else {
> ...
>   }
> ...
> if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) {
>   ...
> } else {
>   ...
> }
>   }
>   ...
> } else if (event.getType().equals(RMStateStoreEventType.REMOVE_APP)) {
> ...
> } else {
>   ...
> }
> }
> {code}
> This is not only confuse people but also led to mistake easily. We may 
> leverage state machine to simply this even no state transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1962) Timeline server is enabled by default

2014-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006871#comment-14006871
 ] 

Hudson commented on YARN-1962:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/])
YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by 
Zhiguo Hong. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596724)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


> Timeline server is enabled by default
> -
>
> Key: YARN-1962
> URL: https://issues.apache.org/jira/browse/YARN-1962
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.0
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Fix For: 2.4.1
>
> Attachments: YARN-1962.1.patch, YARN-1962.2.patch
>
>
> Since Timeline server is not matured and secured yet, enabling  it by default 
> might create some confusion.
> We were playing with 2.4.0 and found a lot of exceptions for distributed 
> shell example related to connection refused error. Btw, we didn't run TS 
> because it is not secured yet.
> Although it is possible to explicitly turn it off through yarn-site config. 
> In my opinion,  this extra change for this new service is not worthy at this 
> point,.  
> This JIRA is to turn it off by default.
> If there is an agreement, i can put a simple patch about this.
> {noformat}
> 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response 
> from the timeline server.
> com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: 
> Connection refused
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281)
> Caused by: java.net.ConnectException: Connection refused
>   at java.net.PlainSocketImpl.socketConnect(Native Method)
>   at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>   at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
>   at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>   at java.net.Socket.connect(Socket.java:579)
>   at java.net.Socket.connect(Socket.java:528)
>   at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
>   at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
>   at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
>   at sun.net.www.http.HttpClient. impl.TimelineClientImpl: Failed to get the response from the timeline server.
> com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: 
> Connection refused
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.A

[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers

2014-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006870#comment-14006870
 ] 

Hudson commented on YARN-2017:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/])
YARN-2017. Merged some of the common scheduler code. Contributed by Jian He. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596753)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplication.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
/hadoop/common/

[jira] [Commented] (YARN-2081) TestDistributedShell fails after YARN-1962

2014-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006866#comment-14006866
 ] 

Hudson commented on YARN-2081:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/])
YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by 
Zhiguo Hong. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596724)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


> TestDistributedShell fails after YARN-1962
> --
>
> Key: YARN-2081
> URL: https://issues.apache.org/jira/browse/YARN-2081
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Affects Versions: 3.0.0, 2.4.1
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
> Fix For: 2.4.1
>
> Attachments: YARN-2081.patch
>
>
> java.lang.AssertionError: expected:<1> but was:<0>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1938) Kerberos authentication for the timeline server

2014-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006874#comment-14006874
 ] 

Hudson commented on YARN-1938:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/])
YARN-1938. Added kerberos login for the Timeline Server. Contributed by Zhijie 
Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596710)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java


> Kerberos authentication for the timeline server
> ---
>
> Key: YARN-1938
> URL: https://issues.apache.org/jira/browse/YARN-1938
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Fix For: 2.5.0
>
> Attachments: YARN-1938.1.patch, YARN-1938.2.patch, YARN-1938.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2050) Fix LogCLIHelpers to create the correct FileContext

2014-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006867#comment-14006867
 ] 

Hudson commented on YARN-2050:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/])
YARN-2050. Fix LogCLIHelpers to create the correct FileContext. Contributed by 
Ming Ma (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596310)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java


> Fix LogCLIHelpers to create the correct FileContext
> ---
>
> Key: YARN-2050
> URL: https://issues.apache.org/jira/browse/YARN-2050
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 3.0.0, 2.5.0
>
> Attachments: YARN-2050-2.patch, YARN-2050.patch
>
>
> LogCLIHelpers calls FileContext.getFileContext() without any parameters. Thus 
> the FileContext created isn't necessarily the FileContext for remote log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2089) FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations

2014-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006869#comment-14006869
 ] 

Hudson commented on YARN-2089:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/])
YARN-2089. FairScheduler: QueuePlacementPolicy and QueuePlacementRule are 
missing audience annotations. (Zhihai Xu via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596765)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java


> FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing 
> audience annotations
> ---
>
> Key: YARN-2089
> URL: https://issues.apache.org/jira/browse/YARN-2089
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.4.0
>Reporter: Anubhav Dhoot
>Assignee: zhihai xu
>  Labels: newbie
> Fix For: 2.5.0
>
> Attachments: yarn-2089.patch
>
>
> We should mark QueuePlacementPolicy and QueuePlacementRule with audience 
> annotations @Private  @Unstable



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1962) Timeline server is enabled by default

2014-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006893#comment-14006893
 ] 

Hudson commented on YARN-1962:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/])
YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by 
Zhiguo Hong. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596724)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


> Timeline server is enabled by default
> -
>
> Key: YARN-1962
> URL: https://issues.apache.org/jira/browse/YARN-1962
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.0
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Fix For: 2.4.1
>
> Attachments: YARN-1962.1.patch, YARN-1962.2.patch
>
>
> Since Timeline server is not matured and secured yet, enabling  it by default 
> might create some confusion.
> We were playing with 2.4.0 and found a lot of exceptions for distributed 
> shell example related to connection refused error. Btw, we didn't run TS 
> because it is not secured yet.
> Although it is possible to explicitly turn it off through yarn-site config. 
> In my opinion,  this extra change for this new service is not worthy at this 
> point,.  
> This JIRA is to turn it off by default.
> If there is an agreement, i can put a simple patch about this.
> {noformat}
> 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response 
> from the timeline server.
> com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: 
> Connection refused
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281)
> Caused by: java.net.ConnectException: Connection refused
>   at java.net.PlainSocketImpl.socketConnect(Native Method)
>   at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>   at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
>   at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>   at java.net.Socket.connect(Socket.java:579)
>   at java.net.Socket.connect(Socket.java:528)
>   at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
>   at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
>   at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
>   at sun.net.www.http.HttpClient. impl.TimelineClientImpl: Failed to get the response from the timeline server.
> com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: 
> Connection refused
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.Application

[jira] [Commented] (YARN-2089) FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations

2014-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006891#comment-14006891
 ] 

Hudson commented on YARN-2089:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/])
YARN-2089. FairScheduler: QueuePlacementPolicy and QueuePlacementRule are 
missing audience annotations. (Zhihai Xu via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596765)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java


> FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing 
> audience annotations
> ---
>
> Key: YARN-2089
> URL: https://issues.apache.org/jira/browse/YARN-2089
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.4.0
>Reporter: Anubhav Dhoot
>Assignee: zhihai xu
>  Labels: newbie
> Fix For: 2.5.0
>
> Attachments: yarn-2089.patch
>
>
> We should mark QueuePlacementPolicy and QueuePlacementRule with audience 
> annotations @Private  @Unstable



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2081) TestDistributedShell fails after YARN-1962

2014-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006888#comment-14006888
 ] 

Hudson commented on YARN-2081:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/])
YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by 
Zhiguo Hong. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596724)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


> TestDistributedShell fails after YARN-1962
> --
>
> Key: YARN-2081
> URL: https://issues.apache.org/jira/browse/YARN-2081
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Affects Versions: 3.0.0, 2.4.1
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
> Fix For: 2.4.1
>
> Attachments: YARN-2081.patch
>
>
> java.lang.AssertionError: expected:<1> but was:<0>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2050) Fix LogCLIHelpers to create the correct FileContext

2014-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006889#comment-14006889
 ] 

Hudson commented on YARN-2050:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/])
YARN-2050. Fix LogCLIHelpers to create the correct FileContext. Contributed by 
Ming Ma (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596310)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java


> Fix LogCLIHelpers to create the correct FileContext
> ---
>
> Key: YARN-2050
> URL: https://issues.apache.org/jira/browse/YARN-2050
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 3.0.0, 2.5.0
>
> Attachments: YARN-2050-2.patch, YARN-2050.patch
>
>
> LogCLIHelpers calls FileContext.getFileContext() without any parameters. Thus 
> the FileContext created isn't necessarily the FileContext for remote log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1938) Kerberos authentication for the timeline server

2014-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006896#comment-14006896
 ] 

Hudson commented on YARN-1938:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/])
YARN-1938. Added kerberos login for the Timeline Server. Contributed by Zhijie 
Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596710)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java


> Kerberos authentication for the timeline server
> ---
>
> Key: YARN-1938
> URL: https://issues.apache.org/jira/browse/YARN-1938
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Fix For: 2.5.0
>
> Attachments: YARN-1938.1.patch, YARN-1938.2.patch, YARN-1938.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers

2014-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006892#comment-14006892
 ] 

Hudson commented on YARN-2017:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/])
YARN-2017. Merged some of the common scheduler code. Contributed by Jian He. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596753)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplication.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
/hadoop/common/trunk/hado

[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-05-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006908#comment-14006908
 ] 

Sunil G commented on YARN-1408:
---

bq. we may change it to decrement the resource request only when the container 
is pulled by the AM ?

As [~jianhe] mentioned, this can create problem with subsequent NM heartbeats. 
Also I agree that the container in ALLOCATED state is the best place to do 
preemption, but this raise condition can come there.
CapacityScheduler raises KILL event for RMContainer(for preemption). So a 
solution may be like recreate resource request back, if the RMContainer state 
is ALLOCATED/ACQUIRED here. 

> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
> Yarn-1408.4.patch, Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2096) testQueueMetricsOnRMRestart has race condition

2014-05-22 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-2096:
---

 Summary: testQueueMetricsOnRMRestart has race condition
 Key: YARN-2096
 URL: https://issues.apache.org/jira/browse/YARN-2096
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot


org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart
 fails randomly because of a race condition.
The test validates that metrics are incremented, but does not wait for all 
transitions to finish before checking for the values.
It also resets metrics after kicking off recovery of second RM. The metrics 
that need to be incremented race with this reset causing test to fail randomly.
We need to wait for the right transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2096) testQueueMetricsOnRMRestart has race condition

2014-05-22 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot reassigned YARN-2096:
---

Assignee: Anubhav Dhoot

> testQueueMetricsOnRMRestart has race condition
> --
>
> Key: YARN-2096
> URL: https://issues.apache.org/jira/browse/YARN-2096
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart
>  fails randomly because of a race condition.
> The test validates that metrics are incremented, but does not wait for all 
> transitions to finish before checking for the values.
> It also resets metrics after kicking off recovery of second RM. The metrics 
> that need to be incremented race with this reset causing test to fail 
> randomly.
> We need to wait for the right transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2096) testQueueMetricsOnRMRestart has race condition

2014-05-22 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2096:


Attachment: YARN-2096.patch

Fixed 2 race conditions by
First one) waiting for appropriate transitions before checking metrics and
 Second one) resetting metrics before the events are triggered.

> testQueueMetricsOnRMRestart has race condition
> --
>
> Key: YARN-2096
> URL: https://issues.apache.org/jira/browse/YARN-2096
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2096.patch
>
>
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart
>  fails randomly because of a race condition.
> The test validates that metrics are incremented, but does not wait for all 
> transitions to finish before checking for the values.
> It also resets metrics after kicking off recovery of second RM. The metrics 
> that need to be incremented race with this reset causing test to fail 
> randomly.
> We need to wait for the right transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)