[jira] [Updated] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2014-10-01 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated YARN-1964:
--
Attachment: YARN-1964.patch

Harmonized changes between yarn-default.xml and YarnConfiguration. Updated docs.

> Create Docker analog of the LinuxContainerExecutor in YARN
> --
>
> Key: YARN-1964
> URL: https://issues.apache.org/jira/browse/YARN-1964
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Abin Shahab
> Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch
>
>
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to *package* their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154522#comment-14154522
 ] 

Hadoop QA commented on YARN-1964:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672275/YARN-1964.patch
  against trunk revision 17d1202.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5194//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5194//console

This message is automatically generated.

> Create Docker analog of the LinuxContainerExecutor in YARN
> --
>
> Key: YARN-1964
> URL: https://issues.apache.org/jira/browse/YARN-1964
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Abin Shahab
> Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch
>
>
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to *package* their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2387) Resource Manager crashes with NPE due to lack of synchronization

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154687#comment-14154687
 ] 

Hudson commented on YARN-2387:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #697 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/697/])
YARN-2387. Resource Manager crashes with NPE due to lack of synchronization. 
Contributed by Mit Desai (jlowe: rev feaf139b4f327d33011e5a4424c06fb44c630955)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerStatusPBImpl.java


> Resource Manager crashes with NPE due to lack of synchronization
> 
>
> Key: YARN-2387
> URL: https://issues.apache.org/jira/browse/YARN-2387
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.5.0
>Reporter: Mit Desai
>Assignee: Mit Desai
>Priority: Blocker
> Fix For: 2.6.0
>
> Attachments: YARN-2387.patch, YARN-2387.patch, YARN-2387.patch
>
>
> We recently came across a 0.23 RM crashing with an NPE. Here is the 
> stacktrace for it.
> {noformat}
> 2014-08-06 05:56:52,165 [ResourceManager Event Processor] FATAL
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in
> handling event type NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToBuilder(ContainerStatusPBImpl.java:61)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToProto(ContainerStatusPBImpl.java:68)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:53)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:34)
> at
> org.apache.hadoop.yarn.api.records.ProtoBase.toString(ProtoBase.java:55)
> at java.lang.String.valueOf(String.java:2854)
> at java.lang.StringBuilder.append(StringBuilder.java:128)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerPBImpl.toString(ContainerPBImpl.java:353)
> at java.lang.String.valueOf(String.java:2854)
> at java.lang.StringBuilder.append(StringBuilder.java:128)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1405)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:790)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:602)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:688)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:82)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:339)
> at java.lang.Thread.run(Thread.java:722)
> 2014-08-06 05:56:52,166 [ResourceManager Event Processor] INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {noformat}
> On investigating a on the issue we found that the ContainerStatusPBImpl has 
> methods that are called by different threads and are not synchronized. Even 
> the 2.X code looks alike.
> We need to make these methods synchronized so that we do not encounter this 
> problem in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2610) Hamlet should close table tags

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154684#comment-14154684
 ] 

Hudson commented on YARN-2610:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #697 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/697/])
YARN-2610. Hamlet should close table tags. (Ray Chiang via kasha) (kasha: rev 
f7743dd07dfbe0dde9be71acfaba16ded52adba7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/hamlet/Hamlet.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/view/TestInfoBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/hamlet/TestHamlet.java


> Hamlet should close table tags
> --
>
> Key: YARN-2610
> URL: https://issues.apache.org/jira/browse/YARN-2610
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: supportability
> Fix For: 2.6.0
>
> Attachments: YARN-2610-01.patch, YARN-2610-02.patch
>
>
> Revisiting a subset of MAPREDUCE-2993.
> The , , , ,  tags are not configured to close 
> properly in Hamlet.  While this is allowed in HTML 4.01, missing closing 
> table tags tends to wreak havoc with a lot of HTML processors (although not 
> usually browsers).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154679#comment-14154679
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #697 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/697/])
YARN-2179. [YARN-1492] Initial cache manager structure and context. (Chris 
Trezzo via kasha) (kasha: rev 17d1202c35a1992eab66ea05dfd2baf219a17aec)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestRemoteAppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheStructureUtil.java
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/RemoteAppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/AppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/hadoop-yarn/bin/yarn


> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Chris Trezzo
>Priority: Critical
> Attachments: YARN-1492-all-trunk-v1.patch, 
> YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
> YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
> shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
> shared_cache_design_v5.pdf, shared_cache_design_v6.pdf
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154690#comment-14154690
 ] 

Hudson commented on YARN-2594:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #697 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/697/])
YARN-2594. Potential deadlock in RM when querying 
ApplicationResourceUsageReport. (Wangda Tan via kasha) (kasha: rev 
14d60dadc25b044a2887bf912ba5872367f2dffb)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java


> Potential deadlock in RM when querying ApplicationResourceUsageReport
> -
>
> Key: YARN-2594
> URL: https://issues.apache.org/jira/browse/YARN-2594
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 2.6.0
>
> Attachments: YARN-2594.patch, YARN-2594.patch, YARN-2594.patch
>
>
> ResoruceManager sometimes become un-responsive:
> There was in exception in ResourceManager log and contains only  following 
> type of messages:
> {code}
> 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
> 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
> 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
> 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
> 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
> 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
> 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2179) Initial cache manager structure and context

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154680#comment-14154680
 ] 

Hudson commented on YARN-2179:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #697 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/697/])
YARN-2179. [YARN-1492] Initial cache manager structure and context. (Chris 
Trezzo via kasha) (kasha: rev 17d1202c35a1992eab66ea05dfd2baf219a17aec)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestRemoteAppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheStructureUtil.java
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/RemoteAppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/AppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/hadoop-yarn/bin/yarn


> Initial cache manager structure and context
> ---
>
> Key: YARN-2179
> URL: https://issues.apache.org/jira/browse/YARN-2179
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-2179-trunk-v1.patch, YARN-2179-trunk-v10.patch, 
> YARN-2179-trunk-v2.patch, YARN-2179-trunk-v3.patch, YARN-2179-trunk-v4.patch, 
> YARN-2179-trunk-v5.patch, YARN-2179-trunk-v6.patch, YARN-2179-trunk-v7.patch, 
> YARN-2179-trunk-v8.patch, YARN-2179-trunk-v9.patch
>
>
> Implement the initial shared cache manager structure and context. The 
> SCMContext will be used by a number of manager services (i.e. the backing 
> store and the cleaner service). The AppChecker is used to gather the 
> currently running applications on SCM startup (necessary for an scm that is 
> backed by an in-memory store).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2602) Generic History Service of TimelineServer sometimes not able to handle NPE

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154686#comment-14154686
 ] 

Hudson commented on YARN-2602:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #697 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/697/])
YARN-2602. Fixed possible NPE in ApplicationHistoryManagerOnTimelineStore. 
Contributed by Zhijie Shen (jianhe: rev 
bbff96be48119774688981d04baf444639135977)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java


> Generic History Service of TimelineServer sometimes not able to handle NPE
> --
>
> Key: YARN-2602
> URL: https://issues.apache.org/jira/browse/YARN-2602
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0
> Environment: ATS is running with AHS/GHS enabled to use TimelineStore.
> Running for 4-5 days, with many random example jobs running
>Reporter: Karam Singh
>Assignee: Zhijie Shen
> Fix For: 2.6.0
>
> Attachments: YARN-2602.1.patch
>
>
> ATS is running with AHS/GHS enabled to use TimelineStore.
> Running for 4-5 day, with many random example jobs running .
> When I ran WS API for AHS/GHS:
> {code}
> curl --negotiate -u : 
> 'http:///v1/applicationhistory/apps/application_1411579118376_0001'
> {code}
> It ran successfully.
> However
> {code}
> curl --negotiate -u : 
> 'http:///ws/v1/applicationhistory/apps'
> {"exception":"WebApplicationException","message":"java.lang.NullPointerException","javaClassName":"javax.ws.rs.WebApplicationException"}
> {code}
> Failed with Internal server error 500.
> After looking at TimelineServer logs found that there was NPE:



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2627) Add logs when attemptFailuresValidityInterval is enabled

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154691#comment-14154691
 ] 

Hudson commented on YARN-2627:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #697 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/697/])
YARN-2627. Added the info logs of attemptFailuresValidityInterval and number of 
previous failed attempts. Contributed by Xuan Gong. (zjshen: rev 
9582a50176800433ad3fa8829a50c28b859812a3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* hadoop-yarn-project/CHANGES.txt


> Add logs when attemptFailuresValidityInterval is enabled
> 
>
> Key: YARN-2627
> URL: https://issues.apache.org/jira/browse/YARN-2627
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2627.1.patch, YARN-2627.2.patch
>
>
> After YARN-611, users can specify attemptFailuresValidityInterval for their 
> applications. This is for testing/debug purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2610) Hamlet should close table tags

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154839#comment-14154839
 ] 

Hudson commented on YARN-2610:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1888 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1888/])
YARN-2610. Hamlet should close table tags. (Ray Chiang via kasha) (kasha: rev 
f7743dd07dfbe0dde9be71acfaba16ded52adba7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/hamlet/Hamlet.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/hamlet/TestHamlet.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/view/TestInfoBlock.java
* hadoop-yarn-project/CHANGES.txt


> Hamlet should close table tags
> --
>
> Key: YARN-2610
> URL: https://issues.apache.org/jira/browse/YARN-2610
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: supportability
> Fix For: 2.6.0
>
> Attachments: YARN-2610-01.patch, YARN-2610-02.patch
>
>
> Revisiting a subset of MAPREDUCE-2993.
> The , , , ,  tags are not configured to close 
> properly in Hamlet.  While this is allowed in HTML 4.01, missing closing 
> table tags tends to wreak havoc with a lot of HTML processors (although not 
> usually browsers).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154833#comment-14154833
 ] 

Hudson commented on YARN-1492:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1888 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1888/])
YARN-2179. [YARN-1492] Initial cache manager structure and context. (Chris 
Trezzo via kasha) (kasha: rev 17d1202c35a1992eab66ea05dfd2baf219a17aec)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestRemoteAppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/AppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheStructureUtil.java
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/RemoteAppChecker.java
* hadoop-yarn-project/hadoop-yarn/bin/yarn
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/pom.xml
* hadoop-yarn-project/CHANGES.txt


> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Chris Trezzo
>Priority: Critical
> Attachments: YARN-1492-all-trunk-v1.patch, 
> YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
> YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
> shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
> shared_cache_design_v5.pdf, shared_cache_design_v6.pdf
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2602) Generic History Service of TimelineServer sometimes not able to handle NPE

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154841#comment-14154841
 ] 

Hudson commented on YARN-2602:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1888 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1888/])
YARN-2602. Fixed possible NPE in ApplicationHistoryManagerOnTimelineStore. 
Contributed by Zhijie Shen (jianhe: rev 
bbff96be48119774688981d04baf444639135977)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* hadoop-yarn-project/CHANGES.txt


> Generic History Service of TimelineServer sometimes not able to handle NPE
> --
>
> Key: YARN-2602
> URL: https://issues.apache.org/jira/browse/YARN-2602
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0
> Environment: ATS is running with AHS/GHS enabled to use TimelineStore.
> Running for 4-5 days, with many random example jobs running
>Reporter: Karam Singh
>Assignee: Zhijie Shen
> Fix For: 2.6.0
>
> Attachments: YARN-2602.1.patch
>
>
> ATS is running with AHS/GHS enabled to use TimelineStore.
> Running for 4-5 day, with many random example jobs running .
> When I ran WS API for AHS/GHS:
> {code}
> curl --negotiate -u : 
> 'http:///v1/applicationhistory/apps/application_1411579118376_0001'
> {code}
> It ran successfully.
> However
> {code}
> curl --negotiate -u : 
> 'http:///ws/v1/applicationhistory/apps'
> {"exception":"WebApplicationException","message":"java.lang.NullPointerException","javaClassName":"javax.ws.rs.WebApplicationException"}
> {code}
> Failed with Internal server error 500.
> After looking at TimelineServer logs found that there was NPE:



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154845#comment-14154845
 ] 

Hudson commented on YARN-2594:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1888 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1888/])
YARN-2594. Potential deadlock in RM when querying 
ApplicationResourceUsageReport. (Wangda Tan via kasha) (kasha: rev 
14d60dadc25b044a2887bf912ba5872367f2dffb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* hadoop-yarn-project/CHANGES.txt


> Potential deadlock in RM when querying ApplicationResourceUsageReport
> -
>
> Key: YARN-2594
> URL: https://issues.apache.org/jira/browse/YARN-2594
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 2.6.0
>
> Attachments: YARN-2594.patch, YARN-2594.patch, YARN-2594.patch
>
>
> ResoruceManager sometimes become un-responsive:
> There was in exception in ResourceManager log and contains only  following 
> type of messages:
> {code}
> 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
> 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
> 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
> 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
> 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
> 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
> 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2179) Initial cache manager structure and context

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154834#comment-14154834
 ] 

Hudson commented on YARN-2179:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1888 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1888/])
YARN-2179. [YARN-1492] Initial cache manager structure and context. (Chris 
Trezzo via kasha) (kasha: rev 17d1202c35a1992eab66ea05dfd2baf219a17aec)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestRemoteAppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/AppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheStructureUtil.java
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/RemoteAppChecker.java
* hadoop-yarn-project/hadoop-yarn/bin/yarn
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/pom.xml
* hadoop-yarn-project/CHANGES.txt


> Initial cache manager structure and context
> ---
>
> Key: YARN-2179
> URL: https://issues.apache.org/jira/browse/YARN-2179
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-2179-trunk-v1.patch, YARN-2179-trunk-v10.patch, 
> YARN-2179-trunk-v2.patch, YARN-2179-trunk-v3.patch, YARN-2179-trunk-v4.patch, 
> YARN-2179-trunk-v5.patch, YARN-2179-trunk-v6.patch, YARN-2179-trunk-v7.patch, 
> YARN-2179-trunk-v8.patch, YARN-2179-trunk-v9.patch
>
>
> Implement the initial shared cache manager structure and context. The 
> SCMContext will be used by a number of manager services (i.e. the backing 
> store and the cleaner service). The AppChecker is used to gather the 
> currently running applications on SCM startup (necessary for an scm that is 
> backed by an in-memory store).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2627) Add logs when attemptFailuresValidityInterval is enabled

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154846#comment-14154846
 ] 

Hudson commented on YARN-2627:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1888 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1888/])
YARN-2627. Added the info logs of attemptFailuresValidityInterval and number of 
previous failed attempts. Contributed by Xuan Gong. (zjshen: rev 
9582a50176800433ad3fa8829a50c28b859812a3)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java


> Add logs when attemptFailuresValidityInterval is enabled
> 
>
> Key: YARN-2627
> URL: https://issues.apache.org/jira/browse/YARN-2627
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2627.1.patch, YARN-2627.2.patch
>
>
> After YARN-611, users can specify attemptFailuresValidityInterval for their 
> applications. This is for testing/debug purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2387) Resource Manager crashes with NPE due to lack of synchronization

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154842#comment-14154842
 ] 

Hudson commented on YARN-2387:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1888 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1888/])
YARN-2387. Resource Manager crashes with NPE due to lack of synchronization. 
Contributed by Mit Desai (jlowe: rev feaf139b4f327d33011e5a4424c06fb44c630955)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerStatusPBImpl.java


> Resource Manager crashes with NPE due to lack of synchronization
> 
>
> Key: YARN-2387
> URL: https://issues.apache.org/jira/browse/YARN-2387
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.5.0
>Reporter: Mit Desai
>Assignee: Mit Desai
>Priority: Blocker
> Fix For: 2.6.0
>
> Attachments: YARN-2387.patch, YARN-2387.patch, YARN-2387.patch
>
>
> We recently came across a 0.23 RM crashing with an NPE. Here is the 
> stacktrace for it.
> {noformat}
> 2014-08-06 05:56:52,165 [ResourceManager Event Processor] FATAL
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in
> handling event type NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToBuilder(ContainerStatusPBImpl.java:61)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToProto(ContainerStatusPBImpl.java:68)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:53)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:34)
> at
> org.apache.hadoop.yarn.api.records.ProtoBase.toString(ProtoBase.java:55)
> at java.lang.String.valueOf(String.java:2854)
> at java.lang.StringBuilder.append(StringBuilder.java:128)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerPBImpl.toString(ContainerPBImpl.java:353)
> at java.lang.String.valueOf(String.java:2854)
> at java.lang.StringBuilder.append(StringBuilder.java:128)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1405)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:790)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:602)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:688)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:82)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:339)
> at java.lang.Thread.run(Thread.java:722)
> 2014-08-06 05:56:52,166 [ResourceManager Event Processor] INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {noformat}
> On investigating a on the issue we found that the ContainerStatusPBImpl has 
> methods that are called by different threads and are not synchronized. Even 
> the 2.X code looks alike.
> We need to make these methods synchronized so that we do not encounter this 
> problem in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2179) Initial cache manager structure and context

2014-10-01 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154851#comment-14154851
 ] 

Jason Lowe commented on YARN-2179:
--

The pom versions are incorrect in branch-2 from the cherry-pick.  The pom says 
3.0.0-SNAPSHOT, but it needs to be 2.6.0-SNAPSHOT in branch-2.

> Initial cache manager structure and context
> ---
>
> Key: YARN-2179
> URL: https://issues.apache.org/jira/browse/YARN-2179
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-2179-trunk-v1.patch, YARN-2179-trunk-v10.patch, 
> YARN-2179-trunk-v2.patch, YARN-2179-trunk-v3.patch, YARN-2179-trunk-v4.patch, 
> YARN-2179-trunk-v5.patch, YARN-2179-trunk-v6.patch, YARN-2179-trunk-v7.patch, 
> YARN-2179-trunk-v8.patch, YARN-2179-trunk-v9.patch
>
>
> Implement the initial shared cache manager structure and context. The 
> SCMContext will be used by a number of manager services (i.e. the backing 
> store and the cleaner service). The AppChecker is used to gather the 
> currently running applications on SCM startup (necessary for an scm that is 
> backed by an in-memory store).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2633) TestContainerLauncherImpl sometimes fails

2014-10-01 Thread Mit Desai (JIRA)
Mit Desai created YARN-2633:
---

 Summary: TestContainerLauncherImpl sometimes fails
 Key: YARN-2633
 URL: https://issues.apache.org/jira/browse/YARN-2633
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Mit Desai
Assignee: Mit Desai


{noformat}
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.lang.NoSuchMethodException: 
org.apache.hadoop.yarn.api.ContainerManagementProtocol$$EnhancerByMockitoWithCGLIB$$25708415.close()
at java.lang.Class.getMethod(Class.java:1665)
at 
org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.stopClient(RpcClientFactoryPBImpl.java:90)
at 
org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.stopProxy(HadoopYarnProtoRPC.java:54)
at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.mayBeCloseProxy(ContainerManagementProtocolProxy.java:79)
at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.kill(ContainerLauncherImpl.java:225)
at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.shutdownAllContainers(ContainerLauncherImpl.java:320)
at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.serviceStop(ContainerLauncherImpl.java:331)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at 
org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncherImpl.testMyShutdown(TestContainerLauncherImpl.java:315)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2179) Initial cache manager structure and context

2014-10-01 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154872#comment-14154872
 ] 

Karthik Kambatla commented on YARN-2179:


Thanks for catching it, Jason. Just pushed another commit fixing the pom 
version in sharedcachemanager. 

> Initial cache manager structure and context
> ---
>
> Key: YARN-2179
> URL: https://issues.apache.org/jira/browse/YARN-2179
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-2179-trunk-v1.patch, YARN-2179-trunk-v10.patch, 
> YARN-2179-trunk-v2.patch, YARN-2179-trunk-v3.patch, YARN-2179-trunk-v4.patch, 
> YARN-2179-trunk-v5.patch, YARN-2179-trunk-v6.patch, YARN-2179-trunk-v7.patch, 
> YARN-2179-trunk-v8.patch, YARN-2179-trunk-v9.patch
>
>
> Implement the initial shared cache manager structure and context. The 
> SCMContext will be used by a number of manager services (i.e. the backing 
> store and the cleaner service). The AppChecker is used to gather the 
> currently running applications on SCM startup (necessary for an scm that is 
> backed by an in-memory store).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2627) Add logs when attemptFailuresValidityInterval is enabled

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154914#comment-14154914
 ] 

Hudson commented on YARN-2627:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1913 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1913/])
YARN-2627. Added the info logs of attemptFailuresValidityInterval and number of 
previous failed attempts. Contributed by Xuan Gong. (zjshen: rev 
9582a50176800433ad3fa8829a50c28b859812a3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* hadoop-yarn-project/CHANGES.txt


> Add logs when attemptFailuresValidityInterval is enabled
> 
>
> Key: YARN-2627
> URL: https://issues.apache.org/jira/browse/YARN-2627
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2627.1.patch, YARN-2627.2.patch
>
>
> After YARN-611, users can specify attemptFailuresValidityInterval for their 
> applications. This is for testing/debug purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2179) Initial cache manager structure and context

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154903#comment-14154903
 ] 

Hudson commented on YARN-2179:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1913 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1913/])
YARN-2179. [YARN-1492] Initial cache manager structure and context. (Chris 
Trezzo via kasha) (kasha: rev 17d1202c35a1992eab66ea05dfd2baf219a17aec)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/AppChecker.java
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/RemoteAppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheStructureUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/hadoop-yarn/bin/yarn
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestRemoteAppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt


> Initial cache manager structure and context
> ---
>
> Key: YARN-2179
> URL: https://issues.apache.org/jira/browse/YARN-2179
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-2179-trunk-v1.patch, YARN-2179-trunk-v10.patch, 
> YARN-2179-trunk-v2.patch, YARN-2179-trunk-v3.patch, YARN-2179-trunk-v4.patch, 
> YARN-2179-trunk-v5.patch, YARN-2179-trunk-v6.patch, YARN-2179-trunk-v7.patch, 
> YARN-2179-trunk-v8.patch, YARN-2179-trunk-v9.patch
>
>
> Implement the initial shared cache manager structure and context. The 
> SCMContext will be used by a number of manager services (i.e. the backing 
> store and the cleaner service). The AppChecker is used to gather the 
> currently running applications on SCM startup (necessary for an scm that is 
> backed by an in-memory store).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2387) Resource Manager crashes with NPE due to lack of synchronization

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154910#comment-14154910
 ] 

Hudson commented on YARN-2387:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1913 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1913/])
YARN-2387. Resource Manager crashes with NPE due to lack of synchronization. 
Contributed by Mit Desai (jlowe: rev feaf139b4f327d33011e5a4424c06fb44c630955)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerStatusPBImpl.java


> Resource Manager crashes with NPE due to lack of synchronization
> 
>
> Key: YARN-2387
> URL: https://issues.apache.org/jira/browse/YARN-2387
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.5.0
>Reporter: Mit Desai
>Assignee: Mit Desai
>Priority: Blocker
> Fix For: 2.6.0
>
> Attachments: YARN-2387.patch, YARN-2387.patch, YARN-2387.patch
>
>
> We recently came across a 0.23 RM crashing with an NPE. Here is the 
> stacktrace for it.
> {noformat}
> 2014-08-06 05:56:52,165 [ResourceManager Event Processor] FATAL
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in
> handling event type NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToBuilder(ContainerStatusPBImpl.java:61)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToProto(ContainerStatusPBImpl.java:68)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:53)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:34)
> at
> org.apache.hadoop.yarn.api.records.ProtoBase.toString(ProtoBase.java:55)
> at java.lang.String.valueOf(String.java:2854)
> at java.lang.StringBuilder.append(StringBuilder.java:128)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerPBImpl.toString(ContainerPBImpl.java:353)
> at java.lang.String.valueOf(String.java:2854)
> at java.lang.StringBuilder.append(StringBuilder.java:128)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1405)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:790)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:602)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:688)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:82)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:339)
> at java.lang.Thread.run(Thread.java:722)
> 2014-08-06 05:56:52,166 [ResourceManager Event Processor] INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {noformat}
> On investigating a on the issue we found that the ContainerStatusPBImpl has 
> methods that are called by different threads and are not synchronized. Even 
> the 2.X code looks alike.
> We need to make these methods synchronized so that we do not encounter this 
> problem in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154902#comment-14154902
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1913 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1913/])
YARN-2179. [YARN-1492] Initial cache manager structure and context. (Chris 
Trezzo via kasha) (kasha: rev 17d1202c35a1992eab66ea05dfd2baf219a17aec)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestRemoteAppChecker.java
* hadoop-yarn-project/hadoop-yarn/bin/yarn
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheStructureUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/AppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/RemoteAppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Chris Trezzo
>Priority: Critical
> Attachments: YARN-1492-all-trunk-v1.patch, 
> YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
> YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
> shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
> shared_cache_design_v5.pdf, shared_cache_design_v6.pdf
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2610) Hamlet should close table tags

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154907#comment-14154907
 ] 

Hudson commented on YARN-2610:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1913 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1913/])
YARN-2610. Hamlet should close table tags. (Ray Chiang via kasha) (kasha: rev 
f7743dd07dfbe0dde9be71acfaba16ded52adba7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/hamlet/TestHamlet.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/view/TestInfoBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/hamlet/Hamlet.java


> Hamlet should close table tags
> --
>
> Key: YARN-2610
> URL: https://issues.apache.org/jira/browse/YARN-2610
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: supportability
> Fix For: 2.6.0
>
> Attachments: YARN-2610-01.patch, YARN-2610-02.patch
>
>
> Revisiting a subset of MAPREDUCE-2993.
> The , , , ,  tags are not configured to close 
> properly in Hamlet.  While this is allowed in HTML 4.01, missing closing 
> table tags tends to wreak havoc with a lot of HTML processors (although not 
> usually browsers).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2602) Generic History Service of TimelineServer sometimes not able to handle NPE

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154909#comment-14154909
 ] 

Hudson commented on YARN-2602:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1913 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1913/])
YARN-2602. Fixed possible NPE in ApplicationHistoryManagerOnTimelineStore. 
Contributed by Zhijie Shen (jianhe: rev 
bbff96be48119774688981d04baf444639135977)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java


> Generic History Service of TimelineServer sometimes not able to handle NPE
> --
>
> Key: YARN-2602
> URL: https://issues.apache.org/jira/browse/YARN-2602
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0
> Environment: ATS is running with AHS/GHS enabled to use TimelineStore.
> Running for 4-5 days, with many random example jobs running
>Reporter: Karam Singh
>Assignee: Zhijie Shen
> Fix For: 2.6.0
>
> Attachments: YARN-2602.1.patch
>
>
> ATS is running with AHS/GHS enabled to use TimelineStore.
> Running for 4-5 day, with many random example jobs running .
> When I ran WS API for AHS/GHS:
> {code}
> curl --negotiate -u : 
> 'http:///v1/applicationhistory/apps/application_1411579118376_0001'
> {code}
> It ran successfully.
> However
> {code}
> curl --negotiate -u : 
> 'http:///ws/v1/applicationhistory/apps'
> {"exception":"WebApplicationException","message":"java.lang.NullPointerException","javaClassName":"javax.ws.rs.WebApplicationException"}
> {code}
> Failed with Internal server error 500.
> After looking at TimelineServer logs found that there was NPE:



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154913#comment-14154913
 ] 

Hudson commented on YARN-2594:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1913 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1913/])
YARN-2594. Potential deadlock in RM when querying 
ApplicationResourceUsageReport. (Wangda Tan via kasha) (kasha: rev 
14d60dadc25b044a2887bf912ba5872367f2dffb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* hadoop-yarn-project/CHANGES.txt


> Potential deadlock in RM when querying ApplicationResourceUsageReport
> -
>
> Key: YARN-2594
> URL: https://issues.apache.org/jira/browse/YARN-2594
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 2.6.0
>
> Attachments: YARN-2594.patch, YARN-2594.patch, YARN-2594.patch
>
>
> ResoruceManager sometimes become un-responsive:
> There was in exception in ResourceManager log and contains only  following 
> type of messages:
> {code}
> 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
> 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
> 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
> 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
> 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
> 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
> 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2613) NMClient doesn't have retries for supporting rolling-upgrades

2014-10-01 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154931#comment-14154931
 ] 

Junping Du commented on YARN-2613:
--

+1. Patch looks good to me. Will commit it shortly.

> NMClient doesn't have retries for supporting rolling-upgrades
> -
>
> Key: YARN-2613
> URL: https://issues.apache.org/jira/browse/YARN-2613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2613.1.patch, YARN-2613.2.patch, YARN-2613.3.patch
>
>
> While NM is rolling upgrade, client should retry NM until it comes up. This 
> jira is to add a NMProxy (similar to RMProxy) with retry implementation to 
> support rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2180) In-memory backing store for cache manager

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154934#comment-14154934
 ] 

Hadoop QA commented on YARN-2180:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12672206/YARN-2180-trunk-v6.patch
  against trunk revision 17d1202.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5195//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5195//console

This message is automatically generated.

> In-memory backing store for cache manager
> -
>
> Key: YARN-2180
> URL: https://issues.apache.org/jira/browse/YARN-2180
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2180-trunk-v1.patch, YARN-2180-trunk-v2.patch, 
> YARN-2180-trunk-v3.patch, YARN-2180-trunk-v4.patch, YARN-2180-trunk-v5.patch, 
> YARN-2180-trunk-v6.patch
>
>
> Implement an in-memory backing store for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2634) Test failure for TestClientRMTokens

2014-10-01 Thread Junping Du (JIRA)
Junping Du created YARN-2634:


 Summary: Test failure for TestClientRMTokens
 Key: YARN-2634
 URL: https://issues.apache.org/jira/browse/YARN-2634
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Junping Du


The test get failed as below:
{noformat}
---
Test set: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
---
Tests run: 6, Failures: 3, Errors: 2, Skipped: 0, Time elapsed: 60.184 sec <<< 
FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
testShortCircuitRenewCancelDifferentHostSamePort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
  Time elapsed: 22.693 sec  <<< FAILURE!
java.lang.AssertionError: expected: but was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostSamePort(TestClientRMTokens.java:272)

testShortCircuitRenewCancelDifferentHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
  Time elapsed: 20.087 sec  <<< FAILURE!
java.lang.AssertionError: expected: but was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostDifferentPort(TestClientRMTokens.java:283)

testShortCircuitRenewCancel(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
  Time elapsed: 0.031 sec  <<< ERROR!
java.lang.NullPointerException: null
at 
org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:148)
at 
org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.renew(RMDelegationTokenIdentifier.java:101)
at org.apache.hadoop.security.token.Token.renew(Token.java:377)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:309)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancel(TestClientRMTokens.java:241)

testShortCircuitRenewCancelSameHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
  Time elapsed: 0.061 sec  <<< FAILURE!
java.lang.AssertionError: expected: but was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelSameHostDifferentPort(TestClientRMTokens.java:261)

testShortCircuitRenewCancelWildcardAddress(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
  Time elapsed: 0.07 sec  <<< ERROR!
java.lang.NullPointerException: null
at org.apache.hadoop.net.NetUtils.isLocalAddress(NetUtils.java:684)
at 
org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:149)

   
1,1   Top

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2634) Test failure for TestClientRMTokens

2014-10-01 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2634:
-
Target Version/s: 2.6.0

> Test failure for TestClientRMTokens
> ---
>
> Key: YARN-2634
> URL: https://issues.apache.org/jira/browse/YARN-2634
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Junping Du
>
> The test get failed as below:
> {noformat}
> ---
> Test set: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
> ---
> Tests run: 6, Failures: 3, Errors: 2, Skipped: 0, Time elapsed: 60.184 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
> testShortCircuitRenewCancelDifferentHostSamePort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 22.693 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostSamePort(TestClientRMTokens.java:272)
> testShortCircuitRenewCancelDifferentHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 20.087 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostDifferentPort(TestClientRMTokens.java:283)
> testShortCircuitRenewCancel(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 0.031 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:148)
> at 
> org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.renew(RMDelegationTokenIdentifier.java:101)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:309)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancel(TestClientRMTokens.java:241)
> testShortCircuitRenewCancelSameHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 0.061 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelSameHostDifferentPort(TestClientRMTokens.java:261)
> testShortCircuitRenewCancelWildcardAddress(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 0.07 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at org.apache.hadoop.net.NetUtils.isLocalAddress(NetUtils.java:684)
> at 
> org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:149)
>   
>   
>1,1   Top
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2545) RMApp should transit to FAILED when AM calls finishApplicationMaster with FAILED

2014-10-01 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154946#comment-14154946
 ] 

Hong Zhiguo commented on YARN-2545:
---

How about the state of appAttempt? should it finally be FAILED instead of 
FINISHED?

> RMApp should transit to FAILED when AM calls finishApplicationMaster with 
> FAILED
> 
>
> Key: YARN-2545
> URL: https://issues.apache.org/jira/browse/YARN-2545
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
>
> If AM calls finishApplicationMaster with getFinalApplicationStatus()==FAILED, 
> and then exits, the corresponding RMApp and RMAppAttempt transit to state 
> FINISHED.
> I think this is wrong and confusing. On RM WebUI, this application is 
> displayed as "State=FINISHED, FinalStatus=FAILED", and is counted as "Apps 
> Completed", not as "Apps Failed".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields

2014-10-01 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2615:
-
Attachment: YARN-2615.patch

Upload the first patch, include the changes on ClientToAMTokenIdentifier (and 
test), RMDelegationTokenIdentifier and TimelineDelegationTokenIdentifier. The 
compatibility tests for RMDelegationTokenIdentifier haven't been completed due 
to test failures on TestClientRMTokens failed on trunk (without code here), 
filed YARN-2634 to fix it before get test in.

> ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended 
> fields
> 
>
> Key: YARN-2615
> URL: https://issues.apache.org/jira/browse/YARN-2615
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-2615.patch
>
>
> As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier 
> and DelegationTokenIdentifier should also be updated in the same way to allow 
> fields get extended in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2634) Test failure for TestClientRMTokens

2014-10-01 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2634:
-
Priority: Blocker  (was: Major)

> Test failure for TestClientRMTokens
> ---
>
> Key: YARN-2634
> URL: https://issues.apache.org/jira/browse/YARN-2634
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Junping Du
>Priority: Blocker
>
> The test get failed as below:
> {noformat}
> ---
> Test set: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
> ---
> Tests run: 6, Failures: 3, Errors: 2, Skipped: 0, Time elapsed: 60.184 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
> testShortCircuitRenewCancelDifferentHostSamePort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 22.693 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostSamePort(TestClientRMTokens.java:272)
> testShortCircuitRenewCancelDifferentHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 20.087 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostDifferentPort(TestClientRMTokens.java:283)
> testShortCircuitRenewCancel(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 0.031 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:148)
> at 
> org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.renew(RMDelegationTokenIdentifier.java:101)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:309)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancel(TestClientRMTokens.java:241)
> testShortCircuitRenewCancelSameHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 0.061 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelSameHostDifferentPort(TestClientRMTokens.java:261)
> testShortCircuitRenewCancelWildcardAddress(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 0.07 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at org.apache.hadoop.net.NetUtils.isLocalAddress(NetUtils.java:684)
> at 
> org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:149)
>   
>   
>1,1   Top
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-01 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-913:

Attachment: YARN-913-015.patch

parch -15; this is patch -14 rebased against trunk with a conflict fixed

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
> YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
> YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
> YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, yarnregistry.pdf, 
> yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user

2014-10-01 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155093#comment-14155093
 ] 

Vinod Kumar Vavilapalli commented on YARN-1063:
---

Tx for the updates [~rusanu]! I am committing this now to unblock the follow up 
patches, trusting [~ivanmi]'s reviews on the Windows side of things.

> Winutils needs ability to create task as domain user
> 
>
> Key: YARN-1063
> URL: https://issues.apache.org/jira/browse/YARN-1063
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
> Environment: Windows
>Reporter: Kyle Leckie
>Assignee: Remus Rusanu
>  Labels: security, windows
> Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, 
> YARN-1063.5.patch, YARN-1063.6.patch, YARN-1063.patch
>
>
> h1. Summary:
> Securing a Hadoop cluster requires constructing some form of security 
> boundary around the processes executed in YARN containers. Isolation based on 
> Windows user isolation seems most feasible. This approach is similar to the 
> approach taken by the existing LinuxContainerExecutor. The current patch to 
> winutils.exe adds the ability to create a process as a domain user. 
> h1. Alternative Methods considered:
> h2. Process rights limited by security token restriction:
> On Windows access decisions are made by examining the security token of a 
> process. It is possible to spawn a process with a restricted security token. 
> Any of the rights granted by SIDs of the default token may be restricted. It 
> is possible to see this in action by examining the security tone of a 
> sandboxed process launch be a web browser. Typically the launched process 
> will have a fully restricted token and need to access machine resources 
> through a dedicated broker process that enforces a custom security policy. 
> This broker process mechanism would break compatibility with the typical 
> Hadoop container process. The Container process must be able to utilize 
> standard function calls for disk and network IO. I performed some work 
> looking at ways to ACL the local files to the specific launched without 
> granting rights to other processes launched on the same machine but found 
> this to be an overly complex solution. 
> h2. Relying on APP containers:
> Recent versions of windows have the ability to launch processes within an 
> isolated container. Application containers are supported for execution of 
> WinRT based executables. This method was ruled out due to the lack of 
> official support for standard windows APIs. At some point in the future 
> windows may support functionality similar to BSD jails or Linux containers, 
> at that point support for containers should be added.
> h1. Create As User Feature Description:
> h2. Usage:
> A new sub command was added to the set of task commands. Here is the syntax:
> winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE]
> Some notes:
> * The username specified is in the format of "user@domain"
> * The machine executing this command must be joined to the domain of the user 
> specified
> * The domain controller must allow the account executing the command access 
> to the user information. For this join the account to the predefined group 
> labeled "Pre-Windows 2000 Compatible Access"
> * The account running the command must have several rights on the local 
> machine. These can be managed manually using secpol.msc: 
> ** "Act as part of the operating system" - SE_TCB_NAME
> ** "Replace a process-level token" - SE_ASSIGNPRIMARYTOKEN_NAME
> ** "Adjust memory quotas for a process" - SE_INCREASE_QUOTA_NAME
> * The launched process will not have rights to the desktop so will not be 
> able to display any information or create UI.
> * The launched process will have no network credentials. Any access of 
> network resources that requires domain authentication will fail.
> h2. Implementation:
> Winutils performs the following steps:
> # Enable the required privileges for the current process.
> # Register as a trusted process with the Local Security Authority (LSA).
> # Create a new logon for the user passed on the command line.
> # Load/Create a profile on the local machine for the new logon.
> # Create a new environment for the new logon.
> # Launch the new process in a job with the task name specified and using the 
> created logon.
> # Wait for the JOB to exit.
> h2. Future work:
> The following work was scoped out of this check in:
> * Support for non-domain users or machine that are not domain joined.
> * Support for privilege isolation by running the task launcher in a high 
> privilege service with access over an ACLed named pipe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2616) Add CLI client to the registry to list/view entries

2014-10-01 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155098#comment-14155098
 ] 

Steve Loughran commented on YARN-2616:
--

thanks.

I'm going to pull this down into the main YARN-913 patch & sync up with 
changes, but will then post the patch here for it to be reviewed/completed in 
isolation.

# I'll set things up for tests to go in, though I won't do the tests...I'll 
leave that as half the challenge.
# Here's my evolving  [[Updated Hadoop style 
guide|https://github.com/steveloughran/formality/blob/master/styleguide/styleguide.md]]

> Add CLI client to the registry to list/view entries
> ---
>
> Key: YARN-2616
> URL: https://issues.apache.org/jira/browse/YARN-2616
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Reporter: Steve Loughran
>Assignee: Akshay Radia
> Attachments: yarn-2616-v1.patch, yarn-2616-v2.patch
>
>
> registry needs a CLI interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2014-10-01 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2624:
---
 Priority: Blocker  (was: Major)
 Target Version/s: 2.6.0
Affects Version/s: 2.5.1

> Resource Localization fails on a cluster due to existing cache directories
> --
>
> Key: YARN-2624
> URL: https://issues.apache.org/jira/browse/YARN-2624
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Blocker
>
> We have found resource localization fails on a cluster with following error 
> in certain cases.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Failed to download rsrc { { 
> hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
>  1412027745352, FILE, null 
> },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
> java.io.IOException: Rename cannot overwrite non empty destination directory 
> /data/yarn/nm/filecache/27
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
>   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
>   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2632) Document NM Restart feature

2014-10-01 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2632:
---
Priority: Blocker  (was: Major)

Marking this a blocker to ensure we don't miss it in 2.6. 

> Document NM Restart feature
> ---
>
> Key: YARN-2632
> URL: https://issues.apache.org/jira/browse/YARN-2632
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Junping Du
>Priority: Blocker
>
> As a new feature to YARN, we should document this feature's behavior, 
> configuration, and things to pay attention.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1972) Implement secure Windows Container Executor

2014-10-01 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1972:
--
Issue Type: Sub-task  (was: Improvement)
Parent: YARN-732

> Implement secure Windows Container Executor
> ---
>
> Key: YARN-1972
> URL: https://issues.apache.org/jira/browse/YARN-1972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: security, windows
> Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, 
> YARN-1972.delta.4.patch, YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, 
> YARN-1972.trunk.5.patch
>
>
> h1. Windows Secure Container Executor (WCE)
> YARN-1063 adds the necessary infrasturcture to launch a process as a domain 
> user as a solution for the problem of having a security boundary between 
> processes executed in YARN containers and the Hadoop services. The WCE is a 
> container executor that leverages the winutils capabilities introduced in 
> YARN-1063 and launches containers as an OS process running as the job 
> submitter user. A description of the S4U infrastructure used by YARN-1063 
> alternatives considered can be read on that JIRA.
> The WCE is based on the DefaultContainerExecutor. It relies on the DCE to 
> drive the flow of execution, but it overwrrides some emthods to the effect of:
> * change the DCE created user cache directories to be owned by the job user 
> and by the nodemanager group.
> * changes the actual container run command to use the 'createAsUser' command 
> of winutils task instead of 'create'
> * runs the localization as standalone process instead of an in-process Java 
> method call. This in turn relies on the winutil createAsUser feature to run 
> the localization as the job user.
>  
> When compared to LinuxContainerExecutor (LCE), the WCE has some minor 
> differences:
> * it does no delegate the creation of the user cache directories to the 
> native implementation.
> * it does no require special handling to be able to delete user files
> The approach on the WCE came from a practical trial-and-error approach. I had 
> to iron out some issues around the Windows script shell limitations (command 
> line length) to get it to work, the biggest issue being the huge CLASSPATH 
> that is commonplace in Hadoop environment container executions. The job 
> container itself is already dealing with this via a so called 'classpath 
> jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch 
> as a separate container the same issue had to be resolved and I used the same 
> 'classpath jar' approach.
> h2. Deployment Requirements
> To use the WCE one needs to set the 
> `yarn.nodemanager.container-executor.class` to 
> `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` 
> and set the `yarn.nodemanager.windows-secure-container-executor.group` to a 
> Windows security group name that is the nodemanager service principal is a 
> member of (equivalent of LCE 
> `yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE 
> does not require any configuration outside of the Hadoop own's yar-site.xml.
> For WCE to work the nodemanager must run as a service principal that is 
> member of the local Administrators group or LocalSystem. this is derived from 
> the need to invoke LoadUserProfile API which mention these requirements in 
> the specifications. This is in addition to the SE_TCB privilege mentioned in 
> YARN-1063, but this requirement will automatically imply that the SE_TCB 
> privilege is held by the nodemanager. For the Linux speakers in the audience, 
> the requirement is basically to run NM as root.
> h2. Dedicated high privilege Service
> Due to the high privilege required by the WCE we had discussed the need to 
> isolate the high privilege operations into a separate process, an 'executor' 
> service that is solely responsible to start the containers (incloding the 
> localizer). The NM would have to authenticate, authorize and communicate with 
> this service via an IPC mechanism and use this service to launch the 
> containers. I still believe we'll end up deploying such a service, but the 
> effort to onboard such a new platfrom specific new service on the project are 
> not trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-732) YARN support for container isolation on Windows

2014-10-01 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-732:
-
Fix Version/s: (was: trunk-win)

> YARN support for container isolation on Windows
> ---
>
> Key: YARN-732
> URL: https://issues.apache.org/jira/browse/YARN-732
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: trunk-win
>Reporter: Kyle Leckie
>  Labels: security
> Attachments: winutils.diff
>
>
> There is no ContainerExecutor on windows that can launch containers in a 
> manner that creates:
> 1) container isolation
> 2) container execution with reduced rights
> I am working on patches that will add the ability to launch containers in a 
> process with a reduced access token. 
> Update: After examining several approaches I have settled on launching the 
> task as a domain user. I have attached the current winutils diff which is a 
> work in progress. 
> Work remaining:
> - Create isolated desktop for task processes.
> - Set integrity of spawned processed to low.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2129) Add scheduling priority to the WindowsSecureContainerExecutor

2014-10-01 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2129:
--
Issue Type: Sub-task  (was: Improvement)
Parent: YARN-732

> Add scheduling priority to the WindowsSecureContainerExecutor
> -
>
> Key: YARN-2129
> URL: https://issues.apache.org/jira/browse/YARN-2129
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: security, windows
> Attachments: YARN-2129.1.patch, YARN-2129.2.patch
>
>
> The WCE (YARN-1972) could and should honor 
> NM_CONTAINER_EXECUTOR_SCHED_PRIORITY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155116#comment-14155116
 ] 

Hudson commented on YARN-1063:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6164 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6164/])
YARN-1063. Augmented Hadoop common winutils to have the ability to create 
containers as domain users. Contributed by Remus Rusanu. (vinodkv: rev 
5ca97f1e60b8a7848f6eadd15f6c08ed390a8cda)
* hadoop-yarn-project/CHANGES.txt
* hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c
* hadoop-common-project/hadoop-common/src/main/winutils/symlink.c
* hadoop-common-project/hadoop-common/src/main/winutils/include/winutils.h
* hadoop-common-project/hadoop-common/src/main/winutils/task.c
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestWinUtils.java
* hadoop-common-project/hadoop-common/src/main/winutils/chown.c


> Winutils needs ability to create task as domain user
> 
>
> Key: YARN-1063
> URL: https://issues.apache.org/jira/browse/YARN-1063
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
> Environment: Windows
>Reporter: Kyle Leckie
>Assignee: Remus Rusanu
>  Labels: security, windows
> Fix For: 2.6.0
>
> Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, 
> YARN-1063.5.patch, YARN-1063.6.patch, YARN-1063.patch
>
>
> h1. Summary:
> Securing a Hadoop cluster requires constructing some form of security 
> boundary around the processes executed in YARN containers. Isolation based on 
> Windows user isolation seems most feasible. This approach is similar to the 
> approach taken by the existing LinuxContainerExecutor. The current patch to 
> winutils.exe adds the ability to create a process as a domain user. 
> h1. Alternative Methods considered:
> h2. Process rights limited by security token restriction:
> On Windows access decisions are made by examining the security token of a 
> process. It is possible to spawn a process with a restricted security token. 
> Any of the rights granted by SIDs of the default token may be restricted. It 
> is possible to see this in action by examining the security tone of a 
> sandboxed process launch be a web browser. Typically the launched process 
> will have a fully restricted token and need to access machine resources 
> through a dedicated broker process that enforces a custom security policy. 
> This broker process mechanism would break compatibility with the typical 
> Hadoop container process. The Container process must be able to utilize 
> standard function calls for disk and network IO. I performed some work 
> looking at ways to ACL the local files to the specific launched without 
> granting rights to other processes launched on the same machine but found 
> this to be an overly complex solution. 
> h2. Relying on APP containers:
> Recent versions of windows have the ability to launch processes within an 
> isolated container. Application containers are supported for execution of 
> WinRT based executables. This method was ruled out due to the lack of 
> official support for standard windows APIs. At some point in the future 
> windows may support functionality similar to BSD jails or Linux containers, 
> at that point support for containers should be added.
> h1. Create As User Feature Description:
> h2. Usage:
> A new sub command was added to the set of task commands. Here is the syntax:
> winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE]
> Some notes:
> * The username specified is in the format of "user@domain"
> * The machine executing this command must be joined to the domain of the user 
> specified
> * The domain controller must allow the account executing the command access 
> to the user information. For this join the account to the predefined group 
> labeled "Pre-Windows 2000 Compatible Access"
> * The account running the command must have several rights on the local 
> machine. These can be managed manually using secpol.msc: 
> ** "Act as part of the operating system" - SE_TCB_NAME
> ** "Replace a process-level token" - SE_ASSIGNPRIMARYTOKEN_NAME
> ** "Adjust memory quotas for a process" - SE_INCREASE_QUOTA_NAME
> * The launched process will not have rights to the desktop so will not be 
> able to display any information or create UI.
> * The launched process will have no network credentials. Any access of 
> network resources that requires domain authentication will fail.
> h2. Implementation:
> Winutils performs the following steps:
> # Enable the required privileges for the current process.
> # Register as a trusted process with the Local Security Authority (LSA).
> # Create a new logon for the user passed on the command line.
> # Load

[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor

2014-10-01 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155127#comment-14155127
 ] 

Vinod Kumar Vavilapalli commented on YARN-1972:
---

bq. Remus Rusanu Vinod Kumar Vavilapalli, as on YARN-1063, we can go ahead and 
address these comments as part of the YARN-2198 effort, it's not necessary to 
resolve these before these patches are committed.
+1 for tracking the remaining issues at YARN-1063.

This looks good, checking this in.

> Implement secure Windows Container Executor
> ---
>
> Key: YARN-1972
> URL: https://issues.apache.org/jira/browse/YARN-1972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: security, windows
> Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, 
> YARN-1972.delta.4.patch, YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, 
> YARN-1972.trunk.5.patch
>
>
> h1. Windows Secure Container Executor (WCE)
> YARN-1063 adds the necessary infrasturcture to launch a process as a domain 
> user as a solution for the problem of having a security boundary between 
> processes executed in YARN containers and the Hadoop services. The WCE is a 
> container executor that leverages the winutils capabilities introduced in 
> YARN-1063 and launches containers as an OS process running as the job 
> submitter user. A description of the S4U infrastructure used by YARN-1063 
> alternatives considered can be read on that JIRA.
> The WCE is based on the DefaultContainerExecutor. It relies on the DCE to 
> drive the flow of execution, but it overwrrides some emthods to the effect of:
> * change the DCE created user cache directories to be owned by the job user 
> and by the nodemanager group.
> * changes the actual container run command to use the 'createAsUser' command 
> of winutils task instead of 'create'
> * runs the localization as standalone process instead of an in-process Java 
> method call. This in turn relies on the winutil createAsUser feature to run 
> the localization as the job user.
>  
> When compared to LinuxContainerExecutor (LCE), the WCE has some minor 
> differences:
> * it does no delegate the creation of the user cache directories to the 
> native implementation.
> * it does no require special handling to be able to delete user files
> The approach on the WCE came from a practical trial-and-error approach. I had 
> to iron out some issues around the Windows script shell limitations (command 
> line length) to get it to work, the biggest issue being the huge CLASSPATH 
> that is commonplace in Hadoop environment container executions. The job 
> container itself is already dealing with this via a so called 'classpath 
> jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch 
> as a separate container the same issue had to be resolved and I used the same 
> 'classpath jar' approach.
> h2. Deployment Requirements
> To use the WCE one needs to set the 
> `yarn.nodemanager.container-executor.class` to 
> `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` 
> and set the `yarn.nodemanager.windows-secure-container-executor.group` to a 
> Windows security group name that is the nodemanager service principal is a 
> member of (equivalent of LCE 
> `yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE 
> does not require any configuration outside of the Hadoop own's yar-site.xml.
> For WCE to work the nodemanager must run as a service principal that is 
> member of the local Administrators group or LocalSystem. this is derived from 
> the need to invoke LoadUserProfile API which mention these requirements in 
> the specifications. This is in addition to the SE_TCB privilege mentioned in 
> YARN-1063, but this requirement will automatically imply that the SE_TCB 
> privilege is held by the nodemanager. For the Linux speakers in the audience, 
> the requirement is basically to run NM as root.
> h2. Dedicated high privilege Service
> Due to the high privilege required by the WCE we had discussed the need to 
> isolate the high privilege operations into a separate process, an 'executor' 
> service that is solely responsible to start the containers (incloding the 
> localizer). The NM would have to authenticate, authorize and communicate with 
> this service via an IPC mechanism and use this service to launch the 
> containers. I still believe we'll end up deploying such a service, but the 
> effort to onboard such a new platfrom specific new service on the project are 
> not trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails

2014-10-01 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155136#comment-14155136
 ] 

Zhijie Shen commented on YARN-2630:
---

Make sense. +1

> TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
> -
>
> Key: YARN-2630
> URL: https://issues.apache.org/jira/browse/YARN-2630
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2630.1.patch, YARN-2630.2.patch
>
>
> The problem is that after YARN-1372, in work-preserving AM restart, the 
> re-launched AM will also receive previously failed AM container. But 
> DistributedShell logic is not expecting this extra completed container.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155139#comment-14155139
 ] 

Hadoop QA commented on YARN-2615:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672344/YARN-2615.patch
  against trunk revision 3f25d91.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.crypto.random.TestOsSecureRandom
  org.apache.hadoop.ha.TestZKFailoverControllerStress
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5196//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5196//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5196//console

This message is automatically generated.

> ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended 
> fields
> 
>
> Key: YARN-2615
> URL: https://issues.apache.org/jira/browse/YARN-2615
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-2615.patch
>
>
> As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier 
> and DelegationTokenIdentifier should also be updated in the same way to allow 
> fields get extended in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2635) TestRMRestart fails with FairScheduler

2014-10-01 Thread Wei Yan (JIRA)
Wei Yan created YARN-2635:
-

 Summary: TestRMRestart fails with FairScheduler
 Key: YARN-2635
 URL: https://issues.apache.org/jira/browse/YARN-2635
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan


If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155151#comment-14155151
 ] 

Hudson commented on YARN-1972:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6165 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6165/])
YARN-1972. Added a secure container-executor for Windows. Contributed by Remus 
Rusanu. (vinodkv: rev ba7f31c2ee8d23ecb183f88920ef06053c0b9769)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/SecureContainer.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java


> Implement secure Windows Container Executor
> ---
>
> Key: YARN-1972
> URL: https://issues.apache.org/jira/browse/YARN-1972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: security, windows
> Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, 
> YARN-1972.delta.4.patch, YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, 
> YARN-1972.trunk.5.patch
>
>
> h1. Windows Secure Container Executor (WCE)
> YARN-1063 adds the necessary infrasturcture to launch a process as a domain 
> user as a solution for the problem of having a security boundary between 
> processes executed in YARN containers and the Hadoop services. The WCE is a 
> container executor that leverages the winutils capabilities introduced in 
> YARN-1063 and launches containers as an OS process running as the job 
> submitter user. A description of the S4U infrastructure used by YARN-1063 
> alternatives considered can be read on that JIRA.
> The WCE is based on the DefaultContainerExecutor. It relies on the DCE to 
> drive the flow of execution, but it overwrrides some emthods to the effect of:
> * change the DCE created user cache directories to be owned by the job user 
> and by the nodemanager group.
> * changes the actual container run command to use the 'createAsUser' command 
> of winutils task instead of 'create'
> * runs the localization as standalone process instead of an in-process Java 
> method call. This in turn relies on the winutil createAsUser feature to run 
> the localization as the job user.
>  
> When compared to LinuxContainerExecutor (LCE), the WCE has some minor 
> differences:
> * it does no delegate the creation of the user cache directories to the 
> native implementation.
> * it does no require special handling to be able to delete user files
> The approach on the WCE came from a practical trial-and-error approach. I had 
> to iron out some issues around the Windows script shell limitations (command 
> line length) to get it to work, the biggest issue being the huge CLASSPATH 
> that is commonplace in Hadoop environment container executions. The job 
> container itself is already dealing with this via a so called 'classpath 
> jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch 
> as a separate container the same issue had to be resolved and I used the same 
> 'classpath jar' approach.
> h2. Deployment Requirements
> To use the WCE one needs to set the 
> `yarn.nodemanager.container-executor.class` to 
> `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` 

[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155169#comment-14155169
 ] 

Hadoop QA commented on YARN-1063:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657587/YARN-1063.6.patch
  against trunk revision 04b0843.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5197//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5197//console

This message is automatically generated.

> Winutils needs ability to create task as domain user
> 
>
> Key: YARN-1063
> URL: https://issues.apache.org/jira/browse/YARN-1063
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
> Environment: Windows
>Reporter: Kyle Leckie
>Assignee: Remus Rusanu
>  Labels: security, windows
> Fix For: 2.6.0
>
> Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, 
> YARN-1063.5.patch, YARN-1063.6.patch, YARN-1063.patch
>
>
> h1. Summary:
> Securing a Hadoop cluster requires constructing some form of security 
> boundary around the processes executed in YARN containers. Isolation based on 
> Windows user isolation seems most feasible. This approach is similar to the 
> approach taken by the existing LinuxContainerExecutor. The current patch to 
> winutils.exe adds the ability to create a process as a domain user. 
> h1. Alternative Methods considered:
> h2. Process rights limited by security token restriction:
> On Windows access decisions are made by examining the security token of a 
> process. It is possible to spawn a process with a restricted security token. 
> Any of the rights granted by SIDs of the default token may be restricted. It 
> is possible to see this in action by examining the security tone of a 
> sandboxed process launch be a web browser. Typically the launched process 
> will have a fully restricted token and need to access machine resources 
> through a dedicated broker process that enforces a custom security policy. 
> This broker process mechanism would break compatibility with the typical 
> Hadoop container process. The Container process must be able to utilize 
> standard function calls for disk and network IO. I performed some work 
> looking at ways to ACL the local files to the specific launched without 
> granting rights to other processes launched on the same machine but found 
> this to be an overly complex solution. 
> h2. Relying on APP containers:
> Recent versions of windows have the ability to launch processes within an 
> isolated container. Application containers are supported for execution of 
> WinRT based executables. This method was ruled out due to the lack of 
> official support for standard windows APIs. At some point in the future 
> windows may support functionality similar to BSD jails or Linux containers, 
> at that point support for containers should be added.
> h1. Create As User Feature Description:
> h2. Usage:
> A new sub command was added to the set of task commands. Here is the syntax:
> winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE]
> Some notes:
> * The username specified is in the format of "user@domain"
> * The machine executing this command must be joined to the domain of the user 
> specified
> * The domain controller must allow the account executing the command access 
> to the user information. For this join the account to the predefined group 
> labeled "Pre-Windows 2000 Compatible Access"
> * The account running the command must have several rights on the local 
> machine. These can be managed manually using secpol.msc: 
> ** "Act as part of the operating system" - SE_TCB_NAME
> ** "Replace a process-level token" - SE_ASSIGNPRIMARYTOKEN_NAME
> ** "Adjust memory quotas for a process" - SE_INCREASE_Q

[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-10-01 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1879:

Attachment: YARN-1879.16.patch

[~ozawa] I have updated your patch to compile with latest trunk. [~jianhe] can 
you please take a look

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, 
> YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, 
> YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails

2014-10-01 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2630:
--
Attachment: YARN-2630.3.patch

Uploaded a patch which renames 
NodeHeartbeatResponse#getFinishedContainersPulledByAM to 
getContainersToBeRemovedFromNM, as I think if in the future we add one more 
channel (not just pulled by AM) to remove containers from NM, the latter is 
more semantically correct.

> TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
> -
>
> Key: YARN-2630
> URL: https://issues.apache.org/jira/browse/YARN-2630
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch
>
>
> The problem is that after YARN-1372, in work-preserving AM restart, the 
> re-launched AM will also receive previously failed AM container. But 
> DistributedShell logic is not expecting this extra completed container.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM

2014-10-01 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155220#comment-14155220
 ] 

Jian He commented on YARN-2617:
---

looks good, +1

> NM does not need to send finished container whose APP is not running to RM
> --
>
> Key: YARN-2617
> URL: https://issues.apache.org/jira/browse/YARN-2617
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, 
> YARN-2617.patch
>
>
> We([~chenchun]) are testing RM work preserving restart and found the 
> following logs when we ran a simple MapReduce task "PI". NM continuously 
> reported completed containers whose Application had already finished while AM 
> had finished. 
> {code}
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> {code}
> In the patch for YARN-1372, ApplicationImpl on NM should guarantee to  clean 
> up already completed applications. But it will only remove appId from  
> 'app.context.getApplications()' when ApplicaitonImpl received evnet 
> 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might 
> receive this event for a long time or could not receive. 
> * For NonAggregatingLogHandler, it wait for 
> YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, 
> then it will be scheduled to delete Application logs and send the event.
> * For LogAggregationService, it might fail(e.g. if user does not have HDFS 
> write permission), and it will not send the event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails

2014-10-01 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155227#comment-14155227
 ] 

Zhijie Shen commented on YARN-2630:
---

Would you please check "finishedContainersPulledByAM" is completely replaced in 
the code base?
{code}
-if (this.finishedContainersPulledByAM != null) {
+if (this.containersToBeRemovedFromNM != null) {
   addFinishedContainersPulledByAMToProto();
 }
{code}
{code}
-  public void addFinishedContainersPulledByAM(
+  public void addContainersToBeRemovedFromNM(
   final List finishedContainersPulledByAM) {
 if (finishedContainersPulledByAM == null)
   return;
 initFinishedContainersPulledByAM();
-this.finishedContainersPulledByAM.addAll(finishedContainersPulledByAM);
+this.containersToBeRemovedFromNM.addAll(finishedContainersPulledByAM);
{code}
{code}
-  nhResponse.addFinishedContainersPulledByAM(finishedContainersPulledByAM);
+  nhResponse.addContainersToBeRemovedFromNM(finishedContainersPulledByAM);
{code}
{code}
-  response.addFinishedContainersPulledByAM(
+  response.addContainersToBeRemovedFromNM(
   new ArrayList(this.finishedContainersPulledByAM));
{code}

> TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
> -
>
> Key: YARN-2630
> URL: https://issues.apache.org/jira/browse/YARN-2630
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch
>
>
> The problem is that after YARN-1372, in work-preserving AM restart, the 
> re-launched AM will also receive previously failed AM container. But 
> DistributedShell logic is not expecting this extra completed container.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-10-01 Thread Jian Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155235#comment-14155235
 ] 

Jian Fang commented on YARN-1680:
-

I may be wrong because I don't understand the logic fully. Seems your patch 
calculates the blacklisted resource for each application. Please clarify for me 
whether the blacklisted node is a cluster level concept or an application level 
one. What if multiple applications have different sets of blacklisted nodes? If 
the blacklisted node is at the cluster level, the blacklisted resource seems 
should be calculated at the cluster level, that is to say, you need to get the 
blacklisted nodes from other applications as well. If it is only at the 
application level, I wonder how the blacklist-task-tracker command works in 
hadoop one. 

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> --
>
> Key: YARN-1680
> URL: https://issues.apache.org/jira/browse/YARN-1680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0, 2.3.0
> Environment: SuSE 11 SP2 + Hadoop-2.3 
>Reporter: Rohith
>Assignee: Chen He
> Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
> YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails

2014-10-01 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2630:
--
Attachment: YARN-2630.4.patch

thanks zhijie ! updated the patch to fix the inconsistency 

> TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
> -
>
> Key: YARN-2630
> URL: https://issues.apache.org/jira/browse/YARN-2630
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, 
> YARN-2630.4.patch
>
>
> The problem is that after YARN-1372, in work-preserving AM restart, the 
> re-launched AM will also receive previously failed AM container. But 
> DistributedShell logic is not expecting this extra completed container.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1972) Implement secure Windows Container Executor

2014-10-01 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1972:
--
Attachment: YARN-1972.delta.5-branch-2.patch

The patch doesn't apply on branch-2. Generated it myself, attaching now.

> Implement secure Windows Container Executor
> ---
>
> Key: YARN-1972
> URL: https://issues.apache.org/jira/browse/YARN-1972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: security, windows
> Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, 
> YARN-1972.delta.4.patch, YARN-1972.delta.5-branch-2.patch, 
> YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, YARN-1972.trunk.5.patch
>
>
> h1. Windows Secure Container Executor (WCE)
> YARN-1063 adds the necessary infrasturcture to launch a process as a domain 
> user as a solution for the problem of having a security boundary between 
> processes executed in YARN containers and the Hadoop services. The WCE is a 
> container executor that leverages the winutils capabilities introduced in 
> YARN-1063 and launches containers as an OS process running as the job 
> submitter user. A description of the S4U infrastructure used by YARN-1063 
> alternatives considered can be read on that JIRA.
> The WCE is based on the DefaultContainerExecutor. It relies on the DCE to 
> drive the flow of execution, but it overwrrides some emthods to the effect of:
> * change the DCE created user cache directories to be owned by the job user 
> and by the nodemanager group.
> * changes the actual container run command to use the 'createAsUser' command 
> of winutils task instead of 'create'
> * runs the localization as standalone process instead of an in-process Java 
> method call. This in turn relies on the winutil createAsUser feature to run 
> the localization as the job user.
>  
> When compared to LinuxContainerExecutor (LCE), the WCE has some minor 
> differences:
> * it does no delegate the creation of the user cache directories to the 
> native implementation.
> * it does no require special handling to be able to delete user files
> The approach on the WCE came from a practical trial-and-error approach. I had 
> to iron out some issues around the Windows script shell limitations (command 
> line length) to get it to work, the biggest issue being the huge CLASSPATH 
> that is commonplace in Hadoop environment container executions. The job 
> container itself is already dealing with this via a so called 'classpath 
> jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch 
> as a separate container the same issue had to be resolved and I used the same 
> 'classpath jar' approach.
> h2. Deployment Requirements
> To use the WCE one needs to set the 
> `yarn.nodemanager.container-executor.class` to 
> `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` 
> and set the `yarn.nodemanager.windows-secure-container-executor.group` to a 
> Windows security group name that is the nodemanager service principal is a 
> member of (equivalent of LCE 
> `yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE 
> does not require any configuration outside of the Hadoop own's yar-site.xml.
> For WCE to work the nodemanager must run as a service principal that is 
> member of the local Administrators group or LocalSystem. this is derived from 
> the need to invoke LoadUserProfile API which mention these requirements in 
> the specifications. This is in addition to the SE_TCB privilege mentioned in 
> YARN-1063, but this requirement will automatically imply that the SE_TCB 
> privilege is held by the nodemanager. For the Linux speakers in the audience, 
> the requirement is basically to run NM as root.
> h2. Dedicated high privilege Service
> Due to the high privilege required by the WCE we had discussed the need to 
> isolate the high privilege operations into a separate process, an 'executor' 
> service that is solely responsible to start the containers (incloding the 
> localizer). The NM would have to authenticate, authorize and communicate with 
> this service via an IPC mechanism and use this service to launch the 
> containers. I still believe we'll end up deploying such a service, but the 
> effort to onboard such a new platfrom specific new service on the project are 
> not trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155259#comment-14155259
 ] 

Hadoop QA commented on YARN-1972:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12672375/YARN-1972.delta.5-branch-2.patch
  against trunk revision 1f5b42a.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5200//console

This message is automatically generated.

> Implement secure Windows Container Executor
> ---
>
> Key: YARN-1972
> URL: https://issues.apache.org/jira/browse/YARN-1972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: security, windows
> Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, 
> YARN-1972.delta.4.patch, YARN-1972.delta.5-branch-2.patch, 
> YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, YARN-1972.trunk.5.patch
>
>
> h1. Windows Secure Container Executor (WCE)
> YARN-1063 adds the necessary infrasturcture to launch a process as a domain 
> user as a solution for the problem of having a security boundary between 
> processes executed in YARN containers and the Hadoop services. The WCE is a 
> container executor that leverages the winutils capabilities introduced in 
> YARN-1063 and launches containers as an OS process running as the job 
> submitter user. A description of the S4U infrastructure used by YARN-1063 
> alternatives considered can be read on that JIRA.
> The WCE is based on the DefaultContainerExecutor. It relies on the DCE to 
> drive the flow of execution, but it overwrrides some emthods to the effect of:
> * change the DCE created user cache directories to be owned by the job user 
> and by the nodemanager group.
> * changes the actual container run command to use the 'createAsUser' command 
> of winutils task instead of 'create'
> * runs the localization as standalone process instead of an in-process Java 
> method call. This in turn relies on the winutil createAsUser feature to run 
> the localization as the job user.
>  
> When compared to LinuxContainerExecutor (LCE), the WCE has some minor 
> differences:
> * it does no delegate the creation of the user cache directories to the 
> native implementation.
> * it does no require special handling to be able to delete user files
> The approach on the WCE came from a practical trial-and-error approach. I had 
> to iron out some issues around the Windows script shell limitations (command 
> line length) to get it to work, the biggest issue being the huge CLASSPATH 
> that is commonplace in Hadoop environment container executions. The job 
> container itself is already dealing with this via a so called 'classpath 
> jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch 
> as a separate container the same issue had to be resolved and I used the same 
> 'classpath jar' approach.
> h2. Deployment Requirements
> To use the WCE one needs to set the 
> `yarn.nodemanager.container-executor.class` to 
> `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` 
> and set the `yarn.nodemanager.windows-secure-container-executor.group` to a 
> Windows security group name that is the nodemanager service principal is a 
> member of (equivalent of LCE 
> `yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE 
> does not require any configuration outside of the Hadoop own's yar-site.xml.
> For WCE to work the nodemanager must run as a service principal that is 
> member of the local Administrators group or LocalSystem. this is derived from 
> the need to invoke LoadUserProfile API which mention these requirements in 
> the specifications. This is in addition to the SE_TCB privilege mentioned in 
> YARN-1063, but this requirement will automatically imply that the SE_TCB 
> privilege is held by the nodemanager. For the Linux speakers in the audience, 
> the requirement is basically to run NM as root.
> h2. Dedicated high privilege Service
> Due to the high privilege required by the WCE we had discussed the need to 
> isolate the high privilege operations into a separate process, an 'executor' 
> service that is solely responsible to start the containers (incloding the 
> localizer). The NM would have to authenticate, authorize and communicate with 
> this service via an IPC mechanism and use this service to launch the 
> containers. I still believe we'll end up deploying such a service, but the 
> effort to onboard such a new platfrom specific new service on the project are 
> not trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4

[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-10-01 Thread Jian Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155282#comment-14155282
 ] 

Jian Fang commented on YARN-1680:
-

Also, seems the variable blackListedResources in SchedulerApplicationAttempt is 
not initialized in YARN-1680-WIP.patch and it causes NPE. 

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> --
>
> Key: YARN-1680
> URL: https://issues.apache.org/jira/browse/YARN-1680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0, 2.3.0
> Environment: SuSE 11 SP2 + Hadoop-2.3 
>Reporter: Rohith
>Assignee: Chen He
> Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
> YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free

2014-10-01 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2628:

Attachment: apache-yarn-2628.0.patch

Uploaded a patch with fix and test case.

> Capacity scheduler with DominantResourceCalculator carries out reservation 
> even though slots are free
> -
>
> Key: YARN-2628
> URL: https://issues.apache.org/jira/browse/YARN-2628
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.5.1
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2628.0.patch
>
>
> We've noticed that if you run the CapacityScheduler with the 
> DominantResourceCalculator, sometimes apps will end up with containers in a 
> reserved state even though free slots are available.
> The root cause seems to be this piece of code from CapacityScheduler.java -
> {noformat}
> // Try to schedule more if there are no reservations to fulfill
> if (node.getReservedContainer() == null) {
>   if (Resources.greaterThanOrEqual(calculator, getClusterResource(),
>   node.getAvailableResource(), minimumAllocation)) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Trying to schedule on node: " + node.getNodeName() +
>   ", available: " + node.getAvailableResource());
> }
> root.assignContainers(clusterResource, node, false);
>   }
> } else {
>   LOG.info("Skipping scheduling since node " + node.getNodeID() + 
>   " is reserved by application " + 
>   
> node.getReservedContainer().getContainerId().getApplicationAttemptId()
>   );
> }
> {noformat}
> The code is meant to check if a node has any slots available for containers . 
> Since it uses the greaterThanOrEqual function, we end up in situation where 
> greaterThanOrEqual returns true, even though we may not have enough CPU or 
> memory to actually run the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-10-01 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155295#comment-14155295
 ] 

Craig Welch commented on YARN-1680:
---

As I recall, blacklisted nodes are application level

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> --
>
> Key: YARN-1680
> URL: https://issues.apache.org/jira/browse/YARN-1680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0, 2.3.0
> Environment: SuSE 11 SP2 + Hadoop-2.3 
>Reporter: Rohith
>Assignee: Chen He
> Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
> YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155302#comment-14155302
 ] 

Hadoop QA commented on YARN-1879:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672365/YARN-1879.16.patch
  against trunk revision 737f280.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5198//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5198//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5198//console

This message is automatically generated.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, 
> YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, 
> YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155318#comment-14155318
 ] 

Hadoop QA commented on YARN-2630:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672368/YARN-2630.3.patch
  against trunk revision 1f5b42a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5199//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5199//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5199//console

This message is automatically generated.

> TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
> -
>
> Key: YARN-2630
> URL: https://issues.apache.org/jira/browse/YARN-2630
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, 
> YARN-2630.4.patch
>
>
> The problem is that after YARN-1372, in work-preserving AM restart, the 
> re-launched AM will also receive previously failed AM container. But 
> DistributedShell logic is not expecting this extra completed container.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-10-01 Thread Jian Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155344#comment-14155344
 ] 

Jian Fang commented on YARN-1680:
-

Is there any behavior change from hadoop one to hadoop two for the blacklist 
node? Seems HADOOP-5643 discussed the ability to blacklist tasktracker. 

We have a use case to blacklist a node at the cluster level before decommission 
the node so as to gracefully remove this node. If the blacklist is only 
application level, then we have to figure out something else.

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> --
>
> Key: YARN-1680
> URL: https://issues.apache.org/jira/browse/YARN-1680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0, 2.3.0
> Environment: SuSE 11 SP2 + Hadoop-2.3 
>Reporter: Rohith
>Assignee: Chen He
> Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
> YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155352#comment-14155352
 ] 

Hadoop QA commented on YARN-2630:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672374/YARN-2630.4.patch
  against trunk revision 1f5b42a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5201//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5201//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5201//console

This message is automatically generated.

> TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
> -
>
> Key: YARN-2630
> URL: https://issues.apache.org/jira/browse/YARN-2630
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, 
> YARN-2630.4.patch
>
>
> The problem is that after YARN-1372, in work-preserving AM restart, the 
> re-launched AM will also receive previously failed AM container. But 
> DistributedShell logic is not expecting this extra completed container.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2636) Windows Secure Container Executor: add unit tests for WSCE

2014-10-01 Thread Remus Rusanu (JIRA)
Remus Rusanu created YARN-2636:
--

 Summary: Windows Secure Container Executor: add unit tests for WSCE
 Key: YARN-2636
 URL: https://issues.apache.org/jira/browse/YARN-2636
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Remus Rusanu
Assignee: Remus Rusanu
Priority: Critical


As title says.

The WSCE has no check-in unit tests. Much of the functionality depends on 
elevated hadoopwinutilsvc service and cannot be tested, but lets test what is 
possible to be mocked in Java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-10-01 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155360#comment-14155360
 ] 

Craig Welch commented on YARN-1680:
---

There are different kinds of blacklisting, the one at issue in this jira is the 
application level one.  The cluster level one ends up with the node's resource 
value being removed from the cluster resource and it doesn't need to be 
addressed here (because removing it from the cluster resource removes it's 
resource amount from any headroom calculation already), this is to address the 
application level blacklist, which needs to be handled at this level.

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> --
>
> Key: YARN-1680
> URL: https://issues.apache.org/jira/browse/YARN-1680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0, 2.3.0
> Environment: SuSE 11 SP2 + Hadoop-2.3 
>Reporter: Rohith
>Assignee: Chen He
> Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
> YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2408) Resource Request REST API for YARN

2014-10-01 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated YARN-2408:
-
Description: 
I’m proposing a new REST API for YARN which exposes a snapshot of the Resource 
Requests that exist inside of the Scheduler. My motivation behind this new 
feature is to allow external software to monitor the amount of resources being 
requested to gain more insightful information into cluster usage than is 
already provided. The API can also be used by external software to detect a 
starved application and alert the appropriate users and/or sys admin so that 
the problem may be remedied.

Here is the proposed API (a JSON counterpart is also available):
{code:xml}

  7680
  7
  
application_1412191664217_0001

appattempt_1412191664217_0001_01
default
6144
6
3

  
1024
1
6
true
20

  localMachine
  /default-rack
  *

  

  
  
  ...
  

{code}

  was:
I’m proposing a new REST API for YARN which exposes a snapshot of the Resource 
Requests that exist inside of the Scheduler. My motivation behind this new 
feature is to allow external software to monitor the amount of resources being 
requested to gain more insightful information into cluster usage than is 
already provided. The API can also be used by external software to detect a 
starved application and alert the appropriate users and/or sys admin so that 
the problem may be remedied.

Here is the proposed API:
{code:xml}

  96256
  94
  
application_
appattempt_
default
96256
94
3

  
1024
1
/default-rack
94
true
20
  
  
1024
1
*
94
true
20
  
  
1024
1
master
94
true
20
  

  

{code}


> Resource Request REST API for YARN
> --
>
> Key: YARN-2408
> URL: https://issues.apache.org/jira/browse/YARN-2408
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: webapp
>Reporter: Renan DelValle
>  Labels: features
> Attachments: YARN-2408-3.patch
>
>
> I’m proposing a new REST API for YARN which exposes a snapshot of the 
> Resource Requests that exist inside of the Scheduler. My motivation behind 
> this new feature is to allow external software to monitor the amount of 
> resources being requested to gain more insightful information into cluster 
> usage than is already provided. The API can also be used by external software 
> to detect a starved application and alert the appropriate users and/or sys 
> admin so that the problem may be remedied.
> Here is the proposed API (a JSON counterpart is also available):
> {code:xml}
> 
>   7680
>   7
>   
> application_1412191664217_0001
> 
> appattempt_1412191664217_0001_01
> default
> 6144
> 6
> 3
> 
>   
> 1024
> 1
> 6
> true
> 20
> 
>   localMachine
>   /default-rack
>   *
> 
>   
> 
>   
>   
>   ...
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2408) Resource Request REST API for YARN

2014-10-01 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated YARN-2408:
-
Attachment: YARN-2408.4.patch

Clustered resource requests that have the same priority, same number of 
containers, same relax locality, and same number of cores.

> Resource Request REST API for YARN
> --
>
> Key: YARN-2408
> URL: https://issues.apache.org/jira/browse/YARN-2408
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: webapp
>Reporter: Renan DelValle
>  Labels: features
> Attachments: YARN-2408.4.patch
>
>
> I’m proposing a new REST API for YARN which exposes a snapshot of the 
> Resource Requests that exist inside of the Scheduler. My motivation behind 
> this new feature is to allow external software to monitor the amount of 
> resources being requested to gain more insightful information into cluster 
> usage than is already provided. The API can also be used by external software 
> to detect a starved application and alert the appropriate users and/or sys 
> admin so that the problem may be remedied.
> Here is the proposed API (a JSON counterpart is also available):
> {code:xml}
> 
>   7680
>   7
>   
> application_1412191664217_0001
> 
> appattempt_1412191664217_0001_01
> default
> 6144
> 6
> 3
> 
>   
> 1024
> 1
> 6
> true
> 20
> 
>   localMachine
>   /default-rack
>   *
> 
>   
> 
>   
>   
>   ...
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2408) Resource Request REST API for YARN

2014-10-01 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated YARN-2408:
-
Attachment: (was: YARN-2408-3.patch)

> Resource Request REST API for YARN
> --
>
> Key: YARN-2408
> URL: https://issues.apache.org/jira/browse/YARN-2408
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: webapp
>Reporter: Renan DelValle
>  Labels: features
> Attachments: YARN-2408.4.patch
>
>
> I’m proposing a new REST API for YARN which exposes a snapshot of the 
> Resource Requests that exist inside of the Scheduler. My motivation behind 
> this new feature is to allow external software to monitor the amount of 
> resources being requested to gain more insightful information into cluster 
> usage than is already provided. The API can also be used by external software 
> to detect a starved application and alert the appropriate users and/or sys 
> admin so that the problem may be remedied.
> Here is the proposed API (a JSON counterpart is also available):
> {code:xml}
> 
>   7680
>   7
>   
> application_1412191664217_0001
> 
> appattempt_1412191664217_0001_01
> default
> 6144
> 6
> 3
> 
>   
> 1024
> 1
> 6
> true
> 20
> 
>   localMachine
>   /default-rack
>   *
> 
>   
> 
>   
>   
>   ...
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user

2014-10-01 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155373#comment-14155373
 ] 

Remus Rusanu commented on YARN-1063:


Contributor credit should also got to Kyle Leckie

> Winutils needs ability to create task as domain user
> 
>
> Key: YARN-1063
> URL: https://issues.apache.org/jira/browse/YARN-1063
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
> Environment: Windows
>Reporter: Kyle Leckie
>Assignee: Remus Rusanu
>  Labels: security, windows
> Fix For: 2.6.0
>
> Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, 
> YARN-1063.5.patch, YARN-1063.6.patch, YARN-1063.patch
>
>
> h1. Summary:
> Securing a Hadoop cluster requires constructing some form of security 
> boundary around the processes executed in YARN containers. Isolation based on 
> Windows user isolation seems most feasible. This approach is similar to the 
> approach taken by the existing LinuxContainerExecutor. The current patch to 
> winutils.exe adds the ability to create a process as a domain user. 
> h1. Alternative Methods considered:
> h2. Process rights limited by security token restriction:
> On Windows access decisions are made by examining the security token of a 
> process. It is possible to spawn a process with a restricted security token. 
> Any of the rights granted by SIDs of the default token may be restricted. It 
> is possible to see this in action by examining the security tone of a 
> sandboxed process launch be a web browser. Typically the launched process 
> will have a fully restricted token and need to access machine resources 
> through a dedicated broker process that enforces a custom security policy. 
> This broker process mechanism would break compatibility with the typical 
> Hadoop container process. The Container process must be able to utilize 
> standard function calls for disk and network IO. I performed some work 
> looking at ways to ACL the local files to the specific launched without 
> granting rights to other processes launched on the same machine but found 
> this to be an overly complex solution. 
> h2. Relying on APP containers:
> Recent versions of windows have the ability to launch processes within an 
> isolated container. Application containers are supported for execution of 
> WinRT based executables. This method was ruled out due to the lack of 
> official support for standard windows APIs. At some point in the future 
> windows may support functionality similar to BSD jails or Linux containers, 
> at that point support for containers should be added.
> h1. Create As User Feature Description:
> h2. Usage:
> A new sub command was added to the set of task commands. Here is the syntax:
> winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE]
> Some notes:
> * The username specified is in the format of "user@domain"
> * The machine executing this command must be joined to the domain of the user 
> specified
> * The domain controller must allow the account executing the command access 
> to the user information. For this join the account to the predefined group 
> labeled "Pre-Windows 2000 Compatible Access"
> * The account running the command must have several rights on the local 
> machine. These can be managed manually using secpol.msc: 
> ** "Act as part of the operating system" - SE_TCB_NAME
> ** "Replace a process-level token" - SE_ASSIGNPRIMARYTOKEN_NAME
> ** "Adjust memory quotas for a process" - SE_INCREASE_QUOTA_NAME
> * The launched process will not have rights to the desktop so will not be 
> able to display any information or create UI.
> * The launched process will have no network credentials. Any access of 
> network resources that requires domain authentication will fail.
> h2. Implementation:
> Winutils performs the following steps:
> # Enable the required privileges for the current process.
> # Register as a trusted process with the Local Security Authority (LSA).
> # Create a new logon for the user passed on the command line.
> # Load/Create a profile on the local machine for the new logon.
> # Create a new environment for the new logon.
> # Launch the new process in a job with the task name specified and using the 
> created logon.
> # Wait for the JOB to exit.
> h2. Future work:
> The following work was scoped out of this check in:
> * Support for non-domain users or machine that are not domain joined.
> * Support for privilege isolation by running the task launcher in a high 
> privilege service with access over an ACLed named pipe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2408) Resource Request REST API for YARN

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155372#comment-14155372
 ] 

Hadoop QA commented on YARN-2408:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672388/YARN-2408.4.patch
  against trunk revision 1f5b42a.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5203//console

This message is automatically generated.

> Resource Request REST API for YARN
> --
>
> Key: YARN-2408
> URL: https://issues.apache.org/jira/browse/YARN-2408
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: webapp
>Reporter: Renan DelValle
>  Labels: features
> Attachments: YARN-2408.4.patch
>
>
> I’m proposing a new REST API for YARN which exposes a snapshot of the 
> Resource Requests that exist inside of the Scheduler. My motivation behind 
> this new feature is to allow external software to monitor the amount of 
> resources being requested to gain more insightful information into cluster 
> usage than is already provided. The API can also be used by external software 
> to detect a starved application and alert the appropriate users and/or sys 
> admin so that the problem may be remedied.
> Here is the proposed API (a JSON counterpart is also available):
> {code:xml}
> 
>   7680
>   7
>   
> application_1412191664217_0001
> 
> appattempt_1412191664217_0001_01
> default
> 6144
> 6
> 3
> 
>   
> 1024
> 1
> 6
> true
> 20
> 
>   localMachine
>   /default-rack
>   *
> 
>   
> 
>   
>   
>   ...
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-10-01 Thread Jian Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155381#comment-14155381
 ] 

Jian Fang commented on YARN-1680:
-

Thanks Craig for your clarification. Is the cluster level blacklisted node 
called an unhealthy node? I checked Hadoop two code, but only found the cluster 
level blacklist related to the parameters such as 
yarn.nodemanager.health-checker.script.path. Are there any other code paths for 
the cluster level blacklist in hadoop two?

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> --
>
> Key: YARN-1680
> URL: https://issues.apache.org/jira/browse/YARN-1680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0, 2.3.0
> Environment: SuSE 11 SP2 + Hadoop-2.3 
>Reporter: Rohith
>Assignee: Chen He
> Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
> YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155393#comment-14155393
 ] 

Hadoop QA commented on YARN-2628:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12672381/apache-yarn-2628.0.patch
  against trunk revision 1f5b42a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5202//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5202//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5202//console

This message is automatically generated.

> Capacity scheduler with DominantResourceCalculator carries out reservation 
> even though slots are free
> -
>
> Key: YARN-2628
> URL: https://issues.apache.org/jira/browse/YARN-2628
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.5.1
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2628.0.patch
>
>
> We've noticed that if you run the CapacityScheduler with the 
> DominantResourceCalculator, sometimes apps will end up with containers in a 
> reserved state even though free slots are available.
> The root cause seems to be this piece of code from CapacityScheduler.java -
> {noformat}
> // Try to schedule more if there are no reservations to fulfill
> if (node.getReservedContainer() == null) {
>   if (Resources.greaterThanOrEqual(calculator, getClusterResource(),
>   node.getAvailableResource(), minimumAllocation)) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Trying to schedule on node: " + node.getNodeName() +
>   ", available: " + node.getAvailableResource());
> }
> root.assignContainers(clusterResource, node, false);
>   }
> } else {
>   LOG.info("Skipping scheduling since node " + node.getNodeID() + 
>   " is reserved by application " + 
>   
> node.getReservedContainer().getContainerId().getApplicationAttemptId()
>   );
> }
> {noformat}
> The code is meant to check if a node has any slots available for containers . 
> Since it uses the greaterThanOrEqual function, we end up in situation where 
> greaterThanOrEqual returns true, even though we may not have enough CPU or 
> memory to actually run the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2617) NM does not need to send finished container whose APP is not running to RM

2014-10-01 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2617:
--
Attachment: YARN-2617.5.patch

just added one more log statement myself, pending jenkins

> NM does not need to send finished container whose APP is not running to RM
> --
>
> Key: YARN-2617
> URL: https://issues.apache.org/jira/browse/YARN-2617
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, 
> YARN-2617.5.patch, YARN-2617.patch
>
>
> We([~chenchun]) are testing RM work preserving restart and found the 
> following logs when we ran a simple MapReduce task "PI". NM continuously 
> reported completed containers whose Application had already finished while AM 
> had finished. 
> {code}
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> {code}
> In the patch for YARN-1372, ApplicationImpl on NM should guarantee to  clean 
> up already completed applications. But it will only remove appId from  
> 'app.context.getApplications()' when ApplicaitonImpl received evnet 
> 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might 
> receive this event for a long time or could not receive. 
> * For NonAggregatingLogHandler, it wait for 
> YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, 
> then it will be scheduled to delete Application logs and send the event.
> * For LogAggregationService, it might fail(e.g. if user does not have HDFS 
> write permission), and it will not send the event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-10-01 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155399#comment-14155399
 ] 

Tsuyoshi OZAWA commented on YARN-1879:
--

Sorry for the delay and thanks for updating the patch, [~adhoot]. About the 
test failure, it looks not related to the patch. Let me attach the patch which 
includes your comment changes.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, 
> YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, 
> YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free

2014-10-01 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155398#comment-14155398
 ] 

Varun Vasudev commented on YARN-2628:
-

The release audit error is from a hdfs file and unrelated.

> Capacity scheduler with DominantResourceCalculator carries out reservation 
> even though slots are free
> -
>
> Key: YARN-2628
> URL: https://issues.apache.org/jira/browse/YARN-2628
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.5.1
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2628.0.patch
>
>
> We've noticed that if you run the CapacityScheduler with the 
> DominantResourceCalculator, sometimes apps will end up with containers in a 
> reserved state even though free slots are available.
> The root cause seems to be this piece of code from CapacityScheduler.java -
> {noformat}
> // Try to schedule more if there are no reservations to fulfill
> if (node.getReservedContainer() == null) {
>   if (Resources.greaterThanOrEqual(calculator, getClusterResource(),
>   node.getAvailableResource(), minimumAllocation)) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Trying to schedule on node: " + node.getNodeName() +
>   ", available: " + node.getAvailableResource());
> }
> root.assignContainers(clusterResource, node, false);
>   }
> } else {
>   LOG.info("Skipping scheduling since node " + node.getNodeID() + 
>   " is reserved by application " + 
>   
> node.getReservedContainer().getContainerId().getApplicationAttemptId()
>   );
> }
> {noformat}
> The code is meant to check if a node has any slots available for containers . 
> Since it uses the greaterThanOrEqual function, we end up in situation where 
> greaterThanOrEqual returns true, even though we may not have enough CPU or 
> memory to actually run the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-10-01 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155400#comment-14155400
 ] 

Tsuyoshi OZAWA commented on YARN-1879:
--

About the release audit warning, it's also not related.

{quote}
 !? 
/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/hadoop-hdfs-project/hadoop-hdfs/.gitattributes
Lines that start with ? in the release audit report indicate files that do 
not have an Apache license header
{quote}

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, 
> YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, 
> YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-10-01 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155407#comment-14155407
 ] 

Tsuyoshi OZAWA commented on YARN-1879:
--

{quote}
>APIs that added trigger flag.
APIs that added Idempotent/AtOnce annotation?
{quote}

I think ">APIs that are added trigger flag." is correct, so updating it.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, 
> YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, 
> YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-10-01 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1879:
-
Attachment: YARN-1879.17.patch

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.17.patch, YARN-1879.2-wip.patch, YARN-1879.2.patch, 
> YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, 
> YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2312) Marking ContainerId#getId as deprecated

2014-10-01 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2312:
-
Attachment: YARN-2312.2-3.patch

> Marking ContainerId#getId as deprecated
> ---
>
> Key: YARN-2312
> URL: https://issues.apache.org/jira/browse/YARN-2312
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, 
> YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch
>
>
> {{ContainerId#getId}} will only return partial value of containerId, only 
> sequence number of container id without epoch, after YARN-2229. We should 
> mark {{ContainerId#getId}} as deprecated and use 
> {{ContainerId#getContainerId}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2254) change TestRMWebServicesAppsModification to support FairScheduler.

2014-10-01 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155467#comment-14155467
 ] 

Karthik Kambatla commented on YARN-2254:


Patch looks mostly good. One nit: Can we rename ALLOC_FILE to FS_ALLOC_FILE and 
"test-queues.xml" to "test-fs-queues.xml" to clarify the files are used only 
for FairScheduler? 

> change TestRMWebServicesAppsModification to support FairScheduler.
> --
>
> Key: YARN-2254
> URL: https://issues.apache.org/jira/browse/YARN-2254
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
>  Labels: test
> Attachments: YARN-2254.000.patch, YARN-2254.001.patch, 
> YARN-2254.002.patch, YARN-2254.003.patch
>
>
> TestRMWebServicesAppsModification skips the test, if the scheduler is not 
> CapacityScheduler.
> change TestRMWebServicesAppsModification to support both CapacityScheduler 
> and FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2312) Marking ContainerId#getId as deprecated

2014-10-01 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155473#comment-14155473
 ] 

Tsuyoshi OZAWA commented on YARN-2312:
--

I cannot reproduce the findbugs warning. Let me check the reason on Jenkins.

> Marking ContainerId#getId as deprecated
> ---
>
> Key: YARN-2312
> URL: https://issues.apache.org/jira/browse/YARN-2312
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, 
> YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch
>
>
> {{ContainerId#getId}} will only return partial value of containerId, only 
> sequence number of container id without epoch, after YARN-2229. We should 
> mark {{ContainerId#getId}} as deprecated and use 
> {{ContainerId#getContainerId}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155477#comment-14155477
 ] 

Hadoop QA commented on YARN-2617:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672391/YARN-2617.5.patch
  against trunk revision 1f5b42a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager
  org.apache.hadoop.yarn.server.nodemanager.TestEventFlow
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor
  
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5205//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5205//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5205//console

This message is automatically generated.

> NM does not need to send finished container whose APP is not running to RM
> --
>
> Key: YARN-2617
> URL: https://issues.apache.org/jira/browse/YARN-2617
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, 
> YARN-2617.5.patch, YARN-2617.patch
>
>
> We([~chenchun]) are testing RM work preserving restart and found the 
> following logs when we ran a simple MapReduce task "PI". NM continuously 
> reported completed containers whose Application had already finished while AM 
> had finished. 
> {code}
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> {code}
> In the patch for YARN-1372, ApplicationImpl on NM should guarantee to  clean 
> up already completed applications. But it will only remove appId from  
> 'app.context.getApplications()' when ApplicaitonImpl received evnet 
> 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might 
> receive this event for a long time or could not receive. 
> * For NonAggregatingLogHandler, it wait for 
> YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, 
> then it will be scheduled to delete Application logs and send the event.
> * For LogAggregationService, it might fail(e.g. if user does not have HDFS 
> write permission), and it will not send the event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155481#comment-14155481
 ] 

Hadoop QA commented on YARN-1879:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672394/YARN-1879.17.patch
  against trunk revision 875aa79.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5206//console

This message is automatically generated.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.17.patch, YARN-1879.2-wip.patch, YARN-1879.2.patch, 
> YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, 
> YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-01 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-913:

Attachment: YARN-913-016.patch

patch -016: includes registry cli patch (-002) of YARN-2616

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
> YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
> YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
> YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
> YARN-913-016.patch, yarnregistry.pdf, yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2616) Add CLI client to the registry to list/view entries

2014-10-01 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155530#comment-14155530
 ] 

Steve Loughran commented on YARN-2616:
--

the patch I just posted doesn't {{stop()}} the registry service, so will leak a 
curator instance/threads.

> Add CLI client to the registry to list/view entries
> ---
>
> Key: YARN-2616
> URL: https://issues.apache.org/jira/browse/YARN-2616
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Reporter: Steve Loughran
>Assignee: Akshay Radia
> Attachments: yarn-2616-v1.patch, yarn-2616-v2.patch
>
>
> registry needs a CLI interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2616) Add CLI client to the registry to list/view entries

2014-10-01 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2616:
-
Attachment: YARN-2616-003.patch

> Add CLI client to the registry to list/view entries
> ---
>
> Key: YARN-2616
> URL: https://issues.apache.org/jira/browse/YARN-2616
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Reporter: Steve Loughran
>Assignee: Akshay Radia
> Attachments: YARN-2616-003.patch, yarn-2616-v1.patch, 
> yarn-2616-v2.patch
>
>
> registry needs a CLI interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2254) change TestRMWebServicesAppsModification to support FairScheduler.

2014-10-01 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2254:

Attachment: YARN-2254.004.patch

> change TestRMWebServicesAppsModification to support FairScheduler.
> --
>
> Key: YARN-2254
> URL: https://issues.apache.org/jira/browse/YARN-2254
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
>  Labels: test
> Attachments: YARN-2254.000.patch, YARN-2254.001.patch, 
> YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch
>
>
> TestRMWebServicesAppsModification skips the test, if the scheduler is not 
> CapacityScheduler.
> change TestRMWebServicesAppsModification to support both CapacityScheduler 
> and FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2254) change TestRMWebServicesAppsModification to support FairScheduler.

2014-10-01 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1418#comment-1418
 ] 

zhihai xu commented on YARN-2254:
-

Hi [~kasha], Good suggestion, I upload a new patch YARN-2254.004.patch to 
address the comments. 
thanks

> change TestRMWebServicesAppsModification to support FairScheduler.
> --
>
> Key: YARN-2254
> URL: https://issues.apache.org/jira/browse/YARN-2254
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
>  Labels: test
> Attachments: YARN-2254.000.patch, YARN-2254.001.patch, 
> YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch
>
>
> TestRMWebServicesAppsModification skips the test, if the scheduler is not 
> CapacityScheduler.
> change TestRMWebServicesAppsModification to support both CapacityScheduler 
> and FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-10-01 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1879:
-
Attachment: YARN-1879.18.patch

Rebased on trunk.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.2-wip.patch, 
> YARN-1879.2.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, 
> YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155565#comment-14155565
 ] 

Hadoop QA commented on YARN-2630:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672374/YARN-2630.4.patch
  against trunk revision 1f5b42a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5204//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5204//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5204//console

This message is automatically generated.

> TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
> -
>
> Key: YARN-2630
> URL: https://issues.apache.org/jira/browse/YARN-2630
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, 
> YARN-2630.4.patch
>
>
> The problem is that after YARN-1372, in work-preserving AM restart, the 
> re-launched AM will also receive previously failed AM container. But 
> DistributedShell logic is not expecting this extra completed container.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2254) change TestRMWebServicesAppsModification to support FairScheduler.

2014-10-01 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155571#comment-14155571
 ] 

Karthik Kambatla commented on YARN-2254:


+1, pending Jenkins. 

I ll commit this later today. 

> change TestRMWebServicesAppsModification to support FairScheduler.
> --
>
> Key: YARN-2254
> URL: https://issues.apache.org/jira/browse/YARN-2254
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
>  Labels: test
> Attachments: YARN-2254.000.patch, YARN-2254.001.patch, 
> YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch
>
>
> TestRMWebServicesAppsModification skips the test, if the scheduler is not 
> CapacityScheduler.
> change TestRMWebServicesAppsModification to support both CapacityScheduler 
> and FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free

2014-10-01 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155584#comment-14155584
 ] 

Jian He commented on YARN-2628:
---

looks good, one minor comment in the test case:
- the following assertion depends on timing, as the allocation happens 
asynchronously, it might fail. could you use a loop to check if the container 
is allocated, otherwise timeout.
{code}
Thread.sleep(1000);
allocResponse = am1.schedule();
Assert.assertEquals(1, allocResponse.getAllocatedContainers().size());
{code}

> Capacity scheduler with DominantResourceCalculator carries out reservation 
> even though slots are free
> -
>
> Key: YARN-2628
> URL: https://issues.apache.org/jira/browse/YARN-2628
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.5.1
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2628.0.patch
>
>
> We've noticed that if you run the CapacityScheduler with the 
> DominantResourceCalculator, sometimes apps will end up with containers in a 
> reserved state even though free slots are available.
> The root cause seems to be this piece of code from CapacityScheduler.java -
> {noformat}
> // Try to schedule more if there are no reservations to fulfill
> if (node.getReservedContainer() == null) {
>   if (Resources.greaterThanOrEqual(calculator, getClusterResource(),
>   node.getAvailableResource(), minimumAllocation)) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Trying to schedule on node: " + node.getNodeName() +
>   ", available: " + node.getAvailableResource());
> }
> root.assignContainers(clusterResource, node, false);
>   }
> } else {
>   LOG.info("Skipping scheduling since node " + node.getNodeID() + 
>   " is reserved by application " + 
>   
> node.getReservedContainer().getContainerId().getApplicationAttemptId()
>   );
> }
> {noformat}
> The code is meant to check if a node has any slots available for containers . 
> Since it uses the greaterThanOrEqual function, we end up in situation where 
> greaterThanOrEqual returns true, even though we may not have enough CPU or 
> memory to actually run the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2616) Add CLI client to the registry to list/view entries

2014-10-01 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155624#comment-14155624
 ] 

Steve Loughran commented on YARN-2616:
--

features of 003 patch
# registry instance created via factory
# uses configuration instance built up on command line (though it is also 
creating a {{YarnConfiguration()}} around that.
# pulls out all exception-to-error-text mapping to single method
# covered the current set of errors
# and also log @ debug if enabled.


> Add CLI client to the registry to list/view entries
> ---
>
> Key: YARN-2616
> URL: https://issues.apache.org/jira/browse/YARN-2616
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Reporter: Steve Loughran
>Assignee: Akshay Radia
> Attachments: YARN-2616-003.patch, yarn-2616-v1.patch, 
> yarn-2616-v2.patch
>
>
> registry needs a CLI interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs

2014-10-01 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155638#comment-14155638
 ] 

Joep Rottinghuis commented on YARN-1414:


@sandyr could we get some love on this jira ? We're essentially running with a 
forked Fairscheduler and would like to reduce tech-debt each time we uprev to a 
newer version.

> with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
> -
>
> Key: YARN-1414
> URL: https://issues.apache.org/jira/browse/YARN-1414
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Assignee: Siqi Li
> Fix For: 2.2.0
>
> Attachments: YARN-1221-subtask.v1.patch.txt, YARN-1221-v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-01 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155643#comment-14155643
 ] 

Zhijie Shen commented on YARN-2583:
---

Per discussion offline:

1. In AggregatedLogDeletionService of JHS, we delete the log files of completed 
app, and in AppLogAggregatorImpl of NM, we delete the log files of the running 
LRS. We need to add a test case to verify AggregatedLogDeletionService won't 
delete the running LRS logs. 

2. We apply the same retention policy at both sides, using the time to 
determine what log files need to be deleted.

3. For scalability consideration, let's keep the criteria of the number of logs 
per app, in case the rolling interval is small and too many configuration files 
are generated. But let's keep the config private to AppLogAggregatorImpl.

> Modify the LogDeletionService to support Log aggregation for LRS
> 
>
> Key: YARN-2583
> URL: https://issues.apache.org/jira/browse/YARN-2583
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2583.1.patch
>
>
> Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
> will check the cut-off-time, if all logs for this application is older than 
> this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
> work for LRS. We expect a LRS application can keep running for a long time. 
> Two different scenarios: 
> 1) If we configured the rollingIntervalSeconds, the new log file will be 
> always uploaded to HDFS. The number of log files for this application will 
> become larger and larger. And there is no log files will be deleted.
> 2) If we did not configure the rollingIntervalSeconds, the log file can only 
> be uploaded to HDFS after the application is finished. It is very possible 
> that the logs are uploaded after the cut-off-time. It will cause problem 
> because at that time the app-log-dir for this application in HDFS has been 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155649#comment-14155649
 ] 

Hadoop QA commented on YARN-1414:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12632578/YARN-1221-v2.patch
  against trunk revision dd1b8f2.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5211//console

This message is automatically generated.

> with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
> -
>
> Key: YARN-1414
> URL: https://issues.apache.org/jira/browse/YARN-1414
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Assignee: Siqi Li
> Fix For: 2.2.0
>
> Attachments: YARN-1221-subtask.v1.patch.txt, YARN-1221-v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2637) maximum-am-resource-percent will be violated when resource of AM is > minimumAllocation

2014-10-01 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2637:


 Summary: maximum-am-resource-percent will be violated when 
resource of AM is > minimumAllocation
 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Priority: Critical


Currently, number of AM in leaf queue will be calculated in following way:
{code}
max_am_resource = queue_max_capacity * maximum_am_resource_percent
#max_am_number = max_am_resource / minimum_allocation
#max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
{code}
And when submit new application to RM, it will check if an app can be activated 
in following way:
{code}
for (Iterator i=pendingApplications.iterator(); 
 i.hasNext(); ) {
  FiCaSchedulerApp application = i.next();
  
  // Check queue limit
  if (getNumActiveApplications() >= getMaximumActiveApplications()) {
break;
  }
  
  // Check user limit
  User user = getUser(application.getUser());
  if (user.getActiveApplications() < getMaximumActiveApplicationsPerUser()) 
{
user.activateApplication();
activeApplications.add(application);
i.remove();
LOG.info("Application " + application.getApplicationId() +
" from user: " + application.getUser() + 
" activated in queue: " + getQueueName());
  }
}
{code}

An example is,
If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
launched is 200, and if user uses 5M for each AM (> minimum_allocation). All 
apps can still be activated, and it will occupy all resource of a queue instead 
of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation

2014-10-01 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2637:
-
Summary: maximum-am-resource-percent could be violated when resource of AM 
is > minimumAllocation  (was: maximum-am-resource-percent will be violated when 
resource of AM is > minimumAllocation)

> maximum-am-resource-percent could be violated when resource of AM is > 
> minimumAllocation
> 
>
> Key: YARN-2637
> URL: https://issues.apache.org/jira/browse/YARN-2637
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Wangda Tan
>Priority: Critical
>
> Currently, number of AM in leaf queue will be calculated in following way:
> {code}
> max_am_resource = queue_max_capacity * maximum_am_resource_percent
> #max_am_number = max_am_resource / minimum_allocation
> #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
> {code}
> And when submit new application to RM, it will check if an app can be 
> activated in following way:
> {code}
> for (Iterator i=pendingApplications.iterator(); 
>  i.hasNext(); ) {
>   FiCaSchedulerApp application = i.next();
>   
>   // Check queue limit
>   if (getNumActiveApplications() >= getMaximumActiveApplications()) {
> break;
>   }
>   
>   // Check user limit
>   User user = getUser(application.getUser());
>   if (user.getActiveApplications() < 
> getMaximumActiveApplicationsPerUser()) {
> user.activateApplication();
> activeApplications.add(application);
> i.remove();
> LOG.info("Application " + application.getApplicationId() +
> " from user: " + application.getUser() + 
> " activated in queue: " + getQueueName());
>   }
> }
> {code}
> An example is,
> If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
> resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
> launched is 200, and if user uses 5M for each AM (> minimum_allocation). All 
> apps can still be activated, and it will occupy all resource of a queue 
> instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155664#comment-14155664
 ] 

Hadoop QA commented on YARN-913:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672406/YARN-913-016.patch
  against trunk revision 875aa79.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 36 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1266 javac 
compiler warnings (more than the trunk's current 1265 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

  
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell

  The test build failed in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5208//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5208//artifact/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5208//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-registry.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5208//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5208//console

This message is automatically generated.

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
> YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
> YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
> YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
> YARN-913-016.patch, yarnregistry.pdf, yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2254) change TestRMWebServicesAppsModification to support FairScheduler.

2014-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155669#comment-14155669
 ] 

Hadoop QA commented on YARN-2254:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672416/YARN-2254.004.patch
  against trunk revision 875aa79.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5209//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5209//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5209//console

This message is automatically generated.

> change TestRMWebServicesAppsModification to support FairScheduler.
> --
>
> Key: YARN-2254
> URL: https://issues.apache.org/jira/browse/YARN-2254
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
>  Labels: test
> Attachments: YARN-2254.000.patch, YARN-2254.001.patch, 
> YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch
>
>
> TestRMWebServicesAppsModification skips the test, if the scheduler is not 
> CapacityScheduler.
> change TestRMWebServicesAppsModification to support both CapacityScheduler 
> and FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >