[jira] [Commented] (YARN-2409) Active to StandBy transition does not stop rmDispatcher that causes 1 AsyncDispatcher thread leak.

2014-08-19 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103382#comment-14103382
 ] 

Rohith commented on YARN-2409:
--

Thanks [~eepayne] and [~jianhe] for review.:-)

> Active to StandBy transition does not stop rmDispatcher that causes 1 
> AsyncDispatcher thread leak. 
> ---
>
> Key: YARN-2409
> URL: https://issues.apache.org/jira/browse/YARN-2409
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: Nishan Shetty
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.6.0
>
> Attachments: YARN-2409.patch
>
>
> {code}
>   at java.lang.Thread.run(Thread.java:662)
> 2014-08-12 07:03:00,839 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> STATUS_UPDATE at LAUNCHED
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>   at java.lang.Thread.run(Thread.java:662)
> 2014-08-12 07:03:00,839 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> CONTAINER_ALLOCATED at LAUNCHED
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>   at java.lang.Thread.run(Thread.java:662)
> 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.ya
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2174) Enabling HTTPs for the writer REST API of TimelineServer

2014-08-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103365#comment-14103365
 ] 

Hadoop QA commented on YARN-2174:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12662820/YARN-2174.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4673//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4673//console

This message is automatically generated.

> Enabling HTTPs for the writer REST API of TimelineServer
> 
>
> Key: YARN-2174
> URL: https://issues.apache.org/jira/browse/YARN-2174
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2174.1.patch, YARN-2174.2.patch, YARN-2174.3.patch
>
>
> Since we'd like to allow the application to put the timeline data at the 
> client, the AM and even the containers, we need to provide the way to 
> distribute the keystore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2179) Initial cache manager structure and context

2014-08-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103352#comment-14103352
 ] 

Hadoop QA commented on YARN-2179:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12662898/YARN-2179-trunk-v3.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4672//console

This message is automatically generated.

> Initial cache manager structure and context
> ---
>
> Key: YARN-2179
> URL: https://issues.apache.org/jira/browse/YARN-2179
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2179-trunk-v1.patch, YARN-2179-trunk-v2.patch, 
> YARN-2179-trunk-v3.patch
>
>
> Implement the initial shared cache manager structure and context. The 
> SCMContext will be used by a number of manager services (i.e. the backing 
> store and the cleaner service). The AppChecker is used to gather the 
> currently running applications on SCM startup (necessary for an scm that is 
> backed by an in-memory store).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2383) Add ability to renew ClientToAMToken

2014-08-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103351#comment-14103351
 ] 

Hadoop QA commented on YARN-2383:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12662875/YARN-2383.preview.3.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 15 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4671//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4671//console

This message is automatically generated.

> Add ability to renew ClientToAMToken
> 
>
> Key: YARN-2383
> URL: https://issues.apache.org/jira/browse/YARN-2383
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2383.preview.1.patch, YARN-2383.preview.2.patch, 
> YARN-2383.preview.3.1.patch, YARN-2383.preview.3.2.patch, 
> YARN-2383.preview.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2174) Enabling HTTPs for the writer REST API of TimelineServer

2014-08-19 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103323#comment-14103323
 ] 

Varun Vasudev commented on YARN-2174:
-

+1 for the latest patch.

> Enabling HTTPs for the writer REST API of TimelineServer
> 
>
> Key: YARN-2174
> URL: https://issues.apache.org/jira/browse/YARN-2174
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2174.1.patch, YARN-2174.2.patch, YARN-2174.3.patch
>
>
> Since we'd like to allow the application to put the timeline data at the 
> client, the AM and even the containers, we need to provide the way to 
> distribute the keystore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2345) yarn rmadmin -report

2014-08-19 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103237#comment-14103237
 ] 

Tsuyoshi OZAWA commented on YARN-2345:
--

[~haogao], yes, it's all yours.

> yarn rmadmin -report
> 
>
> Key: YARN-2345
> URL: https://issues.apache.org/jira/browse/YARN-2345
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Allen Wittenauer
>  Labels: newbie
>
> It would be good to have an equivalent of hdfs dfsadmin -report in YARN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-19 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103151#comment-14103151
 ] 

Ravi Prakash commented on YARN-2424:


Hi Tucu! It is a perfectly valid production configuration that is _helpful_ for 
troubleshooting. It is not for troubleshooting ONLY. In fact, I think it is a 
good idea for several reasons. IMHO we shouldn't dictate configurations that 
Hadoop should be run in without good reason. I am sorry I do not understand why 
you have reservations about this patch. We've already determined that when 
properly configured the security is the same. To prevent misconfiguration we've 
set defaults to secure. This used to be a valid hadoop configuration prior to 
2.3.


> LCE should support non-cgroups, non-secure mode
> ---
>
> Key: YARN-2424
> URL: https://issues.apache.org/jira/browse/YARN-2424
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
>Reporter: Allen Wittenauer
>Priority: Blocker
> Attachments: YARN-2424.patch
>
>
> After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
> This is a fairly serious regression, as turning on LCE prior to turning on 
> full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-08-19 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-1492:
---

Attachment: YARN-1492-all-trunk-v2.patch

Rebase of the all-inclusive patch. This v2 patch also contains a few fixes as 
well.

> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Chris Trezzo
> Attachments: YARN-1492-all-trunk-v1.patch, 
> YARN-1492-all-trunk-v2.patch, shared_cache_design.pdf, 
> shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, 
> shared_cache_design_v4.pdf, shared_cache_design_v5.pdf
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2409) Active to StandBy transition does not stop rmDispatcher that causes 1 AsyncDispatcher thread leak.

2014-08-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103082#comment-14103082
 ] 

Hudson commented on YARN-2409:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1868 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1868/])
YARN-2409. RM ActiveToStandBy transition missing stoping previous rmDispatcher. 
Contributed by Rohith (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618915)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java


> Active to StandBy transition does not stop rmDispatcher that causes 1 
> AsyncDispatcher thread leak. 
> ---
>
> Key: YARN-2409
> URL: https://issues.apache.org/jira/browse/YARN-2409
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: Nishan Shetty
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.6.0
>
> Attachments: YARN-2409.patch
>
>
> {code}
>   at java.lang.Thread.run(Thread.java:662)
> 2014-08-12 07:03:00,839 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> STATUS_UPDATE at LAUNCHED
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>   at java.lang.Thread.run(Thread.java:662)
> 2014-08-12 07:03:00,839 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> CONTAINER_ALLOCATED at LAUNCHED
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>   at java.lang.Thread.run(Thread.java:662)
> 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.ya
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2249) AM release request may be lost on RM restart

2014-08-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103086#comment-14103086
 ] 

Hudson commented on YARN-2249:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1868 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1868/])
YARN-2249. Avoided AM release requests being lost on work preserving RM 
restart. Contributed by Jian He. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618972)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java


> AM release request may be lost on RM restart
> 
>
> Key: YARN-2249
> URL: https://issues.apache.org/jira/browse/YARN-2249
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.6.0
>
> Attachments: YARN-2249.1.patch, YARN-2249.1.patch, YARN-2249.2.patch, 
> YARN-2249.2.patch, YARN-2249.3.patch, YARN-2249.4.patch, YARN-2249.5.patch
>
>
> AM resync on RM restart will send outstanding container release requests back 
> to the new RM. In the meantime, NMs report the container statuses back to RM 
> to recover the containers. If RM receives the container release request  
> before the container is actually recovered in scheduler, the container won't 
> be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2236) Shared Cache uploader service on the Node Manager

2014-08-19 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2236:
---

Attachment: YARN-2236-trunk-v3.patch

Rebase, with a slight modification to work with the updated recovered container 
code.

> Shared Cache uploader service on the Node Manager
> -
>
> Key: YARN-2236
> URL: https://issues.apache.org/jira/browse/YARN-2236
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2236-trunk-v1.patch, YARN-2236-trunk-v2.patch, 
> YARN-2236-trunk-v3.patch
>
>
> Implement the shared cache uploader service on the node manager.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2249) AM release request may be lost on RM restart

2014-08-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103068#comment-14103068
 ] 

Hudson commented on YARN-2249:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6088 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6088/])
YARN-2249. Avoided AM release requests being lost on work preserving RM 
restart. Contributed by Jian He. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618972)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java


> AM release request may be lost on RM restart
> 
>
> Key: YARN-2249
> URL: https://issues.apache.org/jira/browse/YARN-2249
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.6.0
>
> Attachments: YARN-2249.1.patch, YARN-2249.1.patch, YARN-2249.2.patch, 
> YARN-2249.2.patch, YARN-2249.3.patch, YARN-2249.4.patch, YARN-2249.5.patch
>
>
> AM resync on RM restart will send outstanding container release requests back 
> to the new RM. In the meantime, NMs report the container statuses back to RM 
> to recover the containers. If RM receives the container release request  
> before the container is actually recovered in scheduler, the container won't 
> be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2409) Active to StandBy transition does not stop rmDispatcher that causes 1 AsyncDispatcher thread leak.

2014-08-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103070#comment-14103070
 ] 

Hudson commented on YARN-2409:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6088 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6088/])
YARN-2409. RM ActiveToStandBy transition missing stoping previous rmDispatcher. 
Contributed by Rohith (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618915)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java


> Active to StandBy transition does not stop rmDispatcher that causes 1 
> AsyncDispatcher thread leak. 
> ---
>
> Key: YARN-2409
> URL: https://issues.apache.org/jira/browse/YARN-2409
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: Nishan Shetty
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.6.0
>
> Attachments: YARN-2409.patch
>
>
> {code}
>   at java.lang.Thread.run(Thread.java:662)
> 2014-08-12 07:03:00,839 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> STATUS_UPDATE at LAUNCHED
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>   at java.lang.Thread.run(Thread.java:662)
> 2014-08-12 07:03:00,839 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> CONTAINER_ALLOCATED at LAUNCHED
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>   at java.lang.Thread.run(Thread.java:662)
> 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.ya
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-19 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103063#comment-14103063
 ] 

Alejandro Abdelnur commented on YARN-2424:
--

You are saying this is proactive troubleshooting then, not meant for 
production? If so, then, as I said before:

* the property has 'use-only-for-troubleshooting' in its name.
* the NM logs print a WARN at startup and on every started container stating 
the flag and its un-secure nature
* the container stdout/stderr also print a WARN to alert the user of the 
cluster setup.

> LCE should support non-cgroups, non-secure mode
> ---
>
> Key: YARN-2424
> URL: https://issues.apache.org/jira/browse/YARN-2424
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
>Reporter: Allen Wittenauer
>Priority: Blocker
> Attachments: YARN-2424.patch
>
>
> After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
> This is a fairly serious regression, as turning on LCE prior to turning on 
> full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2203) Web UI for cache manager

2014-08-19 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2203:
---

Attachment: YARN-2203-trunk-v2.patch

Rebase.

> Web UI for cache manager
> 
>
> Key: YARN-2203
> URL: https://issues.apache.org/jira/browse/YARN-2203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2203-trunk-v1.patch, YARN-2203-trunk-v2.patch
>
>
> Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2189) Admin service for cache manager

2014-08-19 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2189:
---

Attachment: YARN-2189-trunk-v2.patch

Rebase.

> Admin service for cache manager
> ---
>
> Key: YARN-2189
> URL: https://issues.apache.org/jira/browse/YARN-2189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch
>
>
> Implement the admin service for the shared cache manager. This service is 
> responsible for handling administrative commands such as manually running a 
> cleaner task.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2188) Client service for cache manager

2014-08-19 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2188:
---

Attachment: YARN-2188-trunk-v2.patch

Rebase.

> Client service for cache manager
> 
>
> Key: YARN-2188
> URL: https://issues.apache.org/jira/browse/YARN-2188
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2188-trunk-v1.patch, YARN-2188-trunk-v2.patch
>
>
> Implement the client service for the shared cache manager. This service is 
> responsible for handling client requests to use and release resources.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-19 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103046#comment-14103046
 ] 

Ravi Prakash commented on YARN-2424:


Hi Tucu! The intention was for this to be a useful thing to do as an 
intermediate step when migrating from an insecure cluster to a Kerberized 
cluster. This would let people test their provisioning of unix users without 
having to deal with Kerberos issues.
Could you please answer my question?
bq. So if we enforced the use of several least privileged users (instead of 
only 1), is that not just as secure?

> LCE should support non-cgroups, non-secure mode
> ---
>
> Key: YARN-2424
> URL: https://issues.apache.org/jira/browse/YARN-2424
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
>Reporter: Allen Wittenauer
>Priority: Blocker
> Attachments: YARN-2424.patch
>
>
> After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
> This is a fairly serious regression, as turning on LCE prior to turning on 
> full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2186) Node Manager uploader service for cache manager

2014-08-19 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2186:
---

Attachment: YARN-2186-trunk-v2.patch

Rebase.

> Node Manager uploader service for cache manager
> ---
>
> Key: YARN-2186
> URL: https://issues.apache.org/jira/browse/YARN-2186
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2186-trunk-v1.patch, YARN-2186-trunk-v2.patch
>
>
> Implement the node manager uploader service for the cache manager. This 
> service is responsible for communicating with the node manager when it 
> uploads resources to the shared cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2183) Cleaner service for cache manager

2014-08-19 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2183:
---

Attachment: YARN-2183-trunk-v2.patch

Rebase.

> Cleaner service for cache manager
> -
>
> Key: YARN-2183
> URL: https://issues.apache.org/jira/browse/YARN-2183
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2183-trunk-v1.patch, YARN-2183-trunk-v2.patch
>
>
> Implement the cleaner service for the cache manager along with metrics for 
> the service. This service is responsible for cleaning up old resource 
> references in the manager and removing stale entries from the cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2180) In-memory backing store for cache manager

2014-08-19 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2180:
---

Attachment: YARN-2180-trunk-v2.patch

Rebase.

> In-memory backing store for cache manager
> -
>
> Key: YARN-2180
> URL: https://issues.apache.org/jira/browse/YARN-2180
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2180-trunk-v1.patch, YARN-2180-trunk-v2.patch
>
>
> Implement an in-memory backing store for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2179) Initial cache manager structure and context

2014-08-19 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2179:
---

Attachment: YARN-2179-trunk-v3.patch

Rebase.

> Initial cache manager structure and context
> ---
>
> Key: YARN-2179
> URL: https://issues.apache.org/jira/browse/YARN-2179
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2179-trunk-v1.patch, YARN-2179-trunk-v2.patch, 
> YARN-2179-trunk-v3.patch
>
>
> Implement the initial shared cache manager structure and context. The 
> SCMContext will be used by a number of manager services (i.e. the backing 
> store and the cleaner service). The AppChecker is used to gather the 
> currently running applications on SCM startup (necessary for an scm that is 
> backed by an in-memory store).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2383) Add ability to renew ClientToAMToken

2014-08-19 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2383:


Attachment: YARN-2383.preview.3.2.patch

rebase the patch

> Add ability to renew ClientToAMToken
> 
>
> Key: YARN-2383
> URL: https://issues.apache.org/jira/browse/YARN-2383
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2383.preview.1.patch, YARN-2383.preview.2.patch, 
> YARN-2383.preview.3.1.patch, YARN-2383.preview.3.2.patch, 
> YARN-2383.preview.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2345) yarn rmadmin -report

2014-08-19 Thread Hao Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102944#comment-14102944
 ] 

Hao Gao commented on YARN-2345:
---

Can i try this issue as it is labeled 'newbie' ?

> yarn rmadmin -report
> 
>
> Key: YARN-2345
> URL: https://issues.apache.org/jira/browse/YARN-2345
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Allen Wittenauer
>  Labels: newbie
>
> It would be good to have an equivalent of hdfs dfsadmin -report in YARN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-19 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102941#comment-14102941
 ] 

Alejandro Abdelnur commented on YARN-2424:
--

Having more than one 'least privileged' user does not bring you any benefit as 
they can always step on each other by faking the username at job submission.


> LCE should support non-cgroups, non-secure mode
> ---
>
> Key: YARN-2424
> URL: https://issues.apache.org/jira/browse/YARN-2424
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
>Reporter: Allen Wittenauer
>Priority: Blocker
> Attachments: YARN-2424.patch
>
>
> After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
> This is a fairly serious regression, as turning on LCE prior to turning on 
> full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2345) yarn rmadmin -report

2014-08-19 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102920#comment-14102920
 ] 

Allen Wittenauer commented on YARN-2345:


Essentially, yes.   For reference, here's the dfsadmin -report for a single 
node setup:

{code}
Safe mode is ON
Configured Capacity: 402781270016 (375.12 GB)
Present Capacity: 394000707584 (366.94 GB)
DFS Remaining: 394000691200 (366.94 GB)
DFS Used: 16384 (16 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-
Live datanodes (1):

Name: 10.248.2.155:50010 (10.248.2.155)
Hostname: 10.248.2.155
Decommission Status : Normal
Configured Capacity: 402781270016 (375.12 GB)
DFS Used: 16384 (16 KB)
Non DFS Used: 8780562432 (8.18 GB)
DFS Remaining: 394000691200 (366.94 GB)
DFS Used%: 0.00%
DFS Remaining%: 97.82%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Tue Aug 19 14:52:45 PDT 2014
{code}

There's obviously some details missing from this simple setup (rack topology!). 
An analog should report similar information as to what is available from the RM 
UI.

> yarn rmadmin -report
> 
>
> Key: YARN-2345
> URL: https://issues.apache.org/jira/browse/YARN-2345
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Allen Wittenauer
>  Labels: newbie
>
> It would be good to have an equivalent of hdfs dfsadmin -report in YARN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2345) yarn rmadmin -report

2014-08-19 Thread Hao Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102884#comment-14102884
 ] 

Hao Gao commented on YARN-2345:
---

So yarn rmadmin -report will return the information about the NodeManager?

> yarn rmadmin -report
> 
>
> Key: YARN-2345
> URL: https://issues.apache.org/jira/browse/YARN-2345
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Allen Wittenauer
>  Labels: newbie
>
> It would be good to have an equivalent of hdfs dfsadmin -report in YARN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-19 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102873#comment-14102873
 ] 

Ravi Prakash commented on YARN-2424:


Hi Tucu! I'd brought it up only because in the earlier comment you'd said
bq. Ravi, all the config in the container-executor.cfg is EXCLUSIVELY for 
enforcing constraints on the process to be launched, it does not restrict a 
launched JVM process from doing a System.setProperty("user.name", "ANY") to 
gain access to +*HDFS*+ as user ANY (if Kerberos is ON, setting 'user.name' 
property has no effect).
I'm glad we agree that YARN-1253 wasn't about protecting HDFS or YARN.

bq. it is about protecting the node at OS level by enforcing the use of a least 
privileged user.
So if we enforced the use of several least privileged users (instead of only 
1), is that not just as secure? Would you argue that with the proper use of 
blacklists and whitelists this cannot be achieved?

> LCE should support non-cgroups, non-secure mode
> ---
>
> Key: YARN-2424
> URL: https://issues.apache.org/jira/browse/YARN-2424
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
>Reporter: Allen Wittenauer
>Priority: Blocker
> Attachments: YARN-2424.patch
>
>
> After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
> This is a fairly serious regression, as turning on LCE prior to turning on 
> full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2384) Document YARN multihoming settings

2014-08-19 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102807#comment-14102807
 ] 

Arpit Agarwal commented on YARN-2384:
-

Thanks for the contribution [~kj-ki]. Perhaps instead of three different docs 
and duplicated content we can have a single doc with HDFS, YARN and MR settings 
and have it under common. We can remove the content from the current HDFS doc 
and leave a forwarding link to the common doc. What do you think?

[~cwelch], would you be able to verify the yarn/mr content for accuracy?

> Document YARN multihoming settings
> --
>
> Key: YARN-2384
> URL: https://issues.apache.org/jira/browse/YARN-2384
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.6.0
>Reporter: Arpit Agarwal
> Attachments: YARN-2384.patch
>
>
> YARN-1994 introduced new settings to improve multihoming support in YARN/MR. 
> This Jira is to get the settings documented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-19 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102794#comment-14102794
 ] 

Alejandro Abdelnur commented on YARN-2424:
--

Repeating myself from a previous comment: "YARN-1253 is not about protecting 
HDFS or YARN, it is about protecting the node at OS level by enforcing the use 
of a least privileged user."


> LCE should support non-cgroups, non-secure mode
> ---
>
> Key: YARN-2424
> URL: https://issues.apache.org/jira/browse/YARN-2424
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
>Reporter: Allen Wittenauer
>Priority: Blocker
> Attachments: YARN-2424.patch
>
>
> After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
> This is a fairly serious regression, as turning on LCE prior to turning on 
> full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2249) AM release request may be lost on RM restart

2014-08-19 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102773#comment-14102773
 ] 

Zhijie Shen commented on YARN-2249:
---

+1 for the latest patch. Will commit it.

> AM release request may be lost on RM restart
> 
>
> Key: YARN-2249
> URL: https://issues.apache.org/jira/browse/YARN-2249
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2249.1.patch, YARN-2249.1.patch, YARN-2249.2.patch, 
> YARN-2249.2.patch, YARN-2249.3.patch, YARN-2249.4.patch, YARN-2249.5.patch
>
>
> AM resync on RM restart will send outstanding container release requests back 
> to the new RM. In the meantime, NMs report the container statuses back to RM 
> to recover the containers. If RM receives the container release request  
> before the container is actually recovered in scheduler, the container won't 
> be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2174) Enabling HTTPs for the writer REST API of TimelineServer

2014-08-19 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2174:
--

Attachment: YARN-2174.3.patch

[~vvasudev], thanks for your comments. I updated the patch accordingly. In the 
response, I check whether the communication is really via HTTPS or not.

> Enabling HTTPs for the writer REST API of TimelineServer
> 
>
> Key: YARN-2174
> URL: https://issues.apache.org/jira/browse/YARN-2174
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2174.1.patch, YARN-2174.2.patch, YARN-2174.3.patch
>
>
> Since we'd like to allow the application to put the timeline data at the 
> client, the AM and even the containers, we need to provide the way to 
> distribute the keystore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-19 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102684#comment-14102684
 ] 

Ravi Prakash commented on YARN-2424:


Thanks Tucu! In YARN-1253, Vinod wrote:
bq. Even if the jobs run as a single 'yarnuser', security isn't still there - 
like Arun said, any body can bomb HDFS directories of other users, any user can 
kill any other user's tasks/containers, any one can delete any one else's local 
dirs, log-dir and so on
Is that not true?


> LCE should support non-cgroups, non-secure mode
> ---
>
> Key: YARN-2424
> URL: https://issues.apache.org/jira/browse/YARN-2424
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
>Reporter: Allen Wittenauer
>Priority: Blocker
> Attachments: YARN-2424.patch
>
>
> After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
> This is a fairly serious regression, as turning on LCE prior to turning on 
> full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-19 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102629#comment-14102629
 ] 

Alejandro Abdelnur commented on YARN-2424:
--

Ravi, all the config in the container-executor.cfg is EXCLUSIVELY for enforcing 
constraints on the process to be launched, it does not restrict a launched JVM 
process from doing a {{System.setProperty("user.name", "ANY")}} to gain access 
to HDFS as user ANY (if Kerberos is ON, setting 'user.name' property has no 
effect).

BTW, I'm not OK with making this a valid configuration, it is not.

> LCE should support non-cgroups, non-secure mode
> ---
>
> Key: YARN-2424
> URL: https://issues.apache.org/jira/browse/YARN-2424
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
>Reporter: Allen Wittenauer
>Priority: Blocker
> Attachments: YARN-2424.patch
>
>
> After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
> This is a fairly serious regression, as turning on LCE prior to turning on 
> full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2431) NM restart: cgroup is not removed for reacquired containers

2014-08-19 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-2431:


 Summary: NM restart: cgroup is not removed for reacquired 
containers
 Key: YARN-2431
 URL: https://issues.apache.org/jira/browse/YARN-2431
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe


The cgroup for a reacquired container is not being removed when the container 
exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-19 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102554#comment-14102554
 ] 

Ravi Prakash commented on YARN-2424:


Hi Tucu!
bq. If security is OFF, I can simply submit a job as ANY user by simply doing 
-Duser.name=ANY. User ANY will be the one used by YARN and HDFS
Is this true? Are you suggesting that the blacklist (banner.users) in 
container-executor.cfg does not work? Could you not blacklist root, hdfs, 
mapred and yarn?

We are not doing this for security. We understand that +*with the right 
configuration*+, the level of security you provide is exactly the same as you 
would have in an unsecure cluster. If only the users of the cluster are 
whitelisted and all other users like root / mapred / yarn / hdfs are 
blacklisted, and the users which are whitelisted don't enjoy any elevated 
privilidges on the slave nodes. This is a perfectly valid configuration with 
the same level of security as would be provided if all yarn tasks ran one user.

Could you please point out a technical concern with the security in this 
configuration? This would not be a configuration for "troubleshooting only". 
This would be a perfectly valid configuration.

> LCE should support non-cgroups, non-secure mode
> ---
>
> Key: YARN-2424
> URL: https://issues.apache.org/jira/browse/YARN-2424
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
>Reporter: Allen Wittenauer
>Priority: Blocker
> Attachments: YARN-2424.patch
>
>
> After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
> This is a fairly serious regression, as turning on LCE prior to turning on 
> full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-19 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102538#comment-14102538
 ] 

Alejandro Abdelnur commented on YARN-2424:
--

[~aw], I think you are missing the point.

I know that in an un-secure cluster you can fake the user name to interact with 
HDFS or YARN from anywhere at anytime. 

YARN-1253 is not about protecting HDFS or YARN, it is about protecting the node 
at OS level by enforcing the use of a least privileged user.

> LCE should support non-cgroups, non-secure mode
> ---
>
> Key: YARN-2424
> URL: https://issues.apache.org/jira/browse/YARN-2424
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
>Reporter: Allen Wittenauer
>Priority: Blocker
> Attachments: YARN-2424.patch
>
>
> After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
> This is a fairly serious regression, as turning on LCE prior to turning on 
> full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2174) Enabling HTTPs for the writer REST API of TimelineServer

2014-08-19 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102528#comment-14102528
 ] 

Varun Vasudev commented on YARN-2174:
-

[~zjshen] is it possible to explicitly assert in the tests that the entities 
were posted using https? If there is some wrong configuration, the configurator 
silently falls back to http and the test will still pass. The reason I bring 
this up is that I saw a similar issue with webhdfs today.

{noformat}
2014-08-19 09:54:51,398 DEBUG web.URLConnectionFactory 
(URLConnectionFactory.java:newDefaultURLConnectionFactory(86)) - Cannot load 
customized ssl related configuration. Fallback to system-generic settings.
java.io.FileNotFoundException: /etc/security/clientKeys/all.jks (No such file 
or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:146)
at 
org.apache.hadoop.security.ssl.ReloadingX509TrustManager.loadTrustManager(ReloadingX509TrustManager.java:164)
at 
org.apache.hadoop.security.ssl.ReloadingX509TrustManager.(ReloadingX509TrustManager.java:81)
at 
org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory.init(FileBasedKeyStoresFactory.java:207)
at org.apache.hadoop.security.ssl.SSLFactory.init(SSLFactory.java:121)
at 
org.apache.hadoop.hdfs.web.URLConnectionFactory.newSslConnConfigurator(URLConnectionFactory.java:109)
at 
org.apache.hadoop.hdfs.web.URLConnectionFactory.newDefaultURLConnectionFactory(URLConnectionFactory.java:84)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.initialize(WebHdfsFileSystem.java:149)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at 
org.apache.hadoop.hdfs.web.TokenAspect$TokenManager.getInstance(TokenAspect.java:86)
at 
org.apache.hadoop.hdfs.web.TokenAspect$TokenManager.renew(TokenAspect.java:71)
at org.apache.hadoop.security.token.Token.renew(Token.java:377)
at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:478)
at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:475)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:474)
at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:392)
at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$500(DelegationTokenRenewer.java:70)
at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:658)
at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:639)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}

Note that the log is a debug log. In a production scenario, you'll never know. 
Just want to make sure that we don't end up testing the http workflow because 
of a misconfiguration.

> Enabling HTTPs for the writer REST API of TimelineServer
> 
>
> Key: YARN-2174
> URL: https://issues.apache.org/jira/browse/YARN-2174
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2174.1.patch, YARN-2174.2.patch
>
>
> Since we'd like to allow the application to put the timeline data at the 
> client, the AM and even the containers, we need to provide the way to 
> distribute the keystore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-19 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102482#comment-14102482
 ] 

Allen Wittenauer edited comment on YARN-2424 at 8/19/14 5:13 PM:
-

You do understand that the current code in branch-2 that this patch modifies 
doesn't protect HDFS and YARN at all, correct? I can still set HADOOP_USER_NAME 
and kill any YARN job I want or delete any file in HDFS I want. 




was (Author: aw):
You do understand that the current code in branch-2 doesn't protect HDFS and 
YARN at all, correct? I can still set HADOOP_USER_NAME and kill any YARN job I 
want or delete any file in HDFS I want. 

> LCE should support non-cgroups, non-secure mode
> ---
>
> Key: YARN-2424
> URL: https://issues.apache.org/jira/browse/YARN-2424
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
>Reporter: Allen Wittenauer
>Priority: Blocker
> Attachments: YARN-2424.patch
>
>
> After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
> This is a fairly serious regression, as turning on LCE prior to turning on 
> full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-19 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102482#comment-14102482
 ] 

Allen Wittenauer commented on YARN-2424:


You do understand that the current code in branch-2 doesn't protect HDFS and 
YARN at all, correct? I can still set HADOOP_USER_NAME and kill any YARN job I 
want or delete any file in HDFS I want. 

> LCE should support non-cgroups, non-secure mode
> ---
>
> Key: YARN-2424
> URL: https://issues.apache.org/jira/browse/YARN-2424
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
>Reporter: Allen Wittenauer
>Priority: Blocker
> Attachments: YARN-2424.patch
>
>
> After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
> This is a fairly serious regression, as turning on LCE prior to turning on 
> full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1297) Miscellaneous Fair Scheduler speedups

2014-08-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102477#comment-14102477
 ] 

Hadoop QA commented on YARN-1297:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12628664/YARN-1297-2.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4670//console

This message is automatically generated.

> Miscellaneous Fair Scheduler speedups
> -
>
> Key: YARN-1297
> URL: https://issues.apache.org/jira/browse/YARN-1297
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-1297-1.patch, YARN-1297-2.patch, YARN-1297.patch, 
> YARN-1297.patch
>
>
> I ran the Fair Scheduler's core scheduling loop through a profiler tool and 
> identified a bunch of minimally invasive changes that can shave off a few 
> milliseconds.
> The main one is demoting a couple INFO log messages to DEBUG, which brought 
> my benchmark down from 16000 ms to 6000.
> A few others (which had way less of an impact) were
> * Most of the time in comparisons was being spent in Math.signum.  I switched 
> this to direct ifs and elses and it halved the percent of time spent in 
> comparisons.
> * I removed some unnecessary instantiations of Resource objects
> * I made it so that queues' usage wasn't calculated from the applications up 
> each time getResourceUsage was called.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2385) Adding support for listing all applications in a queue

2014-08-19 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102451#comment-14102451
 ] 

Sunil G commented on YARN-2385:
---

I feel this JIRA itself can be updated to split the getAppsinQueue to 
getRunningAppsInQueue + getPendingAppsInQueue.

I have a doubt though, getAppsInQueue is used by killAllAppsInQueue and 
ClientRMService#getApplications. Generically we need to define which api to be 
replaced in all these callee side. Or will it be getRunningAppsInQueue + 
getPendingAppsInQueue together?

> Adding support for listing all applications in a queue
> --
>
> Key: YARN-2385
> URL: https://issues.apache.org/jira/browse/YARN-2385
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler
>Reporter: Subramaniam Krishnan
>  Labels: abstractyarnscheduler
>
> This JIRA proposes adding a method in AbstractYarnScheduler to get all the 
> pending/active applications. Fair scheduler already supports moving a single 
> application from one queue to another. Support for the same is being added to 
> Capacity Scheduler as part of YARN-2378 and YARN-2248. So with the addition 
> of this method, we can transparently add support for moving all applications 
> from source queue to target queue and draining a queue, i.e. killing all 
> applications in a queue as proposed by YARN-2389



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-19 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102450#comment-14102450
 ] 

Alejandro Abdelnur commented on YARN-2424:
--

If security is OFF, I can simply submit a job as ANY user by simply doing 
-Duser.name=ANY. User ANY will be the one used by YARN and HDFS (I'll leave it 
up to the reader to see how to do this).

I really don't like what this JIRA is proposing, and I've indicated what it 
would have to be done for me not to -1.



> LCE should support non-cgroups, non-secure mode
> ---
>
> Key: YARN-2424
> URL: https://issues.apache.org/jira/browse/YARN-2424
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
>Reporter: Allen Wittenauer
>Priority: Blocker
> Attachments: YARN-2424.patch
>
>
> After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
> This is a fairly serious regression, as turning on LCE prior to turning on 
> full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-19 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102416#comment-14102416
 ] 

Allen Wittenauer commented on YARN-2424:


BTW, it should be pointed out that the current code doesn't actually protect 
non-RPCSEC NFSv3/v2  directories.  It only prevents them from being mounted 
using system facilities.  (I'll leave it up to the reader to see how to 
implement an exploit not that it's particularly hard.) 

The only "security" thing the current code does is limit containers to run as 
one uid which in turn means preventing access to any elevated privs that any 
other user might have.  That's it. So if you have too many users with, say, 
passwordless sudo or if you don't want to publish user names to your compute 
nodes, the current code helps.  Otherwise, you're getting zero benefits.  For 
example, YARN scheduling and HDFS writes are still being done by the originally 
requested user.

The security aspects, as pointed out in the original JIRA, are a red herring.

> LCE should support non-cgroups, non-secure mode
> ---
>
> Key: YARN-2424
> URL: https://issues.apache.org/jira/browse/YARN-2424
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
>Reporter: Allen Wittenauer
>Priority: Blocker
> Attachments: YARN-2424.patch
>
>
> After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
> This is a fairly serious regression, as turning on LCE prior to turning on 
> full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2429) LCE should blacklist based upon group

2014-08-19 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102367#comment-14102367
 ] 

Alejandro Abdelnur commented on YARN-2429:
--

Unless I'm mistaken, the blacklisting is done in the C code. Currently Hadoop 
uses the {{Groups}} class to fetch group info, there are multiple plugins for 
it (shell, ldap, jni, ...). This means that you'd have to either get all groups 
of the user before calling the LCE and passing them as params, or the LCE would 
have to connect to the same group source as the Java side of things. 

> LCE should blacklist based upon group
> -
>
> Key: YARN-2429
> URL: https://issues.apache.org/jira/browse/YARN-2429
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Allen Wittenauer
>
> It should be possible to list a group to ban, not just individual users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-19 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102357#comment-14102357
 ] 

Alejandro Abdelnur commented on YARN-2424:
--

[~raviprak], allow to sudo to more than one user in unsecure mode, it doesn't 
give you any extra security. Actually, it may give you a sense of false 
security.

On using groups in the LCE blacklist/whitelist, i'll comment in YARN-2429.

> LCE should support non-cgroups, non-secure mode
> ---
>
> Key: YARN-2424
> URL: https://issues.apache.org/jira/browse/YARN-2424
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
>Reporter: Allen Wittenauer
>Priority: Blocker
> Attachments: YARN-2424.patch
>
>
> After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
> This is a fairly serious regression, as turning on LCE prior to turning on 
> full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart

2014-08-19 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102058#comment-14102058
 ] 

Anubhav Dhoot commented on YARN-1372:
-

The tests that failed are all passing individually and are all related to bind 
failures. The only real failures seem to be the TestNodeManagerResync. 
Investigating that.

> Ensure all completed containers are reported to the AMs across RM restart
> -
>
> Key: YARN-1372
> URL: https://issues.apache.org/jira/browse/YARN-1372
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Anubhav Dhoot
> Attachments: YARN-1372.001.patch, YARN-1372.001.patch, 
> YARN-1372.prelim.patch, YARN-1372.prelim2.patch
>
>
> Currently the NM informs the RM about completed containers and then removes 
> those containers from the RM notification list. The RM passes on that 
> completed container information to the AM and the AM pulls this data. If the 
> RM dies before the AM pulls this data then the AM may not be able to get this 
> information again. To fix this, NM should maintain a separate list of such 
> completed container notifications sent to the RM. After the AM has pulled the 
> containers from the RM then the RM will inform the NM about it and the NM can 
> remove the completed container from the new list. Upon re-register with the 
> RM (after RM restart) the NM should send the entire list of completed 
> containers to the RM along with any other containers that completed while the 
> RM was dead. This ensures that the RM can inform the AM's about all completed 
> containers. Some container completions may be reported more than once since 
> the AM may have pulled the container but the RM may die before notifying the 
> NM about the pull.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2249) AM release request may be lost on RM restart

2014-08-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102049#comment-14102049
 ] 

Hadoop QA commented on YARN-2249:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12662609/YARN-2249.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-sls 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4669//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4669//console

This message is automatically generated.

> AM release request may be lost on RM restart
> 
>
> Key: YARN-2249
> URL: https://issues.apache.org/jira/browse/YARN-2249
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2249.1.patch, YARN-2249.1.patch, YARN-2249.2.patch, 
> YARN-2249.2.patch, YARN-2249.3.patch, YARN-2249.4.patch, YARN-2249.5.patch
>
>
> AM resync on RM restart will send outstanding container release requests back 
> to the new RM. In the meantime, NMs report the container statuses back to RM 
> to recover the containers. If RM receives the container release request  
> before the container is actually recovered in scheduler, the container won't 
> be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1856) cgroups based memory monitoring for containers

2014-08-19 Thread Beckham007 (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101971#comment-14101971
 ] 

Beckham007 commented on YARN-1856:
--

We had work on this for a few days. We will validate it in our production 
envriment, which has 4000 nodes.
We set memory.limit_in_bytes for /cgroup/memory/hadoop-yarn and set 
memory.soft_limit_in_byte for each container. Also, we use cgroup.event_control 
to handle oom event.
Mesos used the similar policy for memory isolation.

> cgroups based memory monitoring for containers
> --
>
> Key: YARN-1856
> URL: https://issues.apache.org/jira/browse/YARN-1856
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)