date:20130930


 [ 
https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1232:
---

Attachment: yarn-1232-5.patch

Discussed this with Bikas and Alejandro offline. The consensus was to have all 
rpc-addresses take the form rpc-address-conf.node-id. Uploading a patch 
that does that.

 Configuration support for RM HA
 ---

 Key: YARN-1232
 URL: https://issues.apache.org/jira/browse/YARN-1232
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, 
 yarn-1232-4.patch, yarn-1232-5.patch


 We should augment the configuration to allow users specify two RMs and the 
 individual RPC addresses for them. This blocks 
 ConfiguredFailoverProxyProvider.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1252) Secure RM fails to start up in secure HA setup with Renewal request for unknown token exception

Arpit Gupta created YARN-1252:
-

 Summary: Secure RM fails to start up in secure HA setup with 
Renewal request for unknown token exception
 Key: YARN-1252
 URL: https://issues.apache.org/jira/browse/YARN-1252
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Arpit Gupta


{code}
2013-09-26 08:15:20,507 INFO  ipc.Server (Server.java:run(861)) - IPC Server 
Responder: starting
2013-09-26 08:15:20,521 ERROR security.UserGroupInformation 
(UserGroupInformation.java:doAs(1486)) - PriviledgedActionException 
as:rm/host@realm (auth:KERBEROS) 
cause:org.apache.hadoop.security.token.SecretManager$InvalidToken: Renewal 
request for unknown token
at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:388)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:5934)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:453)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:851)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59650)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1483)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042
{code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1232) Configuration support for RM HA


[ 
https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782032#comment-13782032
 ] 

Hadoop QA commented on YARN-1232:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12605931/yarn-1232-5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2040//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2040//console

This message is automatically generated.

 Configuration support for RM HA
 ---

 Key: YARN-1232
 URL: https://issues.apache.org/jira/browse/YARN-1232
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, 
 yarn-1232-4.patch, yarn-1232-5.patch


 We should augment the configuration to allow users specify two RMs and the 
 individual RPC addresses for them. This blocks 
 ConfiguredFailoverProxyProvider.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1253) Changes to LinuxContainerExecutor to use cgroups in unsecure mode

Alejandro Abdelnur created YARN-1253:


 Summary: Changes to LinuxContainerExecutor to use cgroups in 
unsecure mode
 Key: YARN-1253
 URL: https://issues.apache.org/jira/browse/YARN-1253
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Roman Shaposhnik
Priority: Blocker
 Fix For: 2.1.1-beta


When using cgroups we require LCE to be configured in the cluster to start 
containers. 

When LCE starts containers as the user that submitted the job. While this works 
correctly in a secure setup, in an un-secure setup this presents a couple 
issues:

* LCE requires all Hadoop users submitting jobs to be Unix users in all nodes
* Because users can impersonate other users, any user would have access to any 
local file of other users

Particularly, the second issue is not desirable as a user could get access to 
ssh keys of other users in the nodes or if there are NFS mounts, get to other 
users data outside of the cluster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1253) Changes to LinuxContainerExecutor to use cgroups in unsecure mode


[ 
https://issues.apache.org/jira/browse/YARN-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782038#comment-13782038
 ] 

Alejandro Abdelnur commented on YARN-1253:
--

When using {{LinuxContainerExecutor.java}} in unsecure mode, we should have a 
{{yarn.nodemanager.linux-container-executor.unsecure-mode.local-user}} (with 
{{yarnuser}} as default) property that defines the local user LCE should use to 
start containers when used in unsecure mode.

The {{container-executor.c}} should received and extra parameter with the 
runAsUser, differentiating it from the user (which is used to create the 
usercache/$USER/ directory. (the {{container-executor.c}} code already is 
already prepared to handle this differentiation, the changes are minimal, just 
passing the extra parameter and wiring it in the right places.

The {{yarnuser}} should be provisioned as system user in the nodes and added to 
the whitelisted system users in the {{container-executor.cfg}} configuration, 
YARN-1137.

 Changes to LinuxContainerExecutor to use cgroups in unsecure mode
 -

 Key: YARN-1253
 URL: https://issues.apache.org/jira/browse/YARN-1253
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Roman Shaposhnik
Priority: Blocker
 Fix For: 2.1.1-beta


 When using cgroups we require LCE to be configured in the cluster to start 
 containers. 
 When LCE starts containers as the user that submitted the job. While this 
 works correctly in a secure setup, in an un-secure setup this presents a 
 couple issues:
 * LCE requires all Hadoop users submitting jobs to be Unix users in all nodes
 * Because users can impersonate other users, any user would have access to 
 any local file of other users
 Particularly, the second issue is not desirable as a user could get access to 
 ssh keys of other users in the nodes or if there are NFS mounts, get to other 
 users data outside of the cluster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1232) Configuration to support multiple RMs


 [ 
https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1232:
---

Description: We should augment the configuration to allow users specify two 
RMs and the individual RPC addresses for them.  (was: We should augment the 
configuration to allow users specify two RMs and the individual RPC addresses 
for them. This blocks ConfiguredFailoverProxyProvider.)

 Configuration to support multiple RMs
 -

 Key: YARN-1232
 URL: https://issues.apache.org/jira/browse/YARN-1232
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, 
 yarn-1232-4.patch, yarn-1232-5.patch


 We should augment the configuration to allow users specify two RMs and the 
 individual RPC addresses for them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1232) Configuration to support multiple RMs


 [ 
https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1232:
---

Summary: Configuration to support multiple RMs  (was: Configuration support 
for RM HA)

 Configuration to support multiple RMs
 -

 Key: YARN-1232
 URL: https://issues.apache.org/jira/browse/YARN-1232
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, 
 yarn-1232-4.patch, yarn-1232-5.patch


 We should augment the configuration to allow users specify two RMs and the 
 individual RPC addresses for them. This blocks 
 ConfiguredFailoverProxyProvider.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1232) Configuration to support multiple RMs


[ 
https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782047#comment-13782047
 ] 

Karthik Kambatla commented on YARN-1232:


Summarizing the JIRA to make it simpler for folks to follow. As the description 
states, the focus of this JIRA is to add configs to allow specifying multiple 
RMs and their RPC addresses. The approach is to add the notion of RM-ids 
through {{yarn.resourcemanager.ha.nodes}}, and add a node-id suffix to each RPC 
address config. When starting the cluster, the server-side config should 
explicitly set {{yarn.resourcemanager.ha.node.id}} to specify the node-id of 
the RM being started.

I believe the patch is ready for review.



 Configuration to support multiple RMs
 -

 Key: YARN-1232
 URL: https://issues.apache.org/jira/browse/YARN-1232
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, 
 yarn-1232-4.patch, yarn-1232-5.patch


 We should augment the configuration to allow users specify two RMs and the 
 individual RPC addresses for them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1215) Yarn URL should include userinfo

2013-09-30 Thread Chuan Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782066#comment-13782066
 ] 

Chuan Liu commented on YARN-1215:
-

I have following test failures in my full test run. None of them seems a 
regression, i.e. I have the same failures with or without the patch.

Yarn:
{noformat}
TestRMDelegationTokens.testRMDTMasterKeyStateOnRollingMasterKey:114 null
TestFairScheduler.testSimpleFairShareCalculation:385 expected:3414 but was:0
TestDiskFailures.testLocalDirsFailures:99-testDirsFailures:179-verifyDisksHealth:247
 Â» NoSuchElement
TestContainerManagerSecurity.testContainerManager:113-testNMTokens:222 Â» 
IllegalArgument
TestContainerManagerSecurity.testContainerManager:113-testNMTokens:222 Â» 
IllegalArgument
TestNMClient.testNMClientNoCleanupOnStop:199-allocateContainers:233 Â» 
IndexOutOfBounds
{noformat}

Mapred:
{noformat}
TestFetchFailure.testFetchFailureMultipleReduces:332 expected:SUCCEEDED but 
was:SCHEDULED
TestMRApp.testUpdatedNodes:258 Expecting 2 more completion events for killed 
expected:4 but was:3
TestCommitterEventHandler.testBasic:263 null
TestMiniMRClientCluster.testRestart:146 Address before restart: chuanliu101:0 
is different from new address: chuanliu101:53368 expected:chuanliu101:[0] but 
was:chuanliu101:[53368]
TestClusterMRNotificationNotificationTestCase.testMR:163 expected:2 but 
was:0
TestJobListCache.testAddExisting:39 Â»  test timed out after 1000 milliseconds
TestLocalMRNotificationNotificationTestCase.testMR:178 Â» IO Job cleanup 
didn'...
TestMRJobsWithHistoryService.testJobHistoryData:153 Â» IO 
java.net.ConnectExcep...
{noformat}

 Yarn URL should include userinfo
 

 Key: YARN-1215
 URL: https://issues.apache.org/jira/browse/YARN-1215
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 3.0.0
Reporter: Chuan Liu
Assignee: Chuan Liu
 Attachments: YARN-1215-trunk.2.patch, YARN-1215-trunk.patch


 In the {{org.apache.hadoop.yarn.api.records.URL}} class, we don't have an 
 userinfo as part of the URL. When converting a {{java.net.URI}} object into 
 the YARN URL object in {{ConverterUtils.getYarnUrlFromURI()}} method, we will 
 set uri host as the url host. If the uri has a userinfo part, the userinfo is 
 discarded. This will lead to information loss if the original uri has the 
 userinfo, e.g. foo://username:passw...@example.com will be converted to 
 foo://example.com and username/password information is lost during the 
 conversion.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1111) NM containerlogs servlet can't handle logs of more than a GB

2013-09-30 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782078#comment-13782078
 ] 

Steve Loughran commented on YARN-:
--

investigate how the logs get copied back from YARN containers to HDFS

 NM containerlogs servlet can't handle logs of more than a GB
 

 Key: YARN-
 URL: https://issues.apache.org/jira/browse/YARN-
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.1.0-beta
 Environment: Long-lived service generating lots of log data from 
 HBase running in debug level
Reporter: Steve Loughran
Priority: Minor

 If a container is set up to log stdout to a file, the container log servlet 
 will list the file
 {code}
 err.txt : Total file length is 551 bytes.
 out.txt : Total file length is 1572099246 bytes.
 {code}
 If you actually click on out.txt then the tail logic takes a *very* long time 
 to react. There is also the question of what will happen if the log fills up 
 that volume



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1111) NM containerlogs servlet can't handle logs of more than a GB

2013-09-30 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782080#comment-13782080
 ] 

Steve Loughran commented on YARN-:
--

ignore that last comment, note to myself

 NM containerlogs servlet can't handle logs of more than a GB
 

 Key: YARN-
 URL: https://issues.apache.org/jira/browse/YARN-
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.1.0-beta
 Environment: Long-lived service generating lots of log data from 
 HBase running in debug level
Reporter: Steve Loughran
Priority: Minor

 If a container is set up to log stdout to a file, the container log servlet 
 will list the file
 {code}
 err.txt : Total file length is 551 bytes.
 out.txt : Total file length is 1572099246 bytes.
 {code}
 If you actually click on out.txt then the tail logic takes a *very* long time 
 to react. There is also the question of what will happen if the log fills up 
 that volume



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1221) With Fair Scheduler, reserved MB reported in RM web UI increases indefinitely

2013-09-30 Thread Siqi Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-1221:
--

Attachment: YARN1221_v6.patch

 With Fair Scheduler, reserved MB reported in RM web UI increases indefinitely
 -

 Key: YARN-1221
 URL: https://issues.apache.org/jira/browse/YARN-1221
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
 Attachments: YARN1221_v1.patch.txt, YARN1221_v2.patch.txt, 
 YARN1221_v3.patch.txt, YARN1221_v4.patch, YARN1221_v5.patch, YARN1221_v6.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1232) Configuration to support multiple RMs

2013-09-30 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782241#comment-13782241
 ] 

Bikas Saha commented on YARN-1232:
--

We should probably de-link ha with rm id. The rm id is a logical name for the 
rm that is currently used to separate config, translate tokens etc. ha utilizes 
this logical name concept to reference the rm's.

 Configuration to support multiple RMs
 -

 Key: YARN-1232
 URL: https://issues.apache.org/jira/browse/YARN-1232
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, 
 yarn-1232-4.patch, yarn-1232-5.patch


 We should augment the configuration to allow users specify two RMs and the 
 individual RPC addresses for them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1254) NM is polluting container's credentials

2013-09-30 Thread Vinod Kumar Vavilapalli (JIRA)

Vinod Kumar Vavilapalli created YARN-1254:
-

 Summary: NM is polluting container's credentials
 Key: YARN-1254
 URL: https://issues.apache.org/jira/browse/YARN-1254
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi


Before launching the container, NM is using the same credential object and so 
is polluting what container should see. We should fix this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1221) With Fair Scheduler, reserved MB reported in RM web UI increases indefinitely


[ 
https://issues.apache.org/jira/browse/YARN-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782250#comment-13782250
 ] 

Hadoop QA commented on YARN-1221:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12605964/YARN1221_v6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2041//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2041//console

This message is automatically generated.

 With Fair Scheduler, reserved MB reported in RM web UI increases indefinitely
 -

 Key: YARN-1221
 URL: https://issues.apache.org/jira/browse/YARN-1221
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
 Attachments: YARN1221_v1.patch.txt, YARN1221_v2.patch.txt, 
 YARN1221_v3.patch.txt, YARN1221_v4.patch, YARN1221_v5.patch, YARN1221_v6.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1241) In Fair Scheduler maxRunningApps does not work for non-leaf queues


 [ 
https://issues.apache.org/jira/browse/YARN-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1241:
-

Attachment: YARN-1241.patch

 In Fair Scheduler maxRunningApps does not work for non-leaf queues
 --

 Key: YARN-1241
 URL: https://issues.apache.org/jira/browse/YARN-1241
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1241.patch


 Setting the maxRunningApps property on a parent queue should make it that the 
 sum of apps in all subqueues can't exceed it



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1255) RM fails to start up with Failed to load/recover state error in a HA setup

Arpit Gupta created YARN-1255:
-

 Summary: RM fails to start up with Failed to load/recover state 
error in a HA setup
 Key: YARN-1255
 URL: https://issues.apache.org/jira/browse/YARN-1255
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Arpit Gupta


{code}
2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:parseQueue(408)) - Initialized queue: default: 
capacity=1.0, absoluteCapacity=1.0, usedResources=memory:0, 
vCores:0usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0
2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:parseQueue(408)) - Initialized queue: root: 
numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, usedResources=memory:0, 
vCores:0usedCapacity=0.0, numApps=0, numContainers=0
2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:initializeQueues(306)) - Initialized root queue root: 
numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, usedResources=memory:0, 
vCores:0usedCapacity=0.0, numApps=0, numContainers=0
2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:reinitialize(270)) - Initialized CapacityScheduler with 
calculator=class 
org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
minimumAllocation=memory:1024, vCores:1, maximumAllocation=memory:8192, 
vCores:32
2013-09-30 09:12:09,240 INFO  event.AsyncDispatcher 
(AsyncDispatcher.java:register(157)) - Registering class 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManagerEventType for class 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager
2013-09-30 09:12:09,250 INFO  event.AsyncDispatcher 
(AsyncDispatcher.java:register(157)) - Registering class 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncherEventType 
for class 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher
2013-09-30 09:12:09,252 INFO  resourcemanager.RMNMInfo 
(RMNMInfo.java:init(63)) - Registered RMNMInfo MBean
2013-09-30 09:12:09,253 INFO  util.HostsFileReader 
(HostsFileReader.java:refresh(84)) - Refreshing hosts (include/exclude) list
2013-09-30 09:12:09,278 INFO  security.UserGroupInformation 
(UserGroupInformation.java:loginUserFromKeytab(843)) - Login successful for 
user rm/hostname@realm using keytab file /etc/security/keytabs/rm.service.keytab
2013-09-30 09:12:09,278 INFO  security.RMContainerTokenSecretManager 
(RMContainerTokenSecretManager.java:rollMasterKey(103)) - Rolling master-key 
for container-tokens
2013-09-30 09:12:09,279 INFO  security.AMRMTokenSecretManager 
(AMRMTokenSecretManager.java:rollMasterKey(107)) - Rolling master-key for 
amrm-tokens
2013-09-30 09:12:09,281 INFO  security.NMTokenSecretManagerInRM 
(NMTokenSecretManagerInRM.java:rollMasterKey(97)) - Rolling master-key for 
nm-tokens
2013-09-30 09:12:10,196 INFO  recovery.FileSystemRMStateStore 
(FileSystemRMStateStore.java:loadRMAppState(131)) - Loading application from 
node: application_1380531989689_0002
2013-09-30 09:12:10,217 INFO  recovery.FileSystemRMStateStore 
(FileSystemRMStateStore.java:loadRMAppState(131)) - Loading application from 
node: application_1380531989689_0003
2013-09-30 09:12:10,232 INFO  security.RMDelegationTokenSecretManager 
(RMDelegationTokenSecretManager.java:recover(181)) - recovering 
RMDelegationTokenSecretManager.
2013-09-30 09:12:10,234 INFO  resourcemanager.RMAppManager 
(RMAppManager.java:recover(329)) - Recovering 2 applications
2013-09-30 09:12:10,234 ERROR resourcemanager.ResourceManager 
(ResourceManager.java:serviceStart(640)) - Failed to load/recover state
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:332)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:842)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:636)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:855)
2013-09-30 09:12:10,236 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - 
Exiting with status 1
2013-09-30 09:17:20,144 INFO  resourcemanager.ResourceManager 
(StringUtils.java:startupShutdownMessage(601)) - STARTUP_MSG:
{code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1252) Secure RM fails to start up in secure HA setup with Renewal request for unknown token exception

2013-09-30 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782263#comment-13782263
 ] 

Jian He commented on YARN-1252:
---

It could be the reason that when the application finishes, NN is failing over 
and becomes in SAFEMODE, and at that point of time RM is not able to remove the 
application state (within which we store the HDFSDelegationToken) from the 
store, and RM goes ahead and finishes the app and add the token to the cancel 
queue, when new NN is up, the token is canceled. Then RM shutdown. Since the 
token is removed on HDFS tokenSecretManager already , when RM comes back, it 
will reads the application state(which failed to remove) to try to renew a 
non-existing token.

 Secure RM fails to start up in secure HA setup with Renewal request for 
 unknown token exception
 ---

 Key: YARN-1252
 URL: https://issues.apache.org/jira/browse/YARN-1252
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Arpit Gupta

 {code}
 2013-09-26 08:15:20,507 INFO  ipc.Server (Server.java:run(861)) - IPC Server 
 Responder: starting
 2013-09-26 08:15:20,521 ERROR security.UserGroupInformation 
 (UserGroupInformation.java:doAs(1486)) - PriviledgedActionException 
 as:rm/host@realm (auth:KERBEROS) 
 cause:org.apache.hadoop.security.token.SecretManager$InvalidToken: Renewal 
 request for unknown token
 at 
 org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:388)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:5934)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:453)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:851)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59650)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1483)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1255) RM fails to start up with Failed to load/recover state error in a HA setup

2013-09-30 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782274#comment-13782274
 ] 

Jian He commented on YARN-1255:
---

RM might be killed while it's saving the app data(after the app file is 
created, before the data is written into the file), when RM recovers it loads 
an empty file and gets a NULL exception, reproduced this locally and see the 
same exception stack.

 RM fails to start up with Failed to load/recover state error in a HA setup
 --

 Key: YARN-1255
 URL: https://issues.apache.org/jira/browse/YARN-1255
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Arpit Gupta

 {code}
 2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:parseQueue(408)) - Initialized queue: default: 
 capacity=1.0, absoluteCapacity=1.0, usedResources=memory:0, 
 vCores:0usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, 
 numContainers=0
 2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:parseQueue(408)) - Initialized queue: root: 
 numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, 
 usedResources=memory:0, vCores:0usedCapacity=0.0, numApps=0, numContainers=0
 2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:initializeQueues(306)) - Initialized root queue root: 
 numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, 
 usedResources=memory:0, vCores:0usedCapacity=0.0, numApps=0, numContainers=0
 2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:reinitialize(270)) - Initialized CapacityScheduler 
 with calculator=class 
 org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
 minimumAllocation=memory:1024, vCores:1, maximumAllocation=memory:8192, 
 vCores:32
 2013-09-30 09:12:09,240 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:register(157)) - Registering class 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManagerEventType for class 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager
 2013-09-30 09:12:09,250 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:register(157)) - Registering class 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncherEventType 
 for class 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher
 2013-09-30 09:12:09,252 INFO  resourcemanager.RMNMInfo 
 (RMNMInfo.java:init(63)) - Registered RMNMInfo MBean
 2013-09-30 09:12:09,253 INFO  util.HostsFileReader 
 (HostsFileReader.java:refresh(84)) - Refreshing hosts (include/exclude) list
 2013-09-30 09:12:09,278 INFO  security.UserGroupInformation 
 (UserGroupInformation.java:loginUserFromKeytab(843)) - Login successful for 
 user rm/hostname@realm using keytab file 
 /etc/security/keytabs/rm.service.keytab
 2013-09-30 09:12:09,278 INFO  security.RMContainerTokenSecretManager 
 (RMContainerTokenSecretManager.java:rollMasterKey(103)) - Rolling master-key 
 for container-tokens
 2013-09-30 09:12:09,279 INFO  security.AMRMTokenSecretManager 
 (AMRMTokenSecretManager.java:rollMasterKey(107)) - Rolling master-key for 
 amrm-tokens
 2013-09-30 09:12:09,281 INFO  security.NMTokenSecretManagerInRM 
 (NMTokenSecretManagerInRM.java:rollMasterKey(97)) - Rolling master-key for 
 nm-tokens
 2013-09-30 09:12:10,196 INFO  recovery.FileSystemRMStateStore 
 (FileSystemRMStateStore.java:loadRMAppState(131)) - Loading application from 
 node: application_1380531989689_0002
 2013-09-30 09:12:10,217 INFO  recovery.FileSystemRMStateStore 
 (FileSystemRMStateStore.java:loadRMAppState(131)) - Loading application from 
 node: application_1380531989689_0003
 2013-09-30 09:12:10,232 INFO  security.RMDelegationTokenSecretManager 
 (RMDelegationTokenSecretManager.java:recover(181)) - recovering 
 RMDelegationTokenSecretManager.
 2013-09-30 09:12:10,234 INFO  resourcemanager.RMAppManager 
 (RMAppManager.java:recover(329)) - Recovering 2 applications
 2013-09-30 09:12:10,234 ERROR resourcemanager.ResourceManager 
 (ResourceManager.java:serviceStart(640)) - Failed to load/recover state
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:332)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:842)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:636)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:855)
 2013-09-30 09:12:10,236 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - 
 Exiting with status 1
 2013-09-30

[jira] [Commented] (YARN-1221) With Fair Scheduler, reserved MB reported in RM web UI increases indefinitely


[ 
https://issues.apache.org/jira/browse/YARN-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782283#comment-13782283
 ] 

Sandy Ryza commented on YARN-1221:
--

+1

 With Fair Scheduler, reserved MB reported in RM web UI increases indefinitely
 -

 Key: YARN-1221
 URL: https://issues.apache.org/jira/browse/YARN-1221
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
 Attachments: YARN1221_v1.patch.txt, YARN1221_v2.patch.txt, 
 YARN1221_v3.patch.txt, YARN1221_v4.patch, YARN1221_v5.patch, YARN1221_v6.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1254) NM is polluting container's credentials


 [ 
https://issues.apache.org/jira/browse/YARN-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1254:


Attachment: YARN-1254.20131030.1.patch

 NM is polluting container's credentials
 ---

 Key: YARN-1254
 URL: https://issues.apache.org/jira/browse/YARN-1254
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1254.20131030.1.patch


 Before launching the container, NM is using the same credential object and so 
 is polluting what container should see. We should fix this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1241) In Fair Scheduler maxRunningApps does not work for non-leaf queues


 [ 
https://issues.apache.org/jira/browse/YARN-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1241:
-

Attachment: YARN-1241-1.patch

 In Fair Scheduler maxRunningApps does not work for non-leaf queues
 --

 Key: YARN-1241
 URL: https://issues.apache.org/jira/browse/YARN-1241
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1241-1.patch, YARN-1241.patch


 Setting the maxRunningApps property on a parent queue should make it that the 
 sum of apps in all subqueues can't exceed it



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1241) In Fair Scheduler maxRunningApps does not work for non-leaf queues


[ 
https://issues.apache.org/jira/browse/YARN-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782319#comment-13782319
 ] 

Sandy Ryza commented on YARN-1241:
--

Rebased on trunk

 In Fair Scheduler maxRunningApps does not work for non-leaf queues
 --

 Key: YARN-1241
 URL: https://issues.apache.org/jira/browse/YARN-1241
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1241-1.patch, YARN-1241.patch


 Setting the maxRunningApps property on a parent queue should make it that the 
 sum of apps in all subqueues can't exceed it



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1221) With Fair Scheduler, reserved MB reported in RM web UI increases indefinitely


 [ 
https://issues.apache.org/jira/browse/YARN-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1221:
-

Assignee: Siqi Li

 With Fair Scheduler, reserved MB reported in RM web UI increases indefinitely
 -

 Key: YARN-1221
 URL: https://issues.apache.org/jira/browse/YARN-1221
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Siqi Li
 Attachments: YARN1221_v1.patch.txt, YARN1221_v2.patch.txt, 
 YARN1221_v3.patch.txt, YARN1221_v4.patch, YARN1221_v5.patch, YARN1221_v6.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1221) With Fair Scheduler, reserved MB reported in RM web UI increases indefinitely


[ 
https://issues.apache.org/jira/browse/YARN-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782322#comment-13782322
 ] 

Sandy Ryza commented on YARN-1221:
--

I just committed this to trunk, branch-2, and branch-2.1-beta.  Thanks Siqi!

 With Fair Scheduler, reserved MB reported in RM web UI increases indefinitely
 -

 Key: YARN-1221
 URL: https://issues.apache.org/jira/browse/YARN-1221
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Siqi Li
 Fix For: 2.1.2-beta

 Attachments: YARN1221_v1.patch.txt, YARN1221_v2.patch.txt, 
 YARN1221_v3.patch.txt, YARN1221_v4.patch, YARN1221_v5.patch, YARN1221_v6.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1221) With Fair Scheduler, reserved MB reported in RM web UI increases indefinitely

2013-09-30 Thread Siqi Li (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782324#comment-13782324
 ] 

Siqi Li commented on YARN-1221:
---

you are welcome

 With Fair Scheduler, reserved MB reported in RM web UI increases indefinitely
 -

 Key: YARN-1221
 URL: https://issues.apache.org/jira/browse/YARN-1221
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Siqi Li
 Fix For: 2.1.2-beta

 Attachments: YARN1221_v1.patch.txt, YARN1221_v2.patch.txt, 
 YARN1221_v3.patch.txt, YARN1221_v4.patch, YARN1221_v5.patch, YARN1221_v6.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1254) NM is polluting container's credentials


[ 
https://issues.apache.org/jira/browse/YARN-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782325#comment-13782325
 ] 

Hadoop QA commented on YARN-1254:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12605983/YARN-1254.20131030.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2043//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2043//console

This message is automatically generated.

 NM is polluting container's credentials
 ---

 Key: YARN-1254
 URL: https://issues.apache.org/jira/browse/YARN-1254
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1254.20131030.1.patch


 Before launching the container, NM is using the same credential object and so 
 is polluting what container should see. We should fix this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest

2013-09-30 Thread Bikas Saha (JIRA)

Bikas Saha created YARN-1256:


 Summary: NM silently ignores non-existent service in 
StartContainerRequest
 Key: YARN-1256
 URL: https://issues.apache.org/jira/browse/YARN-1256
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Bikas Saha
Priority: Critical
 Fix For: 2.1.2-beta


A container can set token service metadata for a service, say shuffle_service. 
If that service does not exist then the errors is silently ignored. Later, when 
the next container wants to access data written to shuffle_service by the first 
task, then it fails because the service does not have the token that was 
supposed to be set by the first task.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1255) RM fails to start up with Failed to load/recover state error in a HA setup

2013-09-30 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782348#comment-13782348
 ] 

Jason Lowe commented on YARN-1255:
--

Is this a dup of YARN-1185?

 RM fails to start up with Failed to load/recover state error in a HA setup
 --

 Key: YARN-1255
 URL: https://issues.apache.org/jira/browse/YARN-1255
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Arpit Gupta

 {code}
 2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:parseQueue(408)) - Initialized queue: default: 
 capacity=1.0, absoluteCapacity=1.0, usedResources=memory:0, 
 vCores:0usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, 
 numContainers=0
 2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:parseQueue(408)) - Initialized queue: root: 
 numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, 
 usedResources=memory:0, vCores:0usedCapacity=0.0, numApps=0, numContainers=0
 2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:initializeQueues(306)) - Initialized root queue root: 
 numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, 
 usedResources=memory:0, vCores:0usedCapacity=0.0, numApps=0, numContainers=0
 2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:reinitialize(270)) - Initialized CapacityScheduler 
 with calculator=class 
 org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
 minimumAllocation=memory:1024, vCores:1, maximumAllocation=memory:8192, 
 vCores:32
 2013-09-30 09:12:09,240 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:register(157)) - Registering class 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManagerEventType for class 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager
 2013-09-30 09:12:09,250 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:register(157)) - Registering class 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncherEventType 
 for class 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher
 2013-09-30 09:12:09,252 INFO  resourcemanager.RMNMInfo 
 (RMNMInfo.java:init(63)) - Registered RMNMInfo MBean
 2013-09-30 09:12:09,253 INFO  util.HostsFileReader 
 (HostsFileReader.java:refresh(84)) - Refreshing hosts (include/exclude) list
 2013-09-30 09:12:09,278 INFO  security.UserGroupInformation 
 (UserGroupInformation.java:loginUserFromKeytab(843)) - Login successful for 
 user rm/hostname@realm using keytab file 
 /etc/security/keytabs/rm.service.keytab
 2013-09-30 09:12:09,278 INFO  security.RMContainerTokenSecretManager 
 (RMContainerTokenSecretManager.java:rollMasterKey(103)) - Rolling master-key 
 for container-tokens
 2013-09-30 09:12:09,279 INFO  security.AMRMTokenSecretManager 
 (AMRMTokenSecretManager.java:rollMasterKey(107)) - Rolling master-key for 
 amrm-tokens
 2013-09-30 09:12:09,281 INFO  security.NMTokenSecretManagerInRM 
 (NMTokenSecretManagerInRM.java:rollMasterKey(97)) - Rolling master-key for 
 nm-tokens
 2013-09-30 09:12:10,196 INFO  recovery.FileSystemRMStateStore 
 (FileSystemRMStateStore.java:loadRMAppState(131)) - Loading application from 
 node: application_1380531989689_0002
 2013-09-30 09:12:10,217 INFO  recovery.FileSystemRMStateStore 
 (FileSystemRMStateStore.java:loadRMAppState(131)) - Loading application from 
 node: application_1380531989689_0003
 2013-09-30 09:12:10,232 INFO  security.RMDelegationTokenSecretManager 
 (RMDelegationTokenSecretManager.java:recover(181)) - recovering 
 RMDelegationTokenSecretManager.
 2013-09-30 09:12:10,234 INFO  resourcemanager.RMAppManager 
 (RMAppManager.java:recover(329)) - Recovering 2 applications
 2013-09-30 09:12:10,234 ERROR resourcemanager.ResourceManager 
 (ResourceManager.java:serviceStart(640)) - Failed to load/recover state
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:332)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:842)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:636)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:855)
 2013-09-30 09:12:10,236 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - 
 Exiting with status 1
 2013-09-30 09:17:20,144 INFO  resourcemanager.ResourceManager 
 (StringUtils.java:startupShutdownMessage(601)) - STARTUP_MSG:
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1241) In Fair Scheduler maxRunningApps does not work for non-leaf queues


[ 
https://issues.apache.org/jira/browse/YARN-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782357#comment-13782357
 ] 

Hadoop QA commented on YARN-1241:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12605986/YARN-1241-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2044//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2044//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2044//console

This message is automatically generated.

 In Fair Scheduler maxRunningApps does not work for non-leaf queues
 --

 Key: YARN-1241
 URL: https://issues.apache.org/jira/browse/YARN-1241
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1241-1.patch, YARN-1241.patch


 Setting the maxRunningApps property on a parent queue should make it that the 
 sum of apps in all subqueues can't exceed it



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1247) test-container-executor has gotten out of sync with the changes to container-executor


[ 
https://issues.apache.org/jira/browse/YARN-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782359#comment-13782359
 ] 

Alejandro Abdelnur commented on YARN-1247:
--

+1 LGTM.

 test-container-executor has gotten out of sync with the changes to 
 container-executor
 -

 Key: YARN-1247
 URL: https://issues.apache.org/jira/browse/YARN-1247
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
 Attachments: 
 0001-YARN-1247.-test-container-executor-has-gotten-out-of.patch


 If run under the super-user account test-container-executor.c fails in 
 multiple different places. It would be nice to fix it so that we have better 
 testing of LCE functionality.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Resolved] (YARN-1255) RM fails to start up with Failed to load/recover state error in a HA setup


 [ 
https://issues.apache.org/jira/browse/YARN-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Gupta resolved YARN-1255.
---

Resolution: Duplicate

Thanks [~jlowe] it is.

 RM fails to start up with Failed to load/recover state error in a HA setup
 --

 Key: YARN-1255
 URL: https://issues.apache.org/jira/browse/YARN-1255
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Arpit Gupta

 {code}
 2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:parseQueue(408)) - Initialized queue: default: 
 capacity=1.0, absoluteCapacity=1.0, usedResources=memory:0, 
 vCores:0usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, 
 numContainers=0
 2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:parseQueue(408)) - Initialized queue: root: 
 numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, 
 usedResources=memory:0, vCores:0usedCapacity=0.0, numApps=0, numContainers=0
 2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:initializeQueues(306)) - Initialized root queue root: 
 numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, 
 usedResources=memory:0, vCores:0usedCapacity=0.0, numApps=0, numContainers=0
 2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:reinitialize(270)) - Initialized CapacityScheduler 
 with calculator=class 
 org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
 minimumAllocation=memory:1024, vCores:1, maximumAllocation=memory:8192, 
 vCores:32
 2013-09-30 09:12:09,240 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:register(157)) - Registering class 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManagerEventType for class 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager
 2013-09-30 09:12:09,250 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:register(157)) - Registering class 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncherEventType 
 for class 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher
 2013-09-30 09:12:09,252 INFO  resourcemanager.RMNMInfo 
 (RMNMInfo.java:init(63)) - Registered RMNMInfo MBean
 2013-09-30 09:12:09,253 INFO  util.HostsFileReader 
 (HostsFileReader.java:refresh(84)) - Refreshing hosts (include/exclude) list
 2013-09-30 09:12:09,278 INFO  security.UserGroupInformation 
 (UserGroupInformation.java:loginUserFromKeytab(843)) - Login successful for 
 user rm/hostname@realm using keytab file 
 /etc/security/keytabs/rm.service.keytab
 2013-09-30 09:12:09,278 INFO  security.RMContainerTokenSecretManager 
 (RMContainerTokenSecretManager.java:rollMasterKey(103)) - Rolling master-key 
 for container-tokens
 2013-09-30 09:12:09,279 INFO  security.AMRMTokenSecretManager 
 (AMRMTokenSecretManager.java:rollMasterKey(107)) - Rolling master-key for 
 amrm-tokens
 2013-09-30 09:12:09,281 INFO  security.NMTokenSecretManagerInRM 
 (NMTokenSecretManagerInRM.java:rollMasterKey(97)) - Rolling master-key for 
 nm-tokens
 2013-09-30 09:12:10,196 INFO  recovery.FileSystemRMStateStore 
 (FileSystemRMStateStore.java:loadRMAppState(131)) - Loading application from 
 node: application_1380531989689_0002
 2013-09-30 09:12:10,217 INFO  recovery.FileSystemRMStateStore 
 (FileSystemRMStateStore.java:loadRMAppState(131)) - Loading application from 
 node: application_1380531989689_0003
 2013-09-30 09:12:10,232 INFO  security.RMDelegationTokenSecretManager 
 (RMDelegationTokenSecretManager.java:recover(181)) - recovering 
 RMDelegationTokenSecretManager.
 2013-09-30 09:12:10,234 INFO  resourcemanager.RMAppManager 
 (RMAppManager.java:recover(329)) - Recovering 2 applications
 2013-09-30 09:12:10,234 ERROR resourcemanager.ResourceManager 
 (ResourceManager.java:serviceStart(640)) - Failed to load/recover state
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:332)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:842)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:636)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:855)
 2013-09-30 09:12:10,236 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - 
 Exiting with status 1
 2013-09-30 09:17:20,144 INFO  resourcemanager.ResourceManager 
 (StringUtils.java:startupShutdownMessage(601)) - STARTUP_MSG:
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery


[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782363#comment-13782363
 ] 

Arpit Gupta commented on YARN-1185:
---

Here is the stack trace from the RM when it tries to recover partially written 
data

{code}
2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:parseQueue(408)) - Initialized queue: default: 
capacity=1.0, absoluteCapacity=1.0, usedResources=memory:0, 
vCores:0usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0
2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:parseQueue(408)) - Initialized queue: root: 
numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, usedResources=memory:0, 
vCores:0usedCapacity=0.0, numApps=0, numContainers=0
2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:initializeQueues(306)) - Initialized root queue root: 
numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, usedResources=memory:0, 
vCores:0usedCapacity=0.0, numApps=0, numContainers=0
2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:reinitialize(270)) - Initialized CapacityScheduler with 
calculator=class 
org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
minimumAllocation=memory:1024, vCores:1, maximumAllocation=memory:8192, 
vCores:32
2013-09-30 09:12:09,240 INFO  event.AsyncDispatcher 
(AsyncDispatcher.java:register(157)) - Registering class 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManagerEventType for class 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager
2013-09-30 09:12:09,250 INFO  event.AsyncDispatcher 
(AsyncDispatcher.java:register(157)) - Registering class 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncherEventType 
for class 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher
2013-09-30 09:12:09,252 INFO  resourcemanager.RMNMInfo 
(RMNMInfo.java:init(63)) - Registered RMNMInfo MBean
2013-09-30 09:12:09,253 INFO  util.HostsFileReader 
(HostsFileReader.java:refresh(84)) - Refreshing hosts (include/exclude) list
2013-09-30 09:12:09,278 INFO  security.UserGroupInformation 
(UserGroupInformation.java:loginUserFromKeytab(843)) - Login successful for 
user rm/hostname@realm using keytab file /etc/security/keytabs/rm.service.keytab
2013-09-30 09:12:09,278 INFO  security.RMContainerTokenSecretManager 
(RMContainerTokenSecretManager.java:rollMasterKey(103)) - Rolling master-key 
for container-tokens
2013-09-30 09:12:09,279 INFO  security.AMRMTokenSecretManager 
(AMRMTokenSecretManager.java:rollMasterKey(107)) - Rolling master-key for 
amrm-tokens
2013-09-30 09:12:09,281 INFO  security.NMTokenSecretManagerInRM 
(NMTokenSecretManagerInRM.java:rollMasterKey(97)) - Rolling master-key for 
nm-tokens
2013-09-30 09:12:10,196 INFO  recovery.FileSystemRMStateStore 
(FileSystemRMStateStore.java:loadRMAppState(131)) - Loading application from 
node: application_1380531989689_0002
2013-09-30 09:12:10,217 INFO  recovery.FileSystemRMStateStore 
(FileSystemRMStateStore.java:loadRMAppState(131)) - Loading application from 
node: application_1380531989689_0003
2013-09-30 09:12:10,232 INFO  security.RMDelegationTokenSecretManager 
(RMDelegationTokenSecretManager.java:recover(181)) - recovering 
RMDelegationTokenSecretManager.
2013-09-30 09:12:10,234 INFO  resourcemanager.RMAppManager 
(RMAppManager.java:recover(329)) - Recovering 2 applications
2013-09-30 09:12:10,234 ERROR resourcemanager.ResourceManager 
(ResourceManager.java:serviceStart(640)) - Failed to load/recover state
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:332)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:842)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:636)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:855)
2013-09-30 09:12:10,236 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - 
Exiting with status 1
2013-09-30 09:17:20,144 INFO  resourcemanager.ResourceManager 
(StringUtils.java:startupShutdownMessage(601)) - STARTUP_MSG:
{code}

 FileSystemRMStateStore can leave partial files that prevent subsequent 
 recovery
 ---

 Key: YARN-1185
 URL: https://issues.apache.org/jira/browse/YARN-1185
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe

 FileSystemRMStateStore writes directly to the destination file when storing 
 state. However if the RM

[jira] [Updated] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery

2013-09-30 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1185:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-128

 FileSystemRMStateStore can leave partial files that prevent subsequent 
 recovery
 ---

 Key: YARN-1185
 URL: https://issues.apache.org/jira/browse/YARN-1185
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe

 FileSystemRMStateStore writes directly to the destination file when storing 
 state. However if the RM were to crash in the middle of the write, the 
 recovery method could encounter a partially-written file and either outright 
 crash during recovery or silently load incomplete state.
 To avoid this, the data should be written to a temporary file and renamed to 
 the destination file afterwards.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-953) [YARN-321] Change ResourceManager to use HistoryStorage to log history data

2013-09-30 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-953:
---

Attachment: YARN-953-5.patch

Thanks [~zjshen] for the patch.

I am updating it with latest YARN-321 branch and fixing some of the compilation 
failures.

Thanks,
Mayank

 [YARN-321] Change ResourceManager to use HistoryStorage to log history data
 ---

 Key: YARN-953
 URL: https://issues.apache.org/jira/browse/YARN-953
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: YARN-953.1.patch, YARN-953.2.patch, YARN-953.3.patch, 
 YARN-953.4.patch, YARN-953-5.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1247) test-container-executor has gotten out of sync with the changes to container-executor

2013-09-30 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782402#comment-13782402
 ] 

Hudson commented on YARN-1247:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4501 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4501/])
YARN-1247. test-container-executor has gotten out of sync with the changes to 
container-executor. (rvs via tucu) (tucu: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1527813)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c


 test-container-executor has gotten out of sync with the changes to 
 container-executor
 -

 Key: YARN-1247
 URL: https://issues.apache.org/jira/browse/YARN-1247
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
 Fix For: 2.1.2-beta

 Attachments: 
 0001-YARN-1247.-test-container-executor-has-gotten-out-of.patch


 If run under the super-user account test-container-executor.c fails in 
 multiple different places. It would be nice to fix it so that we have better 
 testing of LCE functionality.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-953) [YARN-321] Change ResourceManager to use HistoryStorage to log history data


[ 
https://issues.apache.org/jira/browse/YARN-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782404#comment-13782404
 ] 

Hadoop QA commented on YARN-953:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12605993/YARN-953-5.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2045//console

This message is automatically generated.

 [YARN-321] Change ResourceManager to use HistoryStorage to log history data
 ---

 Key: YARN-953
 URL: https://issues.apache.org/jira/browse/YARN-953
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: YARN-953.1.patch, YARN-953.2.patch, YARN-953.3.patch, 
 YARN-953.4.patch, YARN-953-5.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1257) Avro apps are failing with hadoop2

2013-09-30 Thread Mayank Bansal (JIRA)

Mayank Bansal created YARN-1257:
---

 Summary: Avro apps are failing with hadoop2
 Key: YARN-1257
 URL: https://issues.apache.org/jira/browse/YARN-1257
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Mayank Bansal
 Fix For: 2.1.2-beta


hi,

MR Apps which are using avro is not running.
These apps are compile with the hadoop1 jars.

Exception in thread main java.lang.IncompatibleClassChangeError: Found 
interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at 
org.apache.avro.mapreduce.AvroMultipleOutputs.getNamedOutputsList(AvroMultipleOutputs.java:208)
at 
org.apache.avro.mapreduce.AvroMultipleOutputs.checkNamedOutputName(AvroMultipleOutputs.java:195)
at 
org.apache.avro.mapreduce.AvroMultipleOutputs.addNamedOutput(AvroMultipleOutputs.java:259)
at 
com.ebay.sojourner.HadoopJob.addAvroMultipleOutput(HadoopJob.java:157)
at com.ebay.sojourner.intraday.IntradayJob.initJob(IntradayJob.java:113)
at com.ebay.sojourner.HadoopJob.run(HadoopJob.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.ebay.sojourner.intraday.IntradayJob.main(IntradayJob.java:165)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
at com.ebay.sojourner.JobDriver.main(JobDriver.java:52)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1257) Avro apps are failing with hadoop2

2013-09-30 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1257:


Priority: Blocker  (was: Major)

 Avro apps are failing with hadoop2
 --

 Key: YARN-1257
 URL: https://issues.apache.org/jira/browse/YARN-1257
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Mayank Bansal
Priority: Blocker
 Fix For: 2.1.2-beta


 hi,
 MR Apps which are using avro is not running.
 These apps are compile with the hadoop1 jars.
 Exception in thread main java.lang.IncompatibleClassChangeError: Found 
 interface org.apache.hadoop.mapreduce.JobContext, but class was expected
 at 
 org.apache.avro.mapreduce.AvroMultipleOutputs.getNamedOutputsList(AvroMultipleOutputs.java:208)
 at 
 org.apache.avro.mapreduce.AvroMultipleOutputs.checkNamedOutputName(AvroMultipleOutputs.java:195)
 at 
 org.apache.avro.mapreduce.AvroMultipleOutputs.addNamedOutput(AvroMultipleOutputs.java:259)
 at 
 com.ebay.sojourner.HadoopJob.addAvroMultipleOutput(HadoopJob.java:157)
 at 
 com.ebay.sojourner.intraday.IntradayJob.initJob(IntradayJob.java:113)
 at com.ebay.sojourner.HadoopJob.run(HadoopJob.java:76)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
 at com.ebay.sojourner.intraday.IntradayJob.main(IntradayJob.java:165)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
 at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
 at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
 at com.ebay.sojourner.JobDriver.main(JobDriver.java:52)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1253) Changes to LinuxContainerExecutor to use cgroups in unsecure mode


 [ 
https://issues.apache.org/jira/browse/YARN-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1253:


Fix Version/s: (was: 2.1.1-beta)

 Changes to LinuxContainerExecutor to use cgroups in unsecure mode
 -

 Key: YARN-1253
 URL: https://issues.apache.org/jira/browse/YARN-1253
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Roman Shaposhnik
Priority: Blocker

 When using cgroups we require LCE to be configured in the cluster to start 
 containers. 
 When LCE starts containers as the user that submitted the job. While this 
 works correctly in a secure setup, in an un-secure setup this presents a 
 couple issues:
 * LCE requires all Hadoop users submitting jobs to be Unix users in all nodes
 * Because users can impersonate other users, any user would have access to 
 any local file of other users
 Particularly, the second issue is not desirable as a user could get access to 
 ssh keys of other users in the nodes or if there are NFS mounts, get to other 
 users data outside of the cluster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1253) Changes to LinuxContainerExecutor to use cgroups in unsecure mode


[ 
https://issues.apache.org/jira/browse/YARN-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782426#comment-13782426
 ] 

Arun C Murthy commented on YARN-1253:
-

AFAIK, LCE already works in non-secure more. 

Can you please help me understand what is the extra ask here? Is the intent to 
ensure only NM can use LCE to run as other users?

 Changes to LinuxContainerExecutor to use cgroups in unsecure mode
 -

 Key: YARN-1253
 URL: https://issues.apache.org/jira/browse/YARN-1253
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Roman Shaposhnik
Priority: Blocker

 When using cgroups we require LCE to be configured in the cluster to start 
 containers. 
 When LCE starts containers as the user that submitted the job. While this 
 works correctly in a secure setup, in an un-secure setup this presents a 
 couple issues:
 * LCE requires all Hadoop users submitting jobs to be Unix users in all nodes
 * Because users can impersonate other users, any user would have access to 
 any local file of other users
 Particularly, the second issue is not desirable as a user could get access to 
 ssh keys of other users in the nodes or if there are NFS mounts, get to other 
 users data outside of the cluster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1253) Changes to LinuxContainerExecutor to use cgroups in unsecure mode


 [ 
https://issues.apache.org/jira/browse/YARN-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1253:


Target Version/s:   (was: 2.1.1-beta)

 Changes to LinuxContainerExecutor to use cgroups in unsecure mode
 -

 Key: YARN-1253
 URL: https://issues.apache.org/jira/browse/YARN-1253
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Roman Shaposhnik
Priority: Blocker

 When using cgroups we require LCE to be configured in the cluster to start 
 containers. 
 When LCE starts containers as the user that submitted the job. While this 
 works correctly in a secure setup, in an un-secure setup this presents a 
 couple issues:
 * LCE requires all Hadoop users submitting jobs to be Unix users in all nodes
 * Because users can impersonate other users, any user would have access to 
 any local file of other users
 Particularly, the second issue is not desirable as a user could get access to 
 ssh keys of other users in the nodes or if there are NFS mounts, get to other 
 users data outside of the cluster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1253) Changes to LinuxContainerExecutor to use cgroups in unsecure mode


[ 
https://issues.apache.org/jira/browse/YARN-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782431#comment-13782431
 ] 

Arun C Murthy commented on YARN-1253:
-

It would help to understand what LCE+nonsecure doesn't solve yet... 

 Changes to LinuxContainerExecutor to use cgroups in unsecure mode
 -

 Key: YARN-1253
 URL: https://issues.apache.org/jira/browse/YARN-1253
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Roman Shaposhnik
Priority: Blocker

 When using cgroups we require LCE to be configured in the cluster to start 
 containers. 
 When LCE starts containers as the user that submitted the job. While this 
 works correctly in a secure setup, in an un-secure setup this presents a 
 couple issues:
 * LCE requires all Hadoop users submitting jobs to be Unix users in all nodes
 * Because users can impersonate other users, any user would have access to 
 any local file of other users
 Particularly, the second issue is not desirable as a user could get access to 
 ssh keys of other users in the nodes or if there are NFS mounts, get to other 
 users data outside of the cluster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1070) ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL

2013-09-30 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782434#comment-13782434
 ] 

Vinod Kumar Vavilapalli commented on YARN-1070:
---

Patch looks good to me, +1. Rekicking Jenkins before comitting it.

 ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at 
 CONTAINER_CLEANEDUP_AFTER_KILL
 -

 Key: YARN-1070
 URL: https://issues.apache.org/jira/browse/YARN-1070
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Attachments: YARN-1070.1.patch, YARN-1070.2.patch, YARN-1070.3.patch, 
 YARN-1070.4.patch, YARN-1070.5.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1010) FairScheduler: decouple container scheduling from nodemanager heartbeats

2013-09-30 Thread Wei Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782437#comment-13782437
 ] 

Wei Yan commented on YARN-1010:
---

Updates in the patch.

(1) The {{FairScheduler}} launches a thread to do the continuous scheduler.
(2) Several configuration fields:
{{yarn.scheduler.fair.continuous.scheduling.enabled}}. Whether to enable 
continuous scheduling. The default value is false.
{{yarn.scheduler.fair.continuous.scheduling.sleep.time.ms}}. The sleep time for 
each round of continuous scheduling, default valus is 5 ms.

Configurations for delay scheduling:
{{yarn.scheduler.fair.locality.threshold.node.time.ms}}. Time threshold for 
node locality. The default value is -1L.
{{yarn.scheduler.fair.locality.threshold.rack.time.ms}}. Time threshold for 
rack locality. The default value is -1L.
(3) Add test cases for continuous scheduling in {{TestFairScheduler}}, and the 
delay scheduling mechanism in {{TestFSSchedulerApp}}.


 FairScheduler: decouple container scheduling from nodemanager heartbeats
 

 Key: YARN-1010
 URL: https://issues.apache.org/jira/browse/YARN-1010
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Wei Yan
Priority: Critical
 Attachments: YARN-1010.patch


 Currently scheduling for a node is done when a node heartbeats.
 For large cluster where the heartbeat interval is set to several seconds this 
 delays scheduling of incoming allocations significantly.
 We could have a continuous loop scanning all nodes and doing scheduling. If 
 there is availability AMs will get the allocation in the next heartbeat after 
 the one that placed the request.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1253) Changes to LinuxContainerExecutor to use cgroups in unsecure mode


[ 
https://issues.apache.org/jira/browse/YARN-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782435#comment-13782435
 ] 

Alejandro Abdelnur commented on YARN-1253:
--

LCE it works in a no-secure setup, but it has 2 issues as stated in the 
description of the JIRA:


* LCE requires all Hadoop users submitting jobs to be Unix users in all nodes
* Because users can impersonate other users, any user would have access to any 
local file of other users

Particularly, the second issue is not desirable as a user could get access to 
ssh keys of other users in the nodes or if there are NFS mounts, get to other 
users data outside of the cluster.


It could be argued that the first one could be a requirement (though, doing an 
analogy, it is not for HDFS permissions in unsecure mode)

The second issue is the, IMO, severe one. Specially for the the scenarios 
mentioned in the following up paragraph Particularly, 


 Changes to LinuxContainerExecutor to use cgroups in unsecure mode
 -

 Key: YARN-1253
 URL: https://issues.apache.org/jira/browse/YARN-1253
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Roman Shaposhnik
Priority: Blocker

 When using cgroups we require LCE to be configured in the cluster to start 
 containers. 
 When LCE starts containers as the user that submitted the job. While this 
 works correctly in a secure setup, in an un-secure setup this presents a 
 couple issues:
 * LCE requires all Hadoop users submitting jobs to be Unix users in all nodes
 * Because users can impersonate other users, any user would have access to 
 any local file of other users
 Particularly, the second issue is not desirable as a user could get access to 
 ssh keys of other users in the nodes or if there are NFS mounts, get to other 
 users data outside of the cluster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1241) In Fair Scheduler maxRunningApps does not work for non-leaf queues


[ 
https://issues.apache.org/jira/browse/YARN-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782448#comment-13782448
 ] 

Sandy Ryza commented on YARN-1241:
--

Uploaded patch to fix findbugs warnings

 In Fair Scheduler maxRunningApps does not work for non-leaf queues
 --

 Key: YARN-1241
 URL: https://issues.apache.org/jira/browse/YARN-1241
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1241-1.patch, YARN-1241-2.patch, YARN-1241.patch


 Setting the maxRunningApps property on a parent queue should make it that the 
 sum of apps in all subqueues can't exceed it



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1258) Allow configuring the Fair Scheduler root queue

Sandy Ryza created YARN-1258:


 Summary: Allow configuring the Fair Scheduler root queue
 Key: YARN-1258
 URL: https://issues.apache.org/jira/browse/YARN-1258
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.1.1-beta
Reporter: Sandy Ryza


This would be useful for acls, maxRunningApps, scheduling modes, etc.

The allocation file should be able to accept both:
* An implicit root queue
* A root queue at the top of the hierarchy with all queues under/inside of it



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1070) ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL


[ 
https://issues.apache.org/jira/browse/YARN-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782453#comment-13782453
 ] 

Hadoop QA commented on YARN-1070:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12601671/YARN-1070.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2047//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2047//console

This message is automatically generated.

 ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at 
 CONTAINER_CLEANEDUP_AFTER_KILL
 -

 Key: YARN-1070
 URL: https://issues.apache.org/jira/browse/YARN-1070
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Attachments: YARN-1070.1.patch, YARN-1070.2.patch, YARN-1070.3.patch, 
 YARN-1070.4.patch, YARN-1070.5.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1070) ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL

2013-09-30 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782474#comment-13782474
 ] 

Hudson commented on YARN-1070:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4502 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4502/])
YARN-1070. Fixed race conditions in NodeManager during container-kill. 
Contributed by Zhijie Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1527827)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainersLauncher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java


 ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at 
 CONTAINER_CLEANEDUP_AFTER_KILL
 -

 Key: YARN-1070
 URL: https://issues.apache.org/jira/browse/YARN-1070
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Fix For: 2.1.2-beta

 Attachments: YARN-1070.1.patch, YARN-1070.2.patch, YARN-1070.3.patch, 
 YARN-1070.4.patch, YARN-1070.5.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1215) Yarn URL should include userinfo

2013-09-30 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782508#comment-13782508
 ] 

Bikas Saha commented on YARN-1215:
--

Hmm.OK. Lets take the patch then. +1.

 Yarn URL should include userinfo
 

 Key: YARN-1215
 URL: https://issues.apache.org/jira/browse/YARN-1215
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 3.0.0
Reporter: Chuan Liu
Assignee: Chuan Liu
 Attachments: YARN-1215-trunk.2.patch, YARN-1215-trunk.patch


 In the {{org.apache.hadoop.yarn.api.records.URL}} class, we don't have an 
 userinfo as part of the URL. When converting a {{java.net.URI}} object into 
 the YARN URL object in {{ConverterUtils.getYarnUrlFromURI()}} method, we will 
 set uri host as the url host. If the uri has a userinfo part, the userinfo is 
 discarded. This will lead to information loss if the original uri has the 
 userinfo, e.g. foo://username:passw...@example.com will be converted to 
 foo://example.com and username/password information is lost during the 
 conversion.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-905) Add state filters to nodes CLI


[ 
https://issues.apache.org/jira/browse/YARN-905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782511#comment-13782511
 ] 

Hadoop QA commented on YARN-905:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12606009/YARN-905-addendum.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2048//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2048//console

This message is automatically generated.

 Add state filters to nodes CLI
 --

 Key: YARN-905
 URL: https://issues.apache.org/jira/browse/YARN-905
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Wei Yan
 Attachments: YARN-905-addendum.patch, YARN-905-addendum.patch, 
 YARN-905-addendum.patch, Yarn-905.patch, YARN-905.patch, YARN-905.patch


 It would be helpful for the nodes CLI to have a node-states option that 
 allows it to return nodes that are not just in the RUNNING state.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1257) Avro apps are failing with hadoop2


[ 
https://issues.apache.org/jira/browse/YARN-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782513#comment-13782513
 ] 

Arun C Murthy commented on YARN-1257:
-

[~mayank_bansal]: seems like this is using o.a.h.mapreduce.* apis? If so, 
you'll have to recompile against hadoop-2...

 Avro apps are failing with hadoop2
 --

 Key: YARN-1257
 URL: https://issues.apache.org/jira/browse/YARN-1257
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Mayank Bansal
Priority: Blocker
 Fix For: 2.1.2-beta


 hi,
 MR Apps which are using avro is not running.
 These apps are compile with the hadoop1 jars.
 Exception in thread main java.lang.IncompatibleClassChangeError: Found 
 interface org.apache.hadoop.mapreduce.JobContext, but class was expected
 at 
 org.apache.avro.mapreduce.AvroMultipleOutputs.getNamedOutputsList(AvroMultipleOutputs.java:208)
 at 
 org.apache.avro.mapreduce.AvroMultipleOutputs.checkNamedOutputName(AvroMultipleOutputs.java:195)
 at 
 org.apache.avro.mapreduce.AvroMultipleOutputs.addNamedOutput(AvroMultipleOutputs.java:259)
 at 
 com.ebay.sojourner.HadoopJob.addAvroMultipleOutput(HadoopJob.java:157)
 at 
 com.ebay.sojourner.intraday.IntradayJob.initJob(IntradayJob.java:113)
 at com.ebay.sojourner.HadoopJob.run(HadoopJob.java:76)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
 at com.ebay.sojourner.intraday.IntradayJob.main(IntradayJob.java:165)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
 at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
 at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
 at com.ebay.sojourner.JobDriver.main(JobDriver.java:52)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1241) In Fair Scheduler maxRunningApps does not work for non-leaf queues


[ 
https://issues.apache.org/jira/browse/YARN-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782542#comment-13782542
 ] 

Hadoop QA commented on YARN-1241:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606007/YARN-1241-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 4 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2049//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2049//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2049//console

This message is automatically generated.

 In Fair Scheduler maxRunningApps does not work for non-leaf queues
 --

 Key: YARN-1241
 URL: https://issues.apache.org/jira/browse/YARN-1241
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1241-1.patch, YARN-1241-2.patch, YARN-1241.patch


 Setting the maxRunningApps property on a parent queue should make it that the 
 sum of apps in all subqueues can't exceed it



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1141) Updating resource requests should be decoupled with updating blacklist


[ 
https://issues.apache.org/jira/browse/YARN-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782544#comment-13782544
 ] 

Hadoop QA commented on YARN-1141:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606004/YARN-1141.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2050//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2050//console

This message is automatically generated.

 Updating resource requests should be decoupled with updating blacklist
 --

 Key: YARN-1141
 URL: https://issues.apache.org/jira/browse/YARN-1141
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1141.1.patch, YARN-1141.2.patch


 Currently, in CapacityScheduler and FifoScheduler, blacklist is updated 
 together with resource requests, only when the incoming resource requests are 
 not empty. Therefore, when the incoming resource requests are empty, the 
 blacklist will not be updated even when blacklist additions and removals are 
 not empty.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1253) Changes to LinuxContainerExecutor to use cgroups in unsecure mode

[
https://issues.apache.org/jira/browse/YARN-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782548#comment-13782548
]

Arun C Murthy commented on YARN-1253:
-

bq. Because users can impersonate other users, any user would have access to
any local file of other users

You mean by submitting a job as someone else?

I think we need to step back - is the requirement that we need to use cgroups
in non-secure mode for resource isolation? If so, LCE in non-secure mode is
sufficient.

Let's not confuse this with security, we already have the problem where someone
can delete all data in HDFS in non-secure mode... which is why we have a
security in the first place.

Changes to LinuxContainerExecutor to use cgroups in unsecure mode
-

Key: YARN-1253
URL: https://issues.apache.org/jira/browse/YARN-1253
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Roman Shaposhnik
Priority: Blocker

When using cgroups we require LCE to be configured in the cluster to start
containers.
When LCE starts containers as the user that submitted the job. While this
works correctly in a secure setup, in an un-secure setup this presents a
couple issues:
* LCE requires all Hadoop users submitting jobs to be Unix users in all nodes
* Because users can impersonate other users, any user would have access to
any local file of other users
Particularly, the second issue is not desirable as a user could get access to
ssh keys of other users in the nodes or if there are NFS mounts, get to other
users data outside of the cluster.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1241) In Fair Scheduler maxRunningApps does not work for non-leaf queues


[ 
https://issues.apache.org/jira/browse/YARN-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782566#comment-13782566
 ] 

Sandy Ryza commented on YARN-1241:
--

Uploaded patch to fix the new findbugs warnings

 In Fair Scheduler maxRunningApps does not work for non-leaf queues
 --

 Key: YARN-1241
 URL: https://issues.apache.org/jira/browse/YARN-1241
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1241-1.patch, YARN-1241-2.patch, YARN-1241-3.patch, 
 YARN-1241.patch


 Setting the maxRunningApps property on a parent queue should make it that the 
 sum of apps in all subqueues can't exceed it



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Resolved] (YARN-1257) Avro apps are failing with hadoop2


 [ 
https://issues.apache.org/jira/browse/YARN-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy resolved YARN-1257.
-

Resolution: Not A Problem

I'm closing at won't fix, pls re-open if necessary. Thanks.

 Avro apps are failing with hadoop2
 --

 Key: YARN-1257
 URL: https://issues.apache.org/jira/browse/YARN-1257
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Mayank Bansal
Priority: Blocker
 Fix For: 2.1.2-beta


 hi,
 MR Apps which are using avro is not running.
 These apps are compile with the hadoop1 jars.
 Exception in thread main java.lang.IncompatibleClassChangeError: Found 
 interface org.apache.hadoop.mapreduce.JobContext, but class was expected
 at 
 org.apache.avro.mapreduce.AvroMultipleOutputs.getNamedOutputsList(AvroMultipleOutputs.java:208)
 at 
 org.apache.avro.mapreduce.AvroMultipleOutputs.checkNamedOutputName(AvroMultipleOutputs.java:195)
 at 
 org.apache.avro.mapreduce.AvroMultipleOutputs.addNamedOutput(AvroMultipleOutputs.java:259)
 at 
 com.ebay.sojourner.HadoopJob.addAvroMultipleOutput(HadoopJob.java:157)
 at 
 com.ebay.sojourner.intraday.IntradayJob.initJob(IntradayJob.java:113)
 at com.ebay.sojourner.HadoopJob.run(HadoopJob.java:76)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
 at com.ebay.sojourner.intraday.IntradayJob.main(IntradayJob.java:165)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
 at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
 at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
 at com.ebay.sojourner.JobDriver.main(JobDriver.java:52)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1253) Changes to LinuxContainerExecutor to use cgroups in unsecure mode

2013-09-30 Thread Vinod Kumar Vavilapalli (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782572#comment-13782572
]

Vinod Kumar Vavilapalli commented on YARN-1253:
---

Agree with [~acmurthy], LCE + unsecure mode can already be used to do cgroup.
If there are bugs, we should fix them.

bq. LCE requires all Hadoop users submitting jobs to be Unix users in all nodes
Yes, this has always been a requirement. I think there is some effort going on
in the Windows world of Hadoop to change this, you should look at it.

bq. Because users can impersonate other users, any user would have access to
any local file of other users
Even if the jobs run as a single 'yarnuser', security isn't still there - like
Arun said, any body can bomb HDFS directories of other users, any user can kill
any other user's tasks/containers, any one can delete any one else's local
dirs, log-dir and so on. We could argue which is worse - stealing user's
passwords or deleting other user's data on DFS - it depends on who you ask. If
you want security, you should enable security.

Changes to LinuxContainerExecutor to use cgroups in unsecure mode
-

Key: YARN-1253
URL: https://issues.apache.org/jira/browse/YARN-1253
Project: Hadoop YARN
Issue Type: New Feature
Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Roman Shaposhnik
Priority: Blocker

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1253) Changes to LinuxContainerExecutor to use cgroups in unsecure mode

2013-09-30 Thread Roman Shaposhnik (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782571#comment-13782571
 ] 

Roman Shaposhnik commented on YARN-1253:


I've started doing some preliminary work on this JIRA, so hopefully I can 
explain some of the things that my patch is about to address:
  # the reason to use LCE in a non-secure mode is to be able to take advantage 
of cgroups mechanism, now perhaps cgroups functionality should be independent 
from the rest of LCE functionality, but re-using the current LCE design is also 
quite easy -- hence lets assume that for cgroups we need LCE
  # in a fully secure deployment, LCE works perfectly and makes YARN users 
correspond 1-1 with the local UNIX users provisioned on each worker node
  # in a non-secure deployment this 1-1 correspondence feels like a  burden 
that doesn't necessarily have to be there

Thus, the proposal is really to add a tiny bit of functionality to LCE where in 
a non-secure case it would be able to run all tasks under a single designated 
user (different from a user running nodemanager). On top of that, the notion of 
the YARN user (which no longer has to have a corresponding UNIX user) get 
preserved in everything else that LCE does (which really boils down to paths in 
the local filesystem used for localization).

 Changes to LinuxContainerExecutor to use cgroups in unsecure mode
 -

 Key: YARN-1253
 URL: https://issues.apache.org/jira/browse/YARN-1253
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Roman Shaposhnik
Priority: Blocker

 When using cgroups we require LCE to be configured in the cluster to start 
 containers. 
 When LCE starts containers as the user that submitted the job. While this 
 works correctly in a secure setup, in an un-secure setup this presents a 
 couple issues:
 * LCE requires all Hadoop users submitting jobs to be Unix users in all nodes
 * Because users can impersonate other users, any user would have access to 
 any local file of other users
 Particularly, the second issue is not desirable as a user could get access to 
 ssh keys of other users in the nodes or if there are NFS mounts, get to other 
 users data outside of the cluster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1260) RM_HOME link breaks when webapp.https.address related properties are not specified

2013-09-30 Thread Yesha Vora (JIRA)

Yesha Vora created YARN-1260:


 Summary: RM_HOME link breaks when webapp.https.address related 
properties are not specified
 Key: YARN-1260
 URL: https://issues.apache.org/jira/browse/YARN-1260
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Yesha Vora


This issue happens in multiple node cluster where resource manager and node 
manager are running on different machines.

Steps to reproduce:
1) set yarn.resourcemanager.hostname = resourcemanager host in yarn-site.xml
2) set hadoop.ssl.enabled = true in core-site.xml
3) Do not specify below property in yarn-site.xml
yarn.nodemanager.webapp.https.address and 
yarn.resourcemanager.webapp.https.address
Here, the default value of above two property will be considered.
4) Go to nodemanager web UI https:nodemanager host:8044/node
5) Click on RM_HOME link 
This link redirects to https:nodemanager host:8090/cluster instead 
https:resourcemanager host:8090/cluster




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-976) Document the meaning of a virtual core


 [ 
https://issues.apache.org/jira/browse/YARN-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-976:


Attachment: YARN-976.patch

 Document the meaning of a virtual core
 --

 Key: YARN-976
 URL: https://issues.apache.org/jira/browse/YARN-976
 Project: Hadoop YARN
  Issue Type: Task
  Components: documentation
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-976.patch


 As virtual cores are a somewhat novel concept, it would be helpful to have 
 thorough documentation that clarifies their meaning.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1260) RM_HOME link breaks when webapp.https.address related properties are not specified

2013-09-30 Thread Yesha Vora (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-1260:
-

Description: 
This issue happens in multiple node cluster where resource manager and node 
manager are running on different machines.

Steps to reproduce:
1) set yarn.resourcemanager.hostname = resourcemanager host in yarn-site.xml
2) set hadoop.ssl.enabled = true in core-site.xml
3) Do not specify below property in yarn-site.xml
yarn.nodemanager.webapp.https.address and 
yarn.resourcemanager.webapp.https.address
Here, the default value of above two property will be considered.
4) Go to nodemanager web UI https://nodemanager host:8044/node
5) Click on RM_HOME link 
This link redirects to https://nodemanager host:8090/cluster instead 
https://resourcemanager host:8090/cluster


  was:
This issue happens in multiple node cluster where resource manager and node 
manager are running on different machines.

Steps to reproduce:
1) set yarn.resourcemanager.hostname = resourcemanager host in yarn-site.xml
2) set hadoop.ssl.enabled = true in core-site.xml
3) Do not specify below property in yarn-site.xml
yarn.nodemanager.webapp.https.address and 
yarn.resourcemanager.webapp.https.address
Here, the default value of above two property will be considered.
4) Go to nodemanager web UI https:nodemanager host:8044/node
5) Click on RM_HOME link 
This link redirects to https:nodemanager host:8090/cluster instead 
https:resourcemanager host:8090/cluster



 RM_HOME link breaks when webapp.https.address related properties are not 
 specified
 --

 Key: YARN-1260
 URL: https://issues.apache.org/jira/browse/YARN-1260
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Yesha Vora

 This issue happens in multiple node cluster where resource manager and node 
 manager are running on different machines.
 Steps to reproduce:
 1) set yarn.resourcemanager.hostname = resourcemanager host in yarn-site.xml
 2) set hadoop.ssl.enabled = true in core-site.xml
 3) Do not specify below property in yarn-site.xml
 yarn.nodemanager.webapp.https.address and 
 yarn.resourcemanager.webapp.https.address
 Here, the default value of above two property will be considered.
 4) Go to nodemanager web UI https://nodemanager host:8044/node
 5) Click on RM_HOME link 
 This link redirects to https://nodemanager host:8090/cluster instead 
 https://resourcemanager host:8090/cluster



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-976) Document the meaning of a virtual core


 [ 
https://issues.apache.org/jira/browse/YARN-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-976:


Issue Type: Sub-task  (was: Task)
Parent: YARN-1024

 Document the meaning of a virtual core
 --

 Key: YARN-976
 URL: https://issues.apache.org/jira/browse/YARN-976
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-976.patch


 As virtual cores are a somewhat novel concept, it would be helpful to have 
 thorough documentation that clarifies their meaning.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1089) Add YARN compute units alongside virtual cores


 [ 
https://issues.apache.org/jira/browse/YARN-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1089:
-

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-1024

 Add YARN compute units alongside virtual cores
 --

 Key: YARN-1089
 URL: https://issues.apache.org/jira/browse/YARN-1089
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1089-1.patch, YARN-1089.patch


 Based on discussion in YARN-1024, we will add YARN compute units as a 
 resource for requesting and scheduling CPU processing power.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1024) Define a CPU resource(s) unambigiously


 [ 
https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1024:
-

Summary: Define a CPU resource(s) unambigiously  (was: Define a virtual 
core unambigiously)

 Define a CPU resource(s) unambigiously
 --

 Key: YARN-1024
 URL: https://issues.apache.org/jira/browse/YARN-1024
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Attachments: CPUasaYARNresource.pdf


 We need to clearly define the meaning of a virtual core unambiguously so that 
 it's easy to migrate applications between clusters.
 For e.g. here is Amazon EC2 definition of ECU: 
 http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it
 Essentially we need to clearly define a YARN Virtual Core (YVC).
 Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the 
 equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.*



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1241) In Fair Scheduler maxRunningApps does not work for non-leaf queues


[ 
https://issues.apache.org/jira/browse/YARN-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782582#comment-13782582
 ] 

Hadoop QA commented on YARN-1241:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606033/YARN-1241-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2051//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2051//console

This message is automatically generated.

 In Fair Scheduler maxRunningApps does not work for non-leaf queues
 --

 Key: YARN-1241
 URL: https://issues.apache.org/jira/browse/YARN-1241
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1241-1.patch, YARN-1241-2.patch, YARN-1241-3.patch, 
 YARN-1241.patch


 Setting the maxRunningApps property on a parent queue should make it that the 
 sum of apps in all subqueues can't exceed it



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1260) RM_HOME link breaks when webapp.https.address related properties are not specified


 [ 
https://issues.apache.org/jira/browse/YARN-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1260:


Priority: Blocker  (was: Major)

 RM_HOME link breaks when webapp.https.address related properties are not 
 specified
 --

 Key: YARN-1260
 URL: https://issues.apache.org/jira/browse/YARN-1260
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta, 2.1.2-beta
Reporter: Yesha Vora
Priority: Blocker

 This issue happens in multiple node cluster where resource manager and node 
 manager are running on different machines.
 Steps to reproduce:
 1) set yarn.resourcemanager.hostname = resourcemanager host in yarn-site.xml
 2) set hadoop.ssl.enabled = true in core-site.xml
 3) Do not specify below property in yarn-site.xml
 yarn.nodemanager.webapp.https.address and 
 yarn.resourcemanager.webapp.https.address
 Here, the default value of above two property will be considered.
 4) Go to nodemanager web UI https://nodemanager host:8044/node
 5) Click on RM_HOME link 
 This link redirects to https://nodemanager host:8090/cluster instead 
 https://resourcemanager host:8090/cluster



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1260) RM_HOME link breaks when webapp.https.address related properties are not specified


 [ 
https://issues.apache.org/jira/browse/YARN-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1260:


Affects Version/s: 2.1.2-beta

 RM_HOME link breaks when webapp.https.address related properties are not 
 specified
 --

 Key: YARN-1260
 URL: https://issues.apache.org/jira/browse/YARN-1260
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta, 2.1.2-beta
Reporter: Yesha Vora
Priority: Blocker

 This issue happens in multiple node cluster where resource manager and node 
 manager are running on different machines.
 Steps to reproduce:
 1) set yarn.resourcemanager.hostname = resourcemanager host in yarn-site.xml
 2) set hadoop.ssl.enabled = true in core-site.xml
 3) Do not specify below property in yarn-site.xml
 yarn.nodemanager.webapp.https.address and 
 yarn.resourcemanager.webapp.https.address
 Here, the default value of above two property will be considered.
 4) Go to nodemanager web UI https://nodemanager host:8044/node
 5) Click on RM_HOME link 
 This link redirects to https://nodemanager host:8090/cluster instead 
 https://resourcemanager host:8090/cluster



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Assigned] (YARN-1260) RM_HOME link breaks when webapp.https.address related properties are not specified


 [ 
https://issues.apache.org/jira/browse/YARN-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi reassigned YARN-1260:
---

Assignee: Omkar Vinit Joshi

 RM_HOME link breaks when webapp.https.address related properties are not 
 specified
 --

 Key: YARN-1260
 URL: https://issues.apache.org/jira/browse/YARN-1260
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta, 2.1.2-beta
Reporter: Yesha Vora
Assignee: Omkar Vinit Joshi
Priority: Blocker

 This issue happens in multiple node cluster where resource manager and node 
 manager are running on different machines.
 Steps to reproduce:
 1) set yarn.resourcemanager.hostname = resourcemanager host in yarn-site.xml
 2) set hadoop.ssl.enabled = true in core-site.xml
 3) Do not specify below property in yarn-site.xml
 yarn.nodemanager.webapp.https.address and 
 yarn.resourcemanager.webapp.https.address
 Here, the default value of above two property will be considered.
 4) Go to nodemanager web UI https://nodemanager host:8044/node
 5) Click on RM_HOME link 
 This link redirects to https://nodemanager host:8090/cluster instead 
 https://resourcemanager host:8090/cluster



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-976) Document the meaning of a virtual core