[jira] [Commented] (YARN-1273) Distributed shell does not account for start container failures reported asynchronously.

2013-10-05 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786950#comment-13786950
 ] 

Vinod Kumar Vavilapalli commented on YARN-1273:
---

Looks good. The test fails without the main code changes and passes with.

+1, checking this in.

 Distributed shell does not account for start container failures reported 
 asynchronously.
 

 Key: YARN-1273
 URL: https://issues.apache.org/jira/browse/YARN-1273
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: YARN-1273.1_1.patch, YARN-1273.2.patch, YARN-1273.3.patch


 2013-10-04 22:09:15,234 ERROR 
 [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #1] 
 distributedshell.ApplicationMaster 
 (ApplicationMaster.java:onStartContainerError(719)) - Failed to start 
 Container container_1380920347574_0018_01_06



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1274) LCE fails to run containers that don't have resources to localize

2013-10-05 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-1274:
-

Attachment: YARN-1274.1.txt

Updated launch_container to create the app level local and log directories. 
Verified dir permissions on a secure cluster.

 LCE fails to run containers that don't have resources to localize
 -

 Key: YARN-1274
 URL: https://issues.apache.org/jira/browse/YARN-1274
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Alejandro Abdelnur
Assignee: Siddharth Seth
Priority: Blocker
 Attachments: YARN-1274.1.txt


 LCE container launch assumes the usercache/USER directory exists and it is 
 owned by the user running the container process.
 But the directory is created only if there are resources to localize by the 
 LCE localization command, if there are not resourcdes to localize, LCE 
 localization never executes and launching fails reporting 255 exit code and 
 the NM logs have something like:
 {code}
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command 
 provided 1
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is 
 llama
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't create 
 directory llama in 
 /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_04
  - Permission denied
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1274) LCE fails to run containers that don't have resources to localize

2013-10-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786953#comment-13786953
 ] 

Hadoop QA commented on YARN-1274:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606975/YARN-1274.1.txt
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2121//console

This message is automatically generated.

 LCE fails to run containers that don't have resources to localize
 -

 Key: YARN-1274
 URL: https://issues.apache.org/jira/browse/YARN-1274
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Alejandro Abdelnur
Assignee: Siddharth Seth
Priority: Blocker
 Attachments: YARN-1274.1.txt


 LCE container launch assumes the usercache/USER directory exists and it is 
 owned by the user running the container process.
 But the directory is created only if there are resources to localize by the 
 LCE localization command, if there are not resourcdes to localize, LCE 
 localization never executes and launching fails reporting 255 exit code and 
 the NM logs have something like:
 {code}
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command 
 provided 1
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is 
 llama
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't create 
 directory llama in 
 /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_04
  - Permission denied
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1273) Distributed shell does not account for start container failures reported asynchronously.

2013-10-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786957#comment-13786957
 ] 

Hudson commented on YARN-1273:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4545 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4545/])
YARN-1273. Fixed Distributed-shell to account for containers that failed to 
start. Contributed by Hitesh Shah. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529389)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/ContainerLaunchFailAppMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


 Distributed shell does not account for start container failures reported 
 asynchronously.
 

 Key: YARN-1273
 URL: https://issues.apache.org/jira/browse/YARN-1273
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Fix For: 2.1.2-beta

 Attachments: YARN-1273.1_1.patch, YARN-1273.2.patch, YARN-1273.3.patch


 2013-10-04 22:09:15,234 ERROR 
 [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #1] 
 distributedshell.ApplicationMaster 
 (ApplicationMaster.java:onStartContainerError(719)) - Failed to start 
 Container container_1380920347574_0018_01_06



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1271) Text file busy errors launching containers again

2013-10-05 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786960#comment-13786960
 ] 

Sandy Ryza commented on YARN-1271:
--

Thanks Vinod!

 Text file busy errors launching containers again
 --

 Key: YARN-1271
 URL: https://issues.apache.org/jira/browse/YARN-1271
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.1.2-beta

 Attachments: YARN-1271-branch-2.patch, YARN-1271.patch


 The error is shown below in the comments.
 MAPREDUCE-2374 fixed this by removing -c when running the container launch 
 script.  It looks like the -c got brought back during the windows branch 
 merge, so we should remove it again.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1225) FinishApplicationMasterRequest should also have a final IPC/RPC address.

2013-10-05 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1225:
-

Attachment: YARN-1225-v4.1.patch

Upload YARN-1225-v4.1.patch.
Above test failure is due to changes in v4 tries to catch specific exceptions 
instead of Exception in RMCommunicator(with addressing existing Findbugs 
warning). Without catching cast exception, TestLocalContainerAllocator will get 
failed as faked job class cannot cast to JobImpl when stopping 
LocalContainerAllocator (will call RMCommunicator.unregister()).

 FinishApplicationMasterRequest should also have a final IPC/RPC address.
 

 Key: YARN-1225
 URL: https://issues.apache.org/jira/browse/YARN-1225
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: Vinod Kumar Vavilapalli
Assignee: Junping Du
 Attachments: YARN-1225-kickOffTestDS.patch, YARN-1225-v1.patch, 
 YARN-1225-v2.patch, YARN-1225-v3.patch, YARN-1225-v4.1.patch, 
 YARN-1225-v4.patch


 AMs already can report final Http URL via FinishApplicationMasterRequest, but 
 there is no field to report an IPC/RPC address.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1253) Changes to LinuxContainerExecutor to run containers as a single dedicated user in non-secure mode

2013-10-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787170#comment-13787170
 ] 

Hudson commented on YARN-1253:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #353 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/353/])
YARN-1253. Changes to LinuxContainerExecutor to run containers as a single 
dedicated user in non-secure mode. (rvs via tucu) (tucu: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529325)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java


 Changes to LinuxContainerExecutor to run containers as a single dedicated 
 user in non-secure mode
 -

 Key: YARN-1253
 URL: https://issues.apache.org/jira/browse/YARN-1253
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Roman Shaposhnik
Priority: Blocker
 Fix For: 2.3.0

 Attachments: YARN-1253.patch.txt


 When using cgroups we require LCE to be configured in the cluster to start 
 containers. 
 When LCE starts containers as the user that submitted the job. While this 
 works correctly in a secure setup, in an un-secure setup this presents a 
 couple issues:
 * LCE requires all Hadoop users submitting jobs to be Unix users in all nodes
 * Because users can impersonate other users, any user would have access to 
 any local file of other users
 Particularly, the second issue is not desirable as a user could get access to 
 ssh keys of other users in the nodes or if there are NFS mounts, get to other 
 users data outside of the cluster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1254) NM is polluting container's credentials

2013-10-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787171#comment-13787171
 ] 

Hudson commented on YARN-1254:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #353 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/353/])
YARN-1254. Fixed NodeManager to not pollute container's credentials. 
Contributed by Omkar Vinit Joshi. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529382)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java


 NM is polluting container's credentials
 ---

 Key: YARN-1254
 URL: https://issues.apache.org/jira/browse/YARN-1254
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
 Fix For: 2.1.2-beta

 Attachments: YARN-1254.20131004.1.patch, YARN-1254.20131004.2.patch, 
 YARN-1254.20131030.1.patch


 Before launching the container, NM is using the same credential object and so 
 is polluting what container should see. We should fix this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1273) Distributed shell does not account for start container failures reported asynchronously.

2013-10-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787174#comment-13787174
 ] 

Hudson commented on YARN-1273:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #353 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/353/])
YARN-1273. Fixed Distributed-shell to account for containers that failed to 
start. Contributed by Hitesh Shah. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529389)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/ContainerLaunchFailAppMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


 Distributed shell does not account for start container failures reported 
 asynchronously.
 

 Key: YARN-1273
 URL: https://issues.apache.org/jira/browse/YARN-1273
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Fix For: 2.1.2-beta

 Attachments: YARN-1273.1_1.patch, YARN-1273.2.patch, YARN-1273.3.patch


 2013-10-04 22:09:15,234 ERROR 
 [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #1] 
 distributedshell.ApplicationMaster 
 (ApplicationMaster.java:onStartContainerError(719)) - Failed to start 
 Container container_1380920347574_0018_01_06



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty

2013-10-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787173#comment-13787173
 ] 

Hudson commented on YARN-1167:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #353 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/353/])
YARN-1167. Fixed Distributed Shell to not incorrectly show empty hostname on RM 
UI. Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529376)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/RegisterApplicationMasterRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java


 Submitted distributed shell application shows appMasterHost = empty
 ---

 Key: YARN-1167
 URL: https://issues.apache.org/jira/browse/YARN-1167
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch, 
 YARN-1167.4.patch, YARN-1167.5.patch, YARN-1167.6.patch, YARN-1167.7.patch, 
 YARN-1167.8.patch, YARN-1167.9.patch


 Submit distributed shell application. Once the application turns to be 
 RUNNING state, app master host should not be empty. In reality, it is empty.
 ==console logs==
 distributedshell.Client: Got application report from ASM for, appId=12, 
 clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, 
 appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, 
 distributedFinalState=UNDEFINED, 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty

2013-10-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787217#comment-13787217
 ] 

Hudson commented on YARN-1167:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1543 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1543/])
YARN-1167. Fixed Distributed Shell to not incorrectly show empty hostname on RM 
UI. Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529376)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/RegisterApplicationMasterRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java


 Submitted distributed shell application shows appMasterHost = empty
 ---

 Key: YARN-1167
 URL: https://issues.apache.org/jira/browse/YARN-1167
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch, 
 YARN-1167.4.patch, YARN-1167.5.patch, YARN-1167.6.patch, YARN-1167.7.patch, 
 YARN-1167.8.patch, YARN-1167.9.patch


 Submit distributed shell application. Once the application turns to be 
 RUNNING state, app master host should not be empty. In reality, it is empty.
 ==console logs==
 distributedshell.Client: Got application report from ASM for, appId=12, 
 clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, 
 appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, 
 distributedFinalState=UNDEFINED, 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1253) Changes to LinuxContainerExecutor to run containers as a single dedicated user in non-secure mode

2013-10-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787214#comment-13787214
 ] 

Hudson commented on YARN-1253:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1543 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1543/])
YARN-1253. Changes to LinuxContainerExecutor to run containers as a single 
dedicated user in non-secure mode. (rvs via tucu) (tucu: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529325)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java


 Changes to LinuxContainerExecutor to run containers as a single dedicated 
 user in non-secure mode
 -

 Key: YARN-1253
 URL: https://issues.apache.org/jira/browse/YARN-1253
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Roman Shaposhnik
Priority: Blocker
 Fix For: 2.3.0

 Attachments: YARN-1253.patch.txt


 When using cgroups we require LCE to be configured in the cluster to start 
 containers. 
 When LCE starts containers as the user that submitted the job. While this 
 works correctly in a secure setup, in an un-secure setup this presents a 
 couple issues:
 * LCE requires all Hadoop users submitting jobs to be Unix users in all nodes
 * Because users can impersonate other users, any user would have access to 
 any local file of other users
 Particularly, the second issue is not desirable as a user could get access to 
 ssh keys of other users in the nodes or if there are NFS mounts, get to other 
 users data outside of the cluster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1254) NM is polluting container's credentials

2013-10-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787215#comment-13787215
 ] 

Hudson commented on YARN-1254:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1543 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1543/])
YARN-1254. Fixed NodeManager to not pollute container's credentials. 
Contributed by Omkar Vinit Joshi. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529382)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java


 NM is polluting container's credentials
 ---

 Key: YARN-1254
 URL: https://issues.apache.org/jira/browse/YARN-1254
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
 Fix For: 2.1.2-beta

 Attachments: YARN-1254.20131004.1.patch, YARN-1254.20131004.2.patch, 
 YARN-1254.20131030.1.patch


 Before launching the container, NM is using the same credential object and so 
 is polluting what container should see. We should fix this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1273) Distributed shell does not account for start container failures reported asynchronously.

2013-10-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787218#comment-13787218
 ] 

Hudson commented on YARN-1273:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1543 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1543/])
YARN-1273. Fixed Distributed-shell to account for containers that failed to 
start. Contributed by Hitesh Shah. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529389)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/ContainerLaunchFailAppMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


 Distributed shell does not account for start container failures reported 
 asynchronously.
 

 Key: YARN-1273
 URL: https://issues.apache.org/jira/browse/YARN-1273
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Fix For: 2.1.2-beta

 Attachments: YARN-1273.1_1.patch, YARN-1273.2.patch, YARN-1273.3.patch


 2013-10-04 22:09:15,234 ERROR 
 [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #1] 
 distributedshell.ApplicationMaster 
 (ApplicationMaster.java:onStartContainerError(719)) - Failed to start 
 Container container_1380920347574_0018_01_06



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1273) Distributed shell does not account for start container failures reported asynchronously.

2013-10-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787234#comment-13787234
 ] 

Hudson commented on YARN-1273:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1569 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1569/])
YARN-1273. Fixed Distributed-shell to account for containers that failed to 
start. Contributed by Hitesh Shah. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529389)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/ContainerLaunchFailAppMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


 Distributed shell does not account for start container failures reported 
 asynchronously.
 

 Key: YARN-1273
 URL: https://issues.apache.org/jira/browse/YARN-1273
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Fix For: 2.1.2-beta

 Attachments: YARN-1273.1_1.patch, YARN-1273.2.patch, YARN-1273.3.patch


 2013-10-04 22:09:15,234 ERROR 
 [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #1] 
 distributedshell.ApplicationMaster 
 (ApplicationMaster.java:onStartContainerError(719)) - Failed to start 
 Container container_1380920347574_0018_01_06



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1251) TestDistributedShell#TestDSShell failed with timeout

2013-10-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787226#comment-13787226
 ] 

Hudson commented on YARN-1251:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1569 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1569/])
YARN-1251. TestDistributedShell#TestDSShell failed with timeout. Contributed by 
Xuan Gong. (hitesh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529369)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java


 TestDistributedShell#TestDSShell failed with timeout
 

 Key: YARN-1251
 URL: https://issues.apache.org/jira/browse/YARN-1251
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Junping Du
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: error.log, YARN-1225-kickOffTestDS.patch, 
 YARN-1251.1.patch


 TestDistributedShell#TestDSShell on trunk Jenkins are failed consistently 
 recently.
 The Stacktrace is:
 {code}
 java.lang.Exception: test timed out after 9 milliseconds
   at 
 com.google.protobuf.LiteralByteString.init(LiteralByteString.java:234)
   at com.google.protobuf.ByteString.copyFromUtf8(ByteString.java:255)
   at 
 org.apache.hadoop.ipc.protobuf.ProtobufRpcEngineProtos$RequestHeaderProto.getMethodNameBytes(ProtobufRpcEngineProtos.java:286)
   at 
 org.apache.hadoop.ipc.protobuf.ProtobufRpcEngineProtos$RequestHeaderProto.getSerializedSize(ProtobufRpcEngineProtos.java:462)
   at 
 com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:84)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$RpcMessageWithHeader.write(ProtobufRpcEngine.java:302)
   at 
 org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:989)
   at org.apache.hadoop.ipc.Client.call(Client.java:1377)
   at org.apache.hadoop.ipc.Client.call(Client.java:1357)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at $Proxy70.getApplicationReport(Unknown Source)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137)
   at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
   at $Proxy71.getApplicationReport(Unknown Source)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:195)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:622)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:597)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:125)
 {code}
 For details, please refer:
 https://builds.apache.org/job/PreCommit-YARN-Build/2039//testReport/



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1232) Configuration to support multiple RMs

2013-10-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787229#comment-13787229
 ] 

Hudson commented on YARN-1232:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1569 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1569/])
YARN-1232. Configuration to support multiple RMs (Karthik Kambatla via bikas) 
(bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529251)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestHAUtil.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ServerRMProxy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMHAProtocolService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java


 Configuration to support multiple RMs
 -

 Key: YARN-1232
 URL: https://issues.apache.org/jira/browse/YARN-1232
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, 
 yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch, yarn-1232-7.patch, 
 yarn-1232-7.patch


 We should augment the configuration to allow users specify two RMs and the 
 individual RPC addresses for them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1274) LCE fails to run containers that don't have resources to localize

2013-10-05 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-1274:
-

Attachment: YARN-1274.trunk.1.txt

Patch for trunk and branch-2. The previous patch applies to branch-2.1.

 LCE fails to run containers that don't have resources to localize
 -

 Key: YARN-1274
 URL: https://issues.apache.org/jira/browse/YARN-1274
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Alejandro Abdelnur
Assignee: Siddharth Seth
Priority: Blocker
 Attachments: YARN-1274.1.txt, YARN-1274.trunk.1.txt


 LCE container launch assumes the usercache/USER directory exists and it is 
 owned by the user running the container process.
 But the directory is created only if there are resources to localize by the 
 LCE localization command, if there are not resourcdes to localize, LCE 
 localization never executes and launching fails reporting 255 exit code and 
 the NM logs have something like:
 {code}
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command 
 provided 1
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is 
 llama
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't create 
 directory llama in 
 /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_04
  - Permission denied
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1274) LCE fails to run containers that don't have resources to localize

2013-10-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787295#comment-13787295
 ] 

Hadoop QA commented on YARN-1274:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12607008/YARN-1274.trunk.1.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2124//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2124//console

This message is automatically generated.

 LCE fails to run containers that don't have resources to localize
 -

 Key: YARN-1274
 URL: https://issues.apache.org/jira/browse/YARN-1274
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Alejandro Abdelnur
Assignee: Siddharth Seth
Priority: Blocker
 Attachments: YARN-1274.1.txt, YARN-1274.trunk.1.txt


 LCE container launch assumes the usercache/USER directory exists and it is 
 owned by the user running the container process.
 But the directory is created only if there are resources to localize by the 
 LCE localization command, if there are not resourcdes to localize, LCE 
 localization never executes and launching fails reporting 255 exit code and 
 the NM logs have something like:
 {code}
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command 
 provided 1
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is 
 llama
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't create 
 directory llama in 
 /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_04
  - Permission denied
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1090) Job does not get into Pending State

2013-10-05 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1090:
--

Attachment: YARN-1090.1.patch

Upload a new patch, reorganized the scheduler UI page

 Job does not get into Pending State
 ---

 Key: YARN-1090
 URL: https://issues.apache.org/jira/browse/YARN-1090
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Jian He
 Attachments: YARN-1090.1.patch, YARN-1090.patch


 When there is no resource available to run a job, next job should go in 
 pending state. RM UI should show next job as pending app and the counter for 
 the pending app should be incremented.
 But Currently. Next job stays in ACCEPTED state and No AM has been assigned 
 to this job.Though Pending App count is not incremented. 
 Running 'job status nextjob' shows job state=PREP. 
 $ mapred job -status job_1377122233385_0002
 13/08/21 21:59:23 INFO client.RMProxy: Connecting to ResourceManager at 
 host1/ip1
 Job: job_1377122233385_0002
 Job File: /ABC/.staging/job_1377122233385_0002/job.xml
 Job Tracking URL : http://host1:port1/application_1377122233385_0002/
 Uber job : false
 Number of maps: 0
 Number of reduces: 0
 map() completion: 0.0
 reduce() completion: 0.0
 Job state: PREP
 retired: false
 reason for failure:



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Reopened] (YARN-1276) Secure cluster can have random failure to launch a container.

2013-10-05 Thread Tassapol Athiapinya (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tassapol Athiapinya reopened YARN-1276:
---


 Secure cluster can have random failure to launch a container.
 -

 Key: YARN-1276
 URL: https://issues.apache.org/jira/browse/YARN-1276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
 Fix For: 2.1.2-beta






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (YARN-1276) Secure cluster can have random failure to launch a container.

2013-10-05 Thread Tassapol Athiapinya (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tassapol Athiapinya resolved YARN-1276.
---

Resolution: Duplicate

Correction: closed as duplicate of YARN-1274

 Secure cluster can have random failure to launch a container.
 -

 Key: YARN-1276
 URL: https://issues.apache.org/jira/browse/YARN-1276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
 Fix For: 2.1.2-beta






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (YARN-1276) Secure cluster can have random failure to launch a container.

2013-10-05 Thread Tassapol Athiapinya (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tassapol Athiapinya resolved YARN-1276.
---

Resolution: Invalid

 Secure cluster can have random failure to launch a container.
 -

 Key: YARN-1276
 URL: https://issues.apache.org/jira/browse/YARN-1276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
 Fix For: 2.1.2-beta






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Moved] (YARN-1277) Add http policy support for YARN daemons

2013-10-05 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas moved HDFS-5310 to YARN-1277:
-

Affects Version/s: (was: 2.0.0-alpha)
   2.0.0-alpha
  Key: YARN-1277  (was: HDFS-5310)
  Project: Hadoop YARN  (was: Hadoop HDFS)

 Add http policy support for YARN daemons
 

 Key: YARN-1277
 URL: https://issues.apache.org/jira/browse/YARN-1277
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Suresh Srinivas

 This YARN part of HADOOP-10022.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1277) Add http policy support for YARN daemons

2013-10-05 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated YARN-1277:
--

Attachment: YARN-1277.patch

Here is an initial patch. [~ojoshi], can you please address one of the TODOs in 
the patch?

 Add http policy support for YARN daemons
 

 Key: YARN-1277
 URL: https://issues.apache.org/jira/browse/YARN-1277
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Suresh Srinivas
 Attachments: YARN-1277.patch


 This YARN part of HADOOP-10022.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1090) Job does not get into Pending State

2013-10-05 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787339#comment-13787339
 ] 

Arun C Murthy commented on YARN-1090:
-

@[~jianhe] - Suggestion: can you please provide a simpler patch which only 
tweaks the UI for now? We can rename variables etc. later... thanks!

 Job does not get into Pending State
 ---

 Key: YARN-1090
 URL: https://issues.apache.org/jira/browse/YARN-1090
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Jian He
 Attachments: YARN-1090.1.patch, YARN-1090.patch


 When there is no resource available to run a job, next job should go in 
 pending state. RM UI should show next job as pending app and the counter for 
 the pending app should be incremented.
 But Currently. Next job stays in ACCEPTED state and No AM has been assigned 
 to this job.Though Pending App count is not incremented. 
 Running 'job status nextjob' shows job state=PREP. 
 $ mapred job -status job_1377122233385_0002
 13/08/21 21:59:23 INFO client.RMProxy: Connecting to ResourceManager at 
 host1/ip1
 Job: job_1377122233385_0002
 Job File: /ABC/.staging/job_1377122233385_0002/job.xml
 Job Tracking URL : http://host1:port1/application_1377122233385_0002/
 Uber job : false
 Number of maps: 0
 Number of reduces: 0
 map() completion: 0.0
 reduce() completion: 0.0
 Job state: PREP
 retired: false
 reason for failure:



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (YARN-1277) Add http policy support for YARN daemons

2013-10-05 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi reassigned YARN-1277:
---

Assignee: Omkar Vinit Joshi

 Add http policy support for YARN daemons
 

 Key: YARN-1277
 URL: https://issues.apache.org/jira/browse/YARN-1277
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Suresh Srinivas
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1277.patch


 This YARN part of HADOOP-10022.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1090) Job does not get into Pending State

2013-10-05 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1090:
--

Attachment: YARN-1090.2.patch

Uploaded a patch that only renames 
Num Pending Apps  - Num Non-schedulable apps
Num Active Apps - Num Schedulable apps

Also added a missed entry for Absolute Used Capacity in the UI.
Fixed a minor comment in QueueMetrics

 Job does not get into Pending State
 ---

 Key: YARN-1090
 URL: https://issues.apache.org/jira/browse/YARN-1090
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Jian He
 Attachments: YARN-1090.1.patch, YARN-1090.2.patch, YARN-1090.patch


 When there is no resource available to run a job, next job should go in 
 pending state. RM UI should show next job as pending app and the counter for 
 the pending app should be incremented.
 But Currently. Next job stays in ACCEPTED state and No AM has been assigned 
 to this job.Though Pending App count is not incremented. 
 Running 'job status nextjob' shows job state=PREP. 
 $ mapred job -status job_1377122233385_0002
 13/08/21 21:59:23 INFO client.RMProxy: Connecting to ResourceManager at 
 host1/ip1
 Job: job_1377122233385_0002
 Job File: /ABC/.staging/job_1377122233385_0002/job.xml
 Job Tracking URL : http://host1:port1/application_1377122233385_0002/
 Uber job : false
 Number of maps: 0
 Number of reduces: 0
 map() completion: 0.0
 reduce() completion: 0.0
 Job state: PREP
 retired: false
 reason for failure:



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1090) Job does not get into Pending State

2013-10-05 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1090:
--

Attachment: YARN-1090.3.patch

upload a new patch that added missed renames

 Job does not get into Pending State
 ---

 Key: YARN-1090
 URL: https://issues.apache.org/jira/browse/YARN-1090
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Jian He
 Attachments: YARN-1090.1.patch, YARN-1090.2.patch, YARN-1090.3.patch, 
 YARN-1090.patch


 When there is no resource available to run a job, next job should go in 
 pending state. RM UI should show next job as pending app and the counter for 
 the pending app should be incremented.
 But Currently. Next job stays in ACCEPTED state and No AM has been assigned 
 to this job.Though Pending App count is not incremented. 
 Running 'job status nextjob' shows job state=PREP. 
 $ mapred job -status job_1377122233385_0002
 13/08/21 21:59:23 INFO client.RMProxy: Connecting to ResourceManager at 
 host1/ip1
 Job: job_1377122233385_0002
 Job File: /ABC/.staging/job_1377122233385_0002/job.xml
 Job Tracking URL : http://host1:port1/application_1377122233385_0002/
 Uber job : false
 Number of maps: 0
 Number of reduces: 0
 map() completion: 0.0
 reduce() completion: 0.0
 Job state: PREP
 retired: false
 reason for failure:



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1130) Improve the log flushing for tasks when mapred.userlog.limit.kb is set

2013-10-05 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1130:


Assignee: Paul Han

 Improve the log flushing for tasks when mapred.userlog.limit.kb is set
 --

 Key: YARN-1130
 URL: https://issues.apache.org/jira/browse/YARN-1130
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.5-alpha
Reporter: Paul Han
Assignee: Paul Han
 Fix For: 2.0.5-alpha

 Attachments: YARN-1130.patch


 When userlog limit is set with something like this:
 {code}
 property
 namemapred.userlog.limit.kb/name
 value2048/value
 descriptionThe maximum size of user-logs of each task in KB. 0 disables the 
 cap.
 /description
 /property
 {code}
 the log entry will be truncated randomly for the jobs.
 The log size is left between 1.2MB to 1.6MB.
 Since the log is already limited, avoid the log truncation is crucial for 
 user.
 The other issue with the current 
 impl(org.apache.hadoop.yarn.ContainerLogAppender) is that log entries will 
 not flush to file until the container shutdown and logmanager close all 
 appenders. If user likes to see the log during task execution, it doesn't 
 support it.
 Will propose a patch to add a flush mechanism and also flush the log when 
 task is done.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1130) Improve the log flushing for tasks when mapred.userlog.limit.kb is set

2013-10-05 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787370#comment-13787370
 ] 

Arun C Murthy commented on YARN-1130:
-

I'm reviewing this, meanwhile, [~paulhan] can you pls look at the unit test 
failures? Tx.

 Improve the log flushing for tasks when mapred.userlog.limit.kb is set
 --

 Key: YARN-1130
 URL: https://issues.apache.org/jira/browse/YARN-1130
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.5-alpha
Reporter: Paul Han
Assignee: Paul Han
 Fix For: 2.0.5-alpha

 Attachments: YARN-1130.patch


 When userlog limit is set with something like this:
 {code}
 property
 namemapred.userlog.limit.kb/name
 value2048/value
 descriptionThe maximum size of user-logs of each task in KB. 0 disables the 
 cap.
 /description
 /property
 {code}
 the log entry will be truncated randomly for the jobs.
 The log size is left between 1.2MB to 1.6MB.
 Since the log is already limited, avoid the log truncation is crucial for 
 user.
 The other issue with the current 
 impl(org.apache.hadoop.yarn.ContainerLogAppender) is that log entries will 
 not flush to file until the container shutdown and logmanager close all 
 appenders. If user likes to see the log during task execution, it doesn't 
 support it.
 Will propose a patch to add a flush mechanism and also flush the log when 
 task is done.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1090) Job does not get into Pending State

2013-10-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787373#comment-13787373
 ] 

Hadoop QA commented on YARN-1090:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12607021/YARN-1090.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2125//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2125//console

This message is automatically generated.

 Job does not get into Pending State
 ---

 Key: YARN-1090
 URL: https://issues.apache.org/jira/browse/YARN-1090
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Jian He
 Attachments: YARN-1090.1.patch, YARN-1090.2.patch, YARN-1090.3.patch, 
 YARN-1090.patch


 When there is no resource available to run a job, next job should go in 
 pending state. RM UI should show next job as pending app and the counter for 
 the pending app should be incremented.
 But Currently. Next job stays in ACCEPTED state and No AM has been assigned 
 to this job.Though Pending App count is not incremented. 
 Running 'job status nextjob' shows job state=PREP. 
 $ mapred job -status job_1377122233385_0002
 13/08/21 21:59:23 INFO client.RMProxy: Connecting to ResourceManager at 
 host1/ip1
 Job: job_1377122233385_0002
 Job File: /ABC/.staging/job_1377122233385_0002/job.xml
 Job Tracking URL : http://host1:port1/application_1377122233385_0002/
 Uber job : false
 Number of maps: 0
 Number of reduces: 0
 map() completion: 0.0
 reduce() completion: 0.0
 Job state: PREP
 retired: false
 reason for failure:



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1268) TestFairScheduler.testContinuousScheduling is flaky

2013-10-05 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787377#comment-13787377
 ] 

Sandy Ryza commented on YARN-1268:
--

I just committed this to trunk and branch-2

 TestFairScheduler.testContinuousScheduling is flaky
 ---

 Key: YARN-1268
 URL: https://issues.apache.org/jira/browse/YARN-1268
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.3.0

 Attachments: YARN-1268-1.patch, YARN-1268.patch


 It looks like there's a timeout in it that's causing it to be flaky.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1268) TestFairScheduler.testContinuousScheduling is flaky

2013-10-05 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1268:
-

Hadoop Flags: Reviewed

 TestFairScheduler.testContinuousScheduling is flaky
 ---

 Key: YARN-1268
 URL: https://issues.apache.org/jira/browse/YARN-1268
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.3.0

 Attachments: YARN-1268-1.patch, YARN-1268.patch


 It looks like there's a timeout in it that's causing it to be flaky.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1268) TestFairScheduler.testContinuousScheduling is flaky

2013-10-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787379#comment-13787379
 ] 

Hudson commented on YARN-1268:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4547 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4547/])
Fix location of YARN-1268 in CHANGES.txt (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529531)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
YARN-1268. TestFairScheduer.testContinuousScheduling is flaky (Sandy Ryza) 
(sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529529)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 TestFairScheduler.testContinuousScheduling is flaky
 ---

 Key: YARN-1268
 URL: https://issues.apache.org/jira/browse/YARN-1268
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.3.0

 Attachments: YARN-1268-1.patch, YARN-1268.patch


 It looks like there's a timeout in it that's causing it to be flaky.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1032) NPE in RackResolve

2013-10-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787385#comment-13787385
 ] 

Hudson commented on YARN-1032:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4548 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4548/])
YARN-1032. Fixed NPE in RackResolver. Contributed by Lohit Vijayarenu. 
(acmurthy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529534)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/RackResolver.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestRackResolver.java


 NPE in RackResolve
 --

 Key: YARN-1032
 URL: https://issues.apache.org/jira/browse/YARN-1032
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.5-alpha
 Environment: linux
Reporter: Lohit Vijayarenu
Assignee: Lohit Vijayarenu
Priority: Critical
 Fix For: 2.1.2-beta

 Attachments: YARN-1032.1.patch, YARN-1032.2.patch, YARN-1032.3.patch


 We found a case where our rack resolve script was not returning rack due to 
 problem with resolving host address. This exception was see in 
 RackResolver.java as NPE, ultimately caught in RMContainerAllocator. 
 {noformat}
 2013-08-01 07:11:37,708 ERROR [RMCommunicator Allocator] 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
 CONTACTING RM. 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:99)
   at 
 org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:92)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignMapsWithLocality(RMContainerAllocator.java:1039)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignContainers(RMContainerAllocator.java:925)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assign(RMContainerAllocator.java:861)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.access$400(RMContainerAllocator.java:681)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:219)
   at 
 org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:243)
   at java.lang.Thread.run(Thread.java:722)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1090) Job does not get into Pending State

2013-10-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787388#comment-13787388
 ] 

Hudson commented on YARN-1090:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4549 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4549/])
YARN-1090. Fixed CS UI to better reflect applications as non-schedulable and 
not as pending. Contributed by Jian He. (acmurthy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529538)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java


 Job does not get into Pending State
 ---

 Key: YARN-1090
 URL: https://issues.apache.org/jira/browse/YARN-1090
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Jian He
 Fix For: 2.1.2-beta

 Attachments: YARN-1090.1.patch, YARN-1090.2.patch, YARN-1090.3.patch, 
 YARN-1090.patch


 When there is no resource available to run a job, next job should go in 
 pending state. RM UI should show next job as pending app and the counter for 
 the pending app should be incremented.
 But Currently. Next job stays in ACCEPTED state and No AM has been assigned 
 to this job.Though Pending App count is not incremented. 
 Running 'job status nextjob' shows job state=PREP. 
 $ mapred job -status job_1377122233385_0002
 13/08/21 21:59:23 INFO client.RMProxy: Connecting to ResourceManager at 
 host1/ip1
 Job: job_1377122233385_0002
 Job File: /ABC/.staging/job_1377122233385_0002/job.xml
 Job Tracking URL : http://host1:port1/application_1377122233385_0002/
 Uber job : false
 Number of maps: 0
 Number of reduces: 0
 map() completion: 0.0
 reduce() completion: 0.0
 Job state: PREP
 retired: false
 reason for failure:



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1278) New AM does not start after rm restart

2013-10-05 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-1278:


 Summary: New AM does not start after rm restart
 Key: YARN-1278
 URL: https://issues.apache.org/jira/browse/YARN-1278
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Yesha Vora
Priority: Blocker


The new AM fails to start after RM restarts. It fails to start new Application 
master and job fails with below error.

 /usr/bin/mapred job -status job_1380985373054_0001
13/10/05 15:04:04 INFO client.RMProxy: Connecting to ResourceManager at hostname
Job: job_1380985373054_0001
Job File: /user/abc/.staging/job_1380985373054_0001/job.xml
Job Tracking URL : 
http://hostname:8088/cluster/app/application_1380985373054_0001
Uber job : false
Number of maps: 0
Number of reduces: 0
map() completion: 0.0
reduce() completion: 0.0
Job state: FAILED
retired: false
reason for failure: There are no failed tasks for the job. Job is failed due to 
some other reason and reason can be found in the logs.
Counters: 0



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-451) Add more metrics to RM page

2013-10-05 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787396#comment-13787396
 ] 

Arun C Murthy commented on YARN-451:


Thinking aloud... we could track past allocations (#containers) per application 
in RM and also track future requests (sum numContainers for *) per application. 
Would showing those two help? 

 Add more metrics to RM page
 ---

 Key: YARN-451
 URL: https://issues.apache.org/jira/browse/YARN-451
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu
Assignee: Sangjin Lee
Priority: Blocker
 Attachments: in_progress_2x.png, yarn-451-trunk-20130916.1.patch


 ResourceManager webUI shows list of RUNNING applications, but it does not 
 tell which applications are requesting more resource compared to others. With 
 cluster running hundreds of applications at once it would be useful to have 
 some kind of metric to show high-resource usage applications vs low-resource 
 usage ones. At the minimum showing number of containers is good option.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete

2013-10-05 Thread Arun C Murthy (JIRA)
Arun C Murthy created YARN-1279:
---

 Summary: Expose a client API to allow clients to figure if log 
aggregation is complete
 Key: YARN-1279
 URL: https://issues.apache.org/jira/browse/YARN-1279
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.2-beta
Reporter: Arun C Murthy
Assignee: Arun C Murthy


Expose a client API to allow clients to figure if log aggregation is complete



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1274) LCE fails to run containers that don't have resources to localize

2013-10-05 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1274:
--

Attachment: YARN-1274.trunk.2.txt

Same patch with corrected code comment.

Ran native test-container-executor test which passes. Based on Sid's testing, 
will check this in once Jenkins says okay.

 LCE fails to run containers that don't have resources to localize
 -

 Key: YARN-1274
 URL: https://issues.apache.org/jira/browse/YARN-1274
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Alejandro Abdelnur
Assignee: Siddharth Seth
Priority: Blocker
 Attachments: YARN-1274.1.txt, YARN-1274.trunk.1.txt, 
 YARN-1274.trunk.2.txt


 LCE container launch assumes the usercache/USER directory exists and it is 
 owned by the user running the container process.
 But the directory is created only if there are resources to localize by the 
 LCE localization command, if there are not resourcdes to localize, LCE 
 localization never executes and launching fails reporting 255 exit code and 
 the NM logs have something like:
 {code}
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command 
 provided 1
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is 
 llama
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't create 
 directory llama in 
 /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_04
  - Permission denied
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1278) New AM does not start after rm restart

2013-10-05 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787404#comment-13787404
 ] 

Bikas Saha commented on YARN-1278:
--

Its hard to debug this without any AM/RM logs. Can some logs be uploaded to the 
jira please?

 New AM does not start after rm restart
 --

 Key: YARN-1278
 URL: https://issues.apache.org/jira/browse/YARN-1278
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Yesha Vora
Priority: Blocker

 The new AM fails to start after RM restarts. It fails to start new 
 Application master and job fails with below error.
  /usr/bin/mapred job -status job_1380985373054_0001
 13/10/05 15:04:04 INFO client.RMProxy: Connecting to ResourceManager at 
 hostname
 Job: job_1380985373054_0001
 Job File: /user/abc/.staging/job_1380985373054_0001/job.xml
 Job Tracking URL : 
 http://hostname:8088/cluster/app/application_1380985373054_0001
 Uber job : false
 Number of maps: 0
 Number of reduces: 0
 map() completion: 0.0
 reduce() completion: 0.0
 Job state: FAILED
 retired: false
 reason for failure: There are no failed tasks for the job. Job is failed due 
 to some other reason and reason can be found in the logs.
 Counters: 0



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1260) RM_HOME link breaks when webapp.https.address related properties are not specified

2013-10-05 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1260:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-1280

 RM_HOME link breaks when webapp.https.address related properties are not 
 specified
 --

 Key: YARN-1260
 URL: https://issues.apache.org/jira/browse/YARN-1260
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.1-beta, 2.1.2-beta
Reporter: Yesha Vora
Assignee: Omkar Vinit Joshi
 Fix For: 2.1.2-beta

 Attachments: YARN-1260.20131030.1.patch


 This issue happens in multiple node cluster where resource manager and node 
 manager are running on different machines.
 Steps to reproduce:
 1) set yarn.resourcemanager.hostname = resourcemanager host in yarn-site.xml
 2) set hadoop.ssl.enabled = true in core-site.xml
 3) Do not specify below property in yarn-site.xml
 yarn.nodemanager.webapp.https.address and 
 yarn.resourcemanager.webapp.https.address
 Here, the default value of above two property will be considered.
 4) Go to nodemanager web UI https://nodemanager host:8044/node
 5) Click on RM_HOME link 
 This link redirects to https://nodemanager host:8090/cluster instead 
 https://resourcemanager host:8090/cluster



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1280) [Umbrella] Ensure YARN+MR works with https on the web interfaces

2013-10-05 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-1280:
-

 Summary: [Umbrella] Ensure YARN+MR works with https on the web 
interfaces
 Key: YARN-1280
 URL: https://issues.apache.org/jira/browse/YARN-1280
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi


Even though https is already supposed to work, it doesn't and we fixed and have 
been fixing a bunch of things related to enabling https in various daemons.

This is an umbrella JIRA for tracking this work.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1208) One of the WebUI Links redirected to http instead https protocol with ssl enabled

2013-10-05 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1208:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-1280

 One of the WebUI Links redirected to http instead https protocol with ssl 
 enabled
 -

 Key: YARN-1208
 URL: https://issues.apache.org/jira/browse/YARN-1208
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Yesha Vora
Assignee: Omkar Vinit Joshi
 Fix For: 2.1.1-beta


 One of the webUI links is redirecting to the http link when https is enabled.
 Open Nodemanager UI (https://nodemanager:50060/node/allContainers) and click 
 on RM HOME link. This link redirects to http://resourcemanager:port; instead 
 https://resourcemanager:port;



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1203) Application Manager UI does not appear with Https enabled

2013-10-05 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1203:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-1280

 Application Manager UI does not appear with Https enabled
 -

 Key: YARN-1203
 URL: https://issues.apache.org/jira/browse/YARN-1203
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Yesha Vora
Assignee: Omkar Vinit Joshi
 Fix For: 2.1.2-beta

 Attachments: YARN-1203.20131017.1.patch, YARN-1203.20131017.2.patch, 
 YARN-1203.20131017.3.patch, YARN-1203.20131018.1.patch, 
 YARN-1203.20131018.2.patch, YARN-1203.20131019.1.patch


 Need to add support to disable 'hadoop.ssl.enabled' for MR jobs.
 A job should be able to run on http protocol by setting 'hadoop.ssl.enabled' 
 property at job level.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1204) Need to add https port related property in Yarn

2013-10-05 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1204:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-1280

 Need to add https port related property in Yarn
 ---

 Key: YARN-1204
 URL: https://issues.apache.org/jira/browse/YARN-1204
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Yesha Vora
Assignee: Omkar Vinit Joshi
 Fix For: 2.1.2-beta

 Attachments: YARN-1204.20131018.1.patch, YARN-1204.20131020.1.patch, 
 YARN-1204.20131020.2.patch, YARN-1204.20131020.3.patch, 
 YARN-1204.20131020.4.patch, YARN-1204.20131023.1.patch


 There is no yarn property available to configure https port for Resource 
 manager, nodemanager and history server. Currently, Yarn services uses the 
 port defined for http [defined by 
 'mapreduce.jobhistory.webapp.address','yarn.nodemanager.webapp.address', 
 'yarn.resourcemanager.webapp.address'] for running services on https protocol.
 Yarn should have list of property to assign https port for RM, NM and JHS.
 It can be like below.
 yarn.nodemanager.webapp.https.address
 yarn.resourcemanager.webapp.https.address
 mapreduce.jobhistory.webapp.https.address 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1095) historyserver webapp config is missing yarn. prefix

2013-10-05 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1095:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-1280

 historyserver webapp config is missing yarn. prefix
 ---

 Key: YARN-1095
 URL: https://issues.apache.org/jira/browse/YARN-1095
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Ramya Sunil
Assignee: Omkar Vinit Joshi
Priority: Blocker
 Fix For: 2.1.1-beta


 The historyserver spnego webapp config is missing yarn. prefix, and it should 
 read historyserver or applicationhistoryserver instead of jobhistoryserver



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1274) LCE fails to run containers that don't have resources to localize

2013-10-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787410#comment-13787410
 ] 

Hadoop QA commented on YARN-1274:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12607036/YARN-1274.trunk.2.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2126//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2126//console

This message is automatically generated.

 LCE fails to run containers that don't have resources to localize
 -

 Key: YARN-1274
 URL: https://issues.apache.org/jira/browse/YARN-1274
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Alejandro Abdelnur
Assignee: Siddharth Seth
Priority: Blocker
 Attachments: YARN-1274.1.txt, YARN-1274.trunk.1.txt, 
 YARN-1274.trunk.2.txt


 LCE container launch assumes the usercache/USER directory exists and it is 
 owned by the user running the container process.
 But the directory is created only if there are resources to localize by the 
 LCE localization command, if there are not resourcdes to localize, LCE 
 localization never executes and launching fails reporting 255 exit code and 
 the NM logs have something like:
 {code}
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command 
 provided 1
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is 
 llama
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't create 
 directory llama in 
 /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_04
  - Permission denied
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1277) Add http policy support for YARN daemons

2013-10-05 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1277:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-1280

 Add http policy support for YARN daemons
 

 Key: YARN-1277
 URL: https://issues.apache.org/jira/browse/YARN-1277
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.0-alpha
Reporter: Suresh Srinivas
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1277.patch


 This YARN part of HADOOP-10022.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1274) LCE fails to run containers that don't have resources to localize

2013-10-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787426#comment-13787426
 ] 

Hudson commented on YARN-1274:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4550 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4550/])
YARN-1274. Fixed NodeManager's LinuxContainerExecutor to create user, app-dir 
and log-dirs correctly even when there are no resources to localize for the 
container. Contributed by Siddharth Seth. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529555)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c


 LCE fails to run containers that don't have resources to localize
 -

 Key: YARN-1274
 URL: https://issues.apache.org/jira/browse/YARN-1274
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Alejandro Abdelnur
Assignee: Siddharth Seth
Priority: Blocker
 Fix For: 2.1.2-beta

 Attachments: YARN-1274.1.txt, YARN-1274.trunk.1.txt, 
 YARN-1274.trunk.2.txt


 LCE container launch assumes the usercache/USER directory exists and it is 
 owned by the user running the container process.
 But the directory is created only if there are resources to localize by the 
 LCE localization command, if there are not resourcdes to localize, LCE 
 localization never executes and launching fails reporting 255 exit code and 
 the NM logs have something like:
 {code}
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command 
 provided 1
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is 
 llama
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't create 
 directory llama in 
 /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_04
  - Permission denied
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1229) Define constraints on Auxiliary Service names. Change ShuffleHandler service name from mapreduce.shuffle to mapreduce_shuffle.

2013-10-05 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787433#comment-13787433
 ] 

Karthik Kambatla commented on YARN-1229:


Updated the JIRA summary to reflect the commit message - explains the issue and 
the solution better.

 Define constraints on Auxiliary Service names. Change ShuffleHandler service 
 name from mapreduce.shuffle to mapreduce_shuffle.
 --

 Key: YARN-1229
 URL: https://issues.apache.org/jira/browse/YARN-1229
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.1.2-beta

 Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, 
 YARN-1229.4.patch, YARN-1229.5.patch, YARN-1229.6.patch


 I run sleep job. If AM fails to start, this exception could occur:
 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
 state FAILED due to: Application application_1379673267098_0020 failed 1 
 times due to AM Container for appattempt_1379673267098_0020_01 exited 
 with  exitCode: 1 due to: Exception from container-launch:
 org.apache.hadoop.util.Shell$ExitCodeException: 
 /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
  line 12: export: 
 `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
 ': not a valid identifier
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
 at org.apache.hadoop.util.Shell.run(Shell.java:379)
 at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
 at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 .Failing this attempt.. Failing the application.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1229) Define constraints on Auxiliary Service names. Change ShuffleHandler service name from mapreduce.shuffle to mapreduce_shuffle.

2013-10-05 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1229:
---

Summary: Define constraints on Auxiliary Service names. Change 
ShuffleHandler service name from mapreduce.shuffle to mapreduce_shuffle.  (was: 
Shell$ExitCodeException could happen if AM fails to start)

 Define constraints on Auxiliary Service names. Change ShuffleHandler service 
 name from mapreduce.shuffle to mapreduce_shuffle.
 --

 Key: YARN-1229
 URL: https://issues.apache.org/jira/browse/YARN-1229
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.1.2-beta

 Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, 
 YARN-1229.4.patch, YARN-1229.5.patch, YARN-1229.6.patch


 I run sleep job. If AM fails to start, this exception could occur:
 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
 state FAILED due to: Application application_1379673267098_0020 failed 1 
 times due to AM Container for appattempt_1379673267098_0020_01 exited 
 with  exitCode: 1 due to: Exception from container-launch:
 org.apache.hadoop.util.Shell$ExitCodeException: 
 /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
  line 12: export: 
 `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
 ': not a valid identifier
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
 at org.apache.hadoop.util.Shell.run(Shell.java:379)
 at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
 at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 .Failing this attempt.. Failing the application.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1068) Add admin support for HA operations

2013-10-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787440#comment-13787440
 ] 

Hadoop QA commented on YARN-1068:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12607039/yarn-1068-8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.client.api.impl.TestNMClient

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2127//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2127//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2127//console

This message is automatically generated.

 Add admin support for HA operations
 ---

 Key: YARN-1068
 URL: https://issues.apache.org/jira/browse/YARN-1068
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, 
 yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, 
 yarn-1068-8.patch, yarn-1068-prelim.patch


 Support HA admin operations to facilitate transitioning the RM to Active and 
 Standby states.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1277) Add http policy support for YARN daemons

2013-10-05 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787447#comment-13787447
 ] 

Omkar Vinit Joshi commented on YARN-1277:
-

updated yarn and mapred only patch

 Add http policy support for YARN daemons
 

 Key: YARN-1277
 URL: https://issues.apache.org/jira/browse/YARN-1277
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.0-alpha
Reporter: Suresh Srinivas
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1277.20131005.1.patch, YARN-1277.patch


 This YARN part of HADOOP-10022.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1277) Add http policy support for YARN daemons

2013-10-05 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1277:


Attachment: YARN-1277.20131005.1.patch

 Add http policy support for YARN daemons
 

 Key: YARN-1277
 URL: https://issues.apache.org/jira/browse/YARN-1277
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.0-alpha
Reporter: Suresh Srinivas
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1277.20131005.1.patch, YARN-1277.patch


 This YARN part of HADOOP-10022.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1278) New AM does not start after rm restart

2013-10-05 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787450#comment-13787450
 ] 

Vinod Kumar Vavilapalli commented on YARN-1278:
---

I debugged this with [~jianhe] directly on the cluster. This was caused by the 
patch for YARN-1149. Will upload logs and the analysis.

 New AM does not start after rm restart
 --

 Key: YARN-1278
 URL: https://issues.apache.org/jira/browse/YARN-1278
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Yesha Vora
Priority: Blocker

 The new AM fails to start after RM restarts. It fails to start new 
 Application master and job fails with below error.
  /usr/bin/mapred job -status job_1380985373054_0001
 13/10/05 15:04:04 INFO client.RMProxy: Connecting to ResourceManager at 
 hostname
 Job: job_1380985373054_0001
 Job File: /user/abc/.staging/job_1380985373054_0001/job.xml
 Job Tracking URL : 
 http://hostname:8088/cluster/app/application_1380985373054_0001
 Uber job : false
 Number of maps: 0
 Number of reduces: 0
 map() completion: 0.0
 reduce() completion: 0.0
 Job state: FAILED
 retired: false
 reason for failure: There are no failed tasks for the job. Job is failed due 
 to some other reason and reason can be found in the logs.
 Counters: 0



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1277) Add http policy support for YARN daemons

2013-10-05 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787454#comment-13787454
 ] 

Suresh Srinivas commented on YARN-1277:
---

[~ojoshi], why are you removing common part of the patch?

 Add http policy support for YARN daemons
 

 Key: YARN-1277
 URL: https://issues.apache.org/jira/browse/YARN-1277
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.0-alpha
Reporter: Suresh Srinivas
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1277.20131005.1.patch, YARN-1277.patch


 This YARN part of HADOOP-10022.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1277) Add http policy support for YARN daemons

2013-10-05 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1277:


Attachment: YARN-1277.20131005.2.patch

 Add http policy support for YARN daemons
 

 Key: YARN-1277
 URL: https://issues.apache.org/jira/browse/YARN-1277
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.0-alpha
Reporter: Suresh Srinivas
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1277.20131005.1.patch, YARN-1277.20131005.2.patch, 
 YARN-1277.patch


 This YARN part of HADOOP-10022.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1277) Add http policy support for YARN daemons

2013-10-05 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787456#comment-13787456
 ] 

Omkar Vinit Joshi commented on YARN-1277:
-

updated the patch with common changes.

 Add http policy support for YARN daemons
 

 Key: YARN-1277
 URL: https://issues.apache.org/jira/browse/YARN-1277
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.0-alpha
Reporter: Suresh Srinivas
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1277.20131005.1.patch, YARN-1277.20131005.2.patch, 
 YARN-1277.patch


 This YARN part of HADOOP-10022.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1277) Add http policy support for YARN daemons

2013-10-05 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787461#comment-13787461
 ] 

Omkar Vinit Joshi commented on YARN-1277:
-

summary
* To enable https
** in resourcemanager and nodemanager set yarn.http.policy=HTTPS_ONLY. By 
default it will be HTTP_ONLY
** in job history server mapreduce.jobhistory.http.policy=HTTPS_ONLY. By 
default it will be HTTP_ONLY
** MR AM's configuration to turn on ssl is removed as it is anyway not 
supported. It will be http only. Note. All MR AM's web links will be served via 
proxy server which is still controlled by yarn http policy.
* Test
* Tested RM,NM, JHS and MR with below combinations
** everything default (no policies)..everything is http only. ports too are 
selected accordingly.
** yarn.http.policy is set to HTTPS_ONLY.. RM and NM are started on HTTPS ports 
and all links to them are following https scheme. However JHS schemes and port 
is still http.
** enabled both yarn.http.policy and mapreduce.jobhistory.http.policy 
(HTTPS_POLICY). Now all links to job history server, node manager and resource 
manager are with https scheme and for https port.

 Add http policy support for YARN daemons
 

 Key: YARN-1277
 URL: https://issues.apache.org/jira/browse/YARN-1277
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.0-alpha
Reporter: Suresh Srinivas
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1277.20131005.1.patch, YARN-1277.20131005.2.patch, 
 YARN-1277.patch


 This YARN part of HADOOP-10022.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1277) Add http policy support for YARN daemons

2013-10-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787477#comment-13787477
 ] 

Hadoop QA commented on YARN-1277:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12607047/YARN-1277.20131005.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2128//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2128//console

This message is automatically generated.

 Add http policy support for YARN daemons
 

 Key: YARN-1277
 URL: https://issues.apache.org/jira/browse/YARN-1277
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.0-alpha
Reporter: Suresh Srinivas
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1277.20131005.1.patch, YARN-1277.20131005.2.patch, 
 YARN-1277.patch


 This YARN part of HADOOP-10022.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1278) New AM does not start after rm restart

2013-10-05 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787481#comment-13787481
 ] 

Xuan Gong commented on YARN-1278:
-

[~vinodkv] If that is the case,  instead of directly using deleteservice to 
delete files, we can rename the file, then call deleteservice to delete them. 
Just like we delete the previous files when NM restart.

 New AM does not start after rm restart
 --

 Key: YARN-1278
 URL: https://issues.apache.org/jira/browse/YARN-1278
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Yesha Vora
Priority: Blocker

 The new AM fails to start after RM restarts. It fails to start new 
 Application master and job fails with below error.
  /usr/bin/mapred job -status job_1380985373054_0001
 13/10/05 15:04:04 INFO client.RMProxy: Connecting to ResourceManager at 
 hostname
 Job: job_1380985373054_0001
 Job File: /user/abc/.staging/job_1380985373054_0001/job.xml
 Job Tracking URL : 
 http://hostname:8088/cluster/app/application_1380985373054_0001
 Uber job : false
 Number of maps: 0
 Number of reduces: 0
 map() completion: 0.0
 reduce() completion: 0.0
 Job state: FAILED
 retired: false
 reason for failure: There are no failed tasks for the job. Job is failed due 
 to some other reason and reason can be found in the logs.
 Counters: 0



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1277) Add http policy support for YARN daemons

2013-10-05 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787490#comment-13787490
 ] 

Vinod Kumar Vavilapalli commented on YARN-1277:
---

I'm happy I spent time pushing for polished patches previously at YARN-1280. 
The code is so much cleaner making it to easy to accommodate changes from this 
patch.

Patch looks good overall. Quick comments:
 - We aren't removing hadoop.ssl.enabled yet?
 - Why two methods getResolvedRMWebAppURLWithoutScheme* in YARN's 
WebAppUtils.java? Looks like only one is enough.
 - AmFilterInitializer changes are import only and can be skipped.
 - Earlier we punted on enabling https for proxy-server, but it should be easy 
to add now?

 Add http policy support for YARN daemons
 

 Key: YARN-1277
 URL: https://issues.apache.org/jira/browse/YARN-1277
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.0-alpha
Reporter: Suresh Srinivas
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1277.20131005.1.patch, YARN-1277.20131005.2.patch, 
 YARN-1277.patch


 This YARN part of HADOOP-10022.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1278) New AM does not start after rm restart

2013-10-05 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787491#comment-13787491
 ] 

Vinod Kumar Vavilapalli commented on YARN-1278:
---

Here's what happened
{code}
2013-10-05 15:03:57,154 INFO  nodemanager.DefaultContainerExecutor 
(DefaultContainerExecutor.java:startLocalizer(105)) - CWD set to 
/grid/0/hdp/yarn/local/usercache/hrt_qa/appcache/application_1380985373054_0001 
= 
file:/grid/0/hdp/yarn/local/usercache/hrt_qa/appcache/application_1380985373054_0001
2013-10-05 15:03:57,251 INFO  localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:update(910)) - DEBUG: FAILED { 
hdfs://HDFS:8020/user/hrt_qa/.staging/job_1380985373054_0001/job.jar, 
1380985387452, PATTERN, (?:classes/|lib/).* }, Rename cannot overwrite non 
empty destination directory 
/grid/4/hdp/yarn/local/usercache/hrt_qa/appcache/application_1380985373054_0001/filecache/10
2013-10-05 15:03:57,252 INFO  localizer.LocalizedResource 
(LocalizedResource.java:handle(196)) - Resource 
hdfs://HDFS:8020/user/hrt_qa/.staging/job_1380985373054_0001/job.jar 
transitioned from DOWNLOADING to FAILED
2013-10-05 15:03:57,253 INFO  container.Container 
(ContainerImpl.java:handle(871)) - Container 
container_1380985373054_0001_02_01 transitioned from LOCALIZING to 
LOCALIZATION_FAILED
2013-10-05 15:03:57,253 INFO  localizer.LocalResourcesTrackerImpl 
(LocalResourcesTrackerImpl.java:handle(137)) - Container 
container_1380985373054_0001_02_01 sent RELEASE event on a resource request 
{ hdfs://HDFS:8020/user/hrt_qa/.staging/job_1380985373054_0001/job.jar, 
1380985387452, PATTERN, (?:classes/|lib/).* } not present in cache.
2013-10-05 15:03:57,254 INFO  localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:processHeartbeat(553)) - Unknown localizer 
with localizerId container_1380985373054_0001_02_01 is sending heartbeat. 
Ordering it to DIE
{code}

Basically, RM restarted, all NMs were forced to resync. And because of 
YARN-1149, now all Applications are removed from NM but deletion of app 
resources is asynchronous. When new AM starts, it tries to download the 
resources all over again but we generate the local destination path based on 
sequence numbers tracked vai LocalResourcesTracker.nextUniqueNumber. Because 
the original apps are removed, those sequence numbers are lost, so the same app 
tries to relocalize and conflicts local paths.

I think on resync, we shouldn't destroy app resources. That is desired anyways 
as there is no need to just relocalize everything because of RM resync.

 New AM does not start after rm restart
 --

 Key: YARN-1278
 URL: https://issues.apache.org/jira/browse/YARN-1278
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Yesha Vora
Priority: Blocker

 The new AM fails to start after RM restarts. It fails to start new 
 Application master and job fails with below error.
  /usr/bin/mapred job -status job_1380985373054_0001
 13/10/05 15:04:04 INFO client.RMProxy: Connecting to ResourceManager at 
 hostname
 Job: job_1380985373054_0001
 Job File: /user/abc/.staging/job_1380985373054_0001/job.xml
 Job Tracking URL : 
 http://hostname:8088/cluster/app/application_1380985373054_0001
 Uber job : false
 Number of maps: 0
 Number of reduces: 0
 map() completion: 0.0
 reduce() completion: 0.0
 Job state: FAILED
 retired: false
 reason for failure: There are no failed tasks for the job. Job is failed due 
 to some other reason and reason can be found in the logs.
 Counters: 0



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1277) Add http policy support for YARN daemons

2013-10-05 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787497#comment-13787497
 ] 

Omkar Vinit Joshi commented on YARN-1277:
-

Thanks [~vinodkv]
bq. We aren't removing hadoop.ssl.enabled yet?
no.. that will become the default for HttpServer in general. But for yarn /JHS/ 
MR we will override it based on policies. As per what I understood after 
discussing with [~sureshms] this is for supporting backward compatibility.

bq. Why two methods getResolvedRMWebAppURLWithoutScheme* in YARN's 
WebAppUtils.java? Looks like only one is enough.
It is mainly because we are calling them from two separate modules (YARN and 
MR). For MR we disable the SSL for server but may want it if it is enabled for 
YARN.

bq. AmFilterInitializer changes are import only and can be skipped.
removed..

bq. Earlier we punted on enabling https for proxy-server, but it should be easy 
to add now?
yes.

 Add http policy support for YARN daemons
 

 Key: YARN-1277
 URL: https://issues.apache.org/jira/browse/YARN-1277
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.0-alpha
Reporter: Suresh Srinivas
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1277.20131005.1.patch, YARN-1277.20131005.2.patch, 
 YARN-1277.patch


 This YARN part of HADOOP-10022.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1277) Add http policy support for YARN daemons

2013-10-05 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1277:


Attachment: YARN-1277.20131005.3.patch

 Add http policy support for YARN daemons
 

 Key: YARN-1277
 URL: https://issues.apache.org/jira/browse/YARN-1277
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.0-alpha
Reporter: Suresh Srinivas
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1277.20131005.1.patch, YARN-1277.20131005.2.patch, 
 YARN-1277.20131005.3.patch, YARN-1277.patch


 This YARN part of HADOOP-10022.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (YARN-1278) New AM does not start after rm restart

2013-10-05 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah reassigned YARN-1278:
-

Assignee: Hitesh Shah

 New AM does not start after rm restart
 --

 Key: YARN-1278
 URL: https://issues.apache.org/jira/browse/YARN-1278
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Yesha Vora
Assignee: Hitesh Shah
Priority: Blocker

 The new AM fails to start after RM restarts. It fails to start new 
 Application master and job fails with below error.
  /usr/bin/mapred job -status job_1380985373054_0001
 13/10/05 15:04:04 INFO client.RMProxy: Connecting to ResourceManager at 
 hostname
 Job: job_1380985373054_0001
 Job File: /user/abc/.staging/job_1380985373054_0001/job.xml
 Job Tracking URL : 
 http://hostname:8088/cluster/app/application_1380985373054_0001
 Uber job : false
 Number of maps: 0
 Number of reduces: 0
 map() completion: 0.0
 reduce() completion: 0.0
 Job state: FAILED
 retired: false
 reason for failure: There are no failed tasks for the job. Job is failed due 
 to some other reason and reason can be found in the logs.
 Counters: 0



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1278) New AM does not start after rm restart

2013-10-05 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-1278:
--

Attachment: YARN-1278.1.patch

 New AM does not start after rm restart
 --

 Key: YARN-1278
 URL: https://issues.apache.org/jira/browse/YARN-1278
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Yesha Vora
Assignee: Hitesh Shah
Priority: Blocker
 Attachments: YARN-1278.1.patch


 The new AM fails to start after RM restarts. It fails to start new 
 Application master and job fails with below error.
  /usr/bin/mapred job -status job_1380985373054_0001
 13/10/05 15:04:04 INFO client.RMProxy: Connecting to ResourceManager at 
 hostname
 Job: job_1380985373054_0001
 Job File: /user/abc/.staging/job_1380985373054_0001/job.xml
 Job Tracking URL : 
 http://hostname:8088/cluster/app/application_1380985373054_0001
 Uber job : false
 Number of maps: 0
 Number of reduces: 0
 map() completion: 0.0
 reduce() completion: 0.0
 Job state: FAILED
 retired: false
 reason for failure: There are no failed tasks for the job. Job is failed due 
 to some other reason and reason can be found in the logs.
 Counters: 0



--
This message was sent by Atlassian JIRA
(v6.1#6144)