date:20141111


[ 
https://issues.apache.org/jira/browse/YARN-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206336#comment-14206336
 ] 

Hudson commented on YARN-2841:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #2 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/2/])
YARN-2841. RMProxy should retry EOFException.  Contributed by Jian He (xgong: 
rev 5c9a51f140ba76ddb25580aeb288db25e3f9653f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
YARN-2841: Correct fix version from branch-2.6 to branch-2.7 in the (xgong: rev 
58e9bf4b908e0b21309006eba49899b092f38071)
* hadoop-yarn-project/CHANGES.txt


 RMProxy should retry EOFException 
 --

 Key: YARN-2841
 URL: https://issues.apache.org/jira/browse/YARN-2841
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jian He
Assignee: Jian He
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-2841.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2841) RMProxy should retry EOFException


[ 
https://issues.apache.org/jira/browse/YARN-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206347#comment-14206347
 ] 

Hudson commented on YARN-2841:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #740 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/740/])
YARN-2841. RMProxy should retry EOFException.  Contributed by Jian He (xgong: 
rev 5c9a51f140ba76ddb25580aeb288db25e3f9653f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
YARN-2841: Correct fix version from branch-2.6 to branch-2.7 in the (xgong: rev 
58e9bf4b908e0b21309006eba49899b092f38071)
* hadoop-yarn-project/CHANGES.txt


 RMProxy should retry EOFException 
 --

 Key: YARN-2841
 URL: https://issues.apache.org/jira/browse/YARN-2841
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jian He
Assignee: Jian He
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-2841.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2841) RMProxy should retry EOFException


[ 
https://issues.apache.org/jira/browse/YARN-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206459#comment-14206459
 ] 

Hudson commented on YARN-2841:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1930 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1930/])
YARN-2841. RMProxy should retry EOFException.  Contributed by Jian He (xgong: 
rev 5c9a51f140ba76ddb25580aeb288db25e3f9653f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java
YARN-2841: Correct fix version from branch-2.6 to branch-2.7 in the (xgong: rev 
58e9bf4b908e0b21309006eba49899b092f38071)
* hadoop-yarn-project/CHANGES.txt


 RMProxy should retry EOFException 
 --

 Key: YARN-2841
 URL: https://issues.apache.org/jira/browse/YARN-2841
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jian He
Assignee: Jian He
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-2841.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2780) Log aggregated resource allocation in rm-appsummary.log


[ 
https://issues.apache.org/jira/browse/YARN-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206532#comment-14206532
 ] 

Jason Lowe commented on YARN-2780:
--

+1 lgtm.  Will commit this later today if there are no objections.

 Log aggregated resource allocation in rm-appsummary.log
 ---

 Key: YARN-2780
 URL: https://issues.apache.org/jira/browse/YARN-2780
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Koji Noguchi
Assignee: Eric Payne
Priority: Minor
 Attachments: YARN-2780.v1.201411031728.txt, 
 YARN-2780.v2.201411061601.txt


 YARN-415 added useful information about resource usage by applications.  
 Asking to log that info inside rm-appsummary.log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2841) RMProxy should retry EOFException


[ 
https://issues.apache.org/jira/browse/YARN-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206536#comment-14206536
 ] 

Hudson commented on YARN-2841:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1954 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1954/])
YARN-2841. RMProxy should retry EOFException.  Contributed by Jian He (xgong: 
rev 5c9a51f140ba76ddb25580aeb288db25e3f9653f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* hadoop-yarn-project/CHANGES.txt
YARN-2841: Correct fix version from branch-2.6 to branch-2.7 in the (xgong: rev 
58e9bf4b908e0b21309006eba49899b092f38071)
* hadoop-yarn-project/CHANGES.txt


 RMProxy should retry EOFException 
 --

 Key: YARN-2841
 URL: https://issues.apache.org/jira/browse/YARN-2841
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jian He
Assignee: Jian He
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-2841.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2846) Incorrect persist exit code for running containers in reacquireContainer() that interrupted by NodeManager restart.

2014-11-11 Thread Junping Du (JIRA)

Junping Du created YARN-2846:


 Summary: Incorrect persist exit code for running containers in 
reacquireContainer() that interrupted by NodeManager restart.
 Key: YARN-2846
 URL: https://issues.apache.org/jira/browse/YARN-2846
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Priority: Blocker


The NM restart work preserving feature could make running AM container get LOST 
and killed during stop NM daemon. The exception is like below:
{code}
2014-11-11 00:48:35,214 INFO  monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 22140 for 
container-id container_1415666714233_0001_01_84: 53.8 MB of 512 MB physical 
memory used; 931.3 MB of 1.0 GB virtual memory used
2014-11-11 00:48:35,223 ERROR nodemanager.NodeManager 
(SignalLogger.java:handle(60)) - RECEIVED SIGNAL 15: SIGTERM
2014-11-11 00:48:35,299 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50060
2014-11-11 00:48:35,337 INFO  containermanager.ContainerManagerImpl 
(ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(512)) - Applications 
still running : [application_1415666714233_0001]
2014-11-11 00:48:35,338 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
server on 45454
2014-11-11 00:48:35,344 INFO  ipc.Server (Server.java:run(706)) - Stopping IPC 
Server listener on 45454
2014-11-11 00:48:35,346 INFO  logaggregation.LogAggregationService 
(LogAggregationService.java:serviceStop(141)) - 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
 waiting for pending aggregation during exit
2014-11-11 00:48:35,347 INFO  ipc.Server (Server.java:run(832)) - Stopping IPC 
Server Responder
2014-11-11 00:48:35,347 INFO  logaggregation.AppLogAggregatorImpl 
(AppLogAggregatorImpl.java:abortLogAggregation(502)) - Aborting log aggregation 
for application_1415666714233_0001
2014-11-11 00:48:35,348 WARN  logaggregation.AppLogAggregatorImpl 
(AppLogAggregatorImpl.java:run(382)) - Aggregation did not complete for 
application application_1415666714233_0001
2014-11-11 00:48:35,358 WARN  monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:run(476)) - 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
 is interrupted. Exiting.
2014-11-11 00:48:35,406 ERROR launcher.RecoveredContainerLaunch 
(RecoveredContainerLaunch.java:call(87)) - Unable to recover container 
container_1415666714233_0001_01_01
java.io.IOException: Interrupted while waiting for process 20001 to exit
at 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:180)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:82)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:177)
... 6 more
{code}
In reacquireContainer() of ContainerExecutor.java, the while loop of checking 
container process (AM container) will be interrupted by NM stop. The 
IOException get thrown and failed to generate an ExitCodeFile for the running 
container. Later, the IOException will be caught in upper call 
(RecoveredContainerLaunch.call()) and the ExitCode (by default to be LOST 
without any setting) get persistent in NMStateStore. 
After NM restart again, this container is recovered as COMPLETE state but exit 
code is LOST (154) - cause this (AM) container get killed later.
We should get rid of recording the exit code of running containers if detecting 
process is interrupted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-2846) Incorrect persist exit code for running containers in reacquireContainer() that interrupted by NodeManager restart.

2014-11-11 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reassigned YARN-2846:


Assignee: Junping Du

 Incorrect persist exit code for running containers in reacquireContainer() 
 that interrupted by NodeManager restart.
 ---

 Key: YARN-2846
 URL: https://issues.apache.org/jira/browse/YARN-2846
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker

 The NM restart work preserving feature could make running AM container get 
 LOST and killed during stop NM daemon. The exception is like below:
 {code}
 2014-11-11 00:48:35,214 INFO  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 22140 for 
 container-id container_1415666714233_0001_01_84: 53.8 MB of 512 MB 
 physical memory used; 931.3 MB of 1.0 GB virtual memory used
 2014-11-11 00:48:35,223 ERROR nodemanager.NodeManager 
 (SignalLogger.java:handle(60)) - RECEIVED SIGNAL 15: SIGTERM
 2014-11-11 00:48:35,299 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
 HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50060
 2014-11-11 00:48:35,337 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(512)) - 
 Applications still running : [application_1415666714233_0001]
 2014-11-11 00:48:35,338 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
 server on 45454
 2014-11-11 00:48:35,344 INFO  ipc.Server (Server.java:run(706)) - Stopping 
 IPC Server listener on 45454
 2014-11-11 00:48:35,346 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:serviceStop(141)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
  waiting for pending aggregation during exit
 2014-11-11 00:48:35,347 INFO  ipc.Server (Server.java:run(832)) - Stopping 
 IPC Server Responder
 2014-11-11 00:48:35,347 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:abortLogAggregation(502)) - Aborting log 
 aggregation for application_1415666714233_0001
 2014-11-11 00:48:35,348 WARN  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:run(382)) - Aggregation did not complete for 
 application application_1415666714233_0001
 2014-11-11 00:48:35,358 WARN  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(476)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
  is interrupted. Exiting.
 2014-11-11 00:48:35,406 ERROR launcher.RecoveredContainerLaunch 
 (RecoveredContainerLaunch.java:call(87)) - Unable to recover container 
 container_1415666714233_0001_01_01
 java.io.IOException: Interrupted while waiting for process 20001 to exit
 at 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:180)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:82)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.InterruptedException: sleep interrupted
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:177)
 ... 6 more
 {code}
 In reacquireContainer() of ContainerExecutor.java, the while loop of checking 
 container process (AM container) will be interrupted by NM stop. The 
 IOException get thrown and failed to generate an ExitCodeFile for the running 
 container. Later, the IOException will be caught in upper call 
 (RecoveredContainerLaunch.call()) and the ExitCode (by default to be LOST 
 without any setting) get persistent in NMStateStore. 
 After NM restart again, this container is recovered as COMPLETE state but 
 exit code is LOST (154) - cause this (AM) container get killed later.
 We should get rid of recording the exit code of running containers if 
 detecting process is interrupted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2846) Incorrect persist exit code for running containers in reacquireContainer() that interrupted by NodeManager restart.

2014-11-11 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2846:
-
Attachment: YARN-2846-demo.patch

Upload the first demo patch to fix the problem.

 Incorrect persist exit code for running containers in reacquireContainer() 
 that interrupted by NodeManager restart.
 ---

 Key: YARN-2846
 URL: https://issues.apache.org/jira/browse/YARN-2846
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Attachments: YARN-2846-demo.patch


 The NM restart work preserving feature could make running AM container get 
 LOST and killed during stop NM daemon. The exception is like below:
 {code}
 2014-11-11 00:48:35,214 INFO  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 22140 for 
 container-id container_1415666714233_0001_01_84: 53.8 MB of 512 MB 
 physical memory used; 931.3 MB of 1.0 GB virtual memory used
 2014-11-11 00:48:35,223 ERROR nodemanager.NodeManager 
 (SignalLogger.java:handle(60)) - RECEIVED SIGNAL 15: SIGTERM
 2014-11-11 00:48:35,299 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
 HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50060
 2014-11-11 00:48:35,337 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(512)) - 
 Applications still running : [application_1415666714233_0001]
 2014-11-11 00:48:35,338 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
 server on 45454
 2014-11-11 00:48:35,344 INFO  ipc.Server (Server.java:run(706)) - Stopping 
 IPC Server listener on 45454
 2014-11-11 00:48:35,346 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:serviceStop(141)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
  waiting for pending aggregation during exit
 2014-11-11 00:48:35,347 INFO  ipc.Server (Server.java:run(832)) - Stopping 
 IPC Server Responder
 2014-11-11 00:48:35,347 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:abortLogAggregation(502)) - Aborting log 
 aggregation for application_1415666714233_0001
 2014-11-11 00:48:35,348 WARN  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:run(382)) - Aggregation did not complete for 
 application application_1415666714233_0001
 2014-11-11 00:48:35,358 WARN  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(476)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
  is interrupted. Exiting.
 2014-11-11 00:48:35,406 ERROR launcher.RecoveredContainerLaunch 
 (RecoveredContainerLaunch.java:call(87)) - Unable to recover container 
 container_1415666714233_0001_01_01
 java.io.IOException: Interrupted while waiting for process 20001 to exit
 at 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:180)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:82)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.InterruptedException: sleep interrupted
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:177)
 ... 6 more
 {code}
 In reacquireContainer() of ContainerExecutor.java, the while loop of checking 
 container process (AM container) will be interrupted by NM stop. The 
 IOException get thrown and failed to generate an ExitCodeFile for the running 
 container. Later, the IOException will be caught in upper call 
 (RecoveredContainerLaunch.call()) and the ExitCode (by default to be LOST 
 without any setting) get persistent in NMStateStore. 
 After NM restart again, this container is recovered as COMPLETE state but 
 exit code is LOST (154) - cause this (AM) container get killed later.
 We should get rid of recording the exit code of running containers if 
 detecting process is interrupted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2846) Incorrect persist exit code for running containers in reacquireContainer() that interrupted by NodeManager restart.


[ 
https://issues.apache.org/jira/browse/YARN-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206601#comment-14206601
 ] 

Hadoop QA commented on YARN-2846:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12680801/YARN-2846-demo.patch
  against trunk revision 58e9bf4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5814//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5814//console

This message is automatically generated.

 Incorrect persist exit code for running containers in reacquireContainer() 
 that interrupted by NodeManager restart.
 ---

 Key: YARN-2846
 URL: https://issues.apache.org/jira/browse/YARN-2846
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Attachments: YARN-2846-demo.patch


 The NM restart work preserving feature could make running AM container get 
 LOST and killed during stop NM daemon. The exception is like below:
 {code}
 2014-11-11 00:48:35,214 INFO  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 22140 for 
 container-id container_1415666714233_0001_01_84: 53.8 MB of 512 MB 
 physical memory used; 931.3 MB of 1.0 GB virtual memory used
 2014-11-11 00:48:35,223 ERROR nodemanager.NodeManager 
 (SignalLogger.java:handle(60)) - RECEIVED SIGNAL 15: SIGTERM
 2014-11-11 00:48:35,299 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
 HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50060
 2014-11-11 00:48:35,337 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(512)) - 
 Applications still running : [application_1415666714233_0001]
 2014-11-11 00:48:35,338 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
 server on 45454
 2014-11-11 00:48:35,344 INFO  ipc.Server (Server.java:run(706)) - Stopping 
 IPC Server listener on 45454
 2014-11-11 00:48:35,346 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:serviceStop(141)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
  waiting for pending aggregation during exit
 2014-11-11 00:48:35,347 INFO  ipc.Server (Server.java:run(832)) - Stopping 
 IPC Server Responder
 2014-11-11 00:48:35,347 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:abortLogAggregation(502)) - Aborting log 
 aggregation for application_1415666714233_0001
 2014-11-11 00:48:35,348 WARN  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:run(382)) - Aggregation did not complete for 
 application application_1415666714233_0001
 2014-11-11 00:48:35,358 WARN  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(476)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
  is interrupted. Exiting.
 2014-11-11 00:48:35,406 ERROR launcher.RecoveredContainerLaunch 
 (RecoveredContainerLaunch.java:call(87)) - Unable to recover container 
 container_1415666714233_0001_01_01
 java.io.IOException: Interrupted while waiting for process 20001 to exit
 at 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:180)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:82)
 at

[jira] [Commented] (YARN-2846) Incorrect persist exit code for running containers in reacquireContainer() that interrupted by NodeManager restart.


[ 
https://issues.apache.org/jira/browse/YARN-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206616#comment-14206616
 ] 

Jason Lowe commented on YARN-2846:
--

Thanks for the report and patch, Junping!

Nit: If reacquireContainer is going to allow InterruptedException to be thrown 
then I'd rather remove the try/catch around the Thread.sleep call and just let 
the exception be thrown directly from there. We can let the code catching the 
exception deal with any logging/etc as appropriate for that caller.  In this 
case we can move the log message to RecoveredContainerLaunch when it fields the 
InterruptedException and chooses not to propagate it upwards.

I'm curious why we're not seeing a similar issue with regular ContainerLaunch 
threads, as they should be interrupted as well.  Are those threads silently 
swallowing the interrupt?  Because otherwise I would expect us to log a 
container completion just like we were doing with a recovered container.

 Incorrect persist exit code for running containers in reacquireContainer() 
 that interrupted by NodeManager restart.
 ---

 Key: YARN-2846
 URL: https://issues.apache.org/jira/browse/YARN-2846
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Attachments: YARN-2846-demo.patch


 The NM restart work preserving feature could make running AM container get 
 LOST and killed during stop NM daemon. The exception is like below:
 {code}
 2014-11-11 00:48:35,214 INFO  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 22140 for 
 container-id container_1415666714233_0001_01_84: 53.8 MB of 512 MB 
 physical memory used; 931.3 MB of 1.0 GB virtual memory used
 2014-11-11 00:48:35,223 ERROR nodemanager.NodeManager 
 (SignalLogger.java:handle(60)) - RECEIVED SIGNAL 15: SIGTERM
 2014-11-11 00:48:35,299 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
 HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50060
 2014-11-11 00:48:35,337 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(512)) - 
 Applications still running : [application_1415666714233_0001]
 2014-11-11 00:48:35,338 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
 server on 45454
 2014-11-11 00:48:35,344 INFO  ipc.Server (Server.java:run(706)) - Stopping 
 IPC Server listener on 45454
 2014-11-11 00:48:35,346 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:serviceStop(141)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
  waiting for pending aggregation during exit
 2014-11-11 00:48:35,347 INFO  ipc.Server (Server.java:run(832)) - Stopping 
 IPC Server Responder
 2014-11-11 00:48:35,347 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:abortLogAggregation(502)) - Aborting log 
 aggregation for application_1415666714233_0001
 2014-11-11 00:48:35,348 WARN  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:run(382)) - Aggregation did not complete for 
 application application_1415666714233_0001
 2014-11-11 00:48:35,358 WARN  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(476)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
  is interrupted. Exiting.
 2014-11-11 00:48:35,406 ERROR launcher.RecoveredContainerLaunch 
 (RecoveredContainerLaunch.java:call(87)) - Unable to recover container 
 container_1415666714233_0001_01_01
 java.io.IOException: Interrupted while waiting for process 20001 to exit
 at 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:180)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:82)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.InterruptedException: sleep interrupted
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:177)
 ... 6 more
 {code}
 In reacquireContainer() of ContainerExecutor.java, the while loop of checking

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2014-11-11 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206626#comment-14206626
 ] 

Naganarasimha G R commented on YARN-2495:
-

{quote}
The benefit are
1) You don't have to update test cases for that
2) The semanic are clear, create a register request with label or not.
{quote}
True, and will be able to revert some unwanted testcase modification. have 
corrected it.
bq. I suggest to have different option for script-based/config-based, even if 
we can combine them together.
Ok, will have different config param for script and config based

bq. IIUC, NM_NODE_LABELS_FROM_CONFIG is a list of labels, even if we want to 
separate the two properties, we cannot remove NM_NODE_LABELS_FROM_CONFIG, 
correct? 
Had searched it wrongly and as you mentioned the name of was not good enough 
for me to recollect back too. corrected it 

bq. I think it's better to leverage existing utility class instead of implement 
your own. For example, you have set values but not check them, which is 
incorrect, but using utility class can avoid such problem. Even if you added 
new fields, tests will cover them without any changes:
Problem is ??TestPBImplRecords?? is in ??hadoop-yarn-common?? project and 
??NodeHeartbeatRequestPBImpl?? and others are in  ??hadoop-yarn-server-common?? 
project. So as we cant add dependency on ??hadoop-yarn-server-common?? in 
??hadoop-yarn-common??, hence shall i create a new class extending 
TestPBImplRecords in  ??hadoop-yarn-server-common?? project. ? 

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml or using script 
 suggested by [~aw])
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2838) Issues with TimeLineServer (Application History)

2014-11-11 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206634#comment-14206634
 ] 

Naganarasimha G R commented on YARN-2838:
-

Hi [~zjshen],
  Can you please feedback on these issues ? As some issues requires 
discussion before rectifiction...


 Issues with TimeLineServer (Application History)
 

 Key: YARN-2838
 URL: https://issues.apache.org/jira/browse/YARN-2838
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.5.1
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Attachments: IssuesInTimelineServer.pdf


 Few issues in usage of Timeline server for generic application history access



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2014-11-11 Thread Abin Shahab (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated YARN-1964:
--
Attachment: YARN-1964.patch

fixed imports.

 Create Docker analog of the LinuxContainerExecutor in YARN
 --

 Key: YARN-1964
 URL: https://issues.apache.org/jira/browse/YARN-1964
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.2.0
Reporter: Arun C Murthy
Assignee: Abin Shahab
 Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
 YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
 YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
 yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
 yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
 yarn-1964-docker.patch, yarn-1964-docker.patch


 Docker (https://www.docker.io/) is, increasingly, a very popular container 
 technology.
 In context of YARN, the support for Docker will provide a very elegant 
 solution to allow applications to *package* their software into a Docker 
 container (entire Linux file system incl. custom versions of perl, python 
 etc.) and use it as a blueprint to launch all their YARN containers with 
 requisite software environment. This provides both consistency (all YARN 
 containers will have the same software environment) and isolation (no 
 interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-11-11 Thread Chen He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-1680:
--
Target Version/s: 2.7.0  (was: 2.6.0)

 availableResources sent to applicationMaster in heartbeat should exclude 
 blacklistedNodes free memory.
 --

 Key: YARN-1680
 URL: https://issues.apache.org/jira/browse/YARN-1680
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0, 2.3.0
 Environment: SuSE 11 SP2 + Hadoop-2.3 
Reporter: Rohith
Assignee: Craig Welch
 Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
 YARN-1680-v2.patch, YARN-1680.patch


 There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
 slow start is set to 1.
 Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
 become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
 NodeManager(NM-4). All reducer task are running in cluster now.
 MRAppMaster does not preempt the reducers because for Reducer preemption 
 calculation, headRoom is considering blacklisted nodes memory. This makes 
 jobs to hang forever(ResourceManager does not assing any new containers on 
 blacklisted nodes but returns availableResouce considers cluster free 
 memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2735) diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are initialized twice in DirectoryCollection


 [ 
https://issues.apache.org/jira/browse/YARN-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2735:
---
Priority: Trivial  (was: Minor)
  Labels: newbie  (was: )

 diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are 
 initialized twice in DirectoryCollection
 ---

 Key: YARN-2735
 URL: https://issues.apache.org/jira/browse/YARN-2735
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
  Labels: newbie
 Attachments: YARN-2735.000.patch


 diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are 
 initialized twice in DirectoryCollection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2735) diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are initialized twice in DirectoryCollection


[ 
https://issues.apache.org/jira/browse/YARN-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206765#comment-14206765
 ] 

Karthik Kambatla commented on YARN-2735:


Trivial patch. +1. Checking this in.

 diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are 
 initialized twice in DirectoryCollection
 ---

 Key: YARN-2735
 URL: https://issues.apache.org/jira/browse/YARN-2735
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
  Labels: newbie
 Attachments: YARN-2735.000.patch


 diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are 
 initialized twice in DirectoryCollection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN


[ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206768#comment-14206768
 ] 

Hadoop QA commented on YARN-1964:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12680823/YARN-1964.patch
  against trunk revision 58e9bf4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5815//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5815//console

This message is automatically generated.

 Create Docker analog of the LinuxContainerExecutor in YARN
 --

 Key: YARN-1964
 URL: https://issues.apache.org/jira/browse/YARN-1964
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.2.0
Reporter: Arun C Murthy
Assignee: Abin Shahab
 Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
 YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
 YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
 yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
 yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
 yarn-1964-docker.patch, yarn-1964-docker.patch


 Docker (https://www.docker.io/) is, increasingly, a very popular container 
 technology.
 In context of YARN, the support for Docker will provide a very elegant 
 solution to allow applications to *package* their software into a Docker 
 container (entire Linux file system incl. custom versions of perl, python 
 etc.) and use it as a blueprint to launch all their YARN containers with 
 requisite software environment. This provides both consistency (all YARN 
 containers will have the same software environment) and isolation (no 
 interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2735) diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are initialized twice in DirectoryCollection


[ 
https://issues.apache.org/jira/browse/YARN-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206788#comment-14206788
 ] 

Hudson commented on YARN-2735:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6510 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6510/])
YARN-2735. diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are 
initialized twice in DirectoryCollection. (Zhihai Xu via kasha) (kasha: rev 
061bc293c8dd3e2605cf150568988bde18407af6)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java


 diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are 
 initialized twice in DirectoryCollection
 ---

 Key: YARN-2735
 URL: https://issues.apache.org/jira/browse/YARN-2735
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
  Labels: newbie
 Fix For: 2.7.0

 Attachments: YARN-2735.000.patch


 diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are 
 initialized twice in DirectoryCollection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2847) Linux native container executor segfaults if default banned user detected

Jason Lowe created YARN-2847:


 Summary: Linux native container executor segfaults if default 
banned user detected
 Key: YARN-2847
 URL: https://issues.apache.org/jira/browse/YARN-2847
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: Jason Lowe


The check_user function in container-executor.c can cause a segmentation fault 
if banned.users is not provided but the user is detected as one of the default 
users.  In that scenario it will call free_values on a NULL pointer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2847) Linux native container executor segfaults if default banned user detected


[ 
https://issues.apache.org/jira/browse/YARN-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206801#comment-14206801
 ] 

Jason Lowe commented on YARN-2847:
--

The problem is in this code:

{code}
  char **banned_users = get_values(BANNED_USERS_KEY);
  char **banned_user = (banned_users == NULL) ? 
(char**) DEFAULT_BANNED_USERS : banned_users;
  for(; *banned_user; ++banned_user) {
if (strcmp(*banned_user, user) == 0) {
  free(user_info);
  if (banned_users != (char**)DEFAULT_BANNED_USERS) {
free_values(banned_users);
  }
  fprintf(LOGFILE, Requested user %s is banned\n, user);
  return NULL;
}
  }
  if (banned_users != NULL  banned_users != (char**)DEFAULT_BANNED_USERS) {
free_values(banned_users);
  }
{code}

Note that in one case we check for banned_users != NULL and != 
DEFAULT_BANNED_USERS but in another case we're missing the NULL check.

Lots of ways to fix it:

- free_values could check for NULL
- banned_users could always be non-NULL (i.e.: set it to DEFAULT_BANNED_USERS 
if get_values returns NULL)
- add check for != NULL before calling free_values

 Linux native container executor segfaults if default banned user detected
 -

 Key: YARN-2847
 URL: https://issues.apache.org/jira/browse/YARN-2847
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: Jason Lowe

 The check_user function in container-executor.c can cause a segmentation 
 fault if banned.users is not provided but the user is detected as one of the 
 default users.  In that scenario it will call free_values on a NULL pointer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

[
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206855#comment-14206855
]

Ravi Prakash commented on YARN-1964:

Thanks Abin! The patch is looking really good now. However the documentation
doesn't seem to be compiling for me. Once that is figured out, I'm a +1. I am
looking to commit it EOD today to trunk, branch-2, branch-2.6. I'd like to
commit it to 2.6 also and request a respin of the RC.

Create Docker analog of the LinuxContainerExecutor in YARN
--

Key: YARN-1964
URL: https://issues.apache.org/jira/browse/YARN-1964
Project: Hadoop YARN
Issue Type: New Feature
Affects Versions: 2.2.0
Reporter: Arun C Murthy
Assignee: Abin Shahab
Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch,
YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch,
YARN-1964.patch, YARN-1964.patch, YARN-1964.patch,
yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch,
yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch,
yarn-1964-docker.patch, yarn-1964-docker.patch

Docker (https://www.docker.io/) is, increasingly, a very popular container
technology.
In context of YARN, the support for Docker will provide a very elegant
solution to allow applications to *package* their software into a Docker
container (entire Linux file system incl. custom versions of perl, python
etc.) and use it as a blueprint to launch all their YARN containers with
requisite software environment. This provides both consistency (all YARN
containers will have the same software environment) and isolation (no
interference with whatever is installed on the physical machine).

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-2817) Disk drive as a resource in YARN


 [ 
https://issues.apache.org/jira/browse/YARN-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved YARN-2817.

Resolution: Duplicate

 Disk drive as a resource in YARN
 

 Key: YARN-2817
 URL: https://issues.apache.org/jira/browse/YARN-2817
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 As YARN continues to cover new ground in terms of new workloads, disk is 
 becoming a very important resource to govern.
 It might be prudent to start with something very simple - allow applications 
 to request entire drives (e.g. 2 drives out of the 12 available on a node), 
 we can then also add support for specific iops, bandwidth etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2811) Fair Scheduler is violating max memory settings in 2.4

2014-11-11 Thread Siqi Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2811:
--
Attachment: YARN-2811.v5.patch

 Fair Scheduler is violating max memory settings in 2.4
 --

 Key: YARN-2811
 URL: https://issues.apache.org/jira/browse/YARN-2811
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-2811.v1.patch, YARN-2811.v2.patch, 
 YARN-2811.v3.patch, YARN-2811.v4.patch, YARN-2811.v5.patch


 This has been seen on several queues showing the allocated MB going 
 significantly above the max MB and it appears to have started with the 2.4 
 upgrade. It could be a regression bug from 2.0 to 2.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2139) Add support for disk IO isolation/scheduling for containers


 [ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2139:
---
Assignee: (was: Wei Yan)

 Add support for disk IO isolation/scheduling for containers
 ---

 Key: YARN-2139
 URL: https://issues.apache.org/jira/browse/YARN-2139
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Wei Yan
 Attachments: Disk_IO_Scheduling_Design_1.pdf, 
 Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, 
 YARN-2139-prototype.patch


 YARN should support considering disk for scheduling tasks on nodes, and 
 provide isolation for these allocations at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN


 [ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2139:
---
Summary: [Umbrella] Support for Disk as a Resource in YARN   (was: Add 
support for disk IO isolation/scheduling for containers)

 [Umbrella] Support for Disk as a Resource in YARN 
 --

 Key: YARN-2139
 URL: https://issues.apache.org/jira/browse/YARN-2139
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Wei Yan
 Attachments: Disk_IO_Scheduling_Design_1.pdf, 
 Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, 
 YARN-2139-prototype.patch


 YARN should support considering disk for scheduling tasks on nodes, and 
 provide isolation for these allocations at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN


 [ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2139:
---
Description: YARN should consider disk as another resource for (1) 
scheduling tasks on nodes, (2) isolation at runtime, (3) spindle locality.   
(was: YARN should support considering disk for scheduling tasks on nodes, and 
provide isolation for these allocations at runtime.)

 [Umbrella] Support for Disk as a Resource in YARN 
 --

 Key: YARN-2139
 URL: https://issues.apache.org/jira/browse/YARN-2139
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Wei Yan
 Attachments: Disk_IO_Scheduling_Design_1.pdf, 
 Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, 
 YARN-2139-prototype.patch


 YARN should consider disk as another resource for (1) scheduling tasks on 
 nodes, (2) isolation at runtime, (3) spindle locality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

[
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206993#comment-14206993
]

Karthik Kambatla commented on YARN-2791:

Thanks [~sdaingade] for sharing the design doc. Well articulated.

The designs on YARN-2139 and YARN-2791 are very similar, except for the disk
resources are called vdisks in YARN-2139 and spindles in YARN-2791. In addition
to the items specified here, YARN-2139 talks about isolation as well. Other
than that, do you see any major items YARN-2791 covers that YARN-2139? The
WebUI is good and very desirable, we should definitely include it. Also,

I suggest we make this (as is - or split into multiple JIRAs) a sub-task of
YARN-2139. Discussing the high-level details on one JIRA helps with aligning on
one final design doc based on everyone's suggestions.

Add Disk as a resource for scheduling
-

Key: YARN-2791
URL: https://issues.apache.org/jira/browse/YARN-2791
Project: Hadoop YARN
Issue Type: New Feature
Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade
Assignee: Yuliya Feldman
Attachments: DiskDriveAsResourceInYARN.pdf

Currently, the number of disks present on a node is not considered a factor
while scheduling containers on that node. Having large amount of memory on a
node can lead to high number of containers being launched on that node, all
of which compete for I/O bandwidth. This multiplexing of I/O across
containers can lead to slower overall progress and sub-optimal resource
utilization as containers starved for I/O bandwidth hold on to other
resources like cpu and memory. This problem can be solved by considering disk
as a resource and including it in deciding how many containers can be
concurrently run on a node.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN


[ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206994#comment-14206994
 ] 

Karthik Kambatla commented on YARN-2139:


Thanks for the prototype, Wei. In light of the updates on YARN-2791 and 
YARN-2817, I propose we incorporate suggestions from [~sdaingade] and 
[~acmurthy] before posting patches for sub-tasks. 

Updated JIRA title, description, and marked it unassigned as this is an 
umbrella JIRA. 


 [Umbrella] Support for Disk as a Resource in YARN 
 --

 Key: YARN-2139
 URL: https://issues.apache.org/jira/browse/YARN-2139
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Wei Yan
 Attachments: Disk_IO_Scheduling_Design_1.pdf, 
 Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, 
 YARN-2139-prototype.patch


 YARN should consider disk as another resource for (1) scheduling tasks on 
 nodes, (2) isolation at runtime, (3) spindle locality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2843) NodeLabels manager should trim all inputs for hosts and labels


[ 
https://issues.apache.org/jira/browse/YARN-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207017#comment-14207017
 ] 

Hudson commented on YARN-2843:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6511 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6511/])
YARN-2843. Fixed NodeLabelsManager to trim inputs for hosts and labels so as to 
make them work correctly. Contributed by Wangda Tan. (vinodkv: rev 
0fd97f9c1989a793b882e6678285607472a3f75a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/NodeLabelTestBase.java


 NodeLabels manager should trim all inputs for hosts and labels
 --

 Key: YARN-2843
 URL: https://issues.apache.org/jira/browse/YARN-2843
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sushmitha Sreenivasan
Assignee: Wangda Tan
 Attachments: YARN-2843-1.patch, YARN-2843-2.patch


 NodeLabels manager should trim all inputs for hosts and labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN


[ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207046#comment-14207046
 ] 

Ravi Prakash commented on YARN-1964:


Hi Karthik! That's fair. I'll ask Arun if he is willing to re-spin 2.6.0.

 Create Docker analog of the LinuxContainerExecutor in YARN
 --

 Key: YARN-1964
 URL: https://issues.apache.org/jira/browse/YARN-1964
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.2.0
Reporter: Arun C Murthy
Assignee: Abin Shahab
 Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
 YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
 YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
 yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
 yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
 yarn-1964-docker.patch, yarn-1964-docker.patch


 Docker (https://www.docker.io/) is, increasingly, a very popular container 
 technology.
 In context of YARN, the support for Docker will provide a very elegant 
 solution to allow applications to *package* their software into a Docker 
 container (entire Linux file system incl. custom versions of perl, python 
 etc.) and use it as a blueprint to launch all their YARN containers with 
 requisite software environment. This provides both consistency (all YARN 
 containers will have the same software environment) and isolation (no 
 interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2848) (FICA) Applications should maintain an application specific 'cluster' resource to calculate headroom and userlimit

2014-11-11 Thread Craig Welch (JIRA)

Craig Welch created YARN-2848:
-

 Summary: (FICA) Applications should maintain an application 
specific 'cluster' resource to calculate headroom and userlimit
 Key: YARN-2848
 URL: https://issues.apache.org/jira/browse/YARN-2848
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch


Likely solutions to [YARN-1680] (properly handling node and rack blacklisting 
with cluster level node additions and removals) will entail managing an 
application-level slice of the cluster resource available to the application 
for use in accurately calculating the application headroom and user limit.  
There is an assumption that events which impact this resource will change less 
frequently than the need to calculate headroom, userlimit, etc (which is a 
valid assumption given that occurs per-allocation heartbeat).  Given that, the 
application should (with assistance from cluster-level code...) detect changes 
to the composition of the cluster (node addition, removal) and when those have 
occurred, calculate a application specific cluster resource by comparing 
cluster nodes to it's own blacklist (both rack and individual node).  I think 
it makes sense to include nodelabel considerations into this calculation as it 
will be efficient to do both at the same time and the single resource value 
reflecting both constraints could then be used for efficient frequent headroom 
and userlimit calculations while remaining highly accurate.  The application 
would need to be made aware of nodelabel changes it is interested in (the 
application or removal of labels of interest to the application to/from nodes). 
 For this purpose, the application submissions's nodelabel expression would be 
used to determine the nodelabel impact on the resource used to calculate 
userlimit and headroom (Cases where application elected to request resources 
not using the application level label expression are out of scope for this - 
but for the common usecase of an application which uses a particular expression 
throughout, userlimit and headroom would be accurate) This could also provide 
an overall mechanism for handling application-specific resource constraints 
which might be added in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2843) NodeLabels manager should trim all inputs for hosts and labels

2014-11-11 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207057#comment-14207057
 ] 

Wangda Tan commented on YARN-2843:
--

Thanks for [~vinodkv]'s review and commit!

 NodeLabels manager should trim all inputs for hosts and labels
 --

 Key: YARN-2843
 URL: https://issues.apache.org/jira/browse/YARN-2843
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sushmitha Sreenivasan
Assignee: Wangda Tan
 Fix For: 2.7.0

 Attachments: YARN-2843-1.patch, YARN-2843-2.patch


 NodeLabels manager should trim all inputs for hosts and labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2014-11-11 Thread Swapnil Daingade (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207080#comment-14207080
 ] 

Swapnil Daingade commented on YARN-2791:


Thanks Karthik Kambatla. Sure, lets make this a sub-task of YARN-2139.

 Add Disk as a resource for scheduling
 -

 Key: YARN-2791
 URL: https://issues.apache.org/jira/browse/YARN-2791
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade
Assignee: Yuliya Feldman
 Attachments: DiskDriveAsResourceInYARN.pdf


 Currently, the number of disks present on a node is not considered a factor 
 while scheduling containers on that node. Having large amount of memory on a 
 node can lead to high number of containers being launched on that node, all 
 of which compete for I/O bandwidth. This multiplexing of I/O across 
 containers can lead to slower overall progress and sub-optimal resource 
 utilization as containers starved for I/O bandwidth hold on to other 
 resources like cpu and memory. This problem can be solved by considering disk 
 as a resource and including it in deciding how many containers can be 
 concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-570) Time strings are formated in different timezone


[ 
https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207083#comment-14207083
 ] 

Karthik Kambatla commented on YARN-570:
---

The patch looks reasonable. +1, relying on others' testing. Checking this in, 
will add one comment in Times.java in the process. 

 Time strings are formated in different timezone
 ---

 Key: YARN-570
 URL: https://issues.apache.org/jira/browse/YARN-570
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.2.0
Reporter: Peng Zhang
Assignee: Akira AJISAKA
 Attachments: MAPREDUCE-5141.patch, YARN-570.2.patch, 
 YARN-570.3.patch, YARN-570.4.patch, YARN-570.5.patch


 Time strings on different page are displayed in different timezone.
 If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as 
 Wed, 10 Apr 2013 08:29:56 GMT
 If it is formatted by format() in yarn.util.Times, it appears as 10-Apr-2013 
 16:29:56
 Same value, but different timezone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2849) MRAppMaster: Add support for disk I/O request

2014-11-11 Thread Wei Yan (JIRA)

Wei Yan created YARN-2849:
-

 Summary: MRAppMaster: Add support for disk I/O request
 Key: YARN-2849
 URL: https://issues.apache.org/jira/browse/YARN-2849
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wei Yan
Assignee: Wei Yan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2851) YarnClient: Add support for disk I/O resource/request information

2014-11-11 Thread Wei Yan (JIRA)

Wei Yan created YARN-2851:
-

 Summary: YarnClient: Add support for disk I/O resource/request 
information
 Key: YARN-2851
 URL: https://issues.apache.org/jira/browse/YARN-2851
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wei Yan
Assignee: Wei Yan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2852) WebUI: Add disk I/O resource information to the web ui

2014-11-11 Thread Wei Yan (JIRA)

Wei Yan created YARN-2852:
-

 Summary: WebUI: Add disk I/O resource information to the web ui
 Key: YARN-2852
 URL: https://issues.apache.org/jira/browse/YARN-2852
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wei Yan
Assignee: Wei Yan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2853) Killing app may hang while AM is unregistering

Jian He created YARN-2853:
-

 Summary: Killing app may hang while AM is unregistering
 Key: YARN-2853
 URL: https://issues.apache.org/jira/browse/YARN-2853
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He


When killing an app, app first moves to KILLING state, If RMAppAttempt receives 
the attempt_unregister event before attempt_kill event,  it'll ignore the later 
attempt_kill event.  Hence, RMApp won't be able to move to KILLED state and 
stays at KILLING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup

2014-11-11 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207268#comment-14207268
 ] 

Wangda Tan commented on YARN-2729:
--

Hi [~Naganarasimha],
IIRC, the script based patch should be based on YARN-2495, and we should create 
a script-based labels provider extend NodeLabelsProviderService, correct? But I 
haven't seen much relationship between this and YARN-2495 besides configuration 
options.

Please let me know if I understood incorrectly. 

Thanks,
Wangda

 Support script based NodeLabelsProvider Interface in Distributed Node Label 
 Configuration Setup
 ---

 Key: YARN-2729
 URL: https://issues.apache.org/jira/browse/YARN-2729
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, 
 YARN-2729.20141031-1.patch


 Support script based NodeLabelsProvider Interface in Distributed Node Label 
 Configuration Setup . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2853) Killing app may hang while AM is unregistering


 [ 
https://issues.apache.org/jira/browse/YARN-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2853:
--
Attachment: YARN-2853.1.patch

Uploaded a patch to handle the possible attempt_unregistered, attempt_failed, 
attempt_finished state at app_killing state.

 Killing app may hang while AM is unregistering
 --

 Key: YARN-2853
 URL: https://issues.apache.org/jira/browse/YARN-2853
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2853.1.patch


 When killing an app, app first moves to KILLING state, If RMAppAttempt 
 receives the attempt_unregister event before attempt_kill event,  it'll 
 ignore the later attempt_kill event.  Hence, RMApp won't be able to move to 
 KILLED state and stays at KILLING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2853) Killing app may hang while AM is unregistering


[ 
https://issues.apache.org/jira/browse/YARN-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207292#comment-14207292
 ] 

Jian He commented on YARN-2853:
---

Instead, we could get rid of the killing state completely and let app stay at 
the original state and change RMApp to handle attempt_killed state at each 
possible state.  this way, we could avoid race condition like this.  I'll file 
a separate  jira to do this.

 Killing app may hang while AM is unregistering
 --

 Key: YARN-2853
 URL: https://issues.apache.org/jira/browse/YARN-2853
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2853.1.patch


 When killing an app, app first moves to KILLING state, If RMAppAttempt 
 receives the attempt_unregister event before attempt_kill event,  it'll 
 ignore the later attempt_kill event.  Hence, RMApp won't be able to move to 
 KILLED state and stays at KILLING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2853) Killing app may hang while AM is unregistering


[ 
https://issues.apache.org/jira/browse/YARN-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207303#comment-14207303
 ] 

Hadoop QA commented on YARN-2853:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12680930/YARN-2853.1.patch
  against trunk revision 163bb55.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5820//console

This message is automatically generated.

 Killing app may hang while AM is unregistering
 --

 Key: YARN-2853
 URL: https://issues.apache.org/jira/browse/YARN-2853
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2853.1.patch


 When killing an app, app first moves to KILLING state, If RMAppAttempt 
 receives the attempt_unregister event before attempt_kill event,  it'll 
 ignore the later attempt_kill event.  Hence, RMApp won't be able to move to 
 KILLED state and stays at KILLING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2853) Killing app may hang while AM is unregistering


 [ 
https://issues.apache.org/jira/browse/YARN-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2853:
--
Attachment: (was: YARN-2853.1.patch)

 Killing app may hang while AM is unregistering
 --

 Key: YARN-2853
 URL: https://issues.apache.org/jira/browse/YARN-2853
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2853.1.patch, YARN-2853.1.patch


 When killing an app, app first moves to KILLING state, If RMAppAttempt 
 receives the attempt_unregister event before attempt_kill event,  it'll 
 ignore the later attempt_kill event.  Hence, RMApp won't be able to move to 
 KILLED state and stays at KILLING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN


[ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207390#comment-14207390
 ] 

Ravi Prakash commented on YARN-1964:


I'm a +1 on this patch. I'll commit it to trunk and branch-2 soon. Soon as I 
get confirmation from Arun, I'll commit it into branch-2.6 as well.

 Create Docker analog of the LinuxContainerExecutor in YARN
 --

 Key: YARN-1964
 URL: https://issues.apache.org/jira/browse/YARN-1964
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.2.0
Reporter: Arun C Murthy
Assignee: Abin Shahab
 Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
 YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
 YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
 yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
 yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
 yarn-1964-docker.patch, yarn-1964-docker.patch


 Docker (https://www.docker.io/) is, increasingly, a very popular container 
 technology.
 In context of YARN, the support for Docker will provide a very elegant 
 solution to allow applications to *package* their software into a Docker 
 container (entire Linux file system incl. custom versions of perl, python 
 etc.) and use it as a blueprint to launch all their YARN containers with 
 requisite software environment. This provides both consistency (all YARN 
 containers will have the same software environment) and isolation (no 
 interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation

2014-11-11 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207405#comment-14207405
 ] 

Craig Welch commented on YARN-2637:
---

I think the fix is fairly straightforward - there is an amResource property 
on the SchedulerApplicationAttempt / FiCaSchedulerApp, it does not appear to be 
being populated in the CapacityScheduler case (but it should be, and the 
information is available in the submission / from the resource requests of the 
appliction) - populate this value, and then add a Resource property to 
LeafQueue which represents the resources used by active application masters - 
when an application starts, add it's amResource value to the LeafQueue's active 
application master resource value, when an application ends, remove it.  Before 
starting an application compare the sum of the active application masters + the 
new application's resource to the resource represented by the percentage of 
cluster resource allowed to be used by am's in the queue  (this can differ by 
queue...) and if it exceeds the value do not start the application.  The 
existing trickle down logic base on the minimum allocation should be removed, 
there is also logic regarding how many applications can be running based on 
explicit configuration which should be retained.

{code}
if ((queue.activeApplicationMasterResourceTotal + 
readyToStartApplication.applicationMasterResource) = 
queue.portionOfClusterResourceAllowedForApplicatoinMaster * clusterResource  
maxAllowedApplications  runningApplications + 1) {
  queue.startTheApp
}
{code}


 maximum-am-resource-percent could be violated when resource of AM is  
 minimumAllocation
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Priority: Critical

 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2854) The document about timeline service and generic service needs to be updated

2014-11-11 Thread Zhijie Shen (JIRA)

Zhijie Shen created YARN-2854:
-

 Summary: The document about timeline service and generic service 
needs to be updated
 Key: YARN-2854
 URL: https://issues.apache.org/jira/browse/YARN-2854
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (YARN-2838) Issues with TimeLineServer (Application History)

2014-11-11 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206671#comment-14206671
]

Zhijie Shen edited comment on YARN-2838 at 11/12/14 12:44 AM:
--

[~Naganarasimha], sorry for not responding you immediately as being busy on
finalizing 2.6. A quick scan through your issue document. Here's my
clarification:

1. While the entry point of the this sub-module is still called
ApplicationHistoryServer, it is actually generalized to be TimelineServer right
now (definitely we need to refactor the code at some point). The baseline
service provided the the timeline server is to allow the cluster and its apps
to store their history information, metrics and so on by complying with the
defined timeline data model. Later on, users and admins can query this
information to do the analysis.

2. Application history (or we prefer to call it generic history service) is now
a built-in service in the timeline server to record the generic history
information of YARN apps. It was on a separate store (on FS), but after
YARN-2033, it has been moved to the timeline store too, as a payload. We
replace the old storage layer, but keep the existing interfaces (web UI,
services, CLI) not changed to be the analog of what RM provides for running
apps. We still didn't integrate TimelineClient and AHSClient, the latter of
which is RPC interface of getting generic history information via RPC
interface. APPLICATION_HISTORY_ENABLED is the only remaining old config to
control whether we also want to pull the app info from the generic history
service inside the timeline server. You may want to take a look at YARN-2033 to
get more context about the change. Moreover, as a number of limitation of the
old history store, we're no longer going to support it.

3. The document is definitely staled. I'll file separate document Jira,
however, it's too late for 2.6. Let's target 2.7 for an up-to-date document
about timeline service and its built-in generic history service (YARN-2854).
Does it sound good?

was (Author: zjshen):
[~Naganarasimha], sorry for not responding you immediately as being busy on
finalizing 2.6. A quick scan through your issue document. Here's my
clarification:

Issues with TimeLineServer (Application History)

Key: YARN-2838
URL: https://issues.apache.org/jira/browse/YARN-2838
Project: Hadoop YARN
Issue Type: Bug
Components: timelineserver
Affects Versions: 2.5.1
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Attachments: IssuesInTimelineServer.pdf

Few issues in usage of Timeline server for generic application history access

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2236) Shared Cache uploader service on the Node Manager

[
https://issues.apache.org/jira/browse/YARN-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207464#comment-14207464
]

Karthik Kambatla commented on YARN-2236:

Sorry for the delay on this, Sangjin. Patch looks generally good, but for some
minor comments:
# LocalResource - mark the methods Public-Unstable for now, we can mark them
Public-Stable once the feature is complete.
# Unrelated to this patch, can me mark BuilderUtils @Private for clarity.
# Also, mark FSDownload#isPublic @Private
# Rename ContainerImpl#storeSharedCacheUploadPolicies to
storeSharedCacheUploadPolicy? Also, should use block comments instead of line
comments.
# LocalResourceRequest - LOG is unused, we should probably get rid of it along
with its imports.
# SharedCacheChecksumFactory
## In the map, can we use Class instead of String?
## getCheckSum should use conf.getClass for getting the classname, and
ReflectionUtils.newInstance for instantiation to go with rest of the YARN code.
Refer to RMProxy for further information.
# Nit: SharedCacheUploader#call - remove the TODOs
# Instead of creating an event and submitting through the event-handler, would
it be simpler to synchronously submit it since we are queueing it up to the
executor anyway?

Shared Cache uploader service on the Node Manager
-

Key: YARN-2236
URL: https://issues.apache.org/jira/browse/YARN-2236
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
Attachments: YARN-2236-trunk-v1.patch, YARN-2236-trunk-v2.patch,
YARN-2236-trunk-v3.patch, YARN-2236-trunk-v4.patch, YARN-2236-trunk-v5.patch

Implement the shared cache uploader service on the node manager.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2236) Shared Cache uploader service on the Node Manager

2014-11-11 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207485#comment-14207485
 ] 

Sangjin Lee commented on YARN-2236:
---

Thanks Karthik! Let me review them, and see what I can do.

Just a quick question, in 2, did you mean marking the entire class BuilderUtils 
as Private or only the methods that are touched by this JIRA?

 Shared Cache uploader service on the Node Manager
 -

 Key: YARN-2236
 URL: https://issues.apache.org/jira/browse/YARN-2236
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2236-trunk-v1.patch, YARN-2236-trunk-v2.patch, 
 YARN-2236-trunk-v3.patch, YARN-2236-trunk-v4.patch, YARN-2236-trunk-v5.patch


 Implement the shared cache uploader service on the node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2848) (FICA) Applications should maintain an application specific 'cluster' resource to calculate headroom and userlimit

2014-11-11 Thread Craig Welch (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2848:
--
Description: Likely solutions to [YARN-1680] (properly handling node and 
rack blacklisting with cluster level node additions and removals) will entail 
managing an application-level slice of the cluster resource available to the 
application for use in accurately calculating the application headroom and user 
limit.  There is an assumption that events which impact this resource will 
occur less frequently than the need to calculate headroom, userlimit, etc 
(which is a valid assumption given that occurs per-allocation heartbeat).  
Given that, the application should (with assistance from cluster-level code...) 
detect changes to the composition of the cluster (node addition, removal) and 
when those have occurred, calculate an application specific cluster resource by 
comparing cluster nodes to it's own blacklist (both rack and individual node).  
I think it makes sense to include nodelabel considerations into this 
calculation as it will be efficient to do both at the same time and the single 
resource value reflecting both constraints could then be used for efficient 
frequent headroom and userlimit calculations while remaining highly accurate.  
The application would need to be made aware of nodelabel changes it is 
interested in (the application or removal of labels of interest to the 
application to/from nodes).  For this purpose, the application submissions's 
nodelabel expression would be used to determine the nodelabel impact on the 
resource used to calculate userlimit and headroom (Cases where the application 
elected to request resources not using the application level label expression 
are out of scope for this - but for the common usecase of an application which 
uses a particular expression throughout, userlimit and headroom would be 
accurate) This could also provide an overall mechanism for handling 
application-specific resource constraints which might be added in the future.  
(was: Likely solutions to [YARN-1680] (properly handling node and rack 
blacklisting with cluster level node additions and removals) will entail 
managing an application-level slice of the cluster resource available to the 
application for use in accurately calculating the application headroom and user 
limit.  There is an assumption that events which impact this resource will 
change less frequently than the need to calculate headroom, userlimit, etc 
(which is a valid assumption given that occurs per-allocation heartbeat).  
Given that, the application should (with assistance from cluster-level code...) 
detect changes to the composition of the cluster (node addition, removal) and 
when those have occurred, calculate a application specific cluster resource by 
comparing cluster nodes to it's own blacklist (both rack and individual node).  
I think it makes sense to include nodelabel considerations into this 
calculation as it will be efficient to do both at the same time and the single 
resource value reflecting both constraints could then be used for efficient 
frequent headroom and userlimit calculations while remaining highly accurate.  
The application would need to be made aware of nodelabel changes it is 
interested in (the application or removal of labels of interest to the 
application to/from nodes).  For this purpose, the application submissions's 
nodelabel expression would be used to determine the nodelabel impact on the 
resource used to calculate userlimit and headroom (Cases where application 
elected to request resources not using the application level label expression 
are out of scope for this - but for the common usecase of an application which 
uses a particular expression throughout, userlimit and headroom would be 
accurate) This could also provide an overall mechanism for handling 
application-specific resource constraints which might be added in the future.)

 (FICA) Applications should maintain an application specific 'cluster' 
 resource to calculate headroom and userlimit
 --

 Key: YARN-2848
 URL: https://issues.apache.org/jira/browse/YARN-2848
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch

 Likely solutions to [YARN-1680] (properly handling node and rack blacklisting 
 with cluster level node additions and removals) will entail managing an 
 application-level slice of the cluster resource available to the 
 application for use in accurately calculating the application headroom and 
 user limit.  There is an assumption that events which impact this resource 
 will occur less frequently than the need to calculate headroom,

[jira] [Commented] (YARN-2853) Killing app may hang while AM is unregistering


[ 
https://issues.apache.org/jira/browse/YARN-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207486#comment-14207486
 ] 

Hadoop QA commented on YARN-2853:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12680948/YARN-2853.1.patch
  against trunk revision 163bb55.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5821//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5821//console

This message is automatically generated.

 Killing app may hang while AM is unregistering
 --

 Key: YARN-2853
 URL: https://issues.apache.org/jira/browse/YARN-2853
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2853.1.patch, YARN-2853.1.patch


 When killing an app, app first moves to KILLING state, If RMAppAttempt 
 receives the attempt_unregister event before attempt_kill event,  it'll 
 ignore the later attempt_kill event.  Hence, RMApp won't be able to move to 
 KILLED state and stays at KILLING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2855) Use local date format to show app date time ,

2014-11-11 Thread Li Junjun (JIRA)

Li Junjun created YARN-2855:
---

 Summary: Use local date format to show app date time ,
 Key: YARN-2855
 URL: https://issues.apache.org/jira/browse/YARN-2855
 Project: Hadoop YARN
  Issue Type: Wish
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Li Junjun
Priority: Minor


in yarn.dt.plugins.js  
function renderHadoopDate use toUTCString . 
I'm in China,  so  I need to add 8 hours in my mind every time!

I wish use toLocaleString() to format Date instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2855) Wish yarn web app use local date format to show app date time

2014-11-11 Thread Li Junjun (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Junjun updated YARN-2855:

Summary: Wish yarn web app use local date format to show app date time   
(was: Use local date format to show app date time ,)

 Wish yarn web app use local date format to show app date time 
 --

 Key: YARN-2855
 URL: https://issues.apache.org/jira/browse/YARN-2855
 Project: Hadoop YARN
  Issue Type: Wish
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Li Junjun
Priority: Minor

 in yarn.dt.plugins.js  
 function renderHadoopDate use toUTCString . 
 I'm in China,  so  I need to add 8 hours in my mind every time!
 I wish use toLocaleString() to format Date instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2838) Issues with TimeLineServer (Application History)

2014-11-11 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207562#comment-14207562
 ] 

Naganarasimha G R commented on YARN-2838:
-

Hi [~zjshen]
  I will go through it (YARN-2033), but felt like some issues still stand 
valid even if plan to continue as timeline server itself.
{quote}
# Whatever the CLI command user executes is historyserver or timelineserver it 
looks 
like ApplicationHistoryServer only run. So can we modify the name of the class 
ApplicationHistoryServer to TimelineHistoryServer (or any other suitable name 
as 
it seems like any command user runs ApplicationHistoryServer is started)
# Instead of the Starting the History Server anyway... deprecated msg, can we 
have 
Starting the Timeline History Server anyway
# Based on start or stop, deprecated message should get modified to Starting 
the 
Timeline History Server anyway... or Stopping the Timeline History Server 
anyway...
{quote}
So if you comment on the individual issues/points would like to start fixing as 
part of this jira

There is also a 4th issue which i mentioned 
{quote}
Missed to add point 4 : In YARNClientIMPL;history data can be either got from 
HistoryServer (old manager) or from TimeLineServer (new)
So historyServiceEnabled flag needs to check for both Timeline server 
configurations and ApplicationHistoryServer configurations, as data can be got 
from either of them.
{quote}
I think this is also related to the issue which you mentioned ??We still didn't 
integrate TimelineClient and AHSClient, the latter of which is RPC interface of 
getting generic history information via RPC interface.??. But any way we need 
to fix this issue also right ? so already any jira is raised or shall i work as 
part of this jira ? 

And also please inform if this issue needs to be split into mulitple jiras 
(apart from documentation which you have already raised) would like to split 
and work on them.
As already i have started looking into these issues, was also planning to work 
on documentation. If you don't mind can you assign the issue (YARN-2854) to me ?

 Issues with TimeLineServer (Application History)
 

 Key: YARN-2838
 URL: https://issues.apache.org/jira/browse/YARN-2838
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.5.1
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Attachments: IssuesInTimelineServer.pdf


 Few issues in usage of Timeline server for generic application history access



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2855) Wish yarn web app use local date format to show app date time


[ 
https://issues.apache.org/jira/browse/YARN-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207610#comment-14207610
 ] 

Karthik Kambatla commented on YARN-2855:


Duplicate of YARN-570?

 Wish yarn web app use local date format to show app date time 
 --

 Key: YARN-2855
 URL: https://issues.apache.org/jira/browse/YARN-2855
 Project: Hadoop YARN
  Issue Type: Wish
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Li Junjun
Priority: Minor

 in yarn.dt.plugins.js  
 function renderHadoopDate use toUTCString . 
 I'm in China,  so  I need to add 8 hours in my mind every time!
 I wish use toLocaleString() to format Date instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2855) Wish yarn web app use local date format to show app date time

2014-11-11 Thread Li Junjun (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207624#comment-14207624
 ] 

Li Junjun commented on YARN-2855:
-

yes! I closed it !

 Wish yarn web app use local date format to show app date time 
 --

 Key: YARN-2855
 URL: https://issues.apache.org/jira/browse/YARN-2855
 Project: Hadoop YARN
  Issue Type: Wish
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Li Junjun
Priority: Minor
 Fix For: 2.7.0


 in yarn.dt.plugins.js  
 function renderHadoopDate use toUTCString . 
 I'm in China,  so  I need to add 8 hours in my mind every time!
 I wish use toLocaleString() to format Date instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN


 [ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash reassigned YARN-1964:
--

Assignee: Ravi Prakash  (was: Abin Shahab)

 Create Docker analog of the LinuxContainerExecutor in YARN
 --

 Key: YARN-1964
 URL: https://issues.apache.org/jira/browse/YARN-1964
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.2.0
Reporter: Arun C Murthy
Assignee: Ravi Prakash
 Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
 YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
 YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
 yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
 yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
 yarn-1964-docker.patch, yarn-1964-docker.patch


 Docker (https://www.docker.io/) is, increasingly, a very popular container 
 technology.
 In context of YARN, the support for Docker will provide a very elegant 
 solution to allow applications to *package* their software into a Docker 
 container (entire Linux file system incl. custom versions of perl, python 
 etc.) and use it as a blueprint to launch all their YARN containers with 
 requisite software environment. This provides both consistency (all YARN 
 containers will have the same software environment) and isolation (no 
 interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2856) Application recovery throw InvalidStateTransitonException: Invalid event: ATTEMPT_KILLED at ACCEPTED

2014-11-11 Thread Rohith (JIRA)

Rohith created YARN-2856:


 Summary: Application recovery throw 
InvalidStateTransitonException: Invalid event: ATTEMPT_KILLED at ACCEPTED
 Key: YARN-2856
 URL: https://issues.apache.org/jira/browse/YARN-2856
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith


It is observed that recovering an application with its attempt KILLED final 
state throw below exception. And application remain in accepted state forever.
{code}
2014-11-12 02:34:10,602 | ERROR | AsyncDispatcher event handler | Can't handle 
this event at current state | 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:673)
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
ATTEMPT_KILLED at ACCEPTED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:671)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:90)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:730)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:714)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2236) Shared Cache uploader service on the Node Manager

2014-11-11 Thread Sangjin Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2236:
--
Attachment: YARN-2236-trunk-v6.patch

v.6 patch posted.

Again, to see the diff against the trunk, see 
https://github.com/ctrezzo/hadoop/compare/trunk...sharedcache-5-YARN-2236-uploader

To see the diff between v.5 and v.6, see 
https://github.com/ctrezzo/hadoop/commit/a74f38cf3e3de824b3c6ced327acbe8e3937aef0

 Shared Cache uploader service on the Node Manager
 -

 Key: YARN-2236
 URL: https://issues.apache.org/jira/browse/YARN-2236
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2236-trunk-v1.patch, YARN-2236-trunk-v2.patch, 
 YARN-2236-trunk-v3.patch, YARN-2236-trunk-v4.patch, YARN-2236-trunk-v5.patch, 
 YARN-2236-trunk-v6.patch


 Implement the shared cache uploader service on the node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2856) Application recovery throw InvalidStateTransitonException: Invalid event: ATTEMPT_KILLED at ACCEPTED

2014-11-11 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207744#comment-14207744
 ] 

Rohith commented on YARN-2856:
--

It is possible event ATTEMPT_KILLED can come to RMApp while recovering the 
attempt with KILLED state. This event need to be handled.

 Application recovery throw InvalidStateTransitonException: Invalid event: 
 ATTEMPT_KILLED at ACCEPTED
 

 Key: YARN-2856
 URL: https://issues.apache.org/jira/browse/YARN-2856
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith

 It is observed that recovering an application with its attempt KILLED final 
 state throw below exception. And application remain in accepted state forever.
 {code}
 2014-11-12 02:34:10,602 | ERROR | AsyncDispatcher event handler | Can't 
 handle this event at current state | 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:673)
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ATTEMPT_KILLED at ACCEPTED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:90)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:730)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:714)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:745)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2236) Shared Cache uploader service on the Node Manager

2014-11-11 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207748#comment-14207748
 ] 

Sangjin Lee commented on YARN-2236:
---

Karthik, the v.6 patch should address all of your comments except #8.

As for #8, it is true that the event handler is bit extraneous. But from the 
code standpoint, it is pretty clean and elegant. We just initialize the 
SharedCacheUploadService, and ContainerImpl can simply publish the event when 
needed. It also makes the coupling between SharedCacheUploadService and 
ContainerImpl loose.

It is possible to have ContainerImpl use SharedCacheUploadService directly, but 
then the SharedCacheUploadService needs to be passed into the ContainerImpl 
constructor so it can be invoked directly. So all in all, I feel that the 
current approach is as clean as the alternative, if not cleaner. Let me know 
your thoughts. Thanks!

 Shared Cache uploader service on the Node Manager
 -

 Key: YARN-2236
 URL: https://issues.apache.org/jira/browse/YARN-2236
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2236-trunk-v1.patch, YARN-2236-trunk-v2.patch, 
 YARN-2236-trunk-v3.patch, YARN-2236-trunk-v4.patch, YARN-2236-trunk-v5.patch, 
 YARN-2236-trunk-v6.patch


 Implement the shared cache uploader service on the node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2236) Shared Cache uploader service on the Node Manager