date:20140613


[ 
https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030292#comment-14030292
 ] 

Hadoop QA commented on YARN-2152:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650228/YARN-2152.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 17 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3979//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3979//console

This message is automatically generated.

 Recover missing container information
 -

 Key: YARN-2152
 URL: https://issues.apache.org/jira/browse/YARN-2152
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2152.1.patch, YARN-2152.1.patch, YARN-2152.2.patch


 Container information such as container priority and container start time 
 cannot be recovered because NM container today lacks such container 
 information to send across on NM registration when RM recovery happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-2144) Add logs when preemption occurs

2014-06-13 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned YARN-2144:


Assignee: Wangda Tan

 Add logs when preemption occurs
 ---

 Key: YARN-2144
 URL: https://issues.apache.org/jira/browse/YARN-2144
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.5.0
Reporter: Tassapol Athiapinya
Assignee: Wangda Tan

 There should be easy-to-read logs when preemption does occur. 
 1. For debugging purpose, RM should log this.
 2. For administrative purpose, RM webpage should have a page to show recent 
 preemption events.
 RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1919) Log yarn.resourcemanager.cluster-id is required for HA instead of throwing NPE


[ 
https://issues.apache.org/jira/browse/YARN-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030433#comment-14030433
 ] 

Tsuyoshi OZAWA commented on YARN-1919:
--

Thanks for the review, Jian. [~kkambatl], could you check it?

 Log yarn.resourcemanager.cluster-id is required for HA instead of throwing NPE
 --

 Key: YARN-1919
 URL: https://issues.apache.org/jira/browse/YARN-1919
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0, 2.4.0
Reporter: Devaraj K
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Attachments: YARN-1919.1.patch


 {code:xml}
 2014-04-09 16:14:16,392 WARN org.apache.hadoop.service.AbstractService: When 
 stopping the service 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService : 
 java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceStop(EmbeddedElectorService.java:108)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
   at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:122)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:232)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1038)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2155) FairScheduler: Incorrect threshold check for preemption


[ 
https://issues.apache.org/jira/browse/YARN-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030524#comment-14030524
 ] 

Hudson commented on YARN-2155:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #582 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/582/])
YARN-2155. FairScheduler: Incorrect threshold check for preemption. (Wei Yan 
via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602295)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java


 FairScheduler: Incorrect threshold check for preemption
 ---

 Key: YARN-2155
 URL: https://issues.apache.org/jira/browse/YARN-2155
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.5.0

 Attachments: YARN-2155.patch


 {code}
 private boolean shouldAttemptPreemption() {
   if (preemptionEnabled) {
 return (preemptionUtilizationThreshold  Math.max(
 (float) rootMetrics.getAvailableMB() / clusterResource.getMemory(),
 (float) rootMetrics.getAvailableVirtualCores() /
 clusterResource.getVirtualCores()));
   }
   return false;
 }
 {code}
 preemptionUtilizationThreshould should be compared with allocatedResource 
 instead of availableResource.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1702) Expose kill app functionality as part of RM web services


[ 
https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030528#comment-14030528
 ] 

Hudson commented on YARN-1702:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #582 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/582/])
YARN-1702. Added kill app functionality to RM web services. Contributed by 
Varun Vasudev. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602298)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm


 Expose kill app functionality as part of RM web services
 

 Key: YARN-1702
 URL: https://issues.apache.org/jira/browse/YARN-1702
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.5.0

 Attachments: apache-yarn-1702.10.patch, apache-yarn-1702.11.patch, 
 apache-yarn-1702.12.patch, apache-yarn-1702.13.patch, 
 apache-yarn-1702.14.patch, apache-yarn-1702.2.patch, 
 apache-yarn-1702.3.patch, apache-yarn-1702.4.patch, apache-yarn-1702.5.patch, 
 apache-yarn-1702.7.patch, apache-yarn-1702.8.patch, apache-yarn-1702.9.patch


 Expose functionality to kill an app via the ResourceManager web services API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2156) ApplicationMasterService#serviceStart() method has hardcoded AuthMethod.TOKEN as security configuration

2014-06-13 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030639#comment-14030639
 ] 

Daryn Sharp commented on YARN-2156:
---

A warning doesn't make sense because it implies there is something you should 
change.  There's not.  The config setting, whether explicitly set or not, is 
entirely irrelevant.  By design, yarn always uses tokens and these tokens carry 
essential information that is not otherwise obtainable for non-token 
authenticated connections.  That's why token authentication is explicitly set.

 ApplicationMasterService#serviceStart() method has hardcoded AuthMethod.TOKEN 
 as security configuration
 ---

 Key: YARN-2156
 URL: https://issues.apache.org/jira/browse/YARN-2156
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Svetozar Ivanov

 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService#serviceStart()
  method has mistakenly hardcoded AuthMethod.TOKEN as Hadoop security 
 authentication. 
 It looks like that:
 {code}
 @Override
   protected void serviceStart() throws Exception {
 Configuration conf = getConfig();
 YarnRPC rpc = YarnRPC.create(conf);
 InetSocketAddress masterServiceAddress = conf.getSocketAddr(
 YarnConfiguration.RM_SCHEDULER_ADDRESS,
 YarnConfiguration.DEFAULT_RM_SCHEDULER_ADDRESS,
 YarnConfiguration.DEFAULT_RM_SCHEDULER_PORT);
 Configuration serverConf = conf;
 // If the auth is not-simple, enforce it to be token-based.
 serverConf = new Configuration(conf);
 serverConf.set(
 CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION,
 SaslRpcServer.AuthMethod.TOKEN.toString());
 
 ...
 }
 {code}
 Obviously such code makes sense only if 
 CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION config setting 
 is missing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2140) Add support for network IO isolation/scheduling for containers

2014-06-13 Thread Robert Joseph Evans (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030643#comment-14030643
]

Robert Joseph Evans commented on YARN-2140:
---

We are working on similar things for storm. I am very interested in your
design, because for any streaming system to truly have a chance on YARN soft
guarantees on network I/O are critical. There are several big problems with
network I/O even if the user can effectively estimate what they will need. The
first is that the resource is not limited to a single node in the cluster. The
network has a topology and a bottlekneck can show up at any point in that
topology. So you may think you are fine because each node in a rack is not
scheduled to be using the full bandwidth that the network card(s) can support.
But you can easily have saturated the top of rack switch without knowing it.
To solve this problem you effectively have to know the topology of the
application itself. So that you can schedule the node to node network
connections within that application. if users don't know how much network they
are going to use at a high level, they will never have any idea at a low level.
But then you also have the big problem of batch being very bursty in its
network usage. The only way to solve this is going to require network hardware
support for prioritizing packets.

But I'll wait for your design before writing too much more.

Add support for network IO isolation/scheduling for containers
--

Key: YARN-2140
URL: https://issues.apache.org/jira/browse/YARN-2140
Project: Hadoop YARN
Issue Type: New Feature
Reporter: Wei Yan
Assignee: Wei Yan

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2155) FairScheduler: Incorrect threshold check for preemption


[ 
https://issues.apache.org/jira/browse/YARN-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030651#comment-14030651
 ] 

Hudson commented on YARN-2155:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1773 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1773/])
YARN-2155. FairScheduler: Incorrect threshold check for preemption. (Wei Yan 
via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602295)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java


 FairScheduler: Incorrect threshold check for preemption
 ---

 Key: YARN-2155
 URL: https://issues.apache.org/jira/browse/YARN-2155
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.5.0

 Attachments: YARN-2155.patch


 {code}
 private boolean shouldAttemptPreemption() {
   if (preemptionEnabled) {
 return (preemptionUtilizationThreshold  Math.max(
 (float) rootMetrics.getAvailableMB() / clusterResource.getMemory(),
 (float) rootMetrics.getAvailableVirtualCores() /
 clusterResource.getVirtualCores()));
   }
   return false;
 }
 {code}
 preemptionUtilizationThreshould should be compared with allocatedResource 
 instead of availableResource.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1702) Expose kill app functionality as part of RM web services


[ 
https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030655#comment-14030655
 ] 

Hudson commented on YARN-1702:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1773 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1773/])
YARN-1702. Added kill app functionality to RM web services. Contributed by 
Varun Vasudev. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602298)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm


 Expose kill app functionality as part of RM web services
 

 Key: YARN-1702
 URL: https://issues.apache.org/jira/browse/YARN-1702
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.5.0

 Attachments: apache-yarn-1702.10.patch, apache-yarn-1702.11.patch, 
 apache-yarn-1702.12.patch, apache-yarn-1702.13.patch, 
 apache-yarn-1702.14.patch, apache-yarn-1702.2.patch, 
 apache-yarn-1702.3.patch, apache-yarn-1702.4.patch, apache-yarn-1702.5.patch, 
 apache-yarn-1702.7.patch, apache-yarn-1702.8.patch, apache-yarn-1702.9.patch


 Expose functionality to kill an app via the ResourceManager web services API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2158) TestRMWebServicesAppsModification sometimes fails in trunk

2014-06-13 Thread Ted Yu (JIRA)

Ted Yu created YARN-2158:


 Summary: TestRMWebServicesAppsModification sometimes fails in trunk
 Key: YARN-2158
 URL: https://issues.apache.org/jira/browse/YARN-2158
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


From https://builds.apache.org/job/Hadoop-Yarn-trunk/582/console :
{code}
Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 66.144 sec  
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
testSingleAppKill[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
  Time elapsed: 2.297 sec   FAILURE!
java.lang.AssertionError: app state incorrect
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.verifyAppStateJson(TestRMWebServicesAppsModification.java:398)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:289)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2155) FairScheduler: Incorrect threshold check for preemption


[ 
https://issues.apache.org/jira/browse/YARN-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030721#comment-14030721
 ] 

Hudson commented on YARN-2155:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1800 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1800/])
YARN-2155. FairScheduler: Incorrect threshold check for preemption. (Wei Yan 
via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602295)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java


 FairScheduler: Incorrect threshold check for preemption
 ---

 Key: YARN-2155
 URL: https://issues.apache.org/jira/browse/YARN-2155
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.5.0

 Attachments: YARN-2155.patch


 {code}
 private boolean shouldAttemptPreemption() {
   if (preemptionEnabled) {
 return (preemptionUtilizationThreshold  Math.max(
 (float) rootMetrics.getAvailableMB() / clusterResource.getMemory(),
 (float) rootMetrics.getAvailableVirtualCores() /
 clusterResource.getVirtualCores()));
   }
   return false;
 }
 {code}
 preemptionUtilizationThreshould should be compared with allocatedResource 
 instead of availableResource.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1702) Expose kill app functionality as part of RM web services


[ 
https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030725#comment-14030725
 ] 

Hudson commented on YARN-1702:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1800 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1800/])
YARN-1702. Added kill app functionality to RM web services. Contributed by 
Varun Vasudev. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602298)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm


 Expose kill app functionality as part of RM web services
 

 Key: YARN-1702
 URL: https://issues.apache.org/jira/browse/YARN-1702
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.5.0

 Attachments: apache-yarn-1702.10.patch, apache-yarn-1702.11.patch, 
 apache-yarn-1702.12.patch, apache-yarn-1702.13.patch, 
 apache-yarn-1702.14.patch, apache-yarn-1702.2.patch, 
 apache-yarn-1702.3.patch, apache-yarn-1702.4.patch, apache-yarn-1702.5.patch, 
 apache-yarn-1702.7.patch, apache-yarn-1702.8.patch, apache-yarn-1702.9.patch


 Expose functionality to kill an app via the ResourceManager web services API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-13 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2022:
--

Attachment: YARN-2022.5.patch

Thank you Mayank. I have updated the patch as per the comments.

Also I did test on real cluster, and found that AM Container are spared by the 
proportional policy. Basic scenarios are tested as part of this.

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2140) Add support for network IO isolation/scheduling for containers

2014-06-13 Thread Wei Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030750#comment-14030750
 ] 

Wei Yan commented on YARN-2140:
---

Thanks for the comments, [~revans2].

 Add support for network IO isolation/scheduling for containers
 --

 Key: YARN-2140
 URL: https://issues.apache.org/jira/browse/YARN-2140
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Wei Yan
Assignee: Wei Yan





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1885) RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts


 [ 
https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1885:
--

Summary: RM may not send the app-finished signal after RM restart to some 
nodes where the application ran before RM restarts  (was: RM may not send the 
finished signal to some nodes where the application ran after RM restarts)

 RM may not send the app-finished signal after RM restart to some nodes where 
 the application ran before RM restarts
 ---

 Key: YARN-1885
 URL: https://issues.apache.org/jira/browse/YARN-1885
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Wangda Tan
 Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, 
 YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch


 During our HA testing we have seen cases where yarn application logs are not 
 available through the cli but i can look at AM logs through the UI. RM was 
 also being restarted in the background as the application was running.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy


[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030800#comment-14030800
 ] 

Hadoop QA commented on YARN-2022:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650318/YARN-2022.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3980//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3980//console

This message is automatically generated.

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2146) Yarn logs aggregation error

2014-06-13 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030812#comment-14030812
 ] 

Xuan Gong commented on YARN-2146:
-

[~airbots] 
Hey, Chen. Have you figured out why this happens ? I am very curious.

 Yarn logs aggregation error
 ---

 Key: YARN-2146
 URL: https://issues.apache.org/jira/browse/YARN-2146
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chen He

 when I run yarn logs -applicationId application_xxx  /tmp/application_xxx. 
 It creates file, also shows part of logs on the terminal screen, and reports 
 following error:   
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
   at java.lang.Long.parseLong(Long.java:430)
   at java.lang.Long.parseLong(Long.java:483)
   at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:566)
   at 
 org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:139)
   at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:137)
   at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:199



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-2158) TestRMWebServicesAppsModification sometimes fails in trunk

2014-06-13 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev reassigned YARN-2158:
---

Assignee: Varun Vasudev

 TestRMWebServicesAppsModification sometimes fails in trunk
 --

 Key: YARN-2158
 URL: https://issues.apache.org/jira/browse/YARN-2158
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Varun Vasudev
Priority: Minor

 From https://builds.apache.org/job/Hadoop-Yarn-trunk/582/console :
 {code}
 Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 66.144 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
 testSingleAppKill[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
   Time elapsed: 2.297 sec   FAILURE!
 java.lang.AssertionError: app state incorrect
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.verifyAppStateJson(TestRMWebServicesAppsModification.java:398)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:289)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-06-13 Thread Anubhav Dhoot (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1365:


Attachment: YARN-1365.005.patch

Addressed Jian's comments.

updated finishApplicationMaster to return resync when application is not 
registered as per the agreement.

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, 
 YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-06-13 Thread Anubhav Dhoot (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030874#comment-14030874
 ] 

Anubhav Dhoot commented on YARN-1365:
-

Hi [~jianhe] I addressed all your comments except
we can print the current state of RMAppAttempt also which will be useful for 
debugging

There is no easy way to get to RMAppAttempt at that point. i dont want to add a 
dependancy on it just for logging. Let me know if you think there is an easy 
way to get to it.

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, 
 YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2157) Document YARN metrics

2014-06-13 Thread Akira AJISAKA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-2157:


Attachment: YARN-2157.patch

Attaching a patch.

 Document YARN metrics
 -

 Key: YARN-2157
 URL: https://issues.apache.org/jira/browse/YARN-2157
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
 Attachments: YARN-2157.patch


 YARN-side of HADOOP-6350. Add YARN metrics to Metrics document.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2157) Document YARN metrics


[ 
https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030943#comment-14030943
 ] 

Jian He commented on YARN-2157:
---

[~ajisakaa], Thanks for working on this ! this will be useful.

 Document YARN metrics
 -

 Key: YARN-2157
 URL: https://issues.apache.org/jira/browse/YARN-2157
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
 Attachments: YARN-2157.patch


 YARN-side of HADOOP-6350. Add YARN metrics to Metrics document.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2052) ContainerId creation after work preserving restart is broken


 [ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2052:
--

Target Version/s: 2.5.0

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA

 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart


[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030955#comment-14030955
 ] 

Hadoop QA commented on YARN-1365:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650337/YARN-1365.005.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3981//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3981//console

This message is automatically generated.

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, 
 YARN-1365.005.patch, YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2157) Document YARN metrics


[ 
https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030958#comment-14030958
 ] 

Hadoop QA commented on YARN-2157:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650339/YARN-2157.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3982//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3982//console

This message is automatically generated.

 Document YARN metrics
 -

 Key: YARN-2157
 URL: https://issues.apache.org/jira/browse/YARN-2157
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
 Attachments: YARN-2157.patch


 YARN-side of HADOOP-6350. Add YARN metrics to Metrics document.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

[
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030966#comment-14030966
]

Vinod Kumar Vavilapalli commented on YARN-2052:
---

bq. e.g. container_XXX_1000 after epoch 1.
This scheme won't work with a single reserved digit for epochs and a large
number of restarts over time.

Here's my summary of what I think we should do:

The current ContainerID format is
{code}
ContainerID {
applicationAttemptID
containerIDInt
}
{code}
Let's just add a new field
{code}
+ rmIdentifier
{code}

Old code (state-store, history-server etc) will not read it and that's fine.
The only problem is users who are interpreting container_ID strings themselves.
That is NOT supported. We should modify ConverterUtils to support the
new-field, and that should do.

Thoughts?

ContainerId creation after work preserving restart is broken

Key: YARN-2052
URL: https://issues.apache.org/jira/browse/YARN-2052
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA

Container ids are made unique by using the app identifier and appending a
monotonically increasing sequence number to it. Since container creation is a
high churn activity the RM does not store the sequence number per app. So
after restart it does not know what the new sequence number should be for new
allocations.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken


[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030971#comment-14030971
 ] 

Vinod Kumar Vavilapalli commented on YARN-2052:
---

I forgot to add one more note that I myself ran into in an offline discussion 
with [~jianhe]. The new field can be RMIdentifier which today is backed by the 
start-timestamp. But two RMs (active/standby) started at the same time can 
potentially clash w.r.t time-stamps. We can chose this to be 
timestamp+host-name etc or simply a UUID..

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA

 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart


[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030979#comment-14030979
 ] 

Hadoop QA commented on YARN-1365:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650340/YARN-1365.005.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3983//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3983//console

This message is automatically generated.

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, 
 YARN-1365.005.patch, YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1885) RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts


[ 
https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031001#comment-14031001
 ] 

Jian He commented on YARN-1885:
---

bq.  The application list should only be respected when the node is not 
inactive? 
For nodes that expired but rejoin with earlier running applications, if the 
application by this time has completed, I think we should also send the 
app-finished signal ?

 RM may not send the app-finished signal after RM restart to some nodes where 
 the application ran before RM restarts
 ---

 Key: YARN-1885
 URL: https://issues.apache.org/jira/browse/YARN-1885
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Wangda Tan
 Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, 
 YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch


 During our HA testing we have seen cases where yarn application logs are not 
 available through the cli but i can look at AM logs through the UI. RM was 
 also being restarted in the background as the application was running.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures


[ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031007#comment-14031007
 ] 

Hadoop QA commented on YARN-2074:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650352/YARN-2074.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3984//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3984//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3984//console

This message is automatically generated.

 Preemption of AM containers shouldn't count towards AM failures
 ---

 Key: YARN-2074
 URL: https://issues.apache.org/jira/browse/YARN-2074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, 
 YARN-2074.4.patch


 One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
 containers getting preempted shouldn't count towards AM failures and thus 
 shouldn't eventually fail applications.
 We should explicitly handle AM container preemption/kill as a separate issue 
 and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures


 [ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2074:
--

Attachment: YARN-2074.5.patch

Fixed the find bug warnings

 Preemption of AM containers shouldn't count towards AM failures
 ---

 Key: YARN-2074
 URL: https://issues.apache.org/jira/browse/YARN-2074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, 
 YARN-2074.4.patch, YARN-2074.5.patch


 One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
 containers getting preempted shouldn't count towards AM failures and thus 
 shouldn't eventually fail applications.
 We should explicitly handle AM container preemption/kill as a separate issue 
 and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2158) TestRMWebServicesAppsModification sometimes fails in trunk

2014-06-13 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2158:


Attachment: apache-yarn-2158.0.patch

Patch to add debugging information to the test.

 TestRMWebServicesAppsModification sometimes fails in trunk
 --

 Key: YARN-2158
 URL: https://issues.apache.org/jira/browse/YARN-2158
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Varun Vasudev
Priority: Minor
 Attachments: apache-yarn-2158.0.patch


 From https://builds.apache.org/job/Hadoop-Yarn-trunk/582/console :
 {code}
 Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 66.144 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
 testSingleAppKill[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
   Time elapsed: 2.297 sec   FAILURE!
 java.lang.AssertionError: app state incorrect
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.verifyAppStateJson(TestRMWebServicesAppsModification.java:398)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:289)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2000) Fix ordering of starting services inside the RM


[ 
https://issues.apache.org/jira/browse/YARN-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031043#comment-14031043
 ] 

Jian He commented on YARN-2000:
---

Probably we can have state-store stop last so that all the other services are 
stopped first and won't accept more requests and send events to state-store.

 Fix ordering of starting services inside the RM
 ---

 Key: YARN-2000
 URL: https://issues.apache.org/jira/browse/YARN-2000
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He

 The order of starting services in RM would be:
 - Recovery of the app/attempts
 - Start the scheduler and add scheduler app/attempts
 - Start ResourceTrackerService and re-populate the containers in scheduler 
 based on the containers info from NMs 
 - ApplicationMasterService either don’t start or start but block until all 
 the previous NMs registers.
 Other than these, there are other services like ClientRMService, Webapps 
 which we need to  think about the order too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2000) Fix ordering of starting services inside the RM


[ 
https://issues.apache.org/jira/browse/YARN-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031080#comment-14031080
 ] 

Tsuyoshi OZAWA commented on YARN-2000:
--

It sounds reasonable to me.

 Fix ordering of starting services inside the RM
 ---

 Key: YARN-2000
 URL: https://issues.apache.org/jira/browse/YARN-2000
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He

 The order of starting services in RM would be:
 - Recovery of the app/attempts
 - Start the scheduler and add scheduler app/attempts
 - Start ResourceTrackerService and re-populate the containers in scheduler 
 based on the containers info from NMs 
 - ApplicationMasterService either don’t start or start but block until all 
 the previous NMs registers.
 Other than these, there are other services like ClientRMService, Webapps 
 which we need to  think about the order too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2158) TestRMWebServicesAppsModification sometimes fails in trunk


[ 
https://issues.apache.org/jira/browse/YARN-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031082#comment-14031082
 ] 

Hadoop QA commented on YARN-2158:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12650365/apache-yarn-2158.0.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3985//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3985//console

This message is automatically generated.

 TestRMWebServicesAppsModification sometimes fails in trunk
 --

 Key: YARN-2158
 URL: https://issues.apache.org/jira/browse/YARN-2158
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Varun Vasudev
Priority: Minor
 Attachments: apache-yarn-2158.0.patch


 From https://builds.apache.org/job/Hadoop-Yarn-trunk/582/console :
 {code}
 Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 66.144 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
 testSingleAppKill[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
   Time elapsed: 2.297 sec   FAILURE!
 java.lang.AssertionError: app state incorrect
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.verifyAppStateJson(TestRMWebServicesAppsModification.java:398)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:289)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures


[ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031083#comment-14031083
 ] 

Hadoop QA commented on YARN-2074:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650366/YARN-2074.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3986//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3986//console

This message is automatically generated.

 Preemption of AM containers shouldn't count towards AM failures
 ---

 Key: YARN-2074
 URL: https://issues.apache.org/jira/browse/YARN-2074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, 
 YARN-2074.4.patch, YARN-2074.5.patch


 One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
 containers getting preempted shouldn't count towards AM failures and thus 
 shouldn't eventually fail applications.
 We should explicitly handle AM container preemption/kill as a separate issue 
 and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2159) allocateContainer() in SchedulerNode needs a clearer LOG.info message

Ray Chiang created YARN-2159:


 Summary: allocateContainer() in SchedulerNode needs a clearer 
LOG.info message
 Key: YARN-2159
 URL: https://issues.apache.org/jira/browse/YARN-2159
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor


This bit of code:

LOG.info(Assigned container  + container.getId() +  of capacity 
+ container.getResource() +  on host  + rmNode.getNodeAddress()
+ , which currently has  + numContainers +  containers, 
+ getUsedResource() +  used and  + getAvailableResource()
+  available);

results in a line like:

2014-05-30 16:17:43,573 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: 
Assigned container container_14000_0009_01_00 of capacity 
memory:1536, vCores:1 on host machine.host.domain.com:8041, which currently 
has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 
available

That message is fine in most cases, but looks pretty bad after the last 
available allocation, since it says something like vCores:0 available.

Perhaps one of the following phrasings is better?

  - which has 18 containers, memory:27648, vCores:18 used and memory:3072, 
vCores:0 available after allocation



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2159) allocateContainer() in SchedulerNode needs a clearer LOG.info message


 [ 
https://issues.apache.org/jira/browse/YARN-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-2159:
-

Description: 
This bit of code:

{quote}
LOG.info(Assigned container  + container.getId() +  of capacity 
+ container.getResource() +  on host  + rmNode.getNodeAddress()
+ , which currently has  + numContainers +  containers, 
+ getUsedResource() +  used and  + getAvailableResource()
+  available);
{quote}

results in a line like:

{quote}
2014-05-30 16:17:43,573 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: 
Assigned container container_14000_0009_01_00 of capacity 
memory:1536, vCores:1 on host machine.host.domain.com:8041, which currently 
has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 
available
{quote}

That message is fine in most cases, but looks pretty bad after the last 
available allocation, since it says something like vCores:0 available.

Perhaps one of the following phrasings is better?

  - which has 18 containers, memory:27648, vCores:18 used and memory:3072, 
vCores:0 available after allocation

  was:
This bit of code:

LOG.info(Assigned container  + container.getId() +  of capacity 
+ container.getResource() +  on host  + rmNode.getNodeAddress()
+ , which currently has  + numContainers +  containers, 
+ getUsedResource() +  used and  + getAvailableResource()
+  available);

results in a line like:

2014-05-30 16:17:43,573 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: 
Assigned container container_14000_0009_01_00 of capacity 
memory:1536, vCores:1 on host machine.host.domain.com:8041, which currently 
has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 
available

That message is fine in most cases, but looks pretty bad after the last 
available allocation, since it says something like vCores:0 available.

Perhaps one of the following phrasings is better?

  - which has 18 containers, memory:27648, vCores:18 used and memory:3072, 
vCores:0 available after allocation


 allocateContainer() in SchedulerNode needs a clearer LOG.info message
 -

 Key: YARN-2159
 URL: https://issues.apache.org/jira/browse/YARN-2159
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor
  Labels: supportability

 This bit of code:
 {quote}
 LOG.info(Assigned container  + container.getId() +  of capacity 
 + container.getResource() +  on host  + rmNode.getNodeAddress()
 + , which currently has  + numContainers +  containers, 
 + getUsedResource() +  used and  + getAvailableResource()
 +  available);
 {quote}
 results in a line like:
 {quote}
 2014-05-30 16:17:43,573 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: 
 Assigned container container_14000_0009_01_00 of capacity 
 memory:1536, vCores:1 on host machine.host.domain.com:8041, which currently 
 has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 
 available
 {quote}
 That message is fine in most cases, but looks pretty bad after the last 
 available allocation, since it says something like vCores:0 available.
 Perhaps one of the following phrasings is better?
   - which has 18 containers, memory:27648, vCores:18 used and 
 memory:3072, vCores:0 available after allocation



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2159) allocateContainer() in SchedulerNode needs a clearer LOG.info message


 [ 
https://issues.apache.org/jira/browse/YARN-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-2159:
-

Description: 
This bit of code:

{quote}
LOG.info(Assigned container  + container.getId() +  of capacity 
+ container.getResource() +  on host  + rmNode.getNodeAddress()
+ , which currently has  + numContainers +  containers, 
+ getUsedResource() +  used and  + getAvailableResource()
+  available);
{quote}

results in a line like:

{quote}
2014-05-30 16:17:43,573 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: 
Assigned container container_14000_0009_01_00 of capacity 
memory:1536, vCores:1 on host machine.host.domain.com:8041, which currently 
has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 
available
{quote}

That message is fine in most cases, but looks pretty bad after the last 
available allocation, since it says something like vCores:0 available.

Here is one suggested phrasing

  - which has 18 containers, memory:27648, vCores:18 used and memory:3072, 
vCores:0 available after allocation

  was:
This bit of code:

{quote}
LOG.info(Assigned container  + container.getId() +  of capacity 
+ container.getResource() +  on host  + rmNode.getNodeAddress()
+ , which currently has  + numContainers +  containers, 
+ getUsedResource() +  used and  + getAvailableResource()
+  available);
{quote}

results in a line like:

{quote}
2014-05-30 16:17:43,573 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: 
Assigned container container_14000_0009_01_00 of capacity 
memory:1536, vCores:1 on host machine.host.domain.com:8041, which currently 
has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 
available
{quote}

That message is fine in most cases, but looks pretty bad after the last 
available allocation, since it says something like vCores:0 available.

Perhaps one of the following phrasings is better?

  - which has 18 containers, memory:27648, vCores:18 used and memory:3072, 
vCores:0 available after allocation


 allocateContainer() in SchedulerNode needs a clearer LOG.info message
 -

 Key: YARN-2159
 URL: https://issues.apache.org/jira/browse/YARN-2159
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor
  Labels: supportability

 This bit of code:
 {quote}
 LOG.info(Assigned container  + container.getId() +  of capacity 
 + container.getResource() +  on host  + rmNode.getNodeAddress()
 + , which currently has  + numContainers +  containers, 
 + getUsedResource() +  used and  + getAvailableResource()
 +  available);
 {quote}
 results in a line like:
 {quote}
 2014-05-30 16:17:43,573 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: 
 Assigned container container_14000_0009_01_00 of capacity 
 memory:1536, vCores:1 on host machine.host.domain.com:8041, which currently 
 has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 
 available
 {quote}
 That message is fine in most cases, but looks pretty bad after the last 
 available allocation, since it says something like vCores:0 available.
 Here is one suggested phrasing
   - which has 18 containers, memory:27648, vCores:18 used and 
 memory:3072, vCores:0 available after allocation



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2146) Yarn logs aggregation error


[ 
https://issues.apache.org/jira/browse/YARN-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031120#comment-14031120
 ] 

Chen He commented on YARN-2146:
---

I think it is because of mismatching during log parsing. I found this problem 
when I was running a Pig on Tez job running on Hadoop-2.4.

 Yarn logs aggregation error
 ---

 Key: YARN-2146
 URL: https://issues.apache.org/jira/browse/YARN-2146
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chen He

 when I run yarn logs -applicationId application_xxx  /tmp/application_xxx. 
 It creates file, also shows part of logs on the terminal screen, and reports 
 following error:   
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
   at java.lang.Long.parseLong(Long.java:430)
   at java.lang.Long.parseLong(Long.java:483)
   at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:566)
   at 
 org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:139)
   at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:137)
   at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:199



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2146) Yarn logs aggregation error

2014-06-13 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031162#comment-14031162
 ] 

Xuan Gong commented on YARN-2146:
-

bq. because of mismatching during log parsing

When we aggregates the logs into HDFS, we write the file_name and size of the 
files before we write the log contents.
When it tries to read back size of the log, but somehow the mismatching 
happens. That cause the exception.

Not sure why this can happen.


 Yarn logs aggregation error
 ---

 Key: YARN-2146
 URL: https://issues.apache.org/jira/browse/YARN-2146
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chen He

 when I run yarn logs -applicationId application_xxx  /tmp/application_xxx. 
 It creates file, also shows part of logs on the terminal screen, and reports 
 following error:   
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
   at java.lang.Long.parseLong(Long.java:430)
   at java.lang.Long.parseLong(Long.java:483)
   at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:566)
   at 
 org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:139)
   at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:137)
   at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:199



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2146) Yarn logs aggregation error


[ 
https://issues.apache.org/jira/browse/YARN-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031167#comment-14031167
 ] 

Chen He commented on YARN-2146:
---

If you take a look of the log aggregation, you may get some hint there.
What if the size of file is not right?

 Yarn logs aggregation error
 ---

 Key: YARN-2146
 URL: https://issues.apache.org/jira/browse/YARN-2146
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chen He

 when I run yarn logs -applicationId application_xxx  /tmp/application_xxx. 
 It creates file, also shows part of logs on the terminal screen, and reports 
 following error:   
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
   at java.lang.Long.parseLong(Long.java:430)
   at java.lang.Long.parseLong(Long.java:483)
   at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:566)
   at 
 org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:139)
   at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:137)
   at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:199



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2146) Yarn logs aggregation error


[ 
https://issues.apache.org/jira/browse/YARN-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031172#comment-14031172
 ] 

Chen He commented on YARN-2146:
---

It is the same problem in YARN-1670.

 Yarn logs aggregation error
 ---

 Key: YARN-2146
 URL: https://issues.apache.org/jira/browse/YARN-2146
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chen He

 when I run yarn logs -applicationId application_xxx  /tmp/application_xxx. 
 It creates file, also shows part of logs on the terminal screen, and reports 
 following error:   
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
   at java.lang.Long.parseLong(Long.java:430)
   at java.lang.Long.parseLong(Long.java:483)
   at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:566)
   at 
 org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:139)
   at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:137)
   at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:199



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Reopened] (YARN-2146) Yarn logs aggregation error


 [ 
https://issues.apache.org/jira/browse/YARN-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He reopened YARN-2146:
---


 Yarn logs aggregation error
 ---

 Key: YARN-2146
 URL: https://issues.apache.org/jira/browse/YARN-2146
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chen He

 when I run yarn logs -applicationId application_xxx  /tmp/application_xxx. 
 It creates file, also shows part of logs on the terminal screen, and reports 
 following error:   
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
   at java.lang.Long.parseLong(Long.java:430)
   at java.lang.Long.parseLong(Long.java:483)
   at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:566)
   at 
 org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:139)
   at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:137)
   at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:199



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (YARN-2146) Yarn logs aggregation error


 [ 
https://issues.apache.org/jira/browse/YARN-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He resolved YARN-2146.
---

Resolution: Duplicate

 Yarn logs aggregation error
 ---

 Key: YARN-2146
 URL: https://issues.apache.org/jira/browse/YARN-2146
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chen He

 when I run yarn logs -applicationId application_xxx  /tmp/application_xxx. 
 It creates file, also shows part of logs on the terminal screen, and reports 
 following error:   
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
   at java.lang.Long.parseLong(Long.java:430)
   at java.lang.Long.parseLong(Long.java:483)
   at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:566)
   at 
 org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:139)
   at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:137)
   at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:199



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length


[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031175#comment-14031175
 ] 

Chen He commented on YARN-1670:
---

It reports similar error in YARN-2146. [~mitdesai] are working on it.

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Fix For: 3.0.0, 0.23.11, 2.4.0, 2.5.0

 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670-v4-b23.patch, YARN-1670-v4-b23.patch, YARN-1670-v4.patch, 
 YARN-1670-v4.patch, YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken


[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031223#comment-14031223
 ] 

Tsuyoshi OZAWA commented on YARN-2052:
--

[~jianhe] and [~vinodkv], thank you for the comments and suggestions!

{quote}
This scheme won't work with a single reserved digit for epochs and a large 
number of restarts over time.
{quote}

Yes, this is a case that integer overflow happens. We need to take it into 
account the case.

{quote}
Old code (state-store, history-server etc) will not read it and that's fine. 
The only problem is users who are interpreting container_ID strings themselves. 
That is NOT supported. We should modify ConverterUtils to support the 
new-field, and that should do.
{quote}

Adding RM Id + hostname as epoch sounds reasonable approach to me. If we 
suffixes the epoch to the container id, following code is also valid with old 
{{ConverterUtils.toContainerId}}:

{code}
ContainerId id = TestContainerId.newContainerId(0, 0, 0, 0);
String cid = ConverterUtils.toString(id);
ContainerId gen = ConverterUtils.toContainerId(cid + _uuid_rm1);
assertEquals(gen, id); // valid to parse even with old code
{code}

Therefore, I think {{container_XXX_000_uuid_rm1}} is better format. I'll create 
a patch based on the idea.

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA

 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart


[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031248#comment-14031248
 ] 

Jian He commented on YARN-1365:
---

Thanks for updating the patch. 

The debug logging can be wrapped with isDebugEnabled condition
{code}
LOG.debug(Skipping notifying ATTEMPT_ADDED);
{code}

The following code is removed, but schedulers#addApplication are not handling 
the case to not send app_accepted events as we do for addApplicationAttempt.
My point was we can do the same for both addApplication and 
addApplicationAttempt to not send dup events. Given this is not relevant to 
this patch itself, we can fix this separately if needed.
{code}
// ACCECPTED state can once again receive APP_ACCEPTED event, because on
// recovery the app returns ACCEPTED state and the app once again go
// through the scheduler and triggers one more APP_ACCEPTED event at
// ACCEPTED state.
.addTransition(RMAppState.ACCEPTE
{code}

This transition can never happen ? given that unregister also has to do resync.
{code}
.addTransition(RMAppAttemptState.LAUNCHED,
  EnumSet.of(RMAppAttemptState.FINAL_SAVING, 
RMAppAttemptState.FINISHED),
  RMAppAttemptEventType.UNREGISTERED, new AMUnregisteredTransition())
{code}

This piece of code is not needed, the previous launchAM internally checks the 
app state already. We can use MockRM.launchAndRegisterAM alternatively. The 
test case can be moved to TestWorkPreservingRMRestart
{code}
nm1.nodeHeartbeat(am0.getApplicationAttemptId(), 1, ContainerState.RUNNING);
am0.waitForState(RMAppAttemptState.RUNNING);
rm1.waitForState(app0.getApplicationId(), RMAppState.RUNNING);
{code}


*Just thinking*:
Does it make sense to map AMCommand(shutdown, resync) to corresponding 
exceptions? The benefit is that we don’t need to add extra fields in AMS 
protocol response and user not using AMRMClient will be forced to handle such 
condition to work with RM restart. thoughts? 

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, 
 YARN-1365.005.patch, YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken


[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031258#comment-14031258
 ] 

Tsuyoshi OZAWA commented on YARN-2052:
--

{quote}
The only problem is users who are interpreting container_ID strings themselves. 
That is NOT supported. 
{quote}

Yeah, I think it is difficult to avoid the problem. But the interpreting logic 
itself doesn't change drastically with our approach because we doesn't change 
the order of attributes. IMHO, it's acceptable approach.

BTW, I found that ConverterUtils is marked as {{@Pivate}}. Should we make the 
class {{@Public}}?

{code}
@Private
public class ConverterUtils {
{code}

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA

 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-06-13 Thread Anubhav Dhoot (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031268#comment-14031268
]

Anubhav Dhoot commented on YARN-1365:
-

Agreed. I was trying to be consistent with allocateresonse, but would prefer
exceptions. AM Client will discover it automatically instead of being hidden in
a return value. I would prefer if allocateresponse would also use exceptions
instead of AM commands. I can open a Jira for it

Will address your other comments as well.

ApplicationMasterService to allow Register and Unregister of an app that was
running before restart
---

Key: YARN-1365
URL: https://issues.apache.org/jira/browse/YARN-1365
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
Attachments: YARN-1365.001.patch, YARN-1365.002.patch,
YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch,
YARN-1365.005.patch, YARN-1365.initial.patch

For an application that was running before restart, the
ApplicationMasterService currently throws an exception when the app tries to
make the initial register or final unregister call. These should succeed and
the RMApp state machine should transition to completed like normal.
Unregistration should succeed for an app that the RM considers complete since
the RM may have died after saving completion in the store but before
notifying the AM that the AM is free to exit.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2159) allocateContainer() in SchedulerNode needs a clearer LOG.info message


 [ 
https://issues.apache.org/jira/browse/YARN-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-2159:
-

Attachment: YARN2159-01.patch

Rearrange sentence as per initial suggestion.

 allocateContainer() in SchedulerNode needs a clearer LOG.info message
 -

 Key: YARN-2159
 URL: https://issues.apache.org/jira/browse/YARN-2159
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor
  Labels: supportability
 Attachments: YARN2159-01.patch


 This bit of code:
 {quote}
 LOG.info(Assigned container  + container.getId() +  of capacity 
 + container.getResource() +  on host  + rmNode.getNodeAddress()
 + , which currently has  + numContainers +  containers, 
 + getUsedResource() +  used and  + getAvailableResource()
 +  available);
 {quote}
 results in a line like:
 {quote}
 2014-05-30 16:17:43,573 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: 
 Assigned container container_14000_0009_01_00 of capacity 
 memory:1536, vCores:1 on host machine.host.domain.com:8041, which currently 
 has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 
 available
 {quote}
 That message is fine in most cases, but looks pretty bad after the last 
 available allocation, since it says something like vCores:0 available.
 Here is one suggested phrasing
   - which has 18 containers, memory:27648, vCores:18 used and 
 memory:3072, vCores:0 available after allocation



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1885) RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts

2014-06-13 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1885:
-

Attachment: YARN-1885.patch

Thanks [~vinodkv] for your comments. I uploaded a patch addressed all your 
comments.

bq. AddNodeTransition: The application list should only be respected when the 
node is not inactive? Not sure if that is right or wrong, but that is how 
running-containers are treated today.
Currently, application will be respected no matter node is inactive or not in 
AddNodeTransition. I think it's not a regression at least, do some extra 
clean-up is not bad, do you agree?



 RM may not send the app-finished signal after RM restart to some nodes where 
 the application ran before RM restarts
 ---

 Key: YARN-1885
 URL: https://issues.apache.org/jira/browse/YARN-1885
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Wangda Tan
 Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, 
 YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, 
 YARN-1885.patch


 During our HA testing we have seen cases where yarn application logs are not 
 available through the cli but i can look at AM logs through the UI. RM was 
 also being restarted in the background as the application was running.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1885) RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts

2014-06-13 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031369#comment-14031369
 ] 

Wangda Tan commented on YARN-1885:
--

[~jianhe],
bq. For nodes that expired but rejoin with earlier running applications, if the 
application by this time has completed, I think we should also send the 
app-finished signal ?
This is behavior in existing patch.

 RM may not send the app-finished signal after RM restart to some nodes where 
 the application ran before RM restarts
 ---

 Key: YARN-1885
 URL: https://issues.apache.org/jira/browse/YARN-1885
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Wangda Tan
 Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, 
 YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, 
 YARN-1885.patch


 During our HA testing we have seen cases where yarn application logs are not 
 available through the cli but i can look at AM logs through the UI. RM was 
 also being restarted in the background as the application was running.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2159) allocateContainer() in SchedulerNode needs a clearer LOG.info message