[jira] [Commented] (YARN-1226) Inconsistent hostname leads to low data locality on IPv6 hosts

2014-07-12 Thread Dr. Martin Menzel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060011#comment-14060011
 ] 

Dr. Martin Menzel commented on YARN-1226:
-

I extended the Test class given on the bug page of oracle to

import java.net.InetAddress;
import java.net.UnknownHostException;

public class HostNameTest {

public static void main(String[] args) throws UnknownHostException {
String version = System.getProperty("java.version");
System.out.println(version + " getHostName(): " + 
InetAddress.getLocalHost().getHostName());
System.out.println(version + " getCanonicalHostName() : 
"+InetAddress.getLocalHost().getCanonicalHostName() );
System.out.println(version + " www.google.com.getHostName(): " 
+ InetAddress.getByName("www.google.com").getHostName());
}
}


If I use getCanonicalHostName() instead of getHostName() I get on ipv6 and ipv4 
based hosts the FQDN.

--

May I ask the community why in such cases the hostname is used instead of the 
always unique IP address directly? I think round robin DNS IP assignments to 
the same name will not be relevant for hadoop nodes (or am I wrong here?).



Most of the hadoop programs running on datanodes are started by script using 
the entries in 

etc/hadoop/slaves

Using those entries as a common basis we could just ask the (forward) DNS which 
slave entry matches the local IP address. This mapping could be cached for 
further use.

> Inconsistent hostname leads to low data locality on IPv6 hosts
> --
>
> Key: YARN-1226
> URL: https://issues.apache.org/jira/browse/YARN-1226
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 0.23.3, 2.0.0-alpha, 2.1.0-beta
> Environment: Linux, IPv6
>Reporter: Kaibo Zhou
>
> When I run a mapreduce job which use TableInputFormat to scan a hbase table 
> on yarn cluser with 140+ nodes, I consistently get very low data locality 
> around 0~10%. 
> The scheduler is Capacity Scheduler. Hbase and hadoop are integrated in the 
> cluster with NodeManager, DataNode and HRegionServer run on the same node.
> The reason of low data locality is: most machines in the cluster uses IPV6, 
> few machines use IPV4. NodeManager use 
> "InetAddress.getLocalHost().getHostName()" to get the host name, but the 
> return result of this function depends on IPV4 or IPV6, see 
> ["InetAddress.getLocalHost().getHostName() returns 
> FQDN"|http://bugs.sun.com/view_bug.do?bug_id=7166687]. 
> On machines with ipv4, NodeManager get hostName as: 
> search042097.sqa.cm4.site.net
> But on machines with ipv6, NodeManager get hostName as: search042097.sqa.cm4
> if run with IPv6 disabled, -Djava.net.preferIPv4Stack=true, then returns 
> search042097.sqa.cm4.site.net.
> 
> For the mapred job which scan hbase table, the InputSplit contains node 
> locations of [FQDN|http://en.wikipedia.org/wiki/FQDN], e.g. 
> search042097.sqa.cm4.site.net. Because in hbase, the RegionServers' hostnames 
> are allocated by HMaster. HMaster communicate with RegionServers and get the 
> region server's host name use java NIO: 
> clientChannel.socket().getInetAddress().getHostName().
> Also see the startup log of region server:
> 13:06:21,200 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Master 
> passed us hostname to use. Was=search042024.sqa.cm4, 
> Now=search042024.sqa.cm4.site.net
> 
> As you can see, most machines in the Yarn cluster with IPV6 get the short 
> hostname, but hbase always get the full hostname, so the Host cannot matched 
> (see RMContainerAllocator::assignToMap).This can lead to poor locality.
> After I use java.net.preferIPv4Stack to force IPv4 in yarn, I get 70+% data 
> locality in the cluster.
> Thanks,
> Kaibo



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2254) change TestRMWebServicesAppsModification to support FairScheduler.

2014-07-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1406#comment-1406
 ] 

Hadoop QA commented on YARN-2254:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655433/YARN-2254.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4288//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4288//console

This message is automatically generated.

> change TestRMWebServicesAppsModification to support FairScheduler.
> --
>
> Key: YARN-2254
> URL: https://issues.apache.org/jira/browse/YARN-2254
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
>  Labels: test
> Attachments: YARN-2254.000.patch, YARN-2254.001.patch
>
>
> TestRMWebServicesAppsModification skips the test, if the scheduler is not 
> CapacityScheduler.
> change TestRMWebServicesAppsModification to support both CapacityScheduler 
> and FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2254) change TestRMWebServicesAppsModification to support FairScheduler.

2014-07-12 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059990#comment-14059990
 ] 

zhihai xu commented on YARN-2254:
-

Karthik- Sorry, I am busy last several days. I get time to work on this issue 
today.
Your suggestion is really good, I already see the benefits now.
A new test (testAppSubmit) is added recently(after I submit patch on 07/04).
This new test(testAppSubmit) is failed with FairScheduler at 
"assertEquals(queueName, app.getQueue());".
So if someone test  TestRMWebServicesAppsModification with Fair scheduler as 
default scheduler,  it will fail at testAppSubmit.
Now I fix this issue in my new patch YARN-2254.001.patch.
Also based on your suggestion. in my new patch YARN-2254.001.patch, I changed 
TestRMWebServicesAppsModification to make the test parametrized and run it on 
both CapacityScheduler and FairScheduler.
In the future, if someone add new test cases to 
TestRMWebServicesAppsModification, same problem won't happen any more, because 
this unit test will run on FairScheduler also.

> change TestRMWebServicesAppsModification to support FairScheduler.
> --
>
> Key: YARN-2254
> URL: https://issues.apache.org/jira/browse/YARN-2254
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
>  Labels: test
> Attachments: YARN-2254.000.patch, YARN-2254.001.patch
>
>
> TestRMWebServicesAppsModification skips the test, if the scheduler is not 
> CapacityScheduler.
> change TestRMWebServicesAppsModification to support both CapacityScheduler 
> and FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2254) change TestRMWebServicesAppsModification to support FairScheduler.

2014-07-12 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2254:


Attachment: YARN-2254.001.patch

> change TestRMWebServicesAppsModification to support FairScheduler.
> --
>
> Key: YARN-2254
> URL: https://issues.apache.org/jira/browse/YARN-2254
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
>  Labels: test
> Attachments: YARN-2254.000.patch, YARN-2254.001.patch
>
>
> TestRMWebServicesAppsModification skips the test, if the scheduler is not 
> CapacityScheduler.
> change TestRMWebServicesAppsModification to support both CapacityScheduler 
> and FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2242) Improve exception information on AM launch crashes

2014-07-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059891#comment-14059891
 ] 

Hadoop QA commented on YARN-2242:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12655411/YARN-2242-071214.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4287//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4287//console

This message is automatically generated.

> Improve exception information on AM launch crashes
> --
>
> Key: YARN-2242
> URL: https://issues.apache.org/jira/browse/YARN-2242
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Fix For: 2.6.0
>
> Attachments: YARN-2242-070115-2.patch, YARN-2242-070814-1.patch, 
> YARN-2242-070814.patch, YARN-2242-071114.patch, YARN-2242-071214.patch
>
>
> Now on each time AM Container crashes during launch, both the console and the 
> webpage UI only report a ShellExitCodeExecption. This is not only unhelpful, 
> but sometimes confusing. With the help of log aggregator, container logs are 
> actually aggregated, and can be very helpful for debugging. One possible way 
> to improve the whole process is to send a "pointer" to the aggregated logs to 
> the programmer when reporting exception information. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2242) Improve exception information on AM launch crashes

2014-07-12 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2242:


Attachment: YARN-2242-071214.patch

Thanks for [~djp]'s comment! I've addressed these issue in the latest patch. 

> Improve exception information on AM launch crashes
> --
>
> Key: YARN-2242
> URL: https://issues.apache.org/jira/browse/YARN-2242
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Fix For: 2.6.0
>
> Attachments: YARN-2242-070115-2.patch, YARN-2242-070814-1.patch, 
> YARN-2242-070814.patch, YARN-2242-071114.patch, YARN-2242-071214.patch
>
>
> Now on each time AM Container crashes during launch, both the console and the 
> webpage UI only report a ShellExitCodeExecption. This is not only unhelpful, 
> but sometimes confusing. With the help of log aggregator, container logs are 
> actually aggregated, and can be very helpful for debugging. One possible way 
> to improve the whole process is to send a "pointer" to the aggregated logs to 
> the programmer when reporting exception information. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1050) Document the Fair Scheduler REST API

2014-07-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059838#comment-14059838
 ] 

Hadoop QA commented on YARN-1050:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655406/YARN-1050-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4286//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4286//console

This message is automatically generated.

> Document the Fair Scheduler REST API
> 
>
> Key: YARN-1050
> URL: https://issues.apache.org/jira/browse/YARN-1050
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sandy Ryza
>Assignee: Kenji Kikushima
> Attachments: YARN-1050-2.patch, YARN-1050.patch
>
>
> The documentation should be placed here along with the Capacity Scheduler 
> documentation: 
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1050) Document the Fair Scheduler REST API

2014-07-12 Thread Kenji Kikushima (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenji Kikushima updated YARN-1050:
--

Attachment: YARN-1050-2.patch

Rebased on trunk. Thanks for testing, [~ajisakaa]!

> Document the Fair Scheduler REST API
> 
>
> Key: YARN-1050
> URL: https://issues.apache.org/jira/browse/YARN-1050
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sandy Ryza
>Assignee: Kenji Kikushima
> Attachments: YARN-1050-2.patch, YARN-1050.patch
>
>
> The documentation should be placed here along with the Capacity Scheduler 
> documentation: 
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2274) FairScheduler: Add debug information about cluster capacity, availability and reservations

2014-07-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059821#comment-14059821
 ] 

Hudson commented on YARN-2274:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5875 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5875/])
YARN-2274. FairScheduler: Add debug information about cluster capacity, 
availability and reservations. (kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1609942)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


> FairScheduler: Add debug information about cluster capacity, availability and 
> reservations
> --
>
> Key: YARN-2274
> URL: https://issues.apache.org/jira/browse/YARN-2274
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.4.1
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Trivial
> Fix For: 2.6.0
>
> Attachments: yarn-2274-1.patch, yarn-2274-2.patch, yarn-2274-3.patch
>
>
> FairScheduler logs have little information on cluster capacity and 
> availability. Need this information to debug production issues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2229) ContainerId can overflow with RM restart

2014-07-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059774#comment-14059774
 ] 

Hadoop QA commented on YARN-2229:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655378/YARN-2229.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1297 javac 
compiler warnings (more than the trunk's current 1258 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.util.TestFSDownload

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4285//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4285//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4285//console

This message is automatically generated.

> ContainerId can overflow with RM restart
> 
>
> Key: YARN-2229
> URL: https://issues.apache.org/jira/browse/YARN-2229
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2229.1.patch, YARN-2229.2.patch, YARN-2229.2.patch, 
> YARN-2229.3.patch, YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, 
> YARN-2229.7.patch, YARN-2229.8.patch
>
>
> On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
> lower 22 bits are for sequence number of Ids. This is for preserving 
> semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
> {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
> {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
> restarts 1024 times.
> To avoid the problem, its better to make containerId long. We need to define 
> the new format of container Id with preserving backward compatibility on this 
> JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059754#comment-14059754
 ] 

Hadoop QA commented on YARN-1408:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655374/Yarn-1408.10.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4284//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4284//console

This message is automatically generated.

> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.10.patch, 
> Yarn-1408.2.patch, Yarn-1408.3.patch, Yarn-1408.4.patch, Yarn-1408.5.patch, 
> Yarn-1408.6.patch, Yarn-1408.7.patch, Yarn-1408.8.patch, Yarn-1408.9.patch, 
> Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2229) ContainerId can overflow with RM restart

2014-07-12 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2229:
-

Attachment: YARN-2229.8.patch

[~jianhe], thank you for review. Updated a patch:
* Renamed setId(long id) as setContainerId().
* Made the epoch type to long.
* Changed to use ComparisonChain to compare containerId, because we cannot 
return long value. 

{quote}
Or we may actually have a separately jira to rename this with new method name.
{quote}

Yes, I agree with changing to use {{getContainerId}} as {{getId}}, because the 
changes can be large. 

> ContainerId can overflow with RM restart
> 
>
> Key: YARN-2229
> URL: https://issues.apache.org/jira/browse/YARN-2229
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2229.1.patch, YARN-2229.2.patch, YARN-2229.2.patch, 
> YARN-2229.3.patch, YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, 
> YARN-2229.7.patch, YARN-2229.8.patch
>
>
> On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
> lower 22 bits are for sequence number of Ids. This is for preserving 
> semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
> {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
> {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
> restarts 1024 times.
> To avoid the problem, its better to make containerId long. We need to define 
> the new format of container Id with preserving backward compatibility on this 
> JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2242) Improve exception information on AM launch crashes

2014-07-12 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059730#comment-14059730
 ] 

Junping Du commented on YARN-2242:
--

[~gtCarrera9], thanks for delivering an update here. The patch get closer.
A little logical mistake here:
{code}
+Boolean shouldCheckURL = (null == applicationAttempt.getTrackingUrl());
{code}
should be:
{code}
boolean shouldCheckURL = (applicationAttempt.getTrackingUrl() != null);
{code}
In addition, we usually use boolean instead of Boolean (the object for boolean) 
to save cost of box (except we need to put boolean value to some object 
container, like List).

For my 1st comments above, I mean to replace StringBuilder.append(string1 + 
string2 +...) with StringBuilder.append(string1).append(string2)... to save the 
cost on joint of string given we already have StringBuilder here: 
e.g.
{code}
diagnosticsBuilder.append("AM Container for "
+ finishEvent.getApplicationAttemptId()
+ " exited with " + " exitCode: " + status.getExitStatus() + "\n");
if (this.getTrackingUrl() != null) {
  diagnosticsBuilder.append("For more detailed output,"
+ " check application tracking page:\n" + this.getTrackingUrl() + "\n"
+ "Then, click on links to logs of each attempt.\n");
}
diagnosticsBuilder.append("\nDiagnostics: " + status.getDiagnostics()
+ "Failing this attempt");
{code}
can be updated to following code (with removing unnecessary \n):
{code}
diagnosticsBuilder.append("AM Container for ").append(
finishEvent.getApplicationAttemptId()).append(
" exited with ").append(" exitCode: ").append(status.getExitStatus()).
append("\n");
if (this.getTrackingUrl() != null) {
  diagnosticsBuilder.append("For more detailed output,").append(
" check application tracking page:").append(
this.getTrackingUrl()).append(
"Then, click on links to logs of each attempt.\n");
}
diagnosticsBuilder.append("Diagnostics: ").append(status.getDiagnostics())
.append("Failing this attempt");
{code}


> Improve exception information on AM launch crashes
> --
>
> Key: YARN-2242
> URL: https://issues.apache.org/jira/browse/YARN-2242
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Fix For: 2.6.0
>
> Attachments: YARN-2242-070115-2.patch, YARN-2242-070814-1.patch, 
> YARN-2242-070814.patch, YARN-2242-071114.patch
>
>
> Now on each time AM Container crashes during launch, both the console and the 
> webpage UI only report a ShellExitCodeExecption. This is not only unhelpful, 
> but sometimes confusing. With the help of log aggregator, container logs are 
> actually aggregated, and can be very helpful for debugging. One possible way 
> to improve the whole process is to send a "pointer" to the aggregated logs to 
> the programmer when reporting exception information. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-12 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-1408:
--

Attachment: (was: Yarn-1408.10.patch)

> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.10.patch, 
> Yarn-1408.2.patch, Yarn-1408.3.patch, Yarn-1408.4.patch, Yarn-1408.5.patch, 
> Yarn-1408.6.patch, Yarn-1408.7.patch, Yarn-1408.8.patch, Yarn-1408.9.patch, 
> Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-12 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-1408:
--

Attachment: Yarn-1408.10.patch

> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.10.patch, 
> Yarn-1408.2.patch, Yarn-1408.3.patch, Yarn-1408.4.patch, Yarn-1408.5.patch, 
> Yarn-1408.6.patch, Yarn-1408.7.patch, Yarn-1408.8.patch, Yarn-1408.9.patch, 
> Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)