date:20130924

[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator

2013-09-24 Thread Wei Yan (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wei Yan updated YARN-1021:
--

Attachment: YARN-1021.patch

Yarn Scheduler Load Simulator
-

Key: YARN-1021
URL: https://issues.apache.org/jira/browse/YARN-1021
Project: Hadoop YARN
Issue Type: New Feature
Components: scheduler
Reporter: Wei Yan
Assignee: Wei Yan
Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz,
YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch,
YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch,
YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch,
YARN-1021.pdf, YARN-1021.pdf

The Yarn Scheduler is a fertile area of interest with different
implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile,
several optimizations are also made to improve scheduler performance for
different scenarios and workload. Each scheduler algorithm has its own set of
features, and drives scheduling decisions by many factors, such as fairness,
capacity guarantee, resource availability, etc. It is very important to
evaluate a scheduler algorithm very well before we deploy it in a production
cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling
algorithm. Evaluating in a real cluster is always time and cost consuming,
and it is also very hard to find a large-enough cluster. Hence, a simulator
which can predict how well a scheduler algorithm for some specific workload
would be quite useful.
We want to build a Scheduler Load Simulator to simulate large-scale Yarn
clusters and application loads in a single machine. This would be invaluable
in furthering Yarn by providing a tool for researchers and developers to
prototype new scheduler features and predict their behavior and performance
with reasonable amount of confidence, there-by aiding rapid innovation.
The simulator will exercise the real Yarn ResourceManager removing the
network factor by simulating NodeManagers and ApplicationMasters via handling
and dispatching NM/AMs heartbeat events from within the same JVM.
To keep tracking of scheduler behavior and performance, a scheduler wrapper
will wrap the real scheduler.
The simulator will produce real time metrics while executing, including:
* Resource usages for whole cluster and each queue, which can be utilized to
configure cluster and queue's capacity.
* The detailed application execution trace (recorded in relation to simulated
time), which can be analyzed to understand/validate the scheduler behavior
(individual jobs turn around time, throughput, fairness, capacity guarantee,
etc).
* Several key metrics of scheduler algorithm, such as time cost of each
scheduler operation (allocate, handle, etc), which can be utilized by Hadoop
developers to find the code spots and scalability limits.
The simulator will provide real time charts showing the behavior of the
scheduler and its performance.
A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing
how to use simulator to simulate Fair Scheduler and Capacity Scheduler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator

2013-09-24 Thread Wei Yan (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wei Yan updated YARN-1021:
--

Attachment: YARN-1021.pdf

Yarn Scheduler Load Simulator
-

[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator

2013-09-24 Thread Wei Yan (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wei Yan updated YARN-1021:
--

Attachment: (was: YARN-1021.pdf)

Yarn Scheduler Load Simulator
-

[jira] [Updated] (YARN-1226) Inconsistent hostname leads to low data locality

2013-09-24 Thread Kaibo Zhou (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kaibo Zhou updated YARN-1226:
-

Summary: Inconsistent hostname leads to low data locality (was:
Inconsistent hostname leads to poor data locality)

Inconsistent hostname leads to low data locality

Key: YARN-1226
URL: https://issues.apache.org/jira/browse/YARN-1226
Project: Hadoop YARN
Issue Type: Improvement
Components: capacityscheduler
Affects Versions: 0.23.3, 2.0.0-alpha, 2.1.0-beta
Reporter: Kaibo Zhou

When I run a mapreduce job which use TableInputFormat to scan a hbase table
on yarn cluser with 140+ nodes, I consistently get very low data locality
around 0~10%.
The scheduler is Capacity Scheduler. Hbase and hadoop are integrated in the
cluster with NodeManager, DataNode and HRegionServer run on the same node.
The reason of low data locality is: most machines in the cluster uses IPV6,
few machines use IPV4. NodeManager use
InetAddress.getLocalHost().getHostName() to get the host name, but the
return result of this function depends on IPV4 or IPV6, see
[InetAddress.getLocalHost().getHostName() returns
FQDN|http://bugs.sun.com/view_bug.do?bug_id=7166687].
On machines with ipv4, NodeManager get hostName as:
search042097.sqa.cm4.site.net
But on machines with ipv6, NodeManager get hostName as: search042097.sqa.cm4
if run with IPv6 disabled, -Djava.net.preferIPv4Stack=true, then returns
search042097.sqa.cm4.site.net.

For the mapred job which scan hbase table, the InputSplit contains node
locations of [FQDN|http://en.wikipedia.org/wiki/FQDN], e.g.
search042097.sqa.cm4.site.net. Because in hbase, the RegionServers' hostnames
are allocated by HMaster. HMaster communicate with RegionServers and get the
region server's host name use java NIO:
clientChannel.socket().getInetAddress().getHostName().
Also see the startup log of region server:
13:06:21,200 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Master
passed us hostname to use. Was=search042024.sqa.cm4,
Now=search042024.sqa.cm4.site.net

As you can see, most machines in the Yarn cluster with IPV6 get the short
hostname, but hbase always get the full hostname, so the Host cannot matched
(see RMContainerAllocator::assignToMap).This can lead to poor locality.
After I use java.net.preferIPv4Stack to force IPv4 in yarn, I get 70+% data
locality in the cluster.
Thanks,
Kaibo

[jira] [Updated] (YARN-1226) Inconsistent hostname leads to poor data locality

2013-09-24 Thread Kaibo Zhou (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kaibo Zhou updated YARN-1226:
-

Summary: Inconsistent hostname leads to poor data locality (was: ipv4 and
ipv6 lead to poor data locality)

Inconsistent hostname leads to poor data locality
-

[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776072#comment-13776072
 ] 

Siddharth Seth commented on YARN-1229:
--

I'm in favour of renaming the shuffle service id as well, and enforcing 
constraints on the names. Shell parameters apparently have name restrictions - 
http://stackoverflow.com/questions/2821043/allowed-characters-in-linux-environment-variable-names
 has some links to standards. Setting aux-service name restrictions based on 
shell name restrictions seems ok to me.

This is an incompatible change though. Sites which have Hadoop 2 (or 0.23) 
deployed would need to change their configs to reflect the shuffle service name 
update. (The shuffleService isn't started when using the default hadoop 
configuration files).

An alternate could be to use base32 encoding for the service name - but would 
prefer not going there.

 Shell$ExitCodeException could happen if AM fails to start
 -

 Key: YARN-1229
 URL: https://issues.apache.org/jira/browse/YARN-1229
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.1.1-beta


 I run sleep job. If AM fails to start, this exception could occur:
 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
 state FAILED due to: Application application_1379673267098_0020 failed 1 
 times due to AM Container for appattempt_1379673267098_0020_01 exited 
 with  exitCode: 1 due to: Exception from container-launch:
 org.apache.hadoop.util.Shell$ExitCodeException: 
 /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
  line 12: export: 
 `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
 ': not a valid identifier
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
 at org.apache.hadoop.util.Shell.run(Shell.java:379)
 at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
 at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1156) Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values

2013-09-24 Thread Akira AJISAKA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-1156:


Assignee: Tsuyoshi OZAWA

 Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values
 -

 Key: YARN-1156
 URL: https://issues.apache.org/jira/browse/YARN-1156
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.1.0-beta
Reporter: Akira AJISAKA
Assignee: Tsuyoshi OZAWA
Priority: Minor
  Labels: metrics, newbie
 Fix For: 2.3.0

 Attachments: YARN-1156.1.patch


 AllocatedGB and AvailableGB metrics are now integer type. If there are four 
 times 500MB memory allocation to container, AllocatedGB is incremented four 
 times by {{(int)500/1024}}, which means 0. That is, the memory size allocated 
 is actually 2000MB, but the metrics shows 0GB. Let's use float type for these 
 metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again

2013-09-24 Thread nijel (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776113#comment-13776113
 ] 

nijel commented on YARN-90:
---

To handle this we can check the failed dirs first in 
DirectoryCollection.checkDirs() and add back to localDirs if the directories 
are recovered from error.


 NodeManager should identify failed disks becoming good back again
 -

 Key: YARN-90
 URL: https://issues.apache.org/jira/browse/YARN-90
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Ravi Gummadi

 MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
 down, it is marked as failed forever. To reuse that disk (after it becomes 
 good), NodeManager needs restart. This JIRA is to improve NodeManager to 
 reuse good disks(which could be bad some time back).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1226) Inconsistent hostname leads to low data locality

2013-09-24 Thread Steve Loughran (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776128#comment-13776128
]

Steve Loughran commented on YARN-1226:
--

Right now Dont use IPv6 is one of those installation rules:
[http://wiki.apache.org/hadoop/HadoopIPv6], precisely because of issues w/ IPv6
in Java.

Now, if there are some bits of code that could be changed to make things work
slightly better they'd be welcome, but right now the focus is on IPv4 -if this
is an IPv6 problem it's going to get low priority

Inconsistent hostname leads to low data locality

[jira] [Updated] (YARN-1226) Inconsistent hostname leads to low data locality on IPv6 hosts

2013-09-24 Thread Steve Loughran (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Steve Loughran updated YARN-1226:
-

Environment: Linux, IPv6
Summary: Inconsistent hostname leads to low data locality on IPv6 hosts
(was: Inconsistent hostname leads to low data locality)

Inconsistent hostname leads to low data locality on IPv6 hosts
--

[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator

2013-09-24 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776142#comment-13776142
]

Hadoop QA commented on YARN-1021:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12604747/YARN-1021.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 11 new
or modified test files.

{color:red}-1 javac{color}. The applied patch generated 1149 javac
compiler warnings (more than the trunk's current 1145 warnings).

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 core tests{color}. The patch passed unit tests in
hadoop-assemblies hadoop-tools/hadoop-sls hadoop-tools/hadoop-tools-dist.

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/1995//testReport/
Javac warnings:
https://builds.apache.org/job/PreCommit-YARN-Build/1995//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1995//console

This message is automatically generated.

Yarn Scheduler Load Simulator
-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more

[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator

2013-09-24 Thread Wei Yan (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wei Yan updated YARN-1021:
--

Attachment: YARN-1021.patch

Yarn Scheduler Load Simulator
-

[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator

2013-09-24 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776173#comment-13776173
]

Hadoop QA commented on YARN-1021:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12604767/YARN-1021.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 11 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-assemblies hadoop-tools/hadoop-sls hadoop-tools/hadoop-tools-dist:

org.apache.hadoop.yarn.sls.TestSLSRunner

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/1996//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1996//console

This message is automatically generated.

Yarn Scheduler Load Simulator
-

[jira] [Updated] (YARN-1231) Fix test cases that will hit max- am-used-resources-percent limit after YARN-276

2013-09-24 Thread Nemon Lou (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated YARN-1231:


Attachment: YARN-1231.patch

A patch fixing test cases in hadoop-yarn-server-resourcemanager project.

 Fix test cases that will hit max- am-used-resources-percent limit after 
 YARN-276
 

 Key: YARN-1231
 URL: https://issues.apache.org/jira/browse/YARN-1231
 Project: Hadoop YARN
  Issue Type: Task
Affects Versions: 2.1.1-beta
Reporter: Nemon Lou
Assignee: Nemon Lou
  Labels: test
 Attachments: YARN-1231.patch


 Use a separate jira to fix YARN's test cases that will fail by hitting max- 
 am-used-resources-percent limit after YARN-276.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1231) Fix test cases that will hit max- am-used-resources-percent limit after YARN-276

2013-09-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776302#comment-13776302
 ] 

Hadoop QA commented on YARN-1231:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12604791/YARN-1231.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1997//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1997//console

This message is automatically generated.

 Fix test cases that will hit max- am-used-resources-percent limit after 
 YARN-276
 

 Key: YARN-1231
 URL: https://issues.apache.org/jira/browse/YARN-1231
 Project: Hadoop YARN
  Issue Type: Task
Affects Versions: 2.1.1-beta
Reporter: Nemon Lou
Assignee: Nemon Lou
  Labels: test
 Attachments: YARN-1231.patch


 Use a separate jira to fix YARN's test cases that will fail by hitting max- 
 am-used-resources-percent limit after YARN-276.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator

2013-09-24 Thread Wei Yan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-1021:
--

Attachment: YARN-1021.patch

 Yarn Scheduler Load Simulator
 -

 Key: YARN-1021
 URL: https://issues.apache.org/jira/browse/YARN-1021
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf


 The Yarn Scheduler is a fertile area of interest with different 
 implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
 several optimizations are also made to improve scheduler performance for 
 different scenarios and workload. Each scheduler algorithm has its own set of 
 features, and drives scheduling decisions by many factors, such as fairness, 
 capacity guarantee, resource availability, etc. It is very important to 
 evaluate a scheduler algorithm very well before we deploy it in a production 
 cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
 algorithm. Evaluating in a real cluster is always time and cost consuming, 
 and it is also very hard to find a large-enough cluster. Hence, a simulator 
 which can predict how well a scheduler algorithm for some specific workload 
 would be quite useful.
 We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
 clusters and application loads in a single machine. This would be invaluable 
 in furthering Yarn by providing a tool for researchers and developers to 
 prototype new scheduler features and predict their behavior and performance 
 with reasonable amount of confidence, there-by aiding rapid innovation.
 The simulator will exercise the real Yarn ResourceManager removing the 
 network factor by simulating NodeManagers and ApplicationMasters via handling 
 and dispatching NM/AMs heartbeat events from within the same JVM.
 To keep tracking of scheduler behavior and performance, a scheduler wrapper 
 will wrap the real scheduler.
 The simulator will produce real time metrics while executing, including:
 * Resource usages for whole cluster and each queue, which can be utilized to 
 configure cluster and queue's capacity.
 * The detailed application execution trace (recorded in relation to simulated 
 time), which can be analyzed to understand/validate the  scheduler behavior 
 (individual jobs turn around time, throughput, fairness, capacity guarantee, 
 etc).
 * Several key metrics of scheduler algorithm, such as time cost of each 
 scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
 developers to find the code spots and scalability limits.
 The simulator will provide real time charts showing the behavior of the 
 scheduler and its performance.
 A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
 how to use simulator to simulate Fair Scheduler and Capacity Scheduler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776350#comment-13776350
 ] 

Chris Nauroth commented on YARN-1229:
-

BTW, if we use {{[a-zA-Z_]+[a-zA-Z0-9_]*}}, then that will be compatible with 
Windows too.  It looks like Windows actually allows many more characters than 
that, but I think it makes sense to stick to a minimal set that we expect to 
work cross-platform.

 Shell$ExitCodeException could happen if AM fails to start
 -

 Key: YARN-1229
 URL: https://issues.apache.org/jira/browse/YARN-1229
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.1.1-beta


 I run sleep job. If AM fails to start, this exception could occur:
 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
 state FAILED due to: Application application_1379673267098_0020 failed 1 
 times due to AM Container for appattempt_1379673267098_0020_01 exited 
 with  exitCode: 1 due to: Exception from container-launch:
 org.apache.hadoop.util.Shell$ExitCodeException: 
 /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
  line 12: export: 
 `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
 ': not a valid identifier
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
 at org.apache.hadoop.util.Shell.run(Shell.java:379)
 at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
 at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator

2013-09-24 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776357#comment-13776357
]

Hadoop QA commented on YARN-1021:
-

{color:green}+1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12604801/YARN-1021.patch
against trunk revision .