[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-1021: -- Attachment: YARN-1021.patch Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-1021: -- Attachment: YARN-1021.pdf Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-1021: -- Attachment: (was: YARN-1021.pdf) Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1226) Inconsistent hostname leads to low data locality
[ https://issues.apache.org/jira/browse/YARN-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaibo Zhou updated YARN-1226: - Summary: Inconsistent hostname leads to low data locality (was: Inconsistent hostname leads to poor data locality) Inconsistent hostname leads to low data locality Key: YARN-1226 URL: https://issues.apache.org/jira/browse/YARN-1226 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 0.23.3, 2.0.0-alpha, 2.1.0-beta Reporter: Kaibo Zhou When I run a mapreduce job which use TableInputFormat to scan a hbase table on yarn cluser with 140+ nodes, I consistently get very low data locality around 0~10%. The scheduler is Capacity Scheduler. Hbase and hadoop are integrated in the cluster with NodeManager, DataNode and HRegionServer run on the same node. The reason of low data locality is: most machines in the cluster uses IPV6, few machines use IPV4. NodeManager use InetAddress.getLocalHost().getHostName() to get the host name, but the return result of this function depends on IPV4 or IPV6, see [InetAddress.getLocalHost().getHostName() returns FQDN|http://bugs.sun.com/view_bug.do?bug_id=7166687]. On machines with ipv4, NodeManager get hostName as: search042097.sqa.cm4.site.net But on machines with ipv6, NodeManager get hostName as: search042097.sqa.cm4 if run with IPv6 disabled, -Djava.net.preferIPv4Stack=true, then returns search042097.sqa.cm4.site.net. For the mapred job which scan hbase table, the InputSplit contains node locations of [FQDN|http://en.wikipedia.org/wiki/FQDN], e.g. search042097.sqa.cm4.site.net. Because in hbase, the RegionServers' hostnames are allocated by HMaster. HMaster communicate with RegionServers and get the region server's host name use java NIO: clientChannel.socket().getInetAddress().getHostName(). Also see the startup log of region server: 13:06:21,200 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us hostname to use. Was=search042024.sqa.cm4, Now=search042024.sqa.cm4.site.net As you can see, most machines in the Yarn cluster with IPV6 get the short hostname, but hbase always get the full hostname, so the Host cannot matched (see RMContainerAllocator::assignToMap).This can lead to poor locality. After I use java.net.preferIPv4Stack to force IPv4 in yarn, I get 70+% data locality in the cluster. Thanks, Kaibo -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1226) Inconsistent hostname leads to poor data locality
[ https://issues.apache.org/jira/browse/YARN-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaibo Zhou updated YARN-1226: - Summary: Inconsistent hostname leads to poor data locality (was: ipv4 and ipv6 lead to poor data locality) Inconsistent hostname leads to poor data locality - Key: YARN-1226 URL: https://issues.apache.org/jira/browse/YARN-1226 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 0.23.3, 2.0.0-alpha, 2.1.0-beta Reporter: Kaibo Zhou When I run a mapreduce job which use TableInputFormat to scan a hbase table on yarn cluser with 140+ nodes, I consistently get very low data locality around 0~10%. The scheduler is Capacity Scheduler. Hbase and hadoop are integrated in the cluster with NodeManager, DataNode and HRegionServer run on the same node. The reason of low data locality is: most machines in the cluster uses IPV6, few machines use IPV4. NodeManager use InetAddress.getLocalHost().getHostName() to get the host name, but the return result of this function depends on IPV4 or IPV6, see [InetAddress.getLocalHost().getHostName() returns FQDN|http://bugs.sun.com/view_bug.do?bug_id=7166687]. On machines with ipv4, NodeManager get hostName as: search042097.sqa.cm4.site.net But on machines with ipv6, NodeManager get hostName as: search042097.sqa.cm4 if run with IPv6 disabled, -Djava.net.preferIPv4Stack=true, then returns search042097.sqa.cm4.site.net. For the mapred job which scan hbase table, the InputSplit contains node locations of [FQDN|http://en.wikipedia.org/wiki/FQDN], e.g. search042097.sqa.cm4.site.net. Because in hbase, the RegionServers' hostnames are allocated by HMaster. HMaster communicate with RegionServers and get the region server's host name use java NIO: clientChannel.socket().getInetAddress().getHostName(). Also see the startup log of region server: 13:06:21,200 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us hostname to use. Was=search042024.sqa.cm4, Now=search042024.sqa.cm4.site.net As you can see, most machines in the Yarn cluster with IPV6 get the short hostname, but hbase always get the full hostname, so the Host cannot matched (see RMContainerAllocator::assignToMap).This can lead to poor locality. After I use java.net.preferIPv4Stack to force IPv4 in yarn, I get 70+% data locality in the cluster. Thanks, Kaibo -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776072#comment-13776072 ] Siddharth Seth commented on YARN-1229: -- I'm in favour of renaming the shuffle service id as well, and enforcing constraints on the names. Shell parameters apparently have name restrictions - http://stackoverflow.com/questions/2821043/allowed-characters-in-linux-environment-variable-names has some links to standards. Setting aux-service name restrictions based on shell name restrictions seems ok to me. This is an incompatible change though. Sites which have Hadoop 2 (or 0.23) deployed would need to change their configs to reflect the shuffle service name update. (The shuffleService isn't started when using the default hadoop configuration files). An alternate could be to use base32 encoding for the service name - but would prefer not going there. Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.1-beta I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1156) Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values
[ https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-1156: Assignee: Tsuyoshi OZAWA Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values - Key: YARN-1156 URL: https://issues.apache.org/jira/browse/YARN-1156 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.1.0-beta Reporter: Akira AJISAKA Assignee: Tsuyoshi OZAWA Priority: Minor Labels: metrics, newbie Fix For: 2.3.0 Attachments: YARN-1156.1.patch AllocatedGB and AvailableGB metrics are now integer type. If there are four times 500MB memory allocation to container, AllocatedGB is incremented four times by {{(int)500/1024}}, which means 0. That is, the memory size allocated is actually 2000MB, but the metrics shows 0GB. Let's use float type for these metrics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again
[ https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776113#comment-13776113 ] nijel commented on YARN-90: --- To handle this we can check the failed dirs first in DirectoryCollection.checkDirs() and add back to localDirs if the directories are recovered from error. NodeManager should identify failed disks becoming good back again - Key: YARN-90 URL: https://issues.apache.org/jira/browse/YARN-90 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Ravi Gummadi MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes down, it is marked as failed forever. To reuse that disk (after it becomes good), NodeManager needs restart. This JIRA is to improve NodeManager to reuse good disks(which could be bad some time back). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1226) Inconsistent hostname leads to low data locality
[ https://issues.apache.org/jira/browse/YARN-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776128#comment-13776128 ] Steve Loughran commented on YARN-1226: -- Right now Dont use IPv6 is one of those installation rules: [http://wiki.apache.org/hadoop/HadoopIPv6], precisely because of issues w/ IPv6 in Java. Now, if there are some bits of code that could be changed to make things work slightly better they'd be welcome, but right now the focus is on IPv4 -if this is an IPv6 problem it's going to get low priority Inconsistent hostname leads to low data locality Key: YARN-1226 URL: https://issues.apache.org/jira/browse/YARN-1226 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 0.23.3, 2.0.0-alpha, 2.1.0-beta Reporter: Kaibo Zhou When I run a mapreduce job which use TableInputFormat to scan a hbase table on yarn cluser with 140+ nodes, I consistently get very low data locality around 0~10%. The scheduler is Capacity Scheduler. Hbase and hadoop are integrated in the cluster with NodeManager, DataNode and HRegionServer run on the same node. The reason of low data locality is: most machines in the cluster uses IPV6, few machines use IPV4. NodeManager use InetAddress.getLocalHost().getHostName() to get the host name, but the return result of this function depends on IPV4 or IPV6, see [InetAddress.getLocalHost().getHostName() returns FQDN|http://bugs.sun.com/view_bug.do?bug_id=7166687]. On machines with ipv4, NodeManager get hostName as: search042097.sqa.cm4.site.net But on machines with ipv6, NodeManager get hostName as: search042097.sqa.cm4 if run with IPv6 disabled, -Djava.net.preferIPv4Stack=true, then returns search042097.sqa.cm4.site.net. For the mapred job which scan hbase table, the InputSplit contains node locations of [FQDN|http://en.wikipedia.org/wiki/FQDN], e.g. search042097.sqa.cm4.site.net. Because in hbase, the RegionServers' hostnames are allocated by HMaster. HMaster communicate with RegionServers and get the region server's host name use java NIO: clientChannel.socket().getInetAddress().getHostName(). Also see the startup log of region server: 13:06:21,200 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us hostname to use. Was=search042024.sqa.cm4, Now=search042024.sqa.cm4.site.net As you can see, most machines in the Yarn cluster with IPV6 get the short hostname, but hbase always get the full hostname, so the Host cannot matched (see RMContainerAllocator::assignToMap).This can lead to poor locality. After I use java.net.preferIPv4Stack to force IPv4 in yarn, I get 70+% data locality in the cluster. Thanks, Kaibo -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1226) Inconsistent hostname leads to low data locality on IPv6 hosts
[ https://issues.apache.org/jira/browse/YARN-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-1226: - Environment: Linux, IPv6 Summary: Inconsistent hostname leads to low data locality on IPv6 hosts (was: Inconsistent hostname leads to low data locality) Inconsistent hostname leads to low data locality on IPv6 hosts -- Key: YARN-1226 URL: https://issues.apache.org/jira/browse/YARN-1226 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 0.23.3, 2.0.0-alpha, 2.1.0-beta Environment: Linux, IPv6 Reporter: Kaibo Zhou When I run a mapreduce job which use TableInputFormat to scan a hbase table on yarn cluser with 140+ nodes, I consistently get very low data locality around 0~10%. The scheduler is Capacity Scheduler. Hbase and hadoop are integrated in the cluster with NodeManager, DataNode and HRegionServer run on the same node. The reason of low data locality is: most machines in the cluster uses IPV6, few machines use IPV4. NodeManager use InetAddress.getLocalHost().getHostName() to get the host name, but the return result of this function depends on IPV4 or IPV6, see [InetAddress.getLocalHost().getHostName() returns FQDN|http://bugs.sun.com/view_bug.do?bug_id=7166687]. On machines with ipv4, NodeManager get hostName as: search042097.sqa.cm4.site.net But on machines with ipv6, NodeManager get hostName as: search042097.sqa.cm4 if run with IPv6 disabled, -Djava.net.preferIPv4Stack=true, then returns search042097.sqa.cm4.site.net. For the mapred job which scan hbase table, the InputSplit contains node locations of [FQDN|http://en.wikipedia.org/wiki/FQDN], e.g. search042097.sqa.cm4.site.net. Because in hbase, the RegionServers' hostnames are allocated by HMaster. HMaster communicate with RegionServers and get the region server's host name use java NIO: clientChannel.socket().getInetAddress().getHostName(). Also see the startup log of region server: 13:06:21,200 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us hostname to use. Was=search042024.sqa.cm4, Now=search042024.sqa.cm4.site.net As you can see, most machines in the Yarn cluster with IPV6 get the short hostname, but hbase always get the full hostname, so the Host cannot matched (see RMContainerAllocator::assignToMap).This can lead to poor locality. After I use java.net.preferIPv4Stack to force IPv4 in yarn, I get 70+% data locality in the cluster. Thanks, Kaibo -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776142#comment-13776142 ] Hadoop QA commented on YARN-1021: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604747/YARN-1021.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1149 javac compiler warnings (more than the trunk's current 1145 warnings). {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-assemblies hadoop-tools/hadoop-sls hadoop-tools/hadoop-tools-dist. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1995//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1995//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1995//console This message is automatically generated. Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more
[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-1021: -- Attachment: YARN-1021.patch Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776173#comment-13776173 ] Hadoop QA commented on YARN-1021: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604767/YARN-1021.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-assemblies hadoop-tools/hadoop-sls hadoop-tools/hadoop-tools-dist: org.apache.hadoop.yarn.sls.TestSLSRunner {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1996//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1996//console This message is automatically generated. Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1231) Fix test cases that will hit max- am-used-resources-percent limit after YARN-276
[ https://issues.apache.org/jira/browse/YARN-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemon Lou updated YARN-1231: Attachment: YARN-1231.patch A patch fixing test cases in hadoop-yarn-server-resourcemanager project. Fix test cases that will hit max- am-used-resources-percent limit after YARN-276 Key: YARN-1231 URL: https://issues.apache.org/jira/browse/YARN-1231 Project: Hadoop YARN Issue Type: Task Affects Versions: 2.1.1-beta Reporter: Nemon Lou Assignee: Nemon Lou Labels: test Attachments: YARN-1231.patch Use a separate jira to fix YARN's test cases that will fail by hitting max- am-used-resources-percent limit after YARN-276. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1231) Fix test cases that will hit max- am-used-resources-percent limit after YARN-276
[ https://issues.apache.org/jira/browse/YARN-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776302#comment-13776302 ] Hadoop QA commented on YARN-1231: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604791/YARN-1231.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1997//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1997//console This message is automatically generated. Fix test cases that will hit max- am-used-resources-percent limit after YARN-276 Key: YARN-1231 URL: https://issues.apache.org/jira/browse/YARN-1231 Project: Hadoop YARN Issue Type: Task Affects Versions: 2.1.1-beta Reporter: Nemon Lou Assignee: Nemon Lou Labels: test Attachments: YARN-1231.patch Use a separate jira to fix YARN's test cases that will fail by hitting max- am-used-resources-percent limit after YARN-276. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-1021: -- Attachment: YARN-1021.patch Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776350#comment-13776350 ] Chris Nauroth commented on YARN-1229: - BTW, if we use {{[a-zA-Z_]+[a-zA-Z0-9_]*}}, then that will be compatible with Windows too. It looks like Windows actually allows many more characters than that, but I think it makes sense to stick to a minimal set that we expect to work cross-platform. Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.1-beta I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776357#comment-13776357 ] Hadoop QA commented on YARN-1021: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604801/YARN-1021.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-assemblies hadoop-tools/hadoop-sls hadoop-tools/hadoop-tools-dist. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1998//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1998//console This message is automatically generated. Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776364#comment-13776364 ] Wei Yan commented on YARN-1021: --- Update a new patch according to [~tucu00]'s latest comments. And also let simulator support two types of inputs: (1) The rumen traces, thus users can directly deploy their rumen traces to the simulator. (2) The simulator itself traces (sls), which is much simpler and users can easily generate various workloads. The simulator also has a tool to help users convert rumen traces to sls traces. Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776367#comment-13776367 ] Alejandro Abdelnur commented on YARN-1021: -- [~ywskycn], we shouldn't use /tmp as that does not get clean up by the build, instead we should use a temp subdir under target/, easily done by: {code} File dir = new File(target, UUID.randomUUID()); dir.mkdirs(); {code} And the documentation, in the appendix should have a complete/simple example of an sls JSON input file as a reference. Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-1021: -- Attachment: YARN-1021.patch Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-1021: -- Attachment: (was: YARN-1021.pdf) Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-1021: -- Attachment: YARN-1021.pdf Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776441#comment-13776441 ] Hadoop QA commented on YARN-1021: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604818/YARN-1021.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-assemblies hadoop-tools/hadoop-sls hadoop-tools/hadoop-tools-dist. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1999//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1999//console This message is automatically generated. Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1204) Need to add https port related property in Yarn
[ https://issues.apache.org/jira/browse/YARN-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776485#comment-13776485 ] Vinod Kumar Vavilapalli commented on YARN-1204: --- The latest patch looks good to me. +1. Checking this in. Need to add https port related property in Yarn --- Key: YARN-1204 URL: https://issues.apache.org/jira/browse/YARN-1204 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Assignee: Omkar Vinit Joshi Attachments: YARN-1204.20131018.1.patch, YARN-1204.20131020.1.patch, YARN-1204.20131020.2.patch, YARN-1204.20131020.3.patch, YARN-1204.20131020.4.patch, YARN-1204.20131023.1.patch There is no yarn property available to configure https port for Resource manager, nodemanager and history server. Currently, Yarn services uses the port defined for http [defined by 'mapreduce.jobhistory.webapp.address','yarn.nodemanager.webapp.address', 'yarn.resourcemanager.webapp.address'] for running services on https protocol. Yarn should have list of property to assign https port for RM, NM and JHS. It can be like below. yarn.nodemanager.webapp.https.address yarn.resourcemanager.webapp.https.address mapreduce.jobhistory.webapp.https.address -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776484#comment-13776484 ] Karthik Kambatla commented on YARN-1068: [~bikassaha], when you get a chance, can you review the latest patch? Add admin support for HA operations --- Key: YARN-1068 URL: https://issues.apache.org/jira/browse/YARN-1068 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-prelim.patch Support HA admin operations to facilitate transitioning the RM to Active and Standby states. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1229: Attachment: YARN-1229.1.patch Attached patch changes the mapreduce.shuffle to MapreduceShuffle. Also enforce the check(service name should contain only a-zA-Z0-9) at AuxSerivce Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.1-beta Attachments: YARN-1229.1.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1089) Add YARN compute units alongside virtual cores
[ https://issues.apache.org/jira/browse/YARN-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776503#comment-13776503 ] Arun C Murthy commented on YARN-1089: - I don't think we should put this in branch-2.1 or target this for hadoop-2.2. This is a major new feature which can be implemented in a compatible manner - let's target this for 2.3.0. Add YARN compute units alongside virtual cores -- Key: YARN-1089 URL: https://issues.apache.org/jira/browse/YARN-1089 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1089-1.patch, YARN-1089.patch Based on discussion in YARN-1024, we will add YARN compute units as a resource for requesting and scheduling CPU processing power. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776502#comment-13776502 ] Alejandro Abdelnur commented on YARN-1068: -- One nit, in the RMHAProtocolService, the {{serviceStop()}} should be symmetric with the start in the sense it should do the {{if (haEnabled)}} check to stop the HAAdmin server (instead of doing this check in the HAAdmin service itself). Add admin support for HA operations --- Key: YARN-1068 URL: https://issues.apache.org/jira/browse/YARN-1068 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-prelim.patch Support HA admin operations to facilitate transitioning the RM to Active and Standby states. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1089) Add YARN compute units alongside virtual cores
[ https://issues.apache.org/jira/browse/YARN-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1089: Target Version/s: 2.3.0 (was: 2.1.1-beta) Add YARN compute units alongside virtual cores -- Key: YARN-1089 URL: https://issues.apache.org/jira/browse/YARN-1089 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1089-1.patch, YARN-1089.patch Based on discussion in YARN-1024, we will add YARN compute units as a resource for requesting and scheduling CPU processing power. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1232) Configuration support for RM HA
Karthik Kambatla created YARN-1232: -- Summary: Configuration support for RM HA Key: YARN-1232 URL: https://issues.apache.org/jira/browse/YARN-1232 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them. This blocks ConfiguredFailoverProxyProvider. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1232) Configuration support for RM HA
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1232: --- Attachment: yarn-1232-1.patch Patch that adds the configs to YarnConfiguration and hooks them up to RM startup and RMProxy implementation through HAUtil. Configuration support for RM HA --- Key: YARN-1232 URL: https://issues.apache.org/jira/browse/YARN-1232 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1232-1.patch We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them. This blocks ConfiguredFailoverProxyProvider. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1232) Configuration support for RM HA
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776508#comment-13776508 ] Karthik Kambatla commented on YARN-1232: Will post another patch that describes these configs in yarn-default.xml. Don't think we can have default values for these though. Configuration support for RM HA --- Key: YARN-1232 URL: https://issues.apache.org/jira/browse/YARN-1232 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1232-1.patch We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them. This blocks ConfiguredFailoverProxyProvider. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy
[ https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776509#comment-13776509 ] Karthik Kambatla commented on YARN-1028: Using the configs introduced in YARN-1232, we should be able to retry alternate RMs by setting {{yarn.resourcemanager.ha.nodes.id}}. [~devaraj.k], I hope it is okay if I take this up. Add FailoverProxyProvider like capability to RMProxy Key: YARN-1028 URL: https://issues.apache.org/jira/browse/YARN-1028 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Devaraj K RMProxy layer currently abstracts RM discovery and implements it by looking up service information from configuration. Motivated by HDFS and using existing classes from Common, we can add failover proxy providers that may provide RM discovery in extensible ways. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1204) Need to add https port related property in Yarn
[ https://issues.apache.org/jira/browse/YARN-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776515#comment-13776515 ] Hudson commented on YARN-1204: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4462 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4462/]) YARN-1204. Added separate configuration properties for https for RM and NM without which servers enabled with https will also start on http ports. Contributed by Omkar Vinit Joshi. MAPREDUCE-5523. Added separate configuration properties for https for JHS without which even when https is enabled, it starts on http port itself. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1525947) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/AppController.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/WebAppUtil.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/jobhistory/JHAdminConfig.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/MiniMRYarnCluster.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NavBlock.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmFilterInitializer.java Need to add https port related property in Yarn --- Key: YARN-1204 URL: https://issues.apache.org/jira/browse/YARN-1204 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Assignee: Omkar Vinit Joshi Attachments: YARN-1204.20131018.1.patch, YARN-1204.20131020.1.patch, YARN-1204.20131020.2.patch, YARN-1204.20131020.3.patch, YARN-1204.20131020.4.patch, YARN-1204.20131023.1.patch There is no yarn property available to configure https port for Resource manager, nodemanager and history server. Currently, Yarn services uses the port defined for http [defined by 'mapreduce.jobhistory.webapp.address','yarn.nodemanager.webapp.address', 'yarn.resourcemanager.webapp.address'] for running services on https protocol. Yarn should have list of property to assign https port for RM, NM and JHS. It can be like below. yarn.nodemanager.webapp.https.address yarn.resourcemanager.webapp.https.address mapreduce.jobhistory.webapp.https.address -- This message is
[jira] [Updated] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1229: Attachment: YARN-1229.2.patch Add a test case Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.1-beta Attachments: YARN-1229.1.patch, YARN-1229.2.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1068: --- Attachment: yarn-1068-7.patch Thanks [~tucu00]. Updated patch to address the comment. Add admin support for HA operations --- Key: YARN-1068 URL: https://issues.apache.org/jira/browse/YARN-1068 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-prelim.patch Support HA admin operations to facilitate transitioning the RM to Active and Standby states. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776529#comment-13776529 ] Xuan Gong commented on YARN-1229: - Run the full YARN test, all the YARN Test are passing. Run the full MAPREDUCE test, some of tests in mapred package has time out issue, which I do not think it is caused by this patch. Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.1-beta Attachments: YARN-1229.1.patch, YARN-1229.2.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776546#comment-13776546 ] Bikas Saha commented on YARN-1229: -- base32 encoding is a good idea if we dont want to break compatibility. It basically boils down to that. Xuan, the AuxServiceHelper is still using NM_AUX_SERVICE prefix that has _ in it. Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.1-beta Attachments: YARN-1229.1.patch, YARN-1229.2.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776571#comment-13776571 ] Hadoop QA commented on YARN-1068: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604842/yarn-1068-7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2000//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2000//console This message is automatically generated. Add admin support for HA operations --- Key: YARN-1068 URL: https://issues.apache.org/jira/browse/YARN-1068 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-prelim.patch Support HA admin operations to facilitate transitioning the RM to Active and Standby states. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1233) NodeManager doesn't renew krb5 creds
Allen Wittenauer created YARN-1233: -- Summary: NodeManager doesn't renew krb5 creds Key: YARN-1233 URL: https://issues.apache.org/jira/browse/YARN-1233 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.0-beta Reporter: Allen Wittenauer In 2.1.0-beta-rc1 (sorry, haven't upgraded yet) the NM is not renewing krb5 TGTs after they expire. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1229: Attachment: YARN-1229.3.patch Changed the NM_AUX_SERVICE prefix to NodeManagerAuxService to eliminate the _ Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.1-beta Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1157) ResourceManager UI has invalid tracking URL link for distributed shell application
[ https://issues.apache.org/jira/browse/YARN-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1157: Attachment: YARN-1157.5.patch create the patch based on the latest trunk ResourceManager UI has invalid tracking URL link for distributed shell application -- Key: YARN-1157 URL: https://issues.apache.org/jira/browse/YARN-1157 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.1-beta Attachments: YARN-1157.1.patch, YARN-1157.2.patch, YARN-1157.2.patch, YARN-1157.3.patch, YARN-1157.4.patch, YARN-1157.5.patch Submit YARN distributed shell application. Goto ResourceManager Web UI. The application definitely appears. In Tracking UI column, there will be history link. Click on that link. Instead of showing application master web UI, HTTP error 500 would appear. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776598#comment-13776598 ] Hadoop QA commented on YARN-1229: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604849/YARN-1229.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2002//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2002//console This message is automatically generated. Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.1-beta Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1157) ResourceManager UI has invalid tracking URL link for distributed shell application
[ https://issues.apache.org/jira/browse/YARN-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776629#comment-13776629 ] Hadoop QA commented on YARN-1157: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604851/YARN-1157.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2003//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2003//console This message is automatically generated. ResourceManager UI has invalid tracking URL link for distributed shell application -- Key: YARN-1157 URL: https://issues.apache.org/jira/browse/YARN-1157 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.1-beta Attachments: YARN-1157.1.patch, YARN-1157.2.patch, YARN-1157.2.patch, YARN-1157.3.patch, YARN-1157.4.patch, YARN-1157.5.patch Submit YARN distributed shell application. Goto ResourceManager Web UI. The application definitely appears. In Tracking UI column, there will be history link. Click on that link. Instead of showing application master web UI, HTTP error 500 would appear. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776628#comment-13776628 ] Alejandro Abdelnur commented on YARN-1021: -- +1 Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1157) ResourceManager UI has invalid tracking URL link for distributed shell application
[ https://issues.apache.org/jira/browse/YARN-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1157: Attachment: YARN-1157.6.patch Adding more comments in RegisterApplicationMasterRequest and FinishApplicationMasterRequest ResourceManager UI has invalid tracking URL link for distributed shell application -- Key: YARN-1157 URL: https://issues.apache.org/jira/browse/YARN-1157 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.1-beta Attachments: YARN-1157.1.patch, YARN-1157.2.patch, YARN-1157.2.patch, YARN-1157.3.patch, YARN-1157.4.patch, YARN-1157.5.patch, YARN-1157.6.patch Submit YARN distributed shell application. Goto ResourceManager Web UI. The application definitely appears. In Tracking UI column, there will be history link. Click on that link. Instead of showing application master web UI, HTTP error 500 would appear. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776634#comment-13776634 ] Bikas Saha commented on YARN-1068: -- It would be educative to compare the HAAdmin server start code with existing admin RM server like the AdminService. I notice 2 things. 1) AdminService does not use the HAServiceProtocolServerSideTranslatorPB pattern 2) AdminService does something with HADOOP_SECURITY_AUTHORIZATION which is missing in HAAdminService. This probably defines who has access to perform the admin operations. We will likely need that for the HAAdmin right? Having thought about this, it seems to me that this jira is actually blocked by YARN-986. Without a concept of a logical name how can we expect the CLI etc to find the correct RM address from configuration? The client conf files would be expected to have entries for all RM instances and we would need to be able to issue admin commands to any one of them. So we need to be able to address them via a logical name, right? So the current approach that picks the RM_HA_ADMIN_SERVICE address does not seem like a viable solution. Similarly, server conf files would need to tell the server what its logical name is so that it can try to pick and instance specific configurations. This is precisely why we have the HAAdmin.resolveTarget() method. Again, it would be educative to look at NNHAServiceTarget for client side and the constructor for NameNode where is uses the logical name to translate and re-write server side conf. Add admin support for HA operations --- Key: YARN-1068 URL: https://issues.apache.org/jira/browse/YARN-1068 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-prelim.patch Support HA admin operations to facilitate transitioning the RM to Active and Standby states. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-986) YARN should have a ClusterId/ServiceId
[ https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776637#comment-13776637 ] Bikas Saha commented on YARN-986: - This should be used to set the service address for tokens. This would also be needed to pick up the correct configs for HA scenarios. YARN should have a ClusterId/ServiceId -- Key: YARN-986 URL: https://issues.apache.org/jira/browse/YARN-986 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli This needs to be done to support non-ip based fail over of RM. Once the server sets the token service address to be this generic ClusterId/ServiceId, clients can translate it to appropriate final IP and then be able to select tokens via TokenSelectors. Some workarounds for other related issues were put in place at YARN-945. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-986) YARN should have a ClusterId/ServiceId
[ https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-986: Summary: YARN should have a ClusterId/ServiceId (was: YARN should have a ClusterId/ServiceId that should be used to set the service address for tokens) YARN should have a ClusterId/ServiceId -- Key: YARN-986 URL: https://issues.apache.org/jira/browse/YARN-986 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli This needs to be done to support non-ip based fail over of RM. Once the server sets the token service address to be this generic ClusterId/ServiceId, clients can translate it to appropriate final IP and then be able to select tokens via TokenSelectors. Some workarounds for other related issues were put in place at YARN-945. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776653#comment-13776653 ] Karthik Kambatla commented on YARN-1068: Thanks [~bikassaha], agree with most of your points. bq. AdminService does not use the HAServiceProtocolServerSideTranslatorPB pattern The reason for this is our attempt to reuse most of the common code - protos and client implementations. bq. Having thought about this, it seems to me that this jira is actually blocked by YARN-986. To fix the admin support in entirety, I agree that we need YARN-1232 and YARN-986. That said, for ease of development, I would propose splitting the admin support into two parts (JIRAs) - basic support (this JIRA) to go in first to help testing YARN-1232 and YARN-986, and complete admin support that adds the remaining parts. Otherwise, we need applying this over those other JIRAs to test. Thoughts? Add admin support for HA operations --- Key: YARN-1068 URL: https://issues.apache.org/jira/browse/YARN-1068 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-prelim.patch Support HA admin operations to facilitate transitioning the RM to Active and Standby states. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1157) ResourceManager UI has invalid tracking URL link for distributed shell application
[ https://issues.apache.org/jira/browse/YARN-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776663#comment-13776663 ] Hadoop QA commented on YARN-1157: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604859/YARN-1157.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2004//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2004//console This message is automatically generated. ResourceManager UI has invalid tracking URL link for distributed shell application -- Key: YARN-1157 URL: https://issues.apache.org/jira/browse/YARN-1157 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.1-beta Attachments: YARN-1157.1.patch, YARN-1157.2.patch, YARN-1157.2.patch, YARN-1157.3.patch, YARN-1157.4.patch, YARN-1157.5.patch, YARN-1157.6.patch Submit YARN distributed shell application. Goto ResourceManager Web UI. The application definitely appears. In Tracking UI column, there will be history link. Click on that link. Instead of showing application master web UI, HTTP error 500 would appear. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1157) ResourceManager UI has invalid tracking URL link for distributed shell application
[ https://issues.apache.org/jira/browse/YARN-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776684#comment-13776684 ] Jian He commented on YARN-1157: --- Tests look much clean, thanks for the update, patch looks good, + 1 ResourceManager UI has invalid tracking URL link for distributed shell application -- Key: YARN-1157 URL: https://issues.apache.org/jira/browse/YARN-1157 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.1-beta Attachments: YARN-1157.1.patch, YARN-1157.2.patch, YARN-1157.2.patch, YARN-1157.3.patch, YARN-1157.4.patch, YARN-1157.5.patch, YARN-1157.6.patch Submit YARN distributed shell application. Goto ResourceManager Web UI. The application definitely appears. In Tracking UI column, there will be history link. Click on that link. Instead of showing application master web UI, HTTP error 500 would appear. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776765#comment-13776765 ] Bikas Saha commented on YARN-1229: -- Looks good to me. Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.1-beta Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1214) Register ClientToken MasterKey in SecretManager after it is saved
[ https://issues.apache.org/jira/browse/YARN-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776802#comment-13776802 ] Bikas Saha commented on YARN-1214: -- +1 Register ClientToken MasterKey in SecretManager after it is saved - Key: YARN-1214 URL: https://issues.apache.org/jira/browse/YARN-1214 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-1214.1.patch, YARN-1214.2.patch, YARN-1214.3.patch, YARN-1214.4.patch, YARN-1214.5.patch, YARN-1214.patch Currently, app attempt ClientToken master key is registered before it is saved. This can cause problem that before the master key is saved, client gets the token and RM also crashes, RM cannot reloads the master key back after it restarts as it is not saved. As a result, client is holding an invalid token. We can register the client token master key after it is saved in the store. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1214) Register ClientToken MasterKey in SecretManager after it is saved
[ https://issues.apache.org/jira/browse/YARN-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1214: -- Attachment: YARN-1214.6.patch patch rebased Register ClientToken MasterKey in SecretManager after it is saved - Key: YARN-1214 URL: https://issues.apache.org/jira/browse/YARN-1214 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-1214.1.patch, YARN-1214.2.patch, YARN-1214.3.patch, YARN-1214.4.patch, YARN-1214.5.patch, YARN-1214.6.patch, YARN-1214.patch Currently, app attempt ClientToken master key is registered before it is saved. This can cause problem that before the master key is saved, client gets the token and RM also crashes, RM cannot reloads the master key back after it restarts as it is not saved. As a result, client is holding an invalid token. We can register the client token master key after it is saved in the store. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1214) Register ClientToken MasterKey in SecretManager after it is saved
[ https://issues.apache.org/jira/browse/YARN-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776843#comment-13776843 ] Hadoop QA commented on YARN-1214: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604886/YARN-1214.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2005//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2005//console This message is automatically generated. Register ClientToken MasterKey in SecretManager after it is saved - Key: YARN-1214 URL: https://issues.apache.org/jira/browse/YARN-1214 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-1214.1.patch, YARN-1214.2.patch, YARN-1214.3.patch, YARN-1214.4.patch, YARN-1214.5.patch, YARN-1214.6.patch, YARN-1214.patch Currently, app attempt ClientToken master key is registered before it is saved. This can cause problem that before the master key is saved, client gets the token and RM also crashes, RM cannot reloads the master key back after it restarts as it is not saved. As a result, client is holding an invalid token. We can register the client token master key after it is saved in the store. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776847#comment-13776847 ] Carlo Curino commented on YARN-624: --- Hi Guys, I would like to quantify what is the typical waste of resources while hoarding containers towards a gang for Gyraph or Storm. Anyone have an intuition/measure of the typical time-delay and container slot-time wasted while hoarding containers, before the useful part of the computation starts? Thanks.. Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776852#comment-13776852 ] Vinod Kumar Vavilapalli commented on YARN-1229: --- *sigh* more incompatible changes. Thought for a while if we can do it in a compatible manner, but doesn't seem like there is any way. Looked at the patch, +1 for the changes. Let's get it in asap. Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.1-beta Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-1229: - Fix Version/s: (was: 2.1.1-beta) 2.1.2-beta Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.2-beta Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1204) Need to add https port related property in Yarn
[ https://issues.apache.org/jira/browse/YARN-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1204: -- Fix Version/s: 2.1.2-beta Need to add https port related property in Yarn --- Key: YARN-1204 URL: https://issues.apache.org/jira/browse/YARN-1204 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Assignee: Omkar Vinit Joshi Fix For: 2.1.2-beta Attachments: YARN-1204.20131018.1.patch, YARN-1204.20131020.1.patch, YARN-1204.20131020.2.patch, YARN-1204.20131020.3.patch, YARN-1204.20131020.4.patch, YARN-1204.20131023.1.patch There is no yarn property available to configure https port for Resource manager, nodemanager and history server. Currently, Yarn services uses the port defined for http [defined by 'mapreduce.jobhistory.webapp.address','yarn.nodemanager.webapp.address', 'yarn.resourcemanager.webapp.address'] for running services on https protocol. Yarn should have list of property to assign https port for RM, NM and JHS. It can be like below. yarn.nodemanager.webapp.https.address yarn.resourcemanager.webapp.https.address mapreduce.jobhistory.webapp.https.address -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776865#comment-13776865 ] Siddharth Seth commented on YARN-1229: -- Just looked at the patch, it'd be nice to include underscores as well - provides for a separator in the allowed character set. Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.2-beta Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1128) FifoPolicy.computeShares throws NPE on empty list of Schedulables
[ https://issues.apache.org/jira/browse/YARN-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-1128: - Fix Version/s: 2.1.2-beta FifoPolicy.computeShares throws NPE on empty list of Schedulables - Key: YARN-1128 URL: https://issues.apache.org/jira/browse/YARN-1128 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Karthik Kambatla Fix For: 2.1.2-beta Attachments: yarn-1128-1.patch FifoPolicy gives all of a queue's share to the earliest-scheduled application. {code} Schedulable earliest = null; for (Schedulable schedulable : schedulables) { if (earliest == null || schedulable.getStartTime() earliest.getStartTime()) { earliest = schedulable; } } earliest.setFairShare(Resources.clone(totalResources)); {code} If the queue has no schedulables in it, earliest will be left null, leading to an NPE on the last line. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1203) Application Manager UI does not appear with Https enabled
[ https://issues.apache.org/jira/browse/YARN-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1203: -- Fix Version/s: 2.1.2-beta Application Manager UI does not appear with Https enabled - Key: YARN-1203 URL: https://issues.apache.org/jira/browse/YARN-1203 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Assignee: Omkar Vinit Joshi Fix For: 2.1.2-beta Attachments: YARN-1203.20131017.1.patch, YARN-1203.20131017.2.patch, YARN-1203.20131017.3.patch, YARN-1203.20131018.1.patch, YARN-1203.20131018.2.patch, YARN-1203.20131019.1.patch Need to add support to disable 'hadoop.ssl.enabled' for MR jobs. A job should be able to run on http protocol by setting 'hadoop.ssl.enabled' property at job level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776872#comment-13776872 ] Chris Nauroth commented on YARN-1229: - Agreed on underscores. Various resources indicate that {{[a-zA-Z_]+[a-zA-Z0-9_]*}} is a good format that we can expect to work cross-platform. Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.2-beta Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1214) Register ClientToken MasterKey in SecretManager after it is saved
[ https://issues.apache.org/jira/browse/YARN-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-1214: - Priority: Critical (was: Major) Register ClientToken MasterKey in SecretManager after it is saved - Key: YARN-1214 URL: https://issues.apache.org/jira/browse/YARN-1214 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Priority: Critical Attachments: YARN-1214.1.patch, YARN-1214.2.patch, YARN-1214.3.patch, YARN-1214.4.patch, YARN-1214.5.patch, YARN-1214.6.patch, YARN-1214.patch Currently, app attempt ClientToken master key is registered before it is saved. This can cause problem that before the master key is saved, client gets the token and RM also crashes, RM cannot reloads the master key back after it restarts as it is not saved. As a result, client is holding an invalid token. We can register the client token master key after it is saved in the store. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1229: Attachment: YARN-1229.4.patch Allow _ as valid character in auxServiceName, and disallow auxServiceName starting at number Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.2-beta Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, YARN-1229.4.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1053) Diagnostic message from ContainerExitEvent is ignored in ContainerImpl
[ https://issues.apache.org/jira/browse/YARN-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1053: Fix Version/s: (was: 2.1.1-beta) 2.1.2-beta Diagnostic message from ContainerExitEvent is ignored in ContainerImpl -- Key: YARN-1053 URL: https://issues.apache.org/jira/browse/YARN-1053 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Labels: newbie Fix For: 2.3.0, 2.1.2-beta Attachments: YARN-1053.20130809.patch If the container launch fails then we send ContainerExitEvent. This event contains exitCode and diagnostic message. Today we are ignoring diagnostic message while handling this event inside ContainerImpl. Fixing it as it is useful in diagnosing the failure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1158) ResourceManager UI has application stdout missing if application stdout is not in the same directory as AppMaster stdout
[ https://issues.apache.org/jira/browse/YARN-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1158: Fix Version/s: (was: 2.1.1-beta) 2.1.2-beta ResourceManager UI has application stdout missing if application stdout is not in the same directory as AppMaster stdout Key: YARN-1158 URL: https://issues.apache.org/jira/browse/YARN-1158 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Tassapol Athiapinya Fix For: 2.1.2-beta Configure yarn-site.xml's yarn.nodemanager.local-dirs to multiple directories. Turn on log aggregation. Run distributed shell application. If an application writes AppMaster.stdout in one directory and stdout in another directory. Goto ResourceManager web UI. Open up container logs. Only AppMaster.stdout would appear. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1121) RMStateStore should flush all pending store events before closing
[ https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1121: Fix Version/s: (was: 2.1.1-beta) 2.1.2-beta RMStateStore should flush all pending store events before closing - Key: YARN-1121 URL: https://issues.apache.org/jira/browse/YARN-1121 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Bikas Saha Fix For: 2.1.2-beta on serviceStop it should wait for all internal pending events to drain before stopping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776915#comment-13776915 ] Siddharth Seth commented on YARN-1229: -- Took a quick look. - Can you please rename MapreduceShuffle to mapreduce_shuffle (closer to the old name) - The check can be regex based, rather than walking through all the characters. - Include an empty check along with the null check Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.2-beta Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, YARN-1229.4.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty
[ https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1167: Fix Version/s: (was: 2.1.1-beta) 2.1.2-beta Submitted distributed shell application shows appMasterHost = empty --- Key: YARN-1167 URL: https://issues.apache.org/jira/browse/YARN-1167 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Fix For: 2.1.2-beta Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty. ==console logs== distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1168) Cannot run echo \Hello World\
[ https://issues.apache.org/jira/browse/YARN-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1168: Fix Version/s: (was: 2.1.1-beta) 2.1.2-beta Cannot run echo \Hello World\ - Key: YARN-1168 URL: https://issues.apache.org/jira/browse/YARN-1168 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Priority: Critical Fix For: 2.1.2-beta Run $ ssh localhost echo \Hello World\ with bash does succeed. Hello World is shown in stdout. Run distributed shell with similar echo command. That is either $ /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.*.jar -shell_command echo -shell_args \Hello World\ or $ /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.*.jar -shell_command echo -shell_args Hello World {code:title=yarn logs -- only hello is shown} LogType: stdout LogLength: 6 Log Contents: hello {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1149: Fix Version/s: (was: 2.1.1-beta) 2.1.2-beta NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING - Key: YARN-1149 URL: https://issues.apache.org/jira/browse/YARN-1149 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, YARN-1149.4.patch When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to complete for application_1377459190746_0118 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for container container_1377459190746_0118_01_04. Current good log dirs are /tmp/yarn/local 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate log-file for app application_1377459190746_0118 2013-08-25 20:45:00,925 WARN application.Application (ApplicationImpl.java:handle(427)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:662) 2013-08-25 20:45:00,926 INFO application.Application (ApplicationImpl.java:handle(430)) - Application application_1377459190746_0118 transitioned from RUNNING to null 2013-08-25 20:45:00,927 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(463)) - org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting. 2013-08-25 20:45:00,938 INFO ipc.Server (Server.java:stop(2437)) - Stopping server on 8040 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1157) ResourceManager UI has invalid tracking URL link for distributed shell application
[ https://issues.apache.org/jira/browse/YARN-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1157: Fix Version/s: (was: 2.1.1-beta) 2.1.2-beta ResourceManager UI has invalid tracking URL link for distributed shell application -- Key: YARN-1157 URL: https://issues.apache.org/jira/browse/YARN-1157 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1157.1.patch, YARN-1157.2.patch, YARN-1157.2.patch, YARN-1157.3.patch, YARN-1157.4.patch, YARN-1157.5.patch, YARN-1157.6.patch Submit YARN distributed shell application. Goto ResourceManager Web UI. The application definitely appears. In Tracking UI column, there will be history link. Click on that link. Instead of showing application master web UI, HTTP error 500 would appear. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1022) Unnecessary INFO logs in AMRMClientAsync
[ https://issues.apache.org/jira/browse/YARN-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1022: Fix Version/s: (was: 2.1.1-beta) 2.1.2-beta Unnecessary INFO logs in AMRMClientAsync Key: YARN-1022 URL: https://issues.apache.org/jira/browse/YARN-1022 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Priority: Minor Labels: newbie Fix For: 2.1.2-beta Logs like the following should be debug or else every legitimate stop causes unnecessary exception traces in the logs. 464 2013-08-03 20:01:34,459 INFO [AMRM Heartbeater thread] org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl: Heartbeater interrupted 465 java.lang.InterruptedException: sleep interrupted 466 at java.lang.Thread.sleep(Native Method) 467 at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:249) 468 2013-08-03 20:01:34,460 INFO [AMRM Callback Handler Thread] org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl: Interrupted while waiting for queue 469 java.lang.InterruptedException 470 at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer. java:1961) 471 at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1996) 472 at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) 473 at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:275) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1142) MiniYARNCluster web ui does not work properly
[ https://issues.apache.org/jira/browse/YARN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1142: Fix Version/s: (was: 2.1.1-beta) 2.1.2-beta MiniYARNCluster web ui does not work properly - Key: YARN-1142 URL: https://issues.apache.org/jira/browse/YARN-1142 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Fix For: 2.1.2-beta When going to the RM http port, the NM web ui is displayed. It seems there is a singleton somewhere that breaks things when RM NMs run in the same process. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1131: Fix Version/s: (was: 2.1.1-beta) 2.1.2-beta $ yarn logs should return a message log aggregation is during progress if YARN application is running - Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Improvement Components: client Reporter: Tassapol Athiapinya Assignee: Junping Du Priority: Minor Fix For: 2.1.2-beta In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1229: Attachment: YARN-1229.5.patch 1.Change to mapreduce_shuffle 2. using regex for checking auxName Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.2-beta Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, YARN-1229.4.patch, YARN-1229.5.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776945#comment-13776945 ] Siddharth Seth commented on YARN-1229: -- Patch looks good. Missed this earlier, but there's several references to mapreduce.shuffle in documentation which need to be updated. Also, since it's being updated - can you make the Pattern final. Thanks Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.2-beta Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, YARN-1229.4.patch, YARN-1229.5.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1229: Attachment: YARN-1229.6.patch fix documentation Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.2-beta Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, YARN-1229.4.patch, YARN-1229.5.patch, YARN-1229.6.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776975#comment-13776975 ] Hadoop QA commented on YARN-1229: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604917/YARN-1229.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2007//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2007//console This message is automatically generated. Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.2-beta Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, YARN-1229.4.patch, YARN-1229.5.patch, YARN-1229.6.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776979#comment-13776979 ] Siddharth Seth commented on YARN-1229: -- +1. Committing. Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.2-beta Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, YARN-1229.4.patch, YARN-1229.5.patch, YARN-1229.6.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1232) Configuration support for RM HA
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1232: --- Attachment: yarn-1232-2.patch Patch that adds descriptions and tests HAUtil, and to be applied on trunk. Configuration support for RM HA --- Key: YARN-1232 URL: https://issues.apache.org/jira/browse/YARN-1232 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1232-1.patch, yarn-1232-2.patch We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them. This blocks ConfiguredFailoverProxyProvider. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (YARN-1234) Container localizer logs are not created in secured cluster
[ https://issues.apache.org/jira/browse/YARN-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi moved MAPREDUCE-5532 to YARN-1234: Key: YARN-1234 (was: MAPREDUCE-5532) Project: Hadoop YARN (was: Hadoop Map/Reduce) Container localizer logs are not created in secured cluster Key: YARN-1234 URL: https://issues.apache.org/jira/browse/YARN-1234 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi When we are running ContainerLocalizer in secured cluster we potentially are not creating any log file to track log messages. This will be helpful in potentially identifying ContainerLocalization issues in secured cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1234) Container localizer logs are not created in secured cluster
[ https://issues.apache.org/jira/browse/YARN-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1234: Fix Version/s: 2.1.2-beta Container localizer logs are not created in secured cluster Key: YARN-1234 URL: https://issues.apache.org/jira/browse/YARN-1234 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Fix For: 2.1.2-beta When we are running ContainerLocalizer in secured cluster we potentially are not creating any log file to track log messages. This will be helpful in potentially identifying ContainerLocalization issues in secured cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1234) Container localizer logs are not created in secured cluster
[ https://issues.apache.org/jira/browse/YARN-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1234: Component/s: nodemanager Container localizer logs are not created in secured cluster Key: YARN-1234 URL: https://issues.apache.org/jira/browse/YARN-1234 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Fix For: 2.1.2-beta When we are running ContainerLocalizer in secured cluster we potentially are not creating any log file to track log messages. This will be helpful in potentially identifying ContainerLocalization issues in secured cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777009#comment-13777009 ] Hudson commented on YARN-1229: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4463 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4463/]) YARN-1229. Define constraints on Auxiliary Service names. Change ShuffleHandler service name from mapreduce.shuffle to mapreduce_shuffle. Contributed by Xuan Gong. (sseth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1526065) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/SingleCluster.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/INSTALL * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestAuxServices.java Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.2-beta Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, YARN-1229.4.patch, YARN-1229.5.patch, YARN-1229.6.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1232) Configuration support for RM HA
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777021#comment-13777021 ] Hadoop QA commented on YARN-1232: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604931/yarn-1232-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.cli.TestYarnCLI org.apache.hadoop.yarn.client.TestGetGroups org.apache.hadoop.yarn.client.api.impl.TestYarnClient org.apache.hadoop.yarn.client.api.impl.TestAMRMClient org.apache.hadoop.yarn.client.api.impl.TestNMClient org.apache.hadoop.yarn.conf.TestYarnConfiguration org.apache.hadoop.yarn.logaggregation.TestLogDumper org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebApp org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue org.apache.hadoop.yarn.server.resourcemanager.recovery.TestRMStateStore org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.TestRMHA org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueParsing org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestChildQueueOrder org.apache.hadoop.yarn.server.resourcemanager.TestResourceManager org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService org.apache.hadoop.yarn.server.resourcemanager.TestFifoScheduler org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions org.apache.hadoop.yarn.server.resourcemanager.TestRMNodeTransitions org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerEventLog org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice.TestApplicationMasterService org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestParentQueue org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestQueueMetrics
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777023#comment-13777023 ] Hadoop QA commented on YARN-1229: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604922/YARN-1229.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2008//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2008//console This message is automatically generated. Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.2-beta Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, YARN-1229.4.patch, YARN-1229.5.patch, YARN-1229.6.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-899) Get queue administration ACLs working
[ https://issues.apache.org/jira/browse/YARN-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-899: --- Attachment: YARN-899.7.patch create patch based on the latest trunk Get queue administration ACLs working - Key: YARN-899 URL: https://issues.apache.org/jira/browse/YARN-899 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Xuan Gong Attachments: YARN-899.1.patch, YARN-899.2.patch, YARN-899.3.patch, YARN-899.4.patch, YARN-899.5.patch, YARN-899.5.patch, YARN-899.6.patch, YARN-899.7.patch The Capacity Scheduler documents the yarn.scheduler.capacity.root.queue-path.acl_administer_queue config option for controlling who can administer a queue, but it is not hooked up to anything. The Fair Scheduler could make use of a similar option as well. This is a feature-parity regression from MR1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-899) Get queue administration ACLs working
[ https://issues.apache.org/jira/browse/YARN-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-899: --- Attachment: YARN-899.8.patch Get queue administration ACLs working - Key: YARN-899 URL: https://issues.apache.org/jira/browse/YARN-899 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Xuan Gong Attachments: YARN-899.1.patch, YARN-899.2.patch, YARN-899.3.patch, YARN-899.4.patch, YARN-899.5.patch, YARN-899.5.patch, YARN-899.6.patch, YARN-899.7.patch, YARN-899.8.patch The Capacity Scheduler documents the yarn.scheduler.capacity.root.queue-path.acl_administer_queue config option for controlling who can administer a queue, but it is not hooked up to anything. The Fair Scheduler could make use of a similar option as well. This is a feature-parity regression from MR1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1089) Add YARN compute units alongside virtual cores
[ https://issues.apache.org/jira/browse/YARN-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777034#comment-13777034 ] Sandy Ryza commented on YARN-1089: -- I'm ok with with waiting until 2.3. In case it's not clear, the consequence of this is that until then it will be impossible to place more tasks on a node than its number of virtual cores, which is essentially its number of physical cores. I think we should make YARN-976, documenting the meaning of vcores, a blocker for 2.2. Add YARN compute units alongside virtual cores -- Key: YARN-1089 URL: https://issues.apache.org/jira/browse/YARN-1089 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1089-1.patch, YARN-1089.patch Based on discussion in YARN-1024, we will add YARN compute units as a resource for requesting and scheduling CPU processing power. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable
[ https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777044#comment-13777044 ] Jian He commented on YARN-674: -- Is this related to ClientRMService.renewDelegationToken this method? Slow or failing DelegationToken renewals on submission itself make RM unavailable - Key: YARN-674 URL: https://issues.apache.org/jira/browse/YARN-674 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli This was caused by YARN-280. A slow or a down NameNode for will make it look like RM is unavailable as it may run out of RPC handlers due to blocked client submissions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1089) Add YARN compute units alongside virtual cores
[ https://issues.apache.org/jira/browse/YARN-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777051#comment-13777051 ] Bikas Saha commented on YARN-1089: -- At this point, I am not seeing the benefit of creating yet another cpu related configuration. While I am not against useful configurations, its already hard to configure YARN. Like Vinod and others said, can a summary of the discussions made elsewhere be placed here. Add YARN compute units alongside virtual cores -- Key: YARN-1089 URL: https://issues.apache.org/jira/browse/YARN-1089 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1089-1.patch, YARN-1089.patch Based on discussion in YARN-1024, we will add YARN compute units as a resource for requesting and scheduling CPU processing power. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1214) Register ClientToken MasterKey in SecretManager after it is saved
[ https://issues.apache.org/jira/browse/YARN-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777056#comment-13777056 ] Hudson commented on YARN-1214: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4464 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4464/]) YARN-1214. Register ClientToken MasterKey in SecretManager after it is saved (Jian He via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1526078) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/ClientToAMTokenSecretManagerInRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java Register ClientToken MasterKey in SecretManager after it is saved - Key: YARN-1214 URL: https://issues.apache.org/jira/browse/YARN-1214 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Priority: Critical Fix For: 2.1.2-beta Attachments: YARN-1214.1.patch, YARN-1214.2.patch, YARN-1214.3.patch, YARN-1214.4.patch, YARN-1214.5.patch, YARN-1214.6.patch, YARN-1214.patch Currently, app attempt ClientToken master key is registered before it is saved. This can cause problem that before the master key is saved, client gets the token and RM also crashes, RM cannot reloads the master key back after it restarts as it is not saved. As a result, client is holding an invalid token. We can register the client token master key after it is saved in the store. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1215) Yarn URL should include userinfo
[ https://issues.apache.org/jira/browse/YARN-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuan Liu updated YARN-1215: Attachment: YARN-1215-trunk.2.patch Attach a new patch that adds a userInfo field for org.apache.hadoop.yarn.api.records.URL. This appends an optional filed to existing .proto file. This is allowed according to compatibility guide at: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility Yarn URL should include userinfo Key: YARN-1215 URL: https://issues.apache.org/jira/browse/YARN-1215 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Reporter: Chuan Liu Assignee: Chuan Liu Attachments: YARN-1215-trunk.2.patch, YARN-1215-trunk.patch In the {{org.apache.hadoop.yarn.api.records.URL}} class, we don't have an userinfo as part of the URL. When converting a {{java.net.URI}} object into the YARN URL object in {{ConverterUtils.getYarnUrlFromURI()}} method, we will set uri host as the url host. If the uri has a userinfo part, the userinfo is discarded. This will lead to information loss if the original uri has the userinfo, e.g. foo://username:passw...@example.com will be converted to foo://example.com and username/password information is lost during the conversion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-899) Get queue administration ACLs working
[ https://issues.apache.org/jira/browse/YARN-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777069#comment-13777069 ] Hadoop QA commented on YARN-899: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604938/YARN-899.8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2010//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2010//console This message is automatically generated. Get queue administration ACLs working - Key: YARN-899 URL: https://issues.apache.org/jira/browse/YARN-899 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Xuan Gong Attachments: YARN-899.1.patch, YARN-899.2.patch, YARN-899.3.patch, YARN-899.4.patch, YARN-899.5.patch, YARN-899.5.patch, YARN-899.6.patch, YARN-899.7.patch, YARN-899.8.patch The Capacity Scheduler documents the yarn.scheduler.capacity.root.queue-path.acl_administer_queue config option for controlling who can administer a queue, but it is not hooked up to anything. The Fair Scheduler could make use of a similar option as well. This is a feature-parity regression from MR1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1089) Add YARN compute units alongside virtual cores
[ https://issues.apache.org/jira/browse/YARN-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777071#comment-13777071 ] Sandy Ryza commented on YARN-1089: -- As was requested, I posted a summary of the proposal on YARN-1024. In case it's not clear on the summary, here's the problem we're trying to solve: We want jobs to be portable between clusters. CPU is not a fluid resource in the way memory is. The number of cores on a machine is just as important its total processing power when scheduling tasks. Imagine a cluster where every node has powerful CPUs with many cores. One type of task that will be run on the cluster saturates a full CPU, but another type of task that will be run on the cluster contains two threads, each which can saturate only half a full CPU. If we have a single dimension for CPU requests, these tasks will request an equal number of those. What happens if we then move those tasks to a cluster with CPUs whose cores are half as fast? The first task will run half as fast, and the second task will run in the same amount of time. It's in the first task's interest to only request half as many CPU resources on that cluster. I'm also afraid of things getting complicated, but I can't think of anything better that doesn't require having the meaning of a virtual core vary widely from cluster to cluster. Add YARN compute units alongside virtual cores -- Key: YARN-1089 URL: https://issues.apache.org/jira/browse/YARN-1089 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1089-1.patch, YARN-1089.patch Based on discussion in YARN-1024, we will add YARN compute units as a resource for requesting and scheduling CPU processing power. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1128) FifoPolicy.computeShares throws NPE on empty list of Schedulables
[ https://issues.apache.org/jira/browse/YARN-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1128: - Hadoop Flags: Reviewed Committed to trunk, branch-2, and branch-2.1-beta FifoPolicy.computeShares throws NPE on empty list of Schedulables - Key: YARN-1128 URL: https://issues.apache.org/jira/browse/YARN-1128 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Karthik Kambatla Fix For: 2.1.2-beta Attachments: yarn-1128-1.patch FifoPolicy gives all of a queue's share to the earliest-scheduled application. {code} Schedulable earliest = null; for (Schedulable schedulable : schedulables) { if (earliest == null || schedulable.getStartTime() earliest.getStartTime()) { earliest = schedulable; } } earliest.setFairShare(Resources.clone(totalResources)); {code} If the queue has no schedulables in it, earliest will be left null, leading to an NPE on the last line. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira