[jira] [Commented] (YARN-1758) MiniYARNCluster broken post YARN-1666
[ https://issues.apache.org/jira/browse/YARN-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917511#comment-13917511 ] Hitesh Shah commented on YARN-1758: --- [~xgong] Have you given though to YARN-1759 as part of this fix? MiniYARNCluster broken post YARN-1666 - Key: YARN-1758 URL: https://issues.apache.org/jira/browse/YARN-1758 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Blocker Attachments: YARN-1758.1.patch, YARN-1758.2.patch NPE seen when trying to use MiniYARNCluster -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917585#comment-13917585 ] Qi Zhang commented on YARN-1021: Hi @Wei Yan. I am trying to use SLS but always meet with the following exception. Can you tell me what is the reason? Thank you! -bash-3.2$ sudo sh share/hadoop/tools/sls/bin/slsrun.sh --input-rumen=share/hadoop/tools/sls/sample-data/2jobs2min-rumen-jh.json --output-dir=share/hadoop/tools/sls/sample_output log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /usr/local/hadoop-2.3.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c libfile', or link it with '-z noexecstack'. java.lang.NullPointerException at org.apache.hadoop.yarn.sls.web.SLSWebApp.init(SLSWebApp.java:82) at org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.initMetrics(ResourceSchedulerWrapper.java:463) at org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.setConf(ResourceSchedulerWrapper.java:162) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createScheduler(ResourceManager.java:230) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:355) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:775) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:197) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.sls.SLSRunner.startRM(SLSRunner.java:163) at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:137) at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:524) Exception in thread pool-2-thread-72 java.lang.NullPointerException at org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.addAMRuntime(ResourceSchedulerWrapper.java:721) at org.apache.hadoop.yarn.sls.appmaster.AMSimulator.lastStep(AMSimulator.java:196) at org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator.lastStep(MRAMSimulator.java:390) at org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:94) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Exception in thread pool-2-thread-98 java.lang.NullPointerException at org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.addAMRuntime(ResourceSchedulerWrapper.java:721) at org.apache.hadoop.yarn.sls.appmaster.AMSimulator.lastStep(AMSimulator.java:196) at org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator.lastStep(MRAMSimulator.java:390) at org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:94) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Fix For: 2.3.0 Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917634#comment-13917634 ] Wei Yan commented on YARN-1021: --- [~qzhang90]. Check the resource simulate.info.html.template. It look the sls cannot find it. And step into the sls directory and try again. cd share/hadoop/tools/sls; bin/slsrun.sh --input-rumen=sample-data/2jobs2min-rumen-jh.json --output-dir=sample_output. Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Fix For: 2.3.0 Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917641#comment-13917641 ] Qi Zhang commented on YARN-1021: Wei Yan. Thank you for your suggestion, it solves the problem! Actually, I tried to run the slsrun.sh from many other directories expect share/hadoop/tools/sls. I think it can be more straightforward if slsrun.sh can be executed from any path. Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Fix For: 2.3.0 Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1761) RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby
[ https://issues.apache.org/jira/browse/YARN-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1761: Attachment: YARN-1761.1.patch RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby Key: YARN-1761 URL: https://issues.apache.org/jira/browse/YARN-1761 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1761.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1761) RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby
[ https://issues.apache.org/jira/browse/YARN-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917648#comment-13917648 ] Xuan Gong commented on YARN-1761: - create a patch to use ConfigurationProvider to load the Configuration, and check whether RM_HA is enabled or not RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby Key: YARN-1761 URL: https://issues.apache.org/jira/browse/YARN-1761 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1761.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1761) RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby
[ https://issues.apache.org/jira/browse/YARN-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917660#comment-13917660 ] Hadoop QA commented on YARN-1761: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12632168/YARN-1761.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3227//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3227//console This message is automatically generated. RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby Key: YARN-1761 URL: https://issues.apache.org/jira/browse/YARN-1761 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1761.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1759) Configuration settings can potentially disappear post YARN-1666
[ https://issues.apache.org/jira/browse/YARN-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917755#comment-13917755 ] Xuan Gong commented on YARN-1759: - Could you explain more on why do you think this will cause any issues. Because I do not think there will be any. We have two ConfigurationProvider right now. The first one is LocalConfigurationProvider. By using this, we will load local core-site.xml and local yarn-site.xml twice. I think this should be fine. It will not change any property values. The other one is FileSystemBasedConfigurationProvider. We will load local core-site.xml and local yarn-site.xml first as the bootstrap configurations, then we load the remote Configurations to over-write everything. And I think if we choose to use FileSystemBasedConfigurationProvider, we should upload the configurations that we want to use to remote FileSystems. Configuration settings can potentially disappear post YARN-1666 --- Key: YARN-1759 URL: https://issues.apache.org/jira/browse/YARN-1759 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah By implicitly loading core-site and yarn-site again in the RM::serviceInit(), some configs may be unintentionally overridden. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1758) MiniYARNCluster broken post YARN-1666
[ https://issues.apache.org/jira/browse/YARN-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917757#comment-13917757 ] Xuan Gong commented on YARN-1758: - bq. Xuan Gong Have you given though to YARN-1759 as part of this fix? I have some comments in YARN-1759. We can start the discussion there. These two tickets are not much related. MiniYARNCluster broken post YARN-1666 - Key: YARN-1758 URL: https://issues.apache.org/jira/browse/YARN-1758 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Blocker Attachments: YARN-1758.1.patch, YARN-1758.2.patch NPE seen when trying to use MiniYARNCluster -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1775) Create SMAPBasedProcessTree to get PSS information
Rajesh Balamohan created YARN-1775: -- Summary: Create SMAPBasedProcessTree to get PSS information Key: YARN-1775 URL: https://issues.apache.org/jira/browse/YARN-1775 Project: Hadoop YARN Issue Type: Sub-task Reporter: Rajesh Balamohan Priority: Minor Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will make use of PSS for computing the memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs
[ https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-1389: Attachment: YARN-1389-2.patch attaching patch with compilation fix Thanks, Mayank ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs -- Key: YARN-1389 URL: https://issues.apache.org/jira/browse/YARN-1389 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-1389-1.patch, YARN-1389-2.patch As we plan to have the APIs in ApplicationHistoryProtocol to expose the reports of *finished* application attempts and containers, we should do the same for ApplicationClientProtocol, which will return the reports of *running* attempts and containers. Later on, we can improve YarnClient to direct the query of running instance to ApplicationClientProtocol, while that of finished instance to ApplicationHistoryProtocol, making it transparent to the users. -- This message was sent by Atlassian JIRA (v6.2#6252)