[jira] [Updated] (YARN-1702) Expose kill app functionality as part of RM web services
[ https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-1702: Attachment: apache-yarn-1702.5.patch Expose kill app functionality as part of RM web services Key: YARN-1702 URL: https://issues.apache.org/jira/browse/YARN-1702 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1702.2.patch, apache-yarn-1702.3.patch, apache-yarn-1702.4.patch, apache-yarn-1702.5.patch Expose functionality to kill an app via the ResourceManager web services API. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1702) Expose kill app functionality as part of RM web services
[ https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-1702: Attachment: (was: apache-yarn-1702.5.patch) Expose kill app functionality as part of RM web services Key: YARN-1702 URL: https://issues.apache.org/jira/browse/YARN-1702 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1702.2.patch, apache-yarn-1702.3.patch, apache-yarn-1702.4.patch, apache-yarn-1702.5.patch Expose functionality to kill an app via the ResourceManager web services API. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1702) Expose kill app functionality as part of RM web services
[ https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910095#comment-13910095 ] Hadoop QA commented on YARN-1702: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630626/apache-yarn-1702.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3162//console This message is automatically generated. Expose kill app functionality as part of RM web services Key: YARN-1702 URL: https://issues.apache.org/jira/browse/YARN-1702 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1702.2.patch, apache-yarn-1702.3.patch, apache-yarn-1702.4.patch, apache-yarn-1702.5.patch Expose functionality to kill an app via the ResourceManager web services API. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1702) Expose kill app functionality as part of RM web services
[ https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-1702: Attachment: apache-yarn-1702.5.patch Expose kill app functionality as part of RM web services Key: YARN-1702 URL: https://issues.apache.org/jira/browse/YARN-1702 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1702.2.patch, apache-yarn-1702.3.patch, apache-yarn-1702.4.patch, apache-yarn-1702.5.patch Expose functionality to kill an app via the ResourceManager web services API. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1702) Expose kill app functionality as part of RM web services
[ https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-1702: Attachment: (was: apache-yarn-1702.5.patch) Expose kill app functionality as part of RM web services Key: YARN-1702 URL: https://issues.apache.org/jira/browse/YARN-1702 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1702.2.patch, apache-yarn-1702.3.patch, apache-yarn-1702.4.patch, apache-yarn-1702.5.patch Expose functionality to kill an app via the ResourceManager web services API. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1702) Expose kill app functionality as part of RM web services
[ https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910142#comment-13910142 ] Hadoop QA commented on YARN-1702: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630633/apache-yarn-1702.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3163//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3163//console This message is automatically generated. Expose kill app functionality as part of RM web services Key: YARN-1702 URL: https://issues.apache.org/jira/browse/YARN-1702 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1702.2.patch, apache-yarn-1702.3.patch, apache-yarn-1702.4.patch, apache-yarn-1702.5.patch Expose functionality to kill an app via the ResourceManager web services API. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1686) NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang.
[ https://issues.apache.org/jira/browse/YARN-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1686: - Attachment: YARN-1686.2.patch Thank you vinod for your reviewing patch. I have updated the patch addressing all your comments. Please review new patch. Jian He, tx for motivation.:-) NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang. Key: YARN-1686 URL: https://issues.apache.org/jira/browse/YARN-1686 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0 Reporter: Rohith Assignee: Rohith Fix For: 3.0.0 Attachments: YARN-1686.1.patch, YARN-1686.2.patch During start of NodeManager,if registration with resourcemanager throw exception then nodemager shutdown happens. Consider case where NM-1 is registered with RM. RM issued Resync to NM. If any exception thrown in resyncWithRM (starts new thread which does not handle exception) during RESYNC evet, then this thread is lost. NodeManger enters hanged state. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1686) NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang.
[ https://issues.apache.org/jira/browse/YARN-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910219#comment-13910219 ] Hadoop QA commented on YARN-1686: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630643/YARN-1686.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3164//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3164//console This message is automatically generated. NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang. Key: YARN-1686 URL: https://issues.apache.org/jira/browse/YARN-1686 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0 Reporter: Rohith Assignee: Rohith Fix For: 3.0.0 Attachments: YARN-1686.1.patch, YARN-1686.2.patch During start of NodeManager,if registration with resourcemanager throw exception then nodemager shutdown happens. Consider case where NM-1 is registered with RM. RM issued Resync to NM. If any exception thrown in resyncWithRM (starts new thread which does not handle exception) during RESYNC evet, then this thread is lost. NodeManger enters hanged state. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.
[ https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910411#comment-13910411 ] Jason Lowe commented on YARN-221: - bq. We can have RM AM wait for notification as in container exit - NM notifies RM - RM notifies AM. That will create some delay for AM to declare the job is done. With the NM - RM heartbeat value used in big clusters, it could add couple seconds delay for the job. That might not be a big deal for regular MR jobs. The NM does out-of-band heartbeats when containers exit, so the turnaround time can be shorter than a full NM heartbeat interval. If we're really concerned about any additional time added for graceful task exit we can also have the AM unregister when the job succeeds/fails but before all tasks exit, and eventually the RM will kill all containers of the application when the AM eventually exits (or times out waiting). In that sense it would not add any time from the job client's perspective, as the job could report completion at the same time it did before. However it would add some time from the YARN perspective, as the application is lingering on the cluster a few extra seconds in the FINISHING state than it did before. bq. One thing to add we need the definition and policy on how to handle those tasks that are in the finishing state and MR AM ends up stopping them as they don't exit by themselves. I don't think we need to get too tricky here. The NM will see the container return a non-zero exit code and assume that's failure. If tasks are succeeding but returning non-zero exit codes then that's probably a bug and arguably a good thing we're grabbing the logs to show what went wrong when it tried to tear down. IMHO we should fix what's causing the non-zero exit code rather than try to add a mechanism to prevent logs from being aggregated in what should be a rare and abnormal case. NM should provide a way for AM to tell it not to aggregate logs. Key: YARN-221 URL: https://issues.apache.org/jira/browse/YARN-221 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Robert Joseph Evans Assignee: Chris Trezzo Attachments: YARN-221-trunk-v1.patch The NodeManager should provide a way for an AM to tell it that either the logs should not be aggregated, that they should be aggregated with a high priority, or that they should be aggregated but with a lower priority. The AM should be able to do this in the ContainerLaunch context to provide a default value, but should also be able to update the value when the container is released. This would allow for the NM to not aggregate logs in some cases, and avoid connection to the NN at all. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1336) Work-preserving nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-1336: - Attachment: YARN-1336-rollup.patch Attaching a rollup patch for the prototype that [~raviprak] and I developed. This recovers resource localization state, applications and containers, tokens, log aggregation, deletion service, and the MR shuffle auxiliary service. A quick high-level overview: - Restart functionality is enabled by configuring yarn.nodemanager.recovery.enabled to true and yarn.nodemanager.recovery.dir to a directory on the local filesystem where the state will be stored. - Containers are launched with an additional shell layer which places the exit code of the container in an .exitcode file. This allows the restarted NM instance to recover containers that are already running or have exited since the last NM instance. - NMStateStoreService is the abstraction layer for the state store. NMNullStateStoreService is used when recovery is disabled and NMLevelDBStateStoreService is used when it is enabled. - Rather than explicitly record localized resource reference counts, resources are recovered with no references and recovered containers re-request their resources as during a normal container lifecycle to restore the reference counts. Some things that are still missing: - ability to distinguish shutdown for restart vs. decommission - proper handling of state store errors - adding unit tests - adding formal documentation. Feedback is greatly appreciated. I'll be working on addressing the missing items and splitting the patch into smaller pieces across the appropriate subtasks to simplify reviews. Work-preserving nodemanager restart --- Key: YARN-1336 URL: https://issues.apache.org/jira/browse/YARN-1336 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Attachments: YARN-1336-rollup.patch This serves as an umbrella ticket for tasks related to work-preserving nodemanager restart. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (YARN-1336) Work-preserving nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned YARN-1336: Assignee: Jason Lowe Work-preserving nodemanager restart --- Key: YARN-1336 URL: https://issues.apache.org/jira/browse/YARN-1336 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1336-rollup.patch This serves as an umbrella ticket for tasks related to work-preserving nodemanager restart. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910615#comment-13910615 ] Robert Kanter commented on YARN-1490: - By the way, the issue I mentioned a few comments [up|https://issues.apache.org/jira/browse/YARN-1490?focusedCommentId=13895329page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13895329] is actually now fixed by YARN-1689. RM should optionally not kill all containers when an ApplicationMaster exits Key: YARN-1490 URL: https://issues.apache.org/jira/browse/YARN-1490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Fix For: 2.4.0 Attachments: YARN-1490.1.patch, YARN-1490.10.patch, YARN-1490.11.patch, YARN-1490.11.patch, YARN-1490.12.patch, YARN-1490.2.patch, YARN-1490.3.patch, YARN-1490.4.patch, YARN-1490.5.patch, YARN-1490.6.patch, YARN-1490.7.patch, YARN-1490.8.patch, YARN-1490.9.patch, org.apache.oozie.service.TestRecoveryService_thread-dump.txt This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1730) Leveldb timeline store needs simple write locking
[ https://issues.apache.org/jira/browse/YARN-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910612#comment-13910612 ] Billie Rinaldi commented on YARN-1730: -- I don't think using hold count will be sufficient. The hold count only returns the number of holds that have been obtained by the current thread. So as soon as the current thread is done with the lock, it would drop the lock from the lock map, which is not what we want. Leveldb timeline store needs simple write locking - Key: YARN-1730 URL: https://issues.apache.org/jira/browse/YARN-1730 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1730.1.patch, YARN-1730.2.patch The actual data writes are performed atomically in a batch, but a lock should be held while identifying a start time for the entity, which precedes every write. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1755) Add support for web services to the WebApp proxy
Varun Vasudev created YARN-1755: --- Summary: Add support for web services to the WebApp proxy Key: YARN-1755 URL: https://issues.apache.org/jira/browse/YARN-1755 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev The RM currently has an inbuilt web proxy that is used to serve requests. The web proxy is necessary for security reasons which are described on the Apache Hadoop website (http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html). The web application proxy is a part of YARN and can be configured to run as a standalone proxy. Currently, the RM itself supports web services. Adding support for all the web service calls in the web app proxy allows it to support failover and retry for all web services. The changes involved are the following – a. Add support for web service calls to the RM web application proxy and have it make the equivalent RPC calls. b. Add support for failover and retry to the web application proxy. We can refactor a lot of the existing client code from the Yarn client. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1686) NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang.
[ https://issues.apache.org/jira/browse/YARN-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1686: -- Attachment: YARN-1686.3.patch Same patch as before but with a test time-out. Will check it in once Jenkins says okay.. NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang. Key: YARN-1686 URL: https://issues.apache.org/jira/browse/YARN-1686 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0 Reporter: Rohith Assignee: Rohith Attachments: YARN-1686.1.patch, YARN-1686.2.patch, YARN-1686.3.patch During start of NodeManager,if registration with resourcemanager throw exception then nodemager shutdown happens. Consider case where NM-1 is registered with RM. RM issued Resync to NM. If any exception thrown in resyncWithRM (starts new thread which does not handle exception) during RESYNC evet, then this thread is lost. NodeManger enters hanged state. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-986) YARN should use cluster-id as token service address
[ https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910690#comment-13910690 ] Vinod Kumar Vavilapalli commented on YARN-986: -- Couldn't find time last week, will look at it today.. YARN should use cluster-id as token service address --- Key: YARN-986 URL: https://issues.apache.org/jira/browse/YARN-986 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-986-1.patch, yarn-986-prelim-0.patch This needs to be done to support non-ip based fail over of RM. Once the server sets the token service address to be this generic ClusterId/ServiceId, clients can translate it to appropriate final IP and then be able to select tokens via TokenSelectors. Some workarounds for other related issues were put in place at YARN-945. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1754) Container process is not really killed
[ https://issues.apache.org/jira/browse/YARN-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910693#comment-13910693 ] Gera Shegalov commented on YARN-1754: - Get https://github.com/jerrykuch/ersatz-setsid and make sure that setsid is on your standard PATH. Container process is not really killed -- Key: YARN-1754 URL: https://issues.apache.org/jira/browse/YARN-1754 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Environment: Mac Reporter: Jeff Zhang I test the following distributed shell example on my mac: hadoop jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar -appname shell -jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar -shell_command=sleep -shell_args=10 -num_containers=1 And it will start 2 process for one container, one is the shell process, another is the real command I execute ( here is sleep 10). And then I kill this application by running command yarn application -kill app_id it will kill the shell process, but won't kill the real command process. The reason is that yarn use kill command to kill process, but it won't kill its child process. use pkill could resolve this issue. IMHO, it is a very important case which will make the resource usage inconsistency, and have potential security problem. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910697#comment-13910697 ] Vinod Kumar Vavilapalli commented on YARN-1490: --- Thanks for the update [~rkanter]. RM should optionally not kill all containers when an ApplicationMaster exits Key: YARN-1490 URL: https://issues.apache.org/jira/browse/YARN-1490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Fix For: 2.4.0 Attachments: YARN-1490.1.patch, YARN-1490.10.patch, YARN-1490.11.patch, YARN-1490.11.patch, YARN-1490.12.patch, YARN-1490.2.patch, YARN-1490.3.patch, YARN-1490.4.patch, YARN-1490.5.patch, YARN-1490.6.patch, YARN-1490.7.patch, YARN-1490.8.patch, YARN-1490.9.patch, org.apache.oozie.service.TestRecoveryService_thread-dump.txt This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1741) XInclude support broken for YARN ResourceManager
[ https://issues.apache.org/jira/browse/YARN-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910703#comment-13910703 ] Eric Sirianni commented on YARN-1741: - Yes - This was the approach I was planning on investigating with a potential patch. The trick is how to most cleanly get that to work with the {{ConfigurationProvider}} API. Two main approaches seem possible: # Change {{ConfigurationProvider.getConfigurationInputStream()}} to return a {{(String, InputStream)}} pair. # Change {{ConfigurationProvider}} to provide directly into the {{Configuration}} object itself. Something like {{ConfigurationProvider.provideTo(Configuration conf)}}. With this approach, the different {{ConfigurationProvider}} subclasses could invoke the specific {{conf.addResource()}} overload that made sense for the subclass. Based on investigating the usages of {{ConfigurationProvider.getConfigurationInputStream()}}, I was leaning towards the 2nd approach. XInclude support broken for YARN ResourceManager Key: YARN-1741 URL: https://issues.apache.org/jira/browse/YARN-1741 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Eric Sirianni Priority: Minor Labels: regression The XInclude support in Hadoop configuration files (introduced via HADOOP-4944) was broken by the recent {{ConfigurationProvider}} changes to YARN ResourceManager. Specifically, YARN-1459 and, more generally, the YARN-1611 family of JIRAs for ResourceManager HA. The issue is that {{ConfigurationProvider}} provides a raw {{InputStream}} as a {{Configuration}} resource for what was previously a {{Path}}-based resource. For {{Path}} resources, the absolute file path is used as the {{systemId}} for the {{DocumentBuilder.parse()}} call: {code} } else if (resource instanceof Path) { // a file resource ... doc = parse(builder, new BufferedInputStream( new FileInputStream(file)), ((Path)resource).toString()); } {code} The {{systemId}} is used to resolve XIncludes (among other things): {code} /** * Parse the content of the given codeInputStream/code as an * XML document and return a new DOM Document object. ... * @param systemId Provide a base for resolving relative URIs. ... */ public Document parse(InputStream is, String systemId) {code} However, for loading raw {{InputStream}} resources, the {{systemId}} is set to {{null}}: {code} } else if (resource instanceof InputStream) { doc = parse(builder, (InputStream) resource, null); {code} causing XInclude resolution to fail. In our particular environment, we make extensive use of XIncludes to standardize common configuration parameters across multiple Hadoop clusters. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1740) Redirection from AM-URL is broken with HTTPS_ONLY policy
[ https://issues.apache.org/jira/browse/YARN-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1740: -- Issue Type: Sub-task (was: Bug) Parent: YARN-1280 Redirection from AM-URL is broken with HTTPS_ONLY policy Key: YARN-1740 URL: https://issues.apache.org/jira/browse/YARN-1740 Project: Hadoop YARN Issue Type: Sub-task Reporter: Yesha Vora Assignee: Jian He Attachments: YARN-1740.1.patch Steps to reproduce: 1) Run a sleep job 2) Run: yarn application -list command to find AM URL. root@host1:~# yarn application -list Total number of applications (application-types: [] and states: SUBMITTED, ACCEPTED, RUNNING):1 Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL application_1383251398986_0003 Sleep job MAPREDUCE hdfs default RUNNING UNDEFINED 5% http://host1:40653 3) Try to access http://host1:40653/ws/v1/mapreduce/info; url. This URL redirects to http://RM_host:RM_https_port/proxy/application_1383251398986_0003/ws/v1/mapreduce/info Here, Http protocol is used with HTTPS port for RM. The expected Url is https://RM_host:RM_https_port/proxy/application_1383251398986_0003/ws/v1/mapreduce/info -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1515) Ability to dump the container threads and stop the containers in a single RPC
[ https://issues.apache.org/jira/browse/YARN-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated YARN-1515: Attachment: YARN-1515.v05.patch v05 adds auto thread dump for stuck AM's as well. Ability to dump the container threads and stop the containers in a single RPC - Key: YARN-1515 URL: https://issues.apache.org/jira/browse/YARN-1515 Project: Hadoop YARN Issue Type: New Feature Components: api, nodemanager Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1515.v01.patch, YARN-1515.v02.patch, YARN-1515.v03.patch, YARN-1515.v04.patch, YARN-1515.v05.patch This is needed to implement MAPREDUCE-5044 to enable thread diagnostics for timed-out task attempts. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1686) NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang.
[ https://issues.apache.org/jira/browse/YARN-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910731#comment-13910731 ] Hadoop QA commented on YARN-1686: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630777/YARN-1686.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3165//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3165//console This message is automatically generated. NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang. Key: YARN-1686 URL: https://issues.apache.org/jira/browse/YARN-1686 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0 Reporter: Rohith Assignee: Rohith Attachments: YARN-1686.1.patch, YARN-1686.2.patch, YARN-1686.3.patch During start of NodeManager,if registration with resourcemanager throw exception then nodemager shutdown happens. Consider case where NM-1 is registered with RM. RM issued Resync to NM. If any exception thrown in resyncWithRM (starts new thread which does not handle exception) during RESYNC evet, then this thread is lost. NodeManger enters hanged state. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active
[ https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910744#comment-13910744 ] Jian He commented on YARN-1734: --- ServiceFailedException is also one type of IOException that will be retried in RPC level by RMProxy RM should get the updated Configurations when it transits from Standby to Active Key: YARN-1734 URL: https://issues.apache.org/jira/browse/YARN-1734 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Critical Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, YARN-1734.4.patch, YARN-1734.5.patch, YARN-1734.6.patch Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active
[ https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910756#comment-13910756 ] Xuan Gong commented on YARN-1734: - bq. ServiceFailedException is also one type of IOException that will be retried in RPC level by RMProxy In HA, we provide different RetryPolicy which is failoverOnNetworkException RM should get the updated Configurations when it transits from Standby to Active Key: YARN-1734 URL: https://issues.apache.org/jira/browse/YARN-1734 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Critical Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, YARN-1734.4.patch, YARN-1734.5.patch, YARN-1734.6.patch Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1741) XInclude support broken for YARN ResourceManager
[ https://issues.apache.org/jira/browse/YARN-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910759#comment-13910759 ] Xuan Gong commented on YARN-1741: - Noted that ConfigurationProvider not only provides inputstream for Configuration files, also providers the inputStream for include_node file and exclude_node file XInclude support broken for YARN ResourceManager Key: YARN-1741 URL: https://issues.apache.org/jira/browse/YARN-1741 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Eric Sirianni Priority: Minor Labels: regression The XInclude support in Hadoop configuration files (introduced via HADOOP-4944) was broken by the recent {{ConfigurationProvider}} changes to YARN ResourceManager. Specifically, YARN-1459 and, more generally, the YARN-1611 family of JIRAs for ResourceManager HA. The issue is that {{ConfigurationProvider}} provides a raw {{InputStream}} as a {{Configuration}} resource for what was previously a {{Path}}-based resource. For {{Path}} resources, the absolute file path is used as the {{systemId}} for the {{DocumentBuilder.parse()}} call: {code} } else if (resource instanceof Path) { // a file resource ... doc = parse(builder, new BufferedInputStream( new FileInputStream(file)), ((Path)resource).toString()); } {code} The {{systemId}} is used to resolve XIncludes (among other things): {code} /** * Parse the content of the given codeInputStream/code as an * XML document and return a new DOM Document object. ... * @param systemId Provide a base for resolving relative URIs. ... */ public Document parse(InputStream is, String systemId) {code} However, for loading raw {{InputStream}} resources, the {{systemId}} is set to {{null}}: {code} } else if (resource instanceof InputStream) { doc = parse(builder, (InputStream) resource, null); {code} causing XInclude resolution to fail. In our particular environment, we make extensive use of XIncludes to standardize common configuration parameters across multiple Hadoop clusters. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1741) XInclude support broken for YARN ResourceManager
[ https://issues.apache.org/jira/browse/YARN-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910766#comment-13910766 ] Eric Sirianni commented on YARN-1741: - OK - approach 2 would not work then. I thought when I did a usage search that all callers of {{ConfigurationProvider.getConfigurationInputStream()}} were immediately handing the returned {{InputStream}} to a {{Configuration}} object. Guess I missed some usages. XInclude support broken for YARN ResourceManager Key: YARN-1741 URL: https://issues.apache.org/jira/browse/YARN-1741 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Eric Sirianni Priority: Minor Labels: regression The XInclude support in Hadoop configuration files (introduced via HADOOP-4944) was broken by the recent {{ConfigurationProvider}} changes to YARN ResourceManager. Specifically, YARN-1459 and, more generally, the YARN-1611 family of JIRAs for ResourceManager HA. The issue is that {{ConfigurationProvider}} provides a raw {{InputStream}} as a {{Configuration}} resource for what was previously a {{Path}}-based resource. For {{Path}} resources, the absolute file path is used as the {{systemId}} for the {{DocumentBuilder.parse()}} call: {code} } else if (resource instanceof Path) { // a file resource ... doc = parse(builder, new BufferedInputStream( new FileInputStream(file)), ((Path)resource).toString()); } {code} The {{systemId}} is used to resolve XIncludes (among other things): {code} /** * Parse the content of the given codeInputStream/code as an * XML document and return a new DOM Document object. ... * @param systemId Provide a base for resolving relative URIs. ... */ public Document parse(InputStream is, String systemId) {code} However, for loading raw {{InputStream}} resources, the {{systemId}} is set to {{null}}: {code} } else if (resource instanceof InputStream) { doc = parse(builder, (InputStream) resource, null); {code} causing XInclude resolution to fail. In our particular environment, we make extensive use of XIncludes to standardize common configuration parameters across multiple Hadoop clusters. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1619) Add cli to kill yarn container
[ https://issues.apache.org/jira/browse/YARN-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1619: Fix Version/s: (was: 2.3.0) 2.4.0 Add cli to kill yarn container -- Key: YARN-1619 URL: https://issues.apache.org/jira/browse/YARN-1619 Project: Hadoop YARN Issue Type: New Feature Reporter: Ramya Sunil Fix For: 2.4.0 It will be useful to have a generic cli tool to kill containers. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1621) Add CLI to list states of yarn container-IDs/hosts
[ https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1621: Fix Version/s: (was: 2.3.0) 2.4.0 Add CLI to list states of yarn container-IDs/hosts -- Key: YARN-1621 URL: https://issues.apache.org/jira/browse/YARN-1621 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Tassapol Athiapinya Fix For: 2.4.0 As more applications are moved to YARN, we need generic CLI to list states of yarn containers and their hosts. Today if YARN application running in a container does hang, there is no way other than to manually kill its process. For each running application, it is useful to differentiate between running/succeeded/failed/killed containers. {code:title=proposed yarn cli} $ yarn application -list-containers appId status where status is one of running/succeeded/killed/failed/all {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1334) YARN should give more info on errors when running failed distributed shell command
[ https://issues.apache.org/jira/browse/YARN-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1334: Fix Version/s: (was: 2.3.0) 2.4.0 YARN should give more info on errors when running failed distributed shell command -- Key: YARN-1334 URL: https://issues.apache.org/jira/browse/YARN-1334 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Affects Versions: 2.3.0 Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.4.0 Attachments: YARN-1334.1.patch Run incorrect command such as: /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distributedshell jar -shell_command ./test1.sh -shell_script ./ would show shell exit code exception with no useful message. It should print out sysout/syserr of containers/AM of why it is failing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
[ https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1514: Fix Version/s: (was: 2.3.0) 2.4.0 Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA Key: YARN-1514 URL: https://issues.apache.org/jira/browse/YARN-1514 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Fix For: 2.4.0 ZKRMStateStore is very sensitive to ZNode-related operations as discussed in YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is called when RM-HA cluster does failover. Therefore, its execution time impacts failover time of RM-HA. We need utility to benchmark time execution time of ZKRMStateStore#loadStore as development tool. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1147) Add end-to-end tests for HA
[ https://issues.apache.org/jira/browse/YARN-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1147: Fix Version/s: (was: 2.3.0) 2.4.0 Add end-to-end tests for HA --- Key: YARN-1147 URL: https://issues.apache.org/jira/browse/YARN-1147 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Xuan Gong Fix For: 2.4.0 While individual sub-tasks add tests for the code they include, it will be handy to write end-to-end tests for HA including some stress testing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1301) Need to log the blacklist additions/removals when YarnSchedule#allocate
[ https://issues.apache.org/jira/browse/YARN-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1301: Fix Version/s: (was: 2.3.0) 2.4.0 Need to log the blacklist additions/removals when YarnSchedule#allocate --- Key: YARN-1301 URL: https://issues.apache.org/jira/browse/YARN-1301 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Tsuyoshi OZAWA Priority: Minor Fix For: 2.4.0 Attachments: YARN-1301.1.patch, YARN-1301.2.patch, YARN-1301.3.patch, YARN-1301.4.patch, YARN-1301.5.patch Now without the log, it's hard to debug whether blacklist is updated on the scheduler side or not -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1561) Fix a generic type warning in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1561: Fix Version/s: (was: 2.3.0) 2.4.0 Fix a generic type warning in FairScheduler --- Key: YARN-1561 URL: https://issues.apache.org/jira/browse/YARN-1561 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Junping Du Assignee: Chen He Priority: Minor Labels: newbie Fix For: 2.4.0 Attachments: yarn-1561.patch The Comparator below should be specified with type: private Comparator nodeAvailableResourceComparator = new NodeAvailableResourceComparator(); -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1142) MiniYARNCluster web ui does not work properly
[ https://issues.apache.org/jira/browse/YARN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1142: Fix Version/s: (was: 2.3.0) 2.4.0 MiniYARNCluster web ui does not work properly - Key: YARN-1142 URL: https://issues.apache.org/jira/browse/YARN-1142 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Fix For: 2.4.0 When going to the RM http port, the NM web ui is displayed. It seems there is a singleton somewhere that breaks things when RM NMs run in the same process. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1375) RM logs get filled with scheduler monitor logs when we enable scheduler monitoring
[ https://issues.apache.org/jira/browse/YARN-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1375: Fix Version/s: (was: 2.3.0) 2.4.0 RM logs get filled with scheduler monitor logs when we enable scheduler monitoring -- Key: YARN-1375 URL: https://issues.apache.org/jira/browse/YARN-1375 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.3.0 Reporter: Devaraj K Assignee: haosdent Fix For: 2.4.0 Attachments: YARN-1375.patch When we enable scheduler monitor, it is filling the RM logs with the same queue states periodically. We can log only when any difference with the previous state instead of logging the same message. {code:xml} 2013-10-30 23:30:08,464 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156008464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:11,464 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156011464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:14,465 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156014465, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:17,466 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156017466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:20,466 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156020466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:23,467 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156023467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:26,468 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156026467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:29,468 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156029468, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:32,469 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156032469, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-745) Move UnmanagedAMLauncher to yarn client package
[ https://issues.apache.org/jira/browse/YARN-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-745: --- Fix Version/s: (was: 2.3.0) 2.4.0 Move UnmanagedAMLauncher to yarn client package --- Key: YARN-745 URL: https://issues.apache.org/jira/browse/YARN-745 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Fix For: 2.4.0 Its currently sitting in yarn applications project which sounds wrong. client project sounds better since it contains the utilities/libraries that clients use to write and debug yarn applications. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1330) Fair Scheduler: defaultQueueSchedulingPolicy does not take effect
[ https://issues.apache.org/jira/browse/YARN-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1330: Fix Version/s: (was: 2.3.0) 2.4.0 Fair Scheduler: defaultQueueSchedulingPolicy does not take effect - Key: YARN-1330 URL: https://issues.apache.org/jira/browse/YARN-1330 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.4.0 Attachments: YARN-1330-1.patch, YARN-1330-1.patch, YARN-1330.patch The defaultQueueSchedulingPolicy property for the Fair Scheduler allocations file doesn't take effect. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1477) No Submit time on AM web pages
[ https://issues.apache.org/jira/browse/YARN-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1477: Fix Version/s: (was: 2.3.0) 2.4.0 No Submit time on AM web pages -- Key: YARN-1477 URL: https://issues.apache.org/jira/browse/YARN-1477 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Chen He Assignee: Chen He Labels: features Fix For: 2.4.0 Similar to MAPREDUCE-5052, This is a fix on AM side. Add submitTime field to the AM's web services REST API -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1234) Container localizer logs are not created in secured cluster
[ https://issues.apache.org/jira/browse/YARN-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1234: Fix Version/s: (was: 2.3.0) 2.4.0 Container localizer logs are not created in secured cluster Key: YARN-1234 URL: https://issues.apache.org/jira/browse/YARN-1234 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Fix For: 2.4.0 When we are running ContainerLocalizer in secured cluster we potentially are not creating any log file to track log messages. This will be helpful in potentially identifying ContainerLocalization issues in secured cluster. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1156) Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values
[ https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1156: Fix Version/s: (was: 2.3.0) 2.4.0 Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values - Key: YARN-1156 URL: https://issues.apache.org/jira/browse/YARN-1156 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.1.0-beta Reporter: Akira AJISAKA Assignee: Tsuyoshi OZAWA Priority: Minor Labels: metrics, newbie Fix For: 2.4.0 Attachments: YARN-1156.1.patch AllocatedGB and AvailableGB metrics are now integer type. If there are four times 500MB memory allocation to container, AllocatedGB is incremented four times by {{(int)500/1024}}, which means 0. That is, the memory size allocated is actually 2000MB, but the metrics shows 0GB. Let's use float type for these metrics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-650) User guide for preemption
[ https://issues.apache.org/jira/browse/YARN-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-650: --- Fix Version/s: (was: 2.3.0) 2.4.0 User guide for preemption - Key: YARN-650 URL: https://issues.apache.org/jira/browse/YARN-650 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Chris Douglas Priority: Minor Fix For: 2.4.0 Attachments: Y650-0.patch YARN-45 added a protocol for the RM to ask back resources. The docs on writing YARN applications should include a section on how to interpret this message. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-153) PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS
[ https://issues.apache.org/jira/browse/YARN-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-153: --- Fix Version/s: (was: 2.3.0) 2.4.0 PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS Key: YARN-153 URL: https://issues.apache.org/jira/browse/YARN-153 Project: Hadoop YARN Issue Type: New Feature Reporter: Jacob Jaigak Song Assignee: Jacob Jaigak Song Fix For: 2.4.0 Attachments: HADOOPasPAAS_Architecture.pdf, MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE4393.patch, MAPREDUCE4393.patch Original Estimate: 336h Time Spent: 336h Remaining Estimate: 0h This application is to demonstrate that YARN can be used for non-mapreduce applications. As Hadoop has already been adopted and deployed widely and its deployment in future will be highly increased, we thought that it's a good potential to be used as PaaS. I have implemented a proof of concept to demonstrate that YARN can be used as a PaaS (Platform as a Service). I have done a gap analysis against VMware's Cloud Foundry and tried to achieve as many PaaS functionalities as possible on YARN. I'd like to check in this POC as a YARN example application. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location
[ https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-314: --- Fix Version/s: (was: 2.3.0) 2.4.0 Schedulers should allow resource requests of different sizes at the same priority and location -- Key: YARN-314 URL: https://issues.apache.org/jira/browse/YARN-314 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.4.0 Currently, resource requests for the same container and locality are expected to all be the same size. While it it doesn't look like it's needed for apps currently, and can be circumvented by specifying different priorities if absolutely necessary, it seems to me that the ability to request containers with different resource requirements at the same priority level should be there for the future and for completeness sake. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS
[ https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-160: --- Fix Version/s: (was: 2.3.0) 2.4.0 nodemanagers should obtain cpu/memory values from underlying OS --- Key: YARN-160 URL: https://issues.apache.org/jira/browse/YARN-160 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Fix For: 2.4.0 As mentioned in YARN-2 *NM memory and CPU configs* Currently these values are coming from the config of the NM, we should be able to obtain those values from the OS (ie, in the case of Linux from /proc/meminfo /proc/cpuinfo). As this is highly OS dependent we should have an interface that obtains this information. In addition implementations of this interface should be able to specify a mem/cpu offset (amount of mem/cpu not to be avail as YARN resource), this would allow to reserve mem/cpu for the OS and other services outside of YARN containers. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-113) WebAppProxyServlet must use SSLFactory for the HttpClient connections
[ https://issues.apache.org/jira/browse/YARN-113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-113: --- Fix Version/s: (was: 2.3.0) 2.4.0 WebAppProxyServlet must use SSLFactory for the HttpClient connections - Key: YARN-113 URL: https://issues.apache.org/jira/browse/YARN-113 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.4.0 The HttpClient must be configured to use the SSLFactory when the web UIs are over HTTPS, otherwise the proxy servlet fails to connect to the AM because of unknown (self-signed) certificates. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1064) YarnConfiguration scheduler configuration constants are not consistent
[ https://issues.apache.org/jira/browse/YARN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1064: Fix Version/s: (was: 2.3.0) 2.4.0 YarnConfiguration scheduler configuration constants are not consistent -- Key: YARN-1064 URL: https://issues.apache.org/jira/browse/YARN-1064 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Priority: Blocker Labels: newbie Fix For: 2.4.0 Some of the scheduler configuration constants in YarnConfiguration have RM_PREFIX and others YARN_PREFIX. For consistency we should move all under the same prefix. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-322) Add cpu information to queue metrics
[ https://issues.apache.org/jira/browse/YARN-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-322: --- Fix Version/s: (was: 2.3.0) 2.4.0 Add cpu information to queue metrics Key: YARN-322 URL: https://issues.apache.org/jira/browse/YARN-322 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, scheduler Reporter: Arun C Murthy Assignee: Arun C Murthy Fix For: 2.4.0 Post YARN-2 we need to add cpu information to queue metrics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-965) NodeManager Metrics containersRunning is not correct When localizing container process is failed or killed
[ https://issues.apache.org/jira/browse/YARN-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-965: --- Fix Version/s: (was: 2.3.0) 2.4.0 NodeManager Metrics containersRunning is not correct When localizing container process is failed or killed -- Key: YARN-965 URL: https://issues.apache.org/jira/browse/YARN-965 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.4-alpha Environment: suse linux Reporter: Li Yuan Fix For: 2.4.0 When successfully launched a container, container state from LOCALIZED to RUNNING, containersRunning ++. Container state from EXITED_WITH_FAILURE or KILLING to DONE, containersRunning--. However, state EXITED_WITH_FAILURE or KILLING could come from LOCALIZING(LOCALIZED), not RUNNING, which caused containersRunningis less than the actual number. Further more, Metrics is wrong, containersLaunched != containersCompleted + containersFailed + containersKilled + containersRunning + containersIniting -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-308) Improve documentation about what asks means in AMRMProtocol
[ https://issues.apache.org/jira/browse/YARN-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-308: --- Fix Version/s: (was: 2.3.0) 2.4.0 Improve documentation about what asks means in AMRMProtocol - Key: YARN-308 URL: https://issues.apache.org/jira/browse/YARN-308 Project: Hadoop YARN Issue Type: Sub-task Components: api, documentation, resourcemanager Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.4.0 Attachments: YARN-308.patch It's unclear to me from reading the javadoc exactly what asks means when the AM sends a heartbeat to the RM. Is the AM supposed to send a list of all resources that it is waiting for? Or just inform the RM about new ones that it wants? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1515) Ability to dump the container threads and stop the containers in a single RPC
[ https://issues.apache.org/jira/browse/YARN-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910783#comment-13910783 ] Hadoop QA commented on YARN-1515: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630783/YARN-1515.v05.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3166//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3166//console This message is automatically generated. Ability to dump the container threads and stop the containers in a single RPC - Key: YARN-1515 URL: https://issues.apache.org/jira/browse/YARN-1515 Project: Hadoop YARN Issue Type: New Feature Components: api, nodemanager Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1515.v01.patch, YARN-1515.v02.patch, YARN-1515.v03.patch, YARN-1515.v04.patch, YARN-1515.v05.patch This is needed to implement MAPREDUCE-5044 to enable thread diagnostics for timed-out task attempts. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910788#comment-13910788 ] Bikas Saha commented on YARN-1410: -- Yes. I would like to understand why we are proposing a custom solution that only works for application submission instead of laying down a common pattern (using Retry Cache) that can be subsequently used in a uniform manner for all other remaining non-idempotent operations. Given then HDFS already uses that layer, it would be good to depend on a common framework that has already been debugged and proven to work on HDFS. Given that YARN and HDFS will be commonly deployed together, sharing these basic pieces will go a long way in making it easier to build/deploy and operate. Given so many pros for this approach why should we not invest in adopting it? Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch Original Estimate: 48h Remaining Estimate: 48h App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1330) Fair Scheduler: defaultQueueSchedulingPolicy does not take effect
[ https://issues.apache.org/jira/browse/YARN-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910789#comment-13910789 ] Sandy Ryza commented on YARN-1330: -- The above issue was fixed by the AllocationFileLoaderService work. Re-resolving this. Fair Scheduler: defaultQueueSchedulingPolicy does not take effect - Key: YARN-1330 URL: https://issues.apache.org/jira/browse/YARN-1330 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.3.0 Attachments: YARN-1330-1.patch, YARN-1330-1.patch, YARN-1330.patch The defaultQueueSchedulingPolicy property for the Fair Scheduler allocations file doesn't take effect. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (YARN-1330) Fair Scheduler: defaultQueueSchedulingPolicy does not take effect
[ https://issues.apache.org/jira/browse/YARN-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved YARN-1330. -- Resolution: Fixed Fix Version/s: (was: 2.4.0) 2.3.0 Fair Scheduler: defaultQueueSchedulingPolicy does not take effect - Key: YARN-1330 URL: https://issues.apache.org/jira/browse/YARN-1330 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.3.0 Attachments: YARN-1330-1.patch, YARN-1330-1.patch, YARN-1330.patch The defaultQueueSchedulingPolicy property for the Fair Scheduler allocations file doesn't take effect. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1334) YARN should give more info on errors when running failed distributed shell command
[ https://issues.apache.org/jira/browse/YARN-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910790#comment-13910790 ] Hadoop QA commented on YARN-1334: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12609555/YARN-1334.1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3169//console This message is automatically generated. YARN should give more info on errors when running failed distributed shell command -- Key: YARN-1334 URL: https://issues.apache.org/jira/browse/YARN-1334 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Affects Versions: 2.3.0 Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.4.0 Attachments: YARN-1334.1.patch Run incorrect command such as: /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar distributedshell jar -shell_command ./test1.sh -shell_script ./ would show shell exit code exception with no useful message. It should print out sysout/syserr of containers/AM of why it is failing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1756) capture the time when newApplication is called in RM
Ming Ma created YARN-1756: - Summary: capture the time when newApplication is called in RM Key: YARN-1756 URL: https://issues.apache.org/jira/browse/YARN-1756 Project: Hadoop YARN Issue Type: Improvement Reporter: Ming Ma The application submission time ( when submitApplication is called) is collected by RM and application history server. But it doesn't capture when the client calls newApplication method. The delta between newApplication and submitApplication could be useful if the client submits large jar files. This metric will be useful for https://issues.apache.org/jira/browse/YARN-1492. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1327) Fix nodemgr native compilation problems on FreeBSD9
[ https://issues.apache.org/jira/browse/YARN-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910815#comment-13910815 ] Hadoop QA commented on YARN-1327: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12609276/nodemgr-portability.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3168//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3168//console This message is automatically generated. Fix nodemgr native compilation problems on FreeBSD9 --- Key: YARN-1327 URL: https://issues.apache.org/jira/browse/YARN-1327 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Radim Kolar Assignee: Radim Kolar Fix For: 3.0.0, 2.4.0 Attachments: nodemgr-portability.txt There are several portability problems preventing from compiling native component on freebsd. 1. libgen.h is not included. correct function prototype is there but linux glibc has workaround to define it for user if libgen.h is not directly included. Include this file directly. 2. query max size of login name using sysconf. it follows same code style like rest of code using sysconf too. 3. cgroups are linux only feature, make conditional compile and return error if mount_cgroup is attempted on non linux OS 4. do not use posix function setpgrp() since it clashes with same function from BSD 4.2, use equivalent function. After inspecting glibc sources its just shortcut to setpgid(0,0) These changes makes it compile on both linux and freebsd. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1375) RM logs get filled with scheduler monitor logs when we enable scheduler monitoring
[ https://issues.apache.org/jira/browse/YARN-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910835#comment-13910835 ] Hadoop QA commented on YARN-1375: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611764/YARN-1375.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3167//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3167//console This message is automatically generated. RM logs get filled with scheduler monitor logs when we enable scheduler monitoring -- Key: YARN-1375 URL: https://issues.apache.org/jira/browse/YARN-1375 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.3.0 Reporter: Devaraj K Assignee: haosdent Fix For: 2.4.0 Attachments: YARN-1375.patch When we enable scheduler monitor, it is filling the RM logs with the same queue states periodically. We can log only when any difference with the previous state instead of logging the same message. {code:xml} 2013-10-30 23:30:08,464 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156008464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:11,464 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156011464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:14,465 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156014465, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:17,466 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156017466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:20,466 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156020466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:23,467 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156023467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:26,468 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156026467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:29,468 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156029468, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:32,469 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156032469, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 {code} -- This message was sent by Atlassian JIRA
[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910845#comment-13910845 ] Xuan Gong commented on YARN-1410: - I really doubt that the Retry cache would work for us. Look at the code on how they are using RetryCache. Take FSNameSystem.delete() as an example, {code} boolean delete(String src, boolean recursive) throws AccessControlException, SafeModeException, UnresolvedLinkException, IOException { CacheEntry cacheEntry = RetryCache.waitForCompletion(retryCache); if (cacheEntry != null cacheEntry.isSuccess()) { return true; // Return previous response } boolean ret = false; try { ret = deleteInt(src, recursive, cacheEntry != null); } catch (AccessControlException e) { logAuditEvent(false, delete, src); throw e; } finally { RetryCache.setState(cacheEntry, ret); } return ret; } {code} Before it starts to do the operation, it will check whether this operation is successful. Before it sends the response, it will mark the operation is successful. It will works perfectly in these HDFS operations. Because after we received the operation response, we can say that the operation is finished. But this does not work for the YARN operations. Take ApplicationSubmission as an example, can we say applicationSubmission is finished when we receives the response from ClientRMService? No, we cannot make that conclusion. Then how will we set the state for the cahceEntry in RetryCache? Set in YarnClientImpl# submitApplication? Then we need to find a way to expose the RetryCache to client code. Or maybe we can add extra logic in ClientRMService to check whether the app is submitted before return back the response? Then this will add another hop and decrease the performance just like my old check-before-submission proposal. I think that the over-all logic of RetryCache does not work, maybe not that useful, for the YARN operations, except that it can provide global unique ID for checking repeated operations. But just for providing such ID, I really do not think that we need to use such “complicate” structures. Also for “proposing a custom solution”, I think the proposal that saves enough information, such as ClientId and ServiceId in ApplicationSubmissionContext, then read them back to rebuild the RetryCache , is a custom solution for ApplicationSubmission, too. I do not think that this way can work for other non-idempotent apis, such as renewDelegationToken(), etc. Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch Original Estimate: 48h Remaining Estimate: 48h App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910848#comment-13910848 ] Karthik Kambatla commented on YARN-1410: bq. can we say applicationSubmission is finished when we receives the response from ClientRMService? I think the response of ClientRMService#submitApplication() should tell us whether the submission is successful or not. If that is not the case, we should probably fix that first. Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch Original Estimate: 48h Remaining Estimate: 48h App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1666) Make admin refreshNodes work across RM failover
[ https://issues.apache.org/jira/browse/YARN-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910849#comment-13910849 ] Hitesh Shah commented on YARN-1666: --- {code} -if (!(this.configurationProvider instanceof LocalConfigurationProvider)) { - // load yarn-site.xml - this.conf = - this.configurationProvider.getConfiguration(this.conf, - YarnConfiguration.YARN_SITE_XML_FILE); - // load core-site.xml - this.conf = - this.configurationProvider.getConfiguration(this.conf, - YarnConfiguration.CORE_SITE_CONFIGURATION_FILE); - // Do refreshUserToGroupsMappings with loaded core-site.xml - Groups.getUserToGroupsMappingServiceWithLoadedConfiguration(this.conf) - .refresh(); -} + +// load yarn-site.xml +this.conf.addResource(this.configurationProvider +.getConfigurationInputStream(this.conf, +YarnConfiguration.YARN_SITE_CONFIGURATION_FILE)); +// load core-site.xml +this.conf.addResource(this.configurationProvider +.getConfigurationInputStream(this.conf, +YarnConfiguration.CORE_SITE_CONFIGURATION_FILE)); +// Do refreshUserToGroupsMappings with loaded core-site.xml +Groups.getUserToGroupsMappingServiceWithLoadedConfiguration(this.conf) +.refresh(); {code} The above code seems to be breaking MiniClusters. Is the expectation now that anyone using a MiniCluster has to create the appropriate config files and add them into the unit test class path? Stack trace below: {code} Exception: null java.lang.NullPointerException at org.apache.hadoop.conf.Configuration$Resource.init(Configuration.java:182) at org.apache.hadoop.conf.Configuration.addResource(Configuration.java:751) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:193) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.MiniYARNCluster.initResourceManager(MiniYARNCluster.java:268) at org.apache.hadoop.yarn.server.MiniYARNCluster.access$400(MiniYARNCluster.java:90) at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceInit(MiniYARNCluster.java:419) {code} Make admin refreshNodes work across RM failover --- Key: YARN-1666 URL: https://issues.apache.org/jira/browse/YARN-1666 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.4.0 Attachments: YARN-1666.1.patch, YARN-1666.2.patch, YARN-1666.2.patch, YARN-1666.3.patch, YARN-1666.4.patch, YARN-1666.4.patch, YARN-1666.5.patch, YARN-1666.6.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1666) Make admin refreshNodes work across RM failover
[ https://issues.apache.org/jira/browse/YARN-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910869#comment-13910869 ] Hitesh Shah commented on YARN-1666: --- http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/resources/ doesn't show those files. Make admin refreshNodes work across RM failover --- Key: YARN-1666 URL: https://issues.apache.org/jira/browse/YARN-1666 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.4.0 Attachments: YARN-1666.1.patch, YARN-1666.2.patch, YARN-1666.2.patch, YARN-1666.3.patch, YARN-1666.4.patch, YARN-1666.4.patch, YARN-1666.5.patch, YARN-1666.6.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1666) Make admin refreshNodes work across RM failover
[ https://issues.apache.org/jira/browse/YARN-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910871#comment-13910871 ] Hitesh Shah commented on YARN-1666: --- My point is that those files should be in the same jar that contains MiniYARNCluster. Make admin refreshNodes work across RM failover --- Key: YARN-1666 URL: https://issues.apache.org/jira/browse/YARN-1666 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.4.0 Attachments: YARN-1666.1.patch, YARN-1666.2.patch, YARN-1666.2.patch, YARN-1666.3.patch, YARN-1666.4.patch, YARN-1666.4.patch, YARN-1666.5.patch, YARN-1666.6.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1666) Make admin refreshNodes work across RM failover
[ https://issues.apache.org/jira/browse/YARN-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910873#comment-13910873 ] Xuan Gong commented on YARN-1666: - But I did include them in the YARN-1666.6.patch Make admin refreshNodes work across RM failover --- Key: YARN-1666 URL: https://issues.apache.org/jira/browse/YARN-1666 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.4.0 Attachments: YARN-1666.1.patch, YARN-1666.2.patch, YARN-1666.2.patch, YARN-1666.3.patch, YARN-1666.4.patch, YARN-1666.4.patch, YARN-1666.5.patch, YARN-1666.6.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1757) Auxiliary service support for nodemanager recovery
Jason Lowe created YARN-1757: Summary: Auxiliary service support for nodemanager recovery Key: YARN-1757 URL: https://issues.apache.org/jira/browse/YARN-1757 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe There needs to be a mechanism for communicating to auxiliary services whether nodemanager recovery is enabled and where they should store their state. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910882#comment-13910882 ] Karthik Kambatla commented on YARN-1410: I guess we need to define what it means for an application submission to be successful. As a user, I would assume the submission is successful if the RM has stored it in a place it is not going to lose. In a restart/ HA setup, this translates to the app being saved to the store. So, ClientRMService#submitApplication should ideally return only after the app is saved. When a scheduler rejects an application, we should probably kick it out of the store or add a REJECTED final state so we don't try recovering a rejected app in case of a failover. Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch Original Estimate: 48h Remaining Estimate: 48h App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1666) Make admin refreshNodes work across RM failover
[ https://issues.apache.org/jira/browse/YARN-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910888#comment-13910888 ] Hitesh Shah commented on YARN-1666: --- [~xgong] Those newly added files are in the wrong location. [~vinodkv] In any case, the above committed patch seems a bit wrong to me. If someone is using a Configuration object with loaded resources, say core-site, yarn-site and foo-site followed by some Configuration::set() calls, the above code will override all conflicting settings. This seems wrong especially in the MiniYARNCluster case. Make admin refreshNodes work across RM failover --- Key: YARN-1666 URL: https://issues.apache.org/jira/browse/YARN-1666 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.4.0 Attachments: YARN-1666.1.patch, YARN-1666.2.patch, YARN-1666.2.patch, YARN-1666.3.patch, YARN-1666.4.patch, YARN-1666.4.patch, YARN-1666.5.patch, YARN-1666.6.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1666) Make admin refreshNodes work across RM failover
[ https://issues.apache.org/jira/browse/YARN-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910896#comment-13910896 ] Hitesh Shah commented on YARN-1666: --- Done. See related jiras for the new issues filed. Make admin refreshNodes work across RM failover --- Key: YARN-1666 URL: https://issues.apache.org/jira/browse/YARN-1666 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.4.0 Attachments: YARN-1666.1.patch, YARN-1666.2.patch, YARN-1666.2.patch, YARN-1666.3.patch, YARN-1666.4.patch, YARN-1666.4.patch, YARN-1666.5.patch, YARN-1666.6.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1758) MiniYARNCluster broken post YARN-1666
[ https://issues.apache.org/jira/browse/YARN-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910894#comment-13910894 ] Hitesh Shah commented on YARN-1758: --- Exception: null java.lang.NullPointerException at org.apache.hadoop.conf.Configuration$Resource.init(Configuration.java:182) at org.apache.hadoop.conf.Configuration.addResource(Configuration.java:751) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:193) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.MiniYARNCluster.initResourceManager(MiniYARNCluster.java:268) at org.apache.hadoop.yarn.server.MiniYARNCluster.access$400(MiniYARNCluster.java:90) at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceInit(MiniYARNCluster.java:419) MiniYARNCluster broken post YARN-1666 - Key: YARN-1758 URL: https://issues.apache.org/jira/browse/YARN-1758 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah NPE seen when trying to use MiniYARNCluster -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1758) MiniYARNCluster broken post YARN-1666
[ https://issues.apache.org/jira/browse/YARN-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-1758: -- Description: NPE seen when trying to use MiniYARNCluster MiniYARNCluster broken post YARN-1666 - Key: YARN-1758 URL: https://issues.apache.org/jira/browse/YARN-1758 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah NPE seen when trying to use MiniYARNCluster -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1666) Make admin refreshNodes work across RM failover
[ https://issues.apache.org/jira/browse/YARN-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910889#comment-13910889 ] Vinod Kumar Vavilapalli commented on YARN-1666: --- [~hitesh]/[~xgong], can you file a ticket? Unless it's a minor tweak to the committed patch. Make admin refreshNodes work across RM failover --- Key: YARN-1666 URL: https://issues.apache.org/jira/browse/YARN-1666 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.4.0 Attachments: YARN-1666.1.patch, YARN-1666.2.patch, YARN-1666.2.patch, YARN-1666.3.patch, YARN-1666.4.patch, YARN-1666.4.patch, YARN-1666.5.patch, YARN-1666.6.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910898#comment-13910898 ] Bikas Saha commented on YARN-1410: -- There is considerable confusion here. I havent seen the latest code but here is my understanding of app submission in yarn. 1) client call submitApp(). this submits the app context and returns success or failure after initial static checks. 2) if success is returned then client call getAppReport() and waits for the app to be accepted. If the app gets accepted, then client reports success to use that app has been successfully submitted. Else app submission fails. Now there can be retries in step 1) or step 2). Step 2 is idempotent. We dont need to worry about that. Step 1) is non-idempotent. With the retry cache approach, upon retry (directly to the same RM or to a failed over RM), a correctly working RetryCache will return the same response as was originally sent by the RM. So if the RM returned success, RetryCache will return success. If the RM returned immediate failure (based on static checks) then the RetryCache will return failure. Its not clear to me why this would cause issues or why it wont work in YARN. The RetryCache is used for per RPC retries. It is not related to the 2-step process that we use in YARN where each step is a different RPC request. Final success for the user is based on the completion of both steps. RetryCache can be used to return the same RPC response for Step 1 as many times as the client retries that same RPC request. Thats exactly what we want. The crucial piece is storing whats needed to re-populate the RetryCache upon failover. Here, we are piggy-backing on AppSubmissionContext storage just like HDFS piggybacks on the edit log entry. I hope this make things clear. [~sureshms] Does this make sense? Side Note: RetryCache also has an option to store a payload along with the response. This is useful when the response has a large internal object that is hard/expensive to re-create and can be fetched from the RetryCache directly. Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch Original Estimate: 48h Remaining Estimate: 48h App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1758) MiniYARNCluster broken post YARN-1666
Hitesh Shah created YARN-1758: - Summary: MiniYARNCluster broken post YARN-1666 Key: YARN-1758 URL: https://issues.apache.org/jira/browse/YARN-1758 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler
Karthik Kambatla created YARN-1760: -- Summary: TestRMAdminService assumes the use of CapacityScheduler Key: YARN-1760 URL: https://issues.apache.org/jira/browse/YARN-1760 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. {noformat} java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1375) RM logs get filled with scheduler monitor logs when we enable scheduler monitoring
[ https://issues.apache.org/jira/browse/YARN-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1375: -- Description: When we enable scheduler monitor, it is filling the RM logs with the same queue states periodically. We can log only when any difference with the previous state instead of logging the same message. {code:xml} 2013-10-30 23:30:08,464 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156008464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:11,464 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156011464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:14,465 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156014465, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:17,466 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156017466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:20,466 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156020466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:23,467 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156023467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:26,468 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156026467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:29,468 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156029468, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:32,469 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156032469, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 {code} was: When we enable scheduler monitor, it is filling the RM logs with the same queue states periodically. We can log only when any difference with the previous state instead of logging the same message. {code:xml} 2013-10-30 23:30:08,464 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156008464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:11,464 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156011464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:14,465 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156014465, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:17,466 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156017466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:20,466 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156020466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:23,467 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156023467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:26,468 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156026467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:29,468 INFO
[jira] [Updated] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1760: --- Priority: Trivial (was: Major) TestRMAdminService assumes the use of CapacityScheduler --- Key: YARN-1760 URL: https://issues.apache.org/jira/browse/YARN-1760 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Trivial Labels: test Attachments: yarn-1760-1.patch YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. {noformat} java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1760: --- Attachment: yarn-1760-1.patch Trivial patch - the test explicitly sets the scheduler to CS. TestRMAdminService assumes the use of CapacityScheduler --- Key: YARN-1760 URL: https://issues.apache.org/jira/browse/YARN-1760 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: test Attachments: yarn-1760-1.patch YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. {noformat} java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1678) Fair scheduler gabs incessantly about reservations
[ https://issues.apache.org/jira/browse/YARN-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910937#comment-13910937 ] Hudson commented on YARN-1678: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5216 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5216/]) YARN-1678. Fair scheduler gabs incessantly about reservations (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1571468) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java Fair scheduler gabs incessantly about reservations -- Key: YARN-1678 URL: https://issues.apache.org/jira/browse/YARN-1678 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.5.0 Attachments: YARN-1678-1.patch, YARN-1678-1.patch, YARN-1678.patch Come on FS. We really don't need to know every time a node with a reservation on it heartbeats. {code} 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Trying to fulfill reservation for application appattempt_1390547864213_0347_01 on node: host: a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, vCores:8 used=memory:8192, vCores:8 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: Making reservation: node=a2330.halxg.cloudera.com app_id=application_1390547864213_0347 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: Application application_1390547864213_0347 reserved container container_1390547864213_0347_01_03 on node host: a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, vCores:8 used=memory:8192, vCores:8, currently has 6 at priority 0; currentReservation 6144 2014-01-29 03:48:16,044 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: Updated reserved container container_1390547864213_0347_01_03 on node host: a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, vCores:8 used=memory:8192, vCores:8 for application org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@1cb01d20 {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1686) NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang.
[ https://issues.apache.org/jira/browse/YARN-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910936#comment-13910936 ] Hudson commented on YARN-1686: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5216 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5216/]) YARN-1686. Fixed NodeManager to properly handle any errors during re-registration after a RESYNC and thus avoid hanging. Contributed by Rohith Sharma. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1571474) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerResync.java NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang. Key: YARN-1686 URL: https://issues.apache.org/jira/browse/YARN-1686 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0 Reporter: Rohith Assignee: Rohith Fix For: 2.4.0 Attachments: YARN-1686.1.patch, YARN-1686.2.patch, YARN-1686.3.patch During start of NodeManager,if registration with resourcemanager throw exception then nodemager shutdown happens. Consider case where NM-1 is registered with RM. RM issued Resync to NM. If any exception thrown in resyncWithRM (starts new thread which does not handle exception) during RESYNC evet, then this thread is lost. NodeManger enters hanged state. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910944#comment-13910944 ] Sandy Ryza commented on YARN-1760: -- A couple nits: * The same configuration is used for all the tests. If the goal is to only use the capacity scheduler for a couple tests, then it should be instantiated in setup() {code} +configuration.set(YarnConfiguration.RM_SCHEDULER, + org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler); {code} It looks like this goes over 80 characters. Also, probably better to use CapacityScheduler.class.getName(). TestRMAdminService assumes the use of CapacityScheduler --- Key: YARN-1760 URL: https://issues.apache.org/jira/browse/YARN-1760 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Trivial Labels: test Attachments: yarn-1760-1.patch YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. {noformat} java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1760: --- Attachment: yarn-1760-2.patch Thanks Sandy. Here is an updated patch. TestRMAdminService assumes the use of CapacityScheduler --- Key: YARN-1760 URL: https://issues.apache.org/jira/browse/YARN-1760 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Trivial Labels: test Attachments: yarn-1760-1.patch, yarn-1760-2.patch YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. {noformat} java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910984#comment-13910984 ] Hadoop QA commented on YARN-1760: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630822/yarn-1760-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3170//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3170//console This message is automatically generated. TestRMAdminService assumes the use of CapacityScheduler --- Key: YARN-1760 URL: https://issues.apache.org/jira/browse/YARN-1760 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Trivial Labels: test Attachments: yarn-1760-1.patch, yarn-1760-2.patch YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. {noformat} java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active
[ https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911009#comment-13911009 ] Xuan Gong commented on YARN-1734: - bq. we will retry in the nonHA case? That also seems unwanted. AdminService#transitionToActive/transitionToStandby can only be called when HA is enabled. bq. One other comment related to the patch: The RefreshContext code is adding unnecessary complexity, let's just directly call each of the individual refresh methods? Sure. Removed. RM should get the updated Configurations when it transits from Standby to Active Key: YARN-1734 URL: https://issues.apache.org/jira/browse/YARN-1734 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Critical Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, YARN-1734.4.patch, YARN-1734.5.patch, YARN-1734.6.patch, YARN-1734.7.patch Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active
[ https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1734: Attachment: YARN-1734.7.patch RM should get the updated Configurations when it transits from Standby to Active Key: YARN-1734 URL: https://issues.apache.org/jira/browse/YARN-1734 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Critical Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, YARN-1734.4.patch, YARN-1734.5.patch, YARN-1734.6.patch, YARN-1734.7.patch Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1761) RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby
Xuan Gong created YARN-1761: --- Summary: RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby Key: YARN-1761 URL: https://issues.apache.org/jira/browse/YARN-1761 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911034#comment-13911034 ] Hadoop QA commented on YARN-1760: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630834/yarn-1760-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3171//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3171//console This message is automatically generated. TestRMAdminService assumes the use of CapacityScheduler --- Key: YARN-1760 URL: https://issues.apache.org/jira/browse/YARN-1760 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Trivial Labels: test Attachments: yarn-1760-1.patch, yarn-1760-2.patch YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. {noformat} java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911038#comment-13911038 ] Sandy Ryza commented on YARN-1760: -- Thanks. One more thing: Configuration.addDefaultResource is a static method that applies to all configurations. So it should either go in setup or the non-static configuration.addResource should be used. TestRMAdminService assumes the use of CapacityScheduler --- Key: YARN-1760 URL: https://issues.apache.org/jira/browse/YARN-1760 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Trivial Labels: test Attachments: yarn-1760-1.patch, yarn-1760-2.patch YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. {noformat} java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active
[ https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911052#comment-13911052 ] Hadoop QA commented on YARN-1734: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630839/YARN-1734.7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3172//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3172//console This message is automatically generated. RM should get the updated Configurations when it transits from Standby to Active Key: YARN-1734 URL: https://issues.apache.org/jira/browse/YARN-1734 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Critical Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, YARN-1734.4.patch, YARN-1734.5.patch, YARN-1734.6.patch, YARN-1734.7.patch Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1363) Get / Cancel / Renew delegation token api should be non blocking
[ https://issues.apache.org/jira/browse/YARN-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911074#comment-13911074 ] Zhijie Shen commented on YARN-1363: --- Talk to Jian offline. Canceled the patch, and seek for a light-weighted solution Get / Cancel / Renew delegation token api should be non blocking Key: YARN-1363 URL: https://issues.apache.org/jira/browse/YARN-1363 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Zhijie Shen Attachments: YARN-1363.1.patch, YARN-1363.2.patch, YARN-1363.3.patch, YARN-1363.4.patch, YARN-1363.5.patch, YARN-1363.6.patch, YARN-1363.7.patch Today GetDelgationToken, CancelDelegationToken and RenewDelegationToken are all blocking apis. * As a part of these calls we try to update RMStateStore and that may slow it down. * Now as we have limited number of client request handlers we may fill up client handlers quickly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911086#comment-13911086 ] Vinod Kumar Vavilapalli commented on YARN-1760: --- If you agree, then we can close this as invalid.. TestRMAdminService assumes the use of CapacityScheduler --- Key: YARN-1760 URL: https://issues.apache.org/jira/browse/YARN-1760 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Trivial Labels: test Attachments: yarn-1760-1.patch, yarn-1760-2.patch YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. {noformat} java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911085#comment-13911085 ] Vinod Kumar Vavilapalli commented on YARN-1760: --- Wait, from what I understand, Xuan will have a similar FairScheduler test via YARN-1679. That test explicitly was for CapacityScheduler, we will very likely rename it at YARN-1679. TestRMAdminService assumes the use of CapacityScheduler --- Key: YARN-1760 URL: https://issues.apache.org/jira/browse/YARN-1760 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Trivial Labels: test Attachments: yarn-1760-1.patch, yarn-1760-2.patch YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. {noformat} java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911094#comment-13911094 ] Sandy Ryza commented on YARN-1760: -- The goal here is just to make the use of the Capacity Scheduler in the existing tests explicit, so that they will pass on distros that set other schedulers as default. TestRMAdminService assumes the use of CapacityScheduler --- Key: YARN-1760 URL: https://issues.apache.org/jira/browse/YARN-1760 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Trivial Labels: test Attachments: yarn-1760-1.patch, yarn-1760-2.patch YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. {noformat} java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911100#comment-13911100 ] Vinod Kumar Vavilapalli commented on YARN-1760: --- I have seen other JIRAs like this and I think I understand the goal. But I don't see this JIRA adding any value once YARN-1679 adds a fair-scheduler specific test in the same class. TestRMAdminService assumes the use of CapacityScheduler --- Key: YARN-1760 URL: https://issues.apache.org/jira/browse/YARN-1760 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Trivial Labels: test Attachments: yarn-1760-1.patch, yarn-1760-2.patch YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. {noformat} java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911102#comment-13911102 ] Sandy Ryza commented on YARN-1760: -- I assume that YARN-1679 will have conf.setClass(YarnConfiguration.RM_SCHEDULER_CLASS, FairScheduler.class) in the FS-specific tests that it adds. This JIRA adds the same to the CS-specific tests. In some other JIRAs, I've tried to make it so that certain tests pass independent of whether the Fair or Capacity scheduler is used. But the goal with this patch is just to make the dependency of the existing tests on the Capacity Scheduler explicit so that it will override a non-CS default. TestRMAdminService assumes the use of CapacityScheduler --- Key: YARN-1760 URL: https://issues.apache.org/jira/browse/YARN-1760 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Trivial Labels: test Attachments: yarn-1760-1.patch, yarn-1760-2.patch YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. {noformat} java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911105#comment-13911105 ] Vinod Kumar Vavilapalli commented on YARN-1410: --- Finally on to this. There are three types of fail-over conditions w.r.t submission: # RM fails over after getApplicationID() and *before* submitApplication(). # RM fail overs *during* the submitApplication call. # RM fails over *after* the submitApplication call and before the subsequent getApplicationReport(). This JIRA started to solve (1) above (as described in the description) and completely degenerated into (2). In the interest of making progress, can we focus only on (1) here and track (2) and (3) separately? (1) itself has implications on the user APIs depending the implementation. I had looked at few of the very early patches and I believe Xuan was trying to solve those in this JIRA. Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch Original Estimate: 48h Remaining Estimate: 48h App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active
[ https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1390#comment-1390 ] Vinod Kumar Vavilapalli commented on YARN-1734: --- bq. AdminService#transitionToActive/transitionToStandby can only be called when HA is enabled. Ah yes. That makes sense. The latest patch looks good. Checking this in. RM should get the updated Configurations when it transits from Standby to Active Key: YARN-1734 URL: https://issues.apache.org/jira/browse/YARN-1734 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Critical Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, YARN-1734.4.patch, YARN-1734.5.patch, YARN-1734.6.patch, YARN-1734.7.patch Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1561) Fix a generic type warning in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1397#comment-1397 ] Junping Du commented on YARN-1561: -- Thanks Chen for the patch! It looks good to me. Will commit it shortly. Fix a generic type warning in FairScheduler --- Key: YARN-1561 URL: https://issues.apache.org/jira/browse/YARN-1561 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Junping Du Assignee: Chen He Priority: Minor Labels: newbie Fix For: 2.4.0 Attachments: yarn-1561.patch The Comparator below should be specified with type: private Comparator nodeAvailableResourceComparator = new NodeAvailableResourceComparator(); -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-153) PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS
[ https://issues.apache.org/jira/browse/YARN-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911122#comment-13911122 ] Junping Du commented on YARN-153: - Hi [~jaigak.song], any update on this JIRA? I am happened to have some experience on Cloud Foundry and have some thoughts too. Mind to have a discussion? PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS Key: YARN-153 URL: https://issues.apache.org/jira/browse/YARN-153 Project: Hadoop YARN Issue Type: New Feature Reporter: Jacob Jaigak Song Assignee: Jacob Jaigak Song Fix For: 2.4.0 Attachments: HADOOPasPAAS_Architecture.pdf, MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE4393.patch, MAPREDUCE4393.patch Original Estimate: 336h Time Spent: 336h Remaining Estimate: 0h This application is to demonstrate that YARN can be used for non-mapreduce applications. As Hadoop has already been adopted and deployed widely and its deployment in future will be highly increased, we thought that it's a good potential to be used as PaaS. I have implemented a proof of concept to demonstrate that YARN can be used as a PaaS (Platform as a Service). I have done a gap analysis against VMware's Cloud Foundry and tried to achieve as many PaaS functionalities as possible on YARN. I'd like to check in this POC as a YARN example application. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1588) Rebind NM tokens for previous attempt's running containers to the new attempt
[ https://issues.apache.org/jira/browse/YARN-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1588: -- Attachment: YARN-1588.4.patch Rebind NM tokens for previous attempt's running containers to the new attempt - Key: YARN-1588 URL: https://issues.apache.org/jira/browse/YARN-1588 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-1588.1.patch, YARN-1588.1.patch, YARN-1588.2.patch, YARN-1588.3.patch, YARN-1588.4.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911130#comment-13911130 ] Vinod Kumar Vavilapalli commented on YARN-1760: --- hm.. okay. TestRMAdminService assumes the use of CapacityScheduler --- Key: YARN-1760 URL: https://issues.apache.org/jira/browse/YARN-1760 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Trivial Labels: test Attachments: yarn-1760-1.patch, yarn-1760-2.patch YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. {noformat} java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active
[ https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911135#comment-13911135 ] Hudson commented on YARN-1734: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5218 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5218/]) YARN-1734. Fixed ResourceManager to update the configurations when it transits from standby to active mode so as to assimilate any changes that happened while it was in standby mode. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1571539) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java RM should get the updated Configurations when it transits from Standby to Active Key: YARN-1734 URL: https://issues.apache.org/jira/browse/YARN-1734 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Critical Fix For: 2.4.0 Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, YARN-1734.4.patch, YARN-1734.5.patch, YARN-1734.6.patch, YARN-1734.7.patch Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active -- This message was sent by Atlassian JIRA (v6.1.5#6160)