[jira] [Commented] (YARN-5777) TestLogsCLI#testFetchApplictionLogsAsAnotherUser fails
[ https://issues.apache.org/jira/browse/YARN-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15604291#comment-15604291 ] Akira Ajisaka commented on YARN-5777: - I ran git bisect and found HADOOP-7352 broke this. > TestLogsCLI#testFetchApplictionLogsAsAnotherUser fails > -- > > Key: YARN-5777 > URL: https://issues.apache.org/jira/browse/YARN-5777 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Akira Ajisaka > > {noformat} > Running org.apache.hadoop.yarn.client.cli.TestLogsCLI > Tests run: 14, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 5.876 sec > <<< FAILURE! - in org.apache.hadoop.yarn.client.cli.TestLogsCLI > testFetchApplictionLogsAsAnotherUser(org.apache.hadoop.yarn.client.cli.TestLogsCLI) > Time elapsed: 0.199 sec <<< ERROR! > java.io.IOException: Invalid directory or I/O error occurred for dir: > /Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/target/logs/priority/logs/application_1477371285256_1000 > at org.apache.hadoop.fs.FileUtil.list(FileUtil.java:1148) > at > org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:469) > at > org.apache.hadoop.fs.DelegateToFileSystem.listStatus(DelegateToFileSystem.java:169) > at org.apache.hadoop.fs.ChecksumFs.listStatus(ChecksumFs.java:519) > at > org.apache.hadoop.fs.AbstractFileSystem$1.(AbstractFileSystem.java:890) > at > org.apache.hadoop.fs.AbstractFileSystem.listStatusIterator(AbstractFileSystem.java:888) > at org.apache.hadoop.fs.FileContext$22.next(FileContext.java:1492) > at org.apache.hadoop.fs.FileContext$22.next(FileContext.java:1487) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.listStatus(FileContext.java:1494) > at > org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.getRemoteNodeFileDir(LogCLIHelpers.java:592) > at > org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:348) > at > org.apache.hadoop.yarn.client.cli.LogsCLI.fetchApplicationLogs(LogsCLI.java:971) > at > org.apache.hadoop.yarn.client.cli.LogsCLI.runCommand(LogsCLI.java:299) > at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:106) > at > org.apache.hadoop.yarn.client.cli.TestLogsCLI.testFetchApplictionLogsAsAnotherUser(TestLogsCLI.java:868) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()
[ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15604287#comment-15604287 ] Bibin A Chundatt commented on YARN-5773: {quote} 3. As mentioned by Bibin A Chundatt, when each app fails to get activated due to the upper cut of resource limit, one INFO log is emitted (because amLimit is 0). During recovery, this is costly. {quote} Thanks [~sunilg] for mentioning about logging missed out to mention in my earlier comment. Costly logging during recovery always the amLimit will be zero {noformat} LOG.info("Not activating application " + applicationId + " as amIfStarted: " + amIfStarted + " exceeds amLimit: " + amLimit); {noformat} > RM recovery too slow due to LeafQueue#activateApplication() > --- > > Key: YARN-5773 > URL: https://issues.apache.org/jira/browse/YARN-5773 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch > > > # Submit application 10K application to default queue. > # All applications are in accepted state > # Now restart resourcemanager > For each application recovery {{LeafQueue#activateApplications()}} is > invoked.Resulting in AM limit check to be done even before Node managers are > getting registered. > Total iteration for N application is about {{N(N+1)/2}} for {{10K}} > application {{5000}} iterations causing time take for Rm to be active > more than 10 min. > Since NM resources are not yet added to during recovery we should skip > {{activateApplicaiton()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5754) Null check missing for earliest in FifoPolicy
[ https://issues.apache.org/jira/browse/YARN-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-5754: --- Summary: Null check missing for earliest in FifoPolicy (was: Variable earliest missing null check in computeShares() in FifoPolicy.java) > Null check missing for earliest in FifoPolicy > - > > Key: YARN-5754 > URL: https://issues.apache.org/jira/browse/YARN-5754 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-5754.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5777) TestLogsCLI#testFetchApplictionLogsAsAnotherUser fails
[ https://issues.apache.org/jira/browse/YARN-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated YARN-5777: Description: {noformat} Running org.apache.hadoop.yarn.client.cli.TestLogsCLI Tests run: 14, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 5.876 sec <<< FAILURE! - in org.apache.hadoop.yarn.client.cli.TestLogsCLI testFetchApplictionLogsAsAnotherUser(org.apache.hadoop.yarn.client.cli.TestLogsCLI) Time elapsed: 0.199 sec <<< ERROR! java.io.IOException: Invalid directory or I/O error occurred for dir: /Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/target/logs/priority/logs/application_1477371285256_1000 at org.apache.hadoop.fs.FileUtil.list(FileUtil.java:1148) at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:469) at org.apache.hadoop.fs.DelegateToFileSystem.listStatus(DelegateToFileSystem.java:169) at org.apache.hadoop.fs.ChecksumFs.listStatus(ChecksumFs.java:519) at org.apache.hadoop.fs.AbstractFileSystem$1.(AbstractFileSystem.java:890) at org.apache.hadoop.fs.AbstractFileSystem.listStatusIterator(AbstractFileSystem.java:888) at org.apache.hadoop.fs.FileContext$22.next(FileContext.java:1492) at org.apache.hadoop.fs.FileContext$22.next(FileContext.java:1487) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.listStatus(FileContext.java:1494) at org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.getRemoteNodeFileDir(LogCLIHelpers.java:592) at org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:348) at org.apache.hadoop.yarn.client.cli.LogsCLI.fetchApplicationLogs(LogsCLI.java:971) at org.apache.hadoop.yarn.client.cli.LogsCLI.runCommand(LogsCLI.java:299) at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:106) at org.apache.hadoop.yarn.client.cli.TestLogsCLI.testFetchApplictionLogsAsAnotherUser(TestLogsCLI.java:868) {noformat} > TestLogsCLI#testFetchApplictionLogsAsAnotherUser fails > -- > > Key: YARN-5777 > URL: https://issues.apache.org/jira/browse/YARN-5777 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Akira Ajisaka > > {noformat} > Running org.apache.hadoop.yarn.client.cli.TestLogsCLI > Tests run: 14, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 5.876 sec > <<< FAILURE! - in org.apache.hadoop.yarn.client.cli.TestLogsCLI > testFetchApplictionLogsAsAnotherUser(org.apache.hadoop.yarn.client.cli.TestLogsCLI) > Time elapsed: 0.199 sec <<< ERROR! > java.io.IOException: Invalid directory or I/O error occurred for dir: > /Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/target/logs/priority/logs/application_1477371285256_1000 > at org.apache.hadoop.fs.FileUtil.list(FileUtil.java:1148) > at > org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:469) > at > org.apache.hadoop.fs.DelegateToFileSystem.listStatus(DelegateToFileSystem.java:169) > at org.apache.hadoop.fs.ChecksumFs.listStatus(ChecksumFs.java:519) > at > org.apache.hadoop.fs.AbstractFileSystem$1.(AbstractFileSystem.java:890) > at > org.apache.hadoop.fs.AbstractFileSystem.listStatusIterator(AbstractFileSystem.java:888) > at org.apache.hadoop.fs.FileContext$22.next(FileContext.java:1492) > at org.apache.hadoop.fs.FileContext$22.next(FileContext.java:1487) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.listStatus(FileContext.java:1494) > at > org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.getRemoteNodeFileDir(LogCLIHelpers.java:592) > at > org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:348) > at > org.apache.hadoop.yarn.client.cli.LogsCLI.fetchApplicationLogs(LogsCLI.java:971) > at > org.apache.hadoop.yarn.client.cli.LogsCLI.runCommand(LogsCLI.java:299) > at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:106) > at > org.apache.hadoop.yarn.client.cli.TestLogsCLI.testFetchApplictionLogsAsAnotherUser(TestLogsCLI.java:868) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5313) TestLogsCLI.testFetchApplictionLogsAsAnotherUser fails in trunk
[ https://issues.apache.org/jira/browse/YARN-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15604224#comment-15604224 ] Akira Ajisaka commented on YARN-5313: - Now the unit test is failing by another reason, so I filed another jira: YARN-5777 > TestLogsCLI.testFetchApplictionLogsAsAnotherUser fails in trunk > --- > > Key: YARN-5313 > URL: https://issues.apache.org/jira/browse/YARN-5313 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Xuan Gong >Priority: Blocker > > We have reverted HADOOP-12718 recently which caused this failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5777) TestLogsCLI#testFetchApplictionLogsAsAnotherUser fails
Akira Ajisaka created YARN-5777: --- Summary: TestLogsCLI#testFetchApplictionLogsAsAnotherUser fails Key: YARN-5777 URL: https://issues.apache.org/jira/browse/YARN-5777 Project: Hadoop YARN Issue Type: Bug Components: test Reporter: Akira Ajisaka -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-5313) TestLogsCLI.testFetchApplictionLogsAsAnotherUser fails in trunk
[ https://issues.apache.org/jira/browse/YARN-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka resolved YARN-5313. - Resolution: Not A Problem Closing this issue because HADOOP-12718 has been reverted. > TestLogsCLI.testFetchApplictionLogsAsAnotherUser fails in trunk > --- > > Key: YARN-5313 > URL: https://issues.apache.org/jira/browse/YARN-5313 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Xuan Gong >Priority: Blocker > > We have reverted HADOOP-12718 recently which caused this failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5575) Many classes use bare yarn. properties instead of the defined constants
[ https://issues.apache.org/jira/browse/YARN-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15604194#comment-15604194 ] Akira Ajisaka commented on YARN-5575: - Hi [~templedf], would you fix checkstyle warnings? I'm +1 if that is addressed. Thanks. > Many classes use bare yarn. properties instead of the defined constants > --- > > Key: YARN-5575 > URL: https://issues.apache.org/jira/browse/YARN-5575 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-5575.001.patch, YARN-5575.002.patch, > YARN-5575.003.patch > > > MAPREDUCE-5870 introduced the following line: > {code} > conf.setInt("yarn.cluster.max-application-priority", 10); > {code} > It should instead be: > {code} > conf.setInt(YarnConfiguration.MAX_CLUSTER_LEVEL_APPLICATION_PRIORITY, > 10); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()
[ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15604144#comment-15604144 ] Sunil G edited comment on YARN-5773 at 10/25/16 4:46 AM: - *Issues in Recovery of apps:* 1. activateApplications works under a write lock. 2. If one application is found of overflowing AM resource limit, instead of breaking from loop, we continue and play complete apps from pendingOrderingPolicy. We may need to iterate all apps because we have apps belongs to different partition and pendingOrderingPolicy does not provide any order for apps based on partition. 3. As mentioned by [~bibinchundatt], when each app fails to get activated due to the upper cut of resource limit, one INFO log is emitted (because *amLimit* is 0). During recovery, this is costly. [~leftnoteasy] and [~rohithsharma] bq.If a given app's AM resource amount > AM headroom, should we skip the AM and activate following app which AM resource amount <= AM headroom? bq.But one point to be considered is for each Node registration, head room changes. So, user head room changes as new node registered. This need to be taken care. Currently activateApplications is invoked when there is a change in cluster resource. So any change in cluster resource will ensure a call to activateApplications and we can recalculate this headroom. I am not very sure about the suggested map. Will this check be coming before we do the existing AM resource percentage check for queue/partition (not user based) ? OR are we replacing this checks? was (Author: sunilg): *Issues in Recovery of apps:* 1. activateApplications works under a write lock. 2. If one application is found of overflowing AM resource limit, instead of breaking from loop, we continue and play complete apps from pendingOrderingPolicy. We may need to iterate all apps because we have apps belongs to different partition and pendingOrderingPolicy does not provide any order for apps based on partition. 3. As mentioned by [~bibinchundatt], when each app fails to get activated due to the upper cut of resource limit, one INFO log is emitted. During recovery, this is costly. [~leftnoteasy] and [~rohithsharma] bq.If a given app's AM resource amount > AM headroom, should we skip the AM and activate following app which AM resource amount <= AM headroom? bq.But one point to be considered is for each Node registration, head room changes. So, user head room changes as new node registered. This need to be taken care. Currently activateApplications is invoked when there is a change in cluster resource. So any change in cluster resource will ensure a call to activateApplications and we can recalculate this headroom. I am not very sure about the suggested map. Will this check be coming before we do the existing AM resource percentage check for queue/partition (not user based) ? OR are we replacing this checks? > RM recovery too slow due to LeafQueue#activateApplication() > --- > > Key: YARN-5773 > URL: https://issues.apache.org/jira/browse/YARN-5773 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch > > > # Submit application 10K application to default queue. > # All applications are in accepted state > # Now restart resourcemanager > For each application recovery {{LeafQueue#activateApplications()}} is > invoked.Resulting in AM limit check to be done even before Node managers are > getting registered. > Total iteration for N application is about {{N(N+1)/2}} for {{10K}} > application {{5000}} iterations causing time take for Rm to be active > more than 10 min. > Since NM resources are not yet added to during recovery we should skip > {{activateApplicaiton()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()
[ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15604144#comment-15604144 ] Sunil G commented on YARN-5773: --- *Issues in Recovery of apps:* 1. activateApplications works under a write lock. 2. If one application is found of overflowing AM resource limit, instead of breaking from loop, we continue and play complete apps from pendingOrderingPolicy. We may need to iterate all apps because we have apps belongs to different partition and pendingOrderingPolicy does not provide any order for apps based on partition. 3. As mentioned by [~bibinchundatt], when each app fails to get activated due to the upper cut of resource limit, one INFO log is emitted. During recovery, this is costly. [~leftnoteasy] and [~rohithsharma] bq.If a given app's AM resource amount > AM headroom, should we skip the AM and activate following app which AM resource amount <= AM headroom? bq.But one point to be considered is for each Node registration, head room changes. So, user head room changes as new node registered. This need to be taken care. Currently activateApplications is invoked when there is a change in cluster resource. So any change in cluster resource will ensure a call to activateApplications and we can recalculate this headroom. I am not very sure about the suggested map. Will this check be coming before we do the existing AM resource percentage check for queue/partition (not user based) ? OR are we replacing this checks? > RM recovery too slow due to LeafQueue#activateApplication() > --- > > Key: YARN-5773 > URL: https://issues.apache.org/jira/browse/YARN-5773 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch > > > # Submit application 10K application to default queue. > # All applications are in accepted state > # Now restart resourcemanager > For each application recovery {{LeafQueue#activateApplications()}} is > invoked.Resulting in AM limit check to be done even before Node managers are > getting registered. > Total iteration for N application is about {{N(N+1)/2}} for {{10K}} > application {{5000}} iterations causing time take for Rm to be active > more than 10 min. > Since NM resources are not yet added to during recovery we should skip > {{activateApplicaiton()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5587) Add support for resource profiles
[ https://issues.apache.org/jira/browse/YARN-5587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15604055#comment-15604055 ] Arun Suresh commented on YARN-5587: --- Thanks [~vvasudev]. First pass comments: # In {{Resources}}, you moved the Suppress deprecation warning from the {{setMemorySize(long)}} method to the {{setMemory(int)}}. Was that intentional ? # {{AMRMClient::ContainerRequest}} : Wondering if we need to allow a Container request to specify both a profile name and a Resource (capability). If they do specify both, what does that mean ? # Similarly, in the {{RemoteRequestTable}}, the RR should be keyed using the Resource (capability) derived from the profileName. > Add support for resource profiles > - > > Key: YARN-5587 > URL: https://issues.apache.org/jira/browse/YARN-5587 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-5587-YARN-3926.001.patch, > YARN-5587-YARN-3926.002.patch, YARN-5587-YARN-3926.003.patch, > YARN-5587-YARN-3926.004.patch, YARN-5587-YARN-3926.005.patch > > > Add support for resource profiles on the RM side to allow users to use > shorthands to specify resource requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()
[ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603964#comment-15603964 ] Rohith Sharma K S commented on YARN-5773: - Thanks folks for discussion.. I went through overall above discussion, I have one doubt that How can *RM recovery* is too slow? Because in current RM Restart, there are 2 stages. # Recover : Read all the application data from ZooKeeper and replay it. Basically, for running/pending apps, an event will be triggered to scheduler, and scheduler has *separate dispatcher* to handle it. # Service Start : Once recover process is completed, all the RM services are started. IICU, RM service is up and able to accept a new requests from clients. So, problem is after RM service start, activating applications are being delayed because Nodes are not yet registered but not actual recovery. It would be better if JIRA summary is updated something like, "Scheduler takes longer time for activating recovered apps when RM is restarted" or any other. As far as improvement, as wangda suggested may be we can keep Mapwhich would optimize in activateApplication for head room. But one point to be considered is for each Node registration, head room changes. So, user head room changes as new node registered. This need to be taken care. > RM recovery too slow due to LeafQueue#activateApplication() > --- > > Key: YARN-5773 > URL: https://issues.apache.org/jira/browse/YARN-5773 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch > > > # Submit application 10K application to default queue. > # All applications are in accepted state > # Now restart resourcemanager > For each application recovery {{LeafQueue#activateApplications()}} is > invoked.Resulting in AM limit check to be done even before Node managers are > getting registered. > Total iteration for N application is about {{N(N+1)/2}} for {{10K}} > application {{5000}} iterations causing time take for Rm to be active > more than 10 min. > Since NM resources are not yet added to during recovery we should skip > {{activateApplicaiton()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()
[ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603947#comment-15603947 ] Bibin A Chundatt commented on YARN-5773: {quote} If a given app's AM resource amount > AM headroom, should we skip the AM and activate following app which AM resource amount <= AM headroom? {quote} Skip all apps only {{queueUsage.getAMUsed > amLimit}}. Since AM can be from different partition and each partition can be have different AM limit so AM limit for all partition also have to exceed Checking both the cases before iterating through all the apps. {noformat} if (!Resources.greaterThan(resourceCalculator, lastClusterResource, lastClusterResource, Resources.none()) && !(getNumActiveApplications() < 1)) { return; } MapuserAmPartitionLimit = new HashMap (); // AM Resource Limit for accessible labels can be pre-calculated. // This will help in updating AMResourceLimit for all labels when queue // is initialized for the first time (when no applications are present). for (String nodePartition : getNodeLabelsForQueue()) { calculateAndGetAMResourceLimitPerPartition(nodePartition); } if(allpatitionLimitexeed()&&!(getNumActiveApplications() < 1)){ return; } {noformat} > RM recovery too slow due to LeafQueue#activateApplication() > --- > > Key: YARN-5773 > URL: https://issues.apache.org/jira/browse/YARN-5773 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch > > > # Submit application 10K application to default queue. > # All applications are in accepted state > # Now restart resourcemanager > For each application recovery {{LeafQueue#activateApplications()}} is > invoked.Resulting in AM limit check to be done even before Node managers are > getting registered. > Total iteration for N application is about {{N(N+1)/2}} for {{10K}} > application {{5000}} iterations causing time take for Rm to be active > more than 10 min. > Since NM resources are not yet added to during recovery we should skip > {{activateApplicaiton()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4743) ResourceManager crash because TimSort
[ https://issues.apache.org/jira/browse/YARN-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603931#comment-15603931 ] Zephyr Guo commented on YARN-4743: -- [~yufeigu], thanks for reviewing. {quote} 5. Not sure why startTimeColloection and nameCollection are needed. Can you explain a little bit? {quote} Because some pieces of codes involve these two variables. {code:title=FairShareComparator} if (res == 0) { // Apps are tied in fairness ratio. Break the tie by submit time and job // name to get a deterministic ordering, which is useful for unit tests. res = (int) Math.signum(s1.getStartTime() - s2.getStartTime()); if (res == 0) res = s1.getName().compareTo(s2.getName()); } {code} I will submit a new patch this week. > ResourceManager crash because TimSort > - > > Key: YARN-4743 > URL: https://issues.apache.org/jira/browse/YARN-4743 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.0.0-alpha1 >Reporter: Zephyr Guo >Assignee: Zephyr Guo > Attachments: YARN-4743-v1.patch, YARN-4743-v2.patch, timsort.log > > > {code} > 2016-02-26 14:08:50,821 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:868) > at java.util.TimSort.mergeAt(TimSort.java:485) > at java.util.TimSort.mergeCollapse(TimSort.java:410) > at java.util.TimSort.sort(TimSort.java:214) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:316) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:240) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:989) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1185) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) > at java.lang.Thread.run(Thread.java:745) > 2016-02-26 14:08:50,822 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > {code} > Actually, this bug found in 2.6.0-cdh. {{FairShareComparator}} is not > transitive. > We get NaN when memorySize=0 and weight=0. > {code:title=FairSharePolicy.java} > useToWeightRatio1 = s1.getResourceUsage().getMemorySize() / > s1.getWeights().getWeight(ResourceType.MEMORY) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5776) Checkstyle: MonitoringThread.Run method length is too long
[ https://issues.apache.org/jira/browse/YARN-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603924#comment-15603924 ] Miklos Szegedi commented on YARN-5776: -- Note: The javac warning in the result file is in ContainerManagerImpl.java not in ContainersMonitorImpl.java that I changed > Checkstyle: MonitoringThread.Run method length is too long > -- > > Key: YARN-5776 > URL: https://issues.apache.org/jira/browse/YARN-5776 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Trivial > Attachments: YARN-5776.000.patch > > > YARN-5725 had a check style violation that should be resolved by refactoring > the function > Details: > ContainersMonitorImpl.java:395 MonitioringThread.Run @Override:5: Method > length is 233 lines (max allowed is 150). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5711) Propogate exceptions back to client when using hedging RM failover provider
[ https://issues.apache.org/jira/browse/YARN-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603925#comment-15603925 ] Hudson commented on YARN-5711: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10669 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10669/]) YARN-5711. Propogate exceptions back to client when using hedging RM (subru: rev 0a166b13472213db0a0cd2dfdaddb2b1746b3957) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RequestHedgingRMFailoverProxyProvider.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestHedgingRequestRMFailoverProxyProvider.java > Propogate exceptions back to client when using hedging RM failover provider > --- > > Key: YARN-5711 > URL: https://issues.apache.org/jira/browse/YARN-5711 > Project: Hadoop YARN > Issue Type: Bug > Components: applications, resourcemanager >Affects Versions: 2.9.0, 3.0.0-alpha1 >Reporter: Subru Krishnan >Assignee: Subru Krishnan >Priority: Critical > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-5711-v1.patch, YARN-5711-v2.patch, > YARN-5711.v1.1.patch > > > When RM failsover, it does _not_ auto re-register running apps and so they > need to re-register when reconnecting to new primary. This is done by > catching {{ApplicationMasterNotRegisteredException}} in *allocate* calls and > re-registering. But *RequestHedgingRMFailoverProxyProvider* does _not_ > propagate {{YarnException}} as the actual invocation is done asynchronously > using seperate threads, so AMs cannot reconnect to RM after failover. > This JIRA proposes that the *RequestHedgingRMFailoverProxyProvider* propagate > any {{YarnException}} that it encounters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5776) Checkstyle: MonitoringThread.Run method length is too long
[ https://issues.apache.org/jira/browse/YARN-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi updated YARN-5776: - Summary: Checkstyle: MonitoringThread.Run method length is too long (was: Checkstyle: MonitioringThread.Run method length is too long) > Checkstyle: MonitoringThread.Run method length is too long > -- > > Key: YARN-5776 > URL: https://issues.apache.org/jira/browse/YARN-5776 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Trivial > Attachments: YARN-5776.000.patch > > > YARN-5725 had a check style violation that should be resolved by refactoring > the function > Details: > ContainersMonitorImpl.java:395 MonitioringThread.Run @Override:5: Method > length is 233 lines (max allowed is 150). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5776) Checkstyle: MonitioringThread.Run method length is too long
[ https://issues.apache.org/jira/browse/YARN-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603905#comment-15603905 ] Hadoop QA commented on YARN-5776: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 56s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 23s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager generated 1 new + 16 unchanged - 1 fixed = 17 total (was 17) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 0 unchanged - 18 fixed = 0 total (was 18) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 15m 7s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 28m 9s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12835052/YARN-5776.000.patch | | JIRA Issue | YARN-5776 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 14bf13a50894 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / dc3272b | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/13495/artifact/patchprocess/diff-compile-javac-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/13495/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/13495/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This
[jira] [Updated] (YARN-5711) Propogate exceptions back to client when using hedging RM failover provider
[ https://issues.apache.org/jira/browse/YARN-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-5711: - Summary: Propogate exceptions back to client when using hedging RM failover provider (was: Propogate exceptions back to client after RM failover when using hedging failover provider) > Propogate exceptions back to client when using hedging RM failover provider > --- > > Key: YARN-5711 > URL: https://issues.apache.org/jira/browse/YARN-5711 > Project: Hadoop YARN > Issue Type: Bug > Components: applications, resourcemanager >Affects Versions: 2.9.0, 3.0.0-alpha1 >Reporter: Subru Krishnan >Assignee: Subru Krishnan >Priority: Critical > Attachments: YARN-5711-v1.patch, YARN-5711-v2.patch, > YARN-5711.v1.1.patch > > > When RM failsover, it does _not_ auto re-register running apps and so they > need to re-register when reconnecting to new primary. This is done by > catching {{ApplicationMasterNotRegisteredException}} in *allocate* calls and > re-registering. But *RequestHedgingRMFailoverProxyProvider* does _not_ > propagate {{YarnException}} as the actual invocation is done asynchronously > using seperate threads, so AMs cannot reconnect to RM after failover. > This JIRA proposes that the *RequestHedgingRMFailoverProxyProvider* propagate > any {{YarnException}} that it encounters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5711) Propogate exceptions back to client after RM failover when using RequestHedgingRMFailoverProxyProvider
[ https://issues.apache.org/jira/browse/YARN-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-5711: - Summary: Propogate exceptions back to client after RM failover when using RequestHedgingRMFailoverProxyProvider (was: AM cannot reconnect to RM after failover when using RequestHedgingRMFailoverProxyProvider) > Propogate exceptions back to client after RM failover when using > RequestHedgingRMFailoverProxyProvider > -- > > Key: YARN-5711 > URL: https://issues.apache.org/jira/browse/YARN-5711 > Project: Hadoop YARN > Issue Type: Bug > Components: applications, resourcemanager >Affects Versions: 2.9.0, 3.0.0-alpha1 >Reporter: Subru Krishnan >Assignee: Subru Krishnan >Priority: Critical > Attachments: YARN-5711-v1.patch, YARN-5711-v2.patch, > YARN-5711.v1.1.patch > > > When RM failsover, it does _not_ auto re-register running apps and so they > need to re-register when reconnecting to new primary. This is done by > catching {{ApplicationMasterNotRegisteredException}} in *allocate* calls and > re-registering. But *RequestHedgingRMFailoverProxyProvider* does _not_ > propagate {{YarnException}} as the actual invocation is done asynchronously > using seperate threads, so AMs cannot reconnect to RM after failover. > This JIRA proposes that the *RequestHedgingRMFailoverProxyProvider* propagate > any {{YarnException}} that it encounters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5711) Propogate exceptions back to client after RM failover when using hedging failover provider
[ https://issues.apache.org/jira/browse/YARN-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-5711: - Summary: Propogate exceptions back to client after RM failover when using hedging failover provider (was: Propogate exceptions back to client after RM failover when using RequestHedgingRMFailoverProxyProvider) > Propogate exceptions back to client after RM failover when using hedging > failover provider > -- > > Key: YARN-5711 > URL: https://issues.apache.org/jira/browse/YARN-5711 > Project: Hadoop YARN > Issue Type: Bug > Components: applications, resourcemanager >Affects Versions: 2.9.0, 3.0.0-alpha1 >Reporter: Subru Krishnan >Assignee: Subru Krishnan >Priority: Critical > Attachments: YARN-5711-v1.patch, YARN-5711-v2.patch, > YARN-5711.v1.1.patch > > > When RM failsover, it does _not_ auto re-register running apps and so they > need to re-register when reconnecting to new primary. This is done by > catching {{ApplicationMasterNotRegisteredException}} in *allocate* calls and > re-registering. But *RequestHedgingRMFailoverProxyProvider* does _not_ > propagate {{YarnException}} as the actual invocation is done asynchronously > using seperate threads, so AMs cannot reconnect to RM after failover. > This JIRA proposes that the *RequestHedgingRMFailoverProxyProvider* propagate > any {{YarnException}} that it encounters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5776) Checkstyle: MonitioringThread.Run method length is too long
[ https://issues.apache.org/jira/browse/YARN-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi updated YARN-5776: - Attachment: YARN-5776.000.patch Remove all relevant checkstyle violations for YARN-5725. Note: no unit test changes, the behavior should be identical. > Checkstyle: MonitioringThread.Run method length is too long > --- > > Key: YARN-5776 > URL: https://issues.apache.org/jira/browse/YARN-5776 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Trivial > Attachments: YARN-5776.000.patch > > > YARN-5725 had a check style violation that should be resolved by refactoring > the function > Details: > ContainersMonitorImpl.java:395 MonitioringThread.Run @Override:5: Method > length is 233 lines (max allowed is 150). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5775) Bug fixes in swagger definition
[ https://issues.apache.org/jira/browse/YARN-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603837#comment-15603837 ] Hadoop QA commented on YARN-5775: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 25s {color} | {color:green} yarn-native-services passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s {color} | {color:green} yarn-native-services passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 10s {color} | {color:red} hadoop-yarn-services-api in yarn-native-services failed. {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} yarn-native-services passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} yarn-native-services passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 7s {color} | {color:red} hadoop-yarn-services-api in the patch failed. {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 14s {color} | {color:green} hadoop-yarn-services-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 19s {color} | {color:red} The patch generated 10 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 10m 33s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12835051/YARN-5775-yarn-native-services.001.patch | | JIRA Issue | YARN-5775 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit | | uname | Linux ff1ac33f8536 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | yarn-native-services / 023be93 | | Default Java | 1.8.0_101 | | mvnsite | https://builds.apache.org/job/PreCommit-YARN-Build/13494/artifact/patchprocess/branch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services-api.txt | | mvnsite | https://builds.apache.org/job/PreCommit-YARN-Build/13494/artifact/patchprocess/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services-api.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/13494/testReport/ | | asflicense | https://builds.apache.org/job/PreCommit-YARN-Build/13494/artifact/patchprocess/patch-asflicense-problems.txt | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services-api U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services-api | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/13494/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Bug fixes in swagger definition > --- > > Key: YARN-5775 > URL: https://issues.apache.org/jira/browse/YARN-5775 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gour Saha >Assignee: Gour Saha > Fix For:
[jira] [Commented] (YARN-5725) Test uncaught exception in TestContainersMonitorResourceChange.testContainersResourceChange when setting IP and host
[ https://issues.apache.org/jira/browse/YARN-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603810#comment-15603810 ] Miklos Szegedi commented on YARN-5725: -- All right, I opened YARN-5776 for the checkstyle violation. > Test uncaught exception in > TestContainersMonitorResourceChange.testContainersResourceChange when setting > IP and host > > > Key: YARN-5725 > URL: https://issues.apache.org/jira/browse/YARN-5725 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Minor > Attachments: YARN-5725.000.patch, YARN-5725.001.patch, > YARN-5725.002.patch, YARN-5725.003.patch > > Original Estimate: 2h > Remaining Estimate: 2h > > The issue is a warning but it prevents container monitor to continue > 2016-10-12 14:38:23,280 WARN [Container Monitor] > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(594)) - > Uncaught exception in ContainersMonitorImpl while monitoring resource of > container_123456_0001_01_01 > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:455) > 2016-10-12 14:38:23,281 WARN [Container Monitor] > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(613)) - > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl > is interrupted. Exiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5776) Checkstyle: MonitioringThread.Run method length is too long
Miklos Szegedi created YARN-5776: Summary: Checkstyle: MonitioringThread.Run method length is too long Key: YARN-5776 URL: https://issues.apache.org/jira/browse/YARN-5776 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Miklos Szegedi Priority: Trivial YARN-5725 had a check style violation that should be resolved by refactoring the function Details: ContainersMonitorImpl.java:395 MonitioringThread.Run @Override:5: Method length is 233 lines (max allowed is 150). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5775) Bug fixes in swagger definition
[ https://issues.apache.org/jira/browse/YARN-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-5775: Attachment: YARN-5775-yarn-native-services.001.patch > Bug fixes in swagger definition > --- > > Key: YARN-5775 > URL: https://issues.apache.org/jira/browse/YARN-5775 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gour Saha >Assignee: Gour Saha > Fix For: yarn-native-services > > Attachments: YARN-5775-yarn-native-services.001.patch > > > All enums have been listed in lowercase. Need to convert all of them to > uppercase. > For e.g. ContainerState: > {noformat} > enum: > - init > - ready > {noformat} > needs to be changed to - > {noformat} > enum: > - INIT > - READY > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5556) Support for deleting queues without requiring a RM restart
[ https://issues.apache.org/jira/browse/YARN-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-5556: Attachment: YARN-5556.v1.003.patch Thanks for the review [~templedf], have attached a patch after fixing your review comments > Support for deleting queues without requiring a RM restart > -- > > Key: YARN-5556 > URL: https://issues.apache.org/jira/browse/YARN-5556 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Xuan Gong >Assignee: Naganarasimha G R > Attachments: YARN-5556.v1.001.patch, YARN-5556.v1.002.patch, > YARN-5556.v1.003.patch > > > Today, we could add or modify queues without restarting the RM, via a CS > refresh. But for deleting queue, we have to restart the ResourceManager. We > could support for deleting queues without requiring a RM restart -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-5775) Bug fixes in swagger definition
[ https://issues.apache.org/jira/browse/YARN-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha reassigned YARN-5775: --- Assignee: Gour Saha > Bug fixes in swagger definition > --- > > Key: YARN-5775 > URL: https://issues.apache.org/jira/browse/YARN-5775 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gour Saha >Assignee: Gour Saha > Fix For: yarn-native-services > > > All enums have been listed in lowercase. Need to convert all of them to > uppercase. > For e.g. ContainerState: > {noformat} > enum: > - init > - ready > {noformat} > needs to be changed to - > {noformat} > enum: > - INIT > - READY > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5123) SQL based RM state store
[ https://issues.apache.org/jira/browse/YARN-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603768#comment-15603768 ] Subru Krishnan commented on YARN-5123: -- [~lavkesh], are you planning to update the patch based on the above discussions (and addressing Yetus warnings)? I feel this will be a nice addition, hence following up. Thanks. > SQL based RM state store > > > Key: YARN-5123 > URL: https://issues.apache.org/jira/browse/YARN-5123 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Attachments: 0001-SQL-Based-RM-state-store-trunk.patch, High > Availability In YARN Resource Manager using SQL Based StateStore.pdf, > sqlstatestore.patch > > > In our setup, zookeeper based RM state store didn't work. We ended up > implementing our own SQL based state store. Here is a patch, if anybody else > wants to use it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5775) Bug fixes in swagger definition
Gour Saha created YARN-5775: --- Summary: Bug fixes in swagger definition Key: YARN-5775 URL: https://issues.apache.org/jira/browse/YARN-5775 Project: Hadoop YARN Issue Type: Sub-task Reporter: Gour Saha Fix For: yarn-native-services All enums have been listed in lowercase. Need to convert all of them to uppercase. For e.g. ContainerState: {noformat} enum: - init - ready {noformat} needs to be changed to - {noformat} enum: - INIT - READY {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603673#comment-15603673 ] Wangda Tan commented on YARN-4734: -- [~aw], Maybe, since you're pretty familiar with AltKerberos stuffs, probably you can also share what is the problem on YARN-4006. [~vvasudev] asked you a [question|https://issues.apache.org/jira/browse/YARN-4006?focusedCommentId=15297651=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15297651] but you may missed it. I will add a note to the doc to say running under security environment is not tested, which includes the AltKerberos setup I think. Anyway, I think our goal is to make sure it doesn't break any other components, plz let us know if you see any critical issues for the merge. Thanks, > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch, YARN-4734.10-NOT_READY.patch, > YARN-4734.11-NOT_READY.patch, YARN-4734.12-NOT_READY.patch, > YARN-4734.13.patch, YARN-4734.14.patch, YARN-4734.15.patch, > YARN-4734.2.patch, YARN-4734.3.patch, YARN-4734.4.patch, YARN-4734.5.patch, > YARN-4734.6.patch, YARN-4734.7.patch, YARN-4734.8.patch, > YARN-4734.9-NOT_READY.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603642#comment-15603642 ] Allen Wittenauer commented on YARN-4734: Oh oh oh. That means this likely won't work with AltKerberos deployments since YARN REST is completely broken with it too. The docs will need an update to mention that. > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch, YARN-4734.10-NOT_READY.patch, > YARN-4734.11-NOT_READY.patch, YARN-4734.12-NOT_READY.patch, > YARN-4734.13.patch, YARN-4734.14.patch, YARN-4734.15.patch, > YARN-4734.2.patch, YARN-4734.3.patch, YARN-4734.4.patch, YARN-4734.5.patch, > YARN-4734.6.patch, YARN-4734.7.patch, YARN-4734.8.patch, > YARN-4734.9-NOT_READY.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5774) MR Job stuck in ACCEPTED status without any progress in Fair Scheduler if {{yarn.scheduler.minimum-allocation-mb}} is 0.
[ https://issues.apache.org/jira/browse/YARN-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-5774: --- Description: MR Job stuck in ACCEPTED status without any progress in Fair Scheduler because there is no resource request for the AM. This happened when you configure {{yarn.scheduler.minimum-allocation-mb}} to zero. The problem is in the code used by both Capacity Scheduler and Fair Scheduler. {{scheduler.increment-allocation-mb}} is a concept in FS, but not CS. So the common code in class RMAppManager passes the {{yarn.scheduler.minimum-allocation-mb}} as incremental one because there is no incremental one for CS when it tried to normalize the resource requests. {code} SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(), scheduler.getClusterResource(), scheduler.getMinimumResourceCapability(), scheduler.getMaximumResourceCapability(), scheduler.getMinimumResourceCapability()); --> incrementResource should be passed here. {code} was: MR Job stuck in ACCEPTED status without any progress in Fair Scheduler because there is no resource request for the AM. This happened when you configure {{yarn.scheduler.minimum-allocation-mb}} to zero. The problem is in the code used by both Capacity Scheduler and Fair Scheduler. scheduler.increment-allocation-mb is a concept in FS, but not CS. So the common code in class RMAppManager passes the yarn.scheduler.minimum-allocation-mb as incremental one because there is no incremental one for CS when it tried to normalize the resource requests. {code} SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(), scheduler.getClusterResource(), scheduler.getMinimumResourceCapability(), scheduler.getMaximumResourceCapability(), scheduler.getMinimumResourceCapability()); --> incrementResource should be passed here. {code} > MR Job stuck in ACCEPTED status without any progress in Fair Scheduler if > {{yarn.scheduler.minimum-allocation-mb}} is 0. > > > Key: YARN-5774 > URL: https://issues.apache.org/jira/browse/YARN-5774 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > > MR Job stuck in ACCEPTED status without any progress in Fair Scheduler > because there is no resource request for the AM. This happened when you > configure {{yarn.scheduler.minimum-allocation-mb}} to zero. > The problem is in the code used by both Capacity Scheduler and Fair > Scheduler. {{scheduler.increment-allocation-mb}} is a concept in FS, but not > CS. So the common code in class RMAppManager passes the > {{yarn.scheduler.minimum-allocation-mb}} as incremental one because there is > no incremental one for CS when it tried to normalize the resource requests. > {code} > SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(), > scheduler.getClusterResource(), > scheduler.getMinimumResourceCapability(), > scheduler.getMaximumResourceCapability(), > scheduler.getMinimumResourceCapability()); --> incrementResource > should be passed here. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5767) Fix the order that resources are cleaned up from the local Public/Private caches
[ https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603469#comment-15603469 ] Hadoop QA commented on YARN-5767: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 3s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 48s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 159 unchanged - 27 fixed = 159 total (was 186) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 14m 58s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 30m 4s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12835025/YARN-5767-trunk-v3.patch | | JIRA Issue | YARN-5767 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 815705b05614 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9d17585 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/13493/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/13493/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Fix the order that resources are cleaned up from the local Public/Private > caches > > > Key: YARN-5767 > URL: https://issues.apache.org/jira/browse/YARN-5767 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0,
[jira] [Updated] (YARN-5767) Fix the order that resources are cleaned up from the local Public/Private caches
[ https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-5767: --- Attachment: YARN-5767-trunk-v3.patch V3 attached. Fixed whitespace. > Fix the order that resources are cleaned up from the local Public/Private > caches > > > Key: YARN-5767 > URL: https://issues.apache.org/jira/browse/YARN-5767 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 3.0.0-alpha1 >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-5767-trunk-v1.patch, YARN-5767-trunk-v2.patch, > YARN-5767-trunk-v3.patch > > > If you look at {{ResourceLocalizationService#handleCacheCleanup}}, you can > see that public resources are added to the {{ResourceRetentionSet}} first > followed by private resources: > {code:java} > private void handleCacheCleanup(LocalizationEvent event) { > ResourceRetentionSet retain = > new ResourceRetentionSet(delService, cacheTargetSize); > retain.addResources(publicRsrc); > if (LOG.isDebugEnabled()) { > LOG.debug("Resource cleanup (public) " + retain); > } > for (LocalResourcesTracker t : privateRsrc.values()) { > retain.addResources(t); > if (LOG.isDebugEnabled()) { > LOG.debug("Resource cleanup " + t.getUser() + ":" + retain); > } > } > //TODO Check if appRsrcs should also be added to the retention set. > } > {code} > Unfortunately, if we look at {{ResourceRetentionSet#addResources}} we see > that this means public resources are deleted first until the target cache > size is met: > {code:java} > public void addResources(LocalResourcesTracker newTracker) { > for (LocalizedResource resource : newTracker) { > currentSize += resource.getSize(); > if (resource.getRefCount() > 0) { > // always retain resources in use > continue; > } > retain.put(resource, newTracker); > } > for (Iterator> i = > retain.entrySet().iterator(); >currentSize - delSize > targetSize && i.hasNext();) { > Map.Entry rsrc = i.next(); > LocalizedResource resource = rsrc.getKey(); > LocalResourcesTracker tracker = rsrc.getValue(); > if (tracker.remove(resource, delService)) { > delSize += resource.getSize(); > i.remove(); > } > } > } > {code} > The result of this is that resources in the private cache are only deleted in > the cases where: > # The cache size is larger than the target cache size and the public cache is > empty. > # The cache size is larger than the target cache size and everything in the > public cache is being used by a running container. > For clusters that primarily use the public cache (i.e. make use of the shared > cache), this means that the most commonly used resources can be deleted > before old resources in the private cache. Furthermore, the private cache can > continue to grow over time causing more and more churn in the public cache. > Additionally, the same problem exists within the private cache. Since > resources are added to the retention set on a user by user basis, resources > will get cleaned up one user at a time in the order that privateRsrc.values() > returns the LocalResourcesTracker. So if user1 has 10MB in their cache and > user2 has 100MB in their cache and the target size of the cache is 50MB, > user1 could potentially have their entire cache removed before anything is > deleted from the user2 cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()
[ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603373#comment-15603373 ] Varun Saxena commented on YARN-5773: [~sunilg], we still need to add the apps to pendingOrderingPolicy. Its just that there is no need of running over all the pending apps on recovery of each unfinished app as NMs' have not yet registered (they wont till recovery finishes). Iterating over all the apps on recovery of each unfinished app I feel is unnecessary as it will time and again hit the same condition and will be unable to activate application. > RM recovery too slow due to LeafQueue#activateApplication() > --- > > Key: YARN-5773 > URL: https://issues.apache.org/jira/browse/YARN-5773 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch > > > # Submit application 10K application to default queue. > # All applications are in accepted state > # Now restart resourcemanager > For each application recovery {{LeafQueue#activateApplications()}} is > invoked.Resulting in AM limit check to be done even before Node managers are > getting registered. > Total iteration for N application is about {{N(N+1)/2}} for {{10K}} > application {{5000}} iterations causing time take for Rm to be active > more than 10 min. > Since NM resources are not yet added to during recovery we should skip > {{activateApplicaiton()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5767) Fix the order that resources are cleaned up from the local Public/Private caches
[ https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603367#comment-15603367 ] Hadoop QA commented on YARN-5767: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 54s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 160 unchanged - 27 fixed = 160 total (was 187) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 14m 53s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 27m 42s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12835016/YARN-5767-trunk-v2.patch | | JIRA Issue | YARN-5767 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 2f2b8c5e8486 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a1a0281 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/13492/artifact/patchprocess/whitespace-eol.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/13492/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/13492/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Fix the order that resources are cleaned up from the local Public/Private > caches > > > Key: YARN-5767 > URL:
[jira] [Updated] (YARN-5774) MR Job stuck in ACCEPTED status without any progress in Fair Scheduler if {{yarn.scheduler.minimum-allocation-mb}} is 0.
[ https://issues.apache.org/jira/browse/YARN-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-5774: --- Summary: MR Job stuck in ACCEPTED status without any progress in Fair Scheduler if {{yarn.scheduler.minimum-allocation-mb}} is 0. (was: MR Job stuck in ACCEPTED status without any progress in Fair Scheduler) > MR Job stuck in ACCEPTED status without any progress in Fair Scheduler if > {{yarn.scheduler.minimum-allocation-mb}} is 0. > > > Key: YARN-5774 > URL: https://issues.apache.org/jira/browse/YARN-5774 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > > MR Job stuck in ACCEPTED status without any progress in Fair Scheduler > because there is no resource request for the AM. This happened when you > configure {{yarn.scheduler.minimum-allocation-mb}} to zero. > The problem is in the code used by both Capacity Scheduler and Fair > Scheduler. scheduler.increment-allocation-mb is a concept in FS, but not CS. > So the common code in class RMAppManager passes the > yarn.scheduler.minimum-allocation-mb as incremental one because there is no > incremental one for CS when it tried to normalize the resource requests. > {code} > SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(), > scheduler.getClusterResource(), > scheduler.getMinimumResourceCapability(), > scheduler.getMaximumResourceCapability(), > scheduler.getMinimumResourceCapability()); --> incrementResource > should be passed here. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5774) MR Job stuck in ACCEPTED status without any progress in Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-5774: --- Description: MR Job stuck in ACCEPTED status without any progress in Fair Scheduler because there is no resource request for the AM. This happened when you configure {{yarn.scheduler.minimum-allocation-mb}} to zero. The problem is in the code used by both Capacity Scheduler and Fair Scheduler. scheduler.increment-allocation-mb is a concept in FS, but not CS. So the common code in class RMAppManager passes the yarn.scheduler.minimum-allocation-mb as incremental one because there is no incremental one for CS when it tried to normalize the resource requests. {code} SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(), scheduler.getClusterResource(), scheduler.getMinimumResourceCapability(), scheduler.getMaximumResourceCapability(), scheduler.getMinimumResourceCapability()); --> incrementResource should be passed here. {code} was: MR Job stuck in ACCEPTED status without any progress in Fair Scheduler because there is no resource request for the AM. The problem is in the code used by both Capacity Scheduler and Fair Scheduler. scheduler.increment-allocation-mb is a concept in FS, but not CS. So the common code in class RMAppManager passes the yarn.scheduler.minimum-allocation-mb as incremental one because there is no incremental one for CS when it tried to normalize the resource requests. {code} SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(), scheduler.getClusterResource(), scheduler.getMinimumResourceCapability(), scheduler.getMaximumResourceCapability(), scheduler.getMinimumResourceCapability()); --> incrementResource should be passed here. {code} > MR Job stuck in ACCEPTED status without any progress in Fair Scheduler > -- > > Key: YARN-5774 > URL: https://issues.apache.org/jira/browse/YARN-5774 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > > MR Job stuck in ACCEPTED status without any progress in Fair Scheduler > because there is no resource request for the AM. This happened when you > configure {{yarn.scheduler.minimum-allocation-mb}} to zero. > The problem is in the code used by both Capacity Scheduler and Fair > Scheduler. scheduler.increment-allocation-mb is a concept in FS, but not CS. > So the common code in class RMAppManager passes the > yarn.scheduler.minimum-allocation-mb as incremental one because there is no > incremental one for CS when it tried to normalize the resource requests. > {code} > SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(), > scheduler.getClusterResource(), > scheduler.getMinimumResourceCapability(), > scheduler.getMaximumResourceCapability(), > scheduler.getMinimumResourceCapability()); --> incrementResource > should be passed here. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-5765: - Summary: LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner. (was: LinuxContainerExecutor creates appcache/{appId} with wrong group owner.) > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Blocker > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example > job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5774) MR Job stuck in ACCEPTED status without any progress in Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-5774: --- Summary: MR Job stuck in ACCEPTED status without any progress in Fair Scheduler (was: A wrong parameter is passed when normalizing resource requests) > MR Job stuck in ACCEPTED status without any progress in Fair Scheduler > -- > > Key: YARN-5774 > URL: https://issues.apache.org/jira/browse/YARN-5774 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > > MR Job stuck in ACCEPTED status without any progress in Fair Scheduler > because there is no resource request for the AM. > The problem is in the code used by both Capacity Scheduler and Fair > Scheduler. scheduler.increment-allocation-mb is a concept in FS, but not CS. > So the common code in class RMAppManager passes the > yarn.scheduler.minimum-allocation-mb as incremental one because there is no > incremental one for CS when it tried to normalize the resource requests. > {code} > SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(), > scheduler.getClusterResource(), > scheduler.getMinimumResourceCapability(), > scheduler.getMaximumResourceCapability(), > scheduler.getMinimumResourceCapability()); --> incrementResource > should be passed here. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5774) A wrong parameter is passed when normalizing resource requests
[ https://issues.apache.org/jira/browse/YARN-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-5774: --- Description: MR Job stuck in ACCEPTED status without any progress in Fair Scheduler because there is no resource request for the AM. The problem is in the code used by both Capacity Scheduler and Fair Scheduler. scheduler.increment-allocation-mb is a concept in FS, but not CS. So the common code in class RMAppManager passes the yarn.scheduler.minimum-allocation-mb as incremental one because there is no incremental one for CS when it tried to normalize the resource requests. {code} SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(), scheduler.getClusterResource(), scheduler.getMinimumResourceCapability(), scheduler.getMaximumResourceCapability(), scheduler.getMinimumResourceCapability()); --> incrementResource should be passed here. {code} was: The problem is in the code used by both Capacity Scheduler and Fair Scheduler. scheduler.increment-allocation-mb is a concept in FS, but not CS. So the common code in class RMAppManager passes the yarn.scheduler.minimum-allocation-mb as incremental one because there is no incremental one for CS when it tried to normalize the resource requests. {code} SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(), scheduler.getClusterResource(), scheduler.getMinimumResourceCapability(), scheduler.getMaximumResourceCapability(), scheduler.getMinimumResourceCapability()); --> incrementResource should be passed here. {code} > A wrong parameter is passed when normalizing resource requests > -- > > Key: YARN-5774 > URL: https://issues.apache.org/jira/browse/YARN-5774 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > > MR Job stuck in ACCEPTED status without any progress in Fair Scheduler > because there is no resource request for the AM. > The problem is in the code used by both Capacity Scheduler and Fair > Scheduler. scheduler.increment-allocation-mb is a concept in FS, but not CS. > So the common code in class RMAppManager passes the > yarn.scheduler.minimum-allocation-mb as incremental one because there is no > incremental one for CS when it tried to normalize the resource requests. > {code} > SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(), > scheduler.getClusterResource(), > scheduler.getMinimumResourceCapability(), > scheduler.getMaximumResourceCapability(), > scheduler.getMinimumResourceCapability()); --> incrementResource > should be passed here. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5774) A wrong parameter is passed when normalizing resource requests
[ https://issues.apache.org/jira/browse/YARN-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-5774: --- Description: The problem is in the code used by both Capacity Scheduler and Fair Scheduler. scheduler.increment-allocation-mb is a concept in FS, but not CS. So the common code in class RMAppManager passes the yarn.scheduler.minimum-allocation-mb as incremental one because there is no incremental one for CS when it tried to normalize the resource requests. {code} SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(), scheduler.getClusterResource(), scheduler.getMinimumResourceCapability(), scheduler.getMaximumResourceCapability(), scheduler.getMinimumResourceCapability()); --> incrementResource should be passed here. {code} was: {code} SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(), scheduler.getClusterResource(), scheduler.getMinimumResourceCapability(), scheduler.getMaximumResourceCapability(), scheduler.getMinimumResourceCapability()); --> incrementResource should be passed here. {code} > A wrong parameter is passed when normalizing resource requests > -- > > Key: YARN-5774 > URL: https://issues.apache.org/jira/browse/YARN-5774 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > > The problem is in the code used by both Capacity Scheduler and Fair > Scheduler. scheduler.increment-allocation-mb is a concept in FS, but not CS. > So the common code in class RMAppManager passes the > yarn.scheduler.minimum-allocation-mb as incremental one because there is no > incremental one for CS when it tried to normalize the resource requests. > {code} > SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(), > scheduler.getClusterResource(), > scheduler.getMinimumResourceCapability(), > scheduler.getMaximumResourceCapability(), > scheduler.getMinimumResourceCapability()); --> incrementResource > should be passed here. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache/{appId} with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603320#comment-15603320 ] Haibo Chen commented on YARN-5765: -- I believe this is broken by YARN-5287. "chmod clears the set-group-ID bit of a regular file if the file's group ID does not match the user's effective group ID or one of the user's supplementary group IDs, unless the user has appropriate privileges. " According to linux man page. This is inline with the reproduction setup I had. Walking through the container-executor.c code, {nm_root}/usercache/{userName} is created with correct permission with the group owner being that of the nm process and Setgid set. However, in create_validate_dir(), "mkdir(npath, perm) != 0" returns false on directory {nm_root}/usercache/{userName}/appcache, so chmod(npath, perm) is executed on the directory, clearing the Setgid Bits. Consequentially, all directories/files created under the appcache directory have the wrong group owner. The container working directory is also created with the same code, therefore, having wrong group owner as well. > LinuxContainerExecutor creates appcache/{appId} with wrong group owner. > --- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Blocker > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example > job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5767) Fix the order that resources are cleaned up from the local Public/Private caches
[ https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-5767: --- Attachment: YARN-5767-trunk-v2.patch Attaching a v2 patch for trunk. This new version simply fixes checkstyles and findbugs. Here is a summary: # Add javadoc comments and fix spacing. # Add a hashcode method and serializable interface to {{LocalCacheCleaner#LRUComparator}}. > Fix the order that resources are cleaned up from the local Public/Private > caches > > > Key: YARN-5767 > URL: https://issues.apache.org/jira/browse/YARN-5767 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 3.0.0-alpha1 >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-5767-trunk-v1.patch, YARN-5767-trunk-v2.patch > > > If you look at {{ResourceLocalizationService#handleCacheCleanup}}, you can > see that public resources are added to the {{ResourceRetentionSet}} first > followed by private resources: > {code:java} > private void handleCacheCleanup(LocalizationEvent event) { > ResourceRetentionSet retain = > new ResourceRetentionSet(delService, cacheTargetSize); > retain.addResources(publicRsrc); > if (LOG.isDebugEnabled()) { > LOG.debug("Resource cleanup (public) " + retain); > } > for (LocalResourcesTracker t : privateRsrc.values()) { > retain.addResources(t); > if (LOG.isDebugEnabled()) { > LOG.debug("Resource cleanup " + t.getUser() + ":" + retain); > } > } > //TODO Check if appRsrcs should also be added to the retention set. > } > {code} > Unfortunately, if we look at {{ResourceRetentionSet#addResources}} we see > that this means public resources are deleted first until the target cache > size is met: > {code:java} > public void addResources(LocalResourcesTracker newTracker) { > for (LocalizedResource resource : newTracker) { > currentSize += resource.getSize(); > if (resource.getRefCount() > 0) { > // always retain resources in use > continue; > } > retain.put(resource, newTracker); > } > for (Iterator> i = > retain.entrySet().iterator(); >currentSize - delSize > targetSize && i.hasNext();) { > Map.Entry rsrc = i.next(); > LocalizedResource resource = rsrc.getKey(); > LocalResourcesTracker tracker = rsrc.getValue(); > if (tracker.remove(resource, delService)) { > delSize += resource.getSize(); > i.remove(); > } > } > } > {code} > The result of this is that resources in the private cache are only deleted in > the cases where: > # The cache size is larger than the target cache size and the public cache is > empty. > # The cache size is larger than the target cache size and everything in the > public cache is being used by a running container. > For clusters that primarily use the public cache (i.e. make use of the shared > cache), this means that the most commonly used resources can be deleted > before old resources in the private cache. Furthermore, the private cache can > continue to grow over time causing more and more churn in the public cache. > Additionally, the same problem exists within the private cache. Since > resources are added to the retention set on a user by user basis, resources > will get cleaned up one user at a time in the order that privateRsrc.values() > returns the LocalResourcesTracker. So if user1 has 10MB in their cache and > user2 has 100MB in their cache and the target size of the cache is 50MB, > user1 could potentially have their entire cache removed before anything is > deleted from the user2 cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603244#comment-15603244 ] Wangda Tan commented on YARN-2009: -- Thanks for update [~sunilg], I have one doubt: should we deduct sum of all am-used for each user from user-limit? Behavior in the patch is deducting sum of all am-used across users in the queue. > Priority support for preemption in ProportionalCapacityPreemptionPolicy > --- > > Key: YARN-2009 > URL: https://issues.apache.org/jira/browse/YARN-2009 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Devaraj K >Assignee: Sunil G > Attachments: YARN-2009.0001.patch, YARN-2009.0002.patch, > YARN-2009.0003.patch, YARN-2009.0004.patch, YARN-2009.0005.patch, > YARN-2009.0006.patch, YARN-2009.0007.patch, YARN-2009.0008.patch, > YARN-2009.0009.patch, YARN-2009.0010.patch, YARN-2009.0011.patch, > YARN-2009.0012.patch, YARN-2009.0013.patch, YARN-2009.0014.patch > > > While preempting containers based on the queue ideal assignment, we may need > to consider preempting the low priority application containers first. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5774) A wrong parameter is passed when normalizing resource requests
Yufei Gu created YARN-5774: -- Summary: A wrong parameter is passed when normalizing resource requests Key: YARN-5774 URL: https://issues.apache.org/jira/browse/YARN-5774 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0-alpha1 Reporter: Yufei Gu Assignee: Yufei Gu {code} SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(), scheduler.getClusterResource(), scheduler.getMinimumResourceCapability(), scheduler.getMaximumResourceCapability(), scheduler.getMinimumResourceCapability()); --> incrementResource should be passed here. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()
[ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603166#comment-15603166 ] Wangda Tan commented on YARN-5773: -- I feel we may need a overhaul to existing activateApplication: If we describe what activateApplications target to solve: A set of pending applications in a queue, each application belongs to one user, different application has different AM request, each user has a quota, and queue has a total quota, get which application will be activated. There's an additional questions: If a given app's AM resource amount > AM headroom, should we skip the AM and activate following app which AM resource amount <= AM headroom? If answer to the above question yes, we can maintain a map: Map, when doing application activation, we don't need to check all the apps, instead we only need to check each user once in most cases. > RM recovery too slow due to LeafQueue#activateApplication() > --- > > Key: YARN-5773 > URL: https://issues.apache.org/jira/browse/YARN-5773 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch > > > # Submit application 10K application to default queue. > # All applications are in accepted state > # Now restart resourcemanager > For each application recovery {{LeafQueue#activateApplications()}} is > invoked.Resulting in AM limit check to be done even before Node managers are > getting registered. > Total iteration for N application is about {{N(N+1)/2}} for {{10K}} > application {{5000}} iterations causing time take for Rm to be active > more than 10 min. > Since NM resources are not yet added to during recovery we should skip > {{activateApplicaiton()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5716) Add global scheduler interface definition and update CapacityScheduler to use it.
[ https://issues.apache.org/jira/browse/YARN-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603128#comment-15603128 ] Hadoop QA commented on YARN-5716: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 13 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 3s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 5s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 51s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 7s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 26s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 58s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 139 new + 1470 unchanged - 164 fixed = 1609 total (was 1634) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 33s {color} | {color:green} hadoop-yarn-project_hadoop-yarn generated 0 new + 6484 unchanged - 10 fixed = 6484 total (was 6494) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 0 new + 928 unchanged - 10 fixed = 928 total (was 938) {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 30s {color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 35m 30s {color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 137m 56s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.TestMiniYarnClusterNodeUtilization | | | hadoop.yarn.server.TestContainerManagerSecurity | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA
[jira] [Comment Edited] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603108#comment-15603108 ] Wangda Tan edited comment on YARN-4734 at 10/24/16 8:28 PM: Thanks [~aw], bq. A question. Given the ... circumstances... lately of patches going into YARN, what's the security status of this branch? Offline double confirmed with [~sunilg] / [~hsreenath] about your question and following answers: a. Existing security status: Current the new UI on the same HttpServer2 instance which hosts REST service / old UI, so we should be able to get security support from existing logics. However, before we can do sufficient tests for security support, I would prefer to suggest user do not expect security support for the UI for now. b. Any possible vulnerabilities? - This feature can be completely disabled, new added code are all packaged inside a war file. When this feature disabled, we are not even placing the WAR file in class path where jersey server will extract it. - As you know ours new UI is not a conventional web application, its an SPA (Single Page application). In conventional apps there were server side code that had to consider security. Our app just uses REST APIs to get data from the server. In other words, every hack that an user could possible do with the UI, he would be able to do it using other tools like Postman. The user can also inject code from the console and tweek the UI functionality. What is basically implies is that its not worth to worry about security at the UI side :) Instead we just need to ensure that the REST end points are secure. bq. Has anyone done an audit? (Web security is outside my area of expertise, so I'd prefer another set of eyes on this one.) Many folks have looked at new added code and we believe it is safe. It is more than welcome that if you or any other folks want to do this check, just let us know if you have any questions/concerns. was (Author: leftnoteasy): Thanks [~aw], bq. A question. Given the ... circumstances... lately of patches going into YARN, what's the security status of this branch? Offline double confirmed with [~sunilg] / [~hsreenath] about your question and following answers: a. Existing security status: Current the new UI on the same HttpServer2 instance which hosts REST service / old UI, so we should be able to get security support from existing logics. However, before we can do sufficient tests for security support, I would prefer to suggest user do not expect security support for the UI for now. b. Any possible vulnerabilities? 1) This feature can be completely disabled, new added code are all packaged inside a war file. When this feature disabled, we are not even placing the WAR file in class path where jersey server will extract it. 2) As you know ours new UI is not a conventional web application, its an SPA (Single Page application). In conventional apps there were server side code that had to consider security. Our app just uses REST APIs to get data from the server. In other words, every hack that an user could possible do with the UI, he would be able to do it using other tools like Postman. The user can also inject code from the console and tweek the UI functionality. What is basically implies is that its not worth to worry about security at the UI side :) Instead we just need to ensure that the REST end points are secure. bq. Has anyone done an audit? (Web security is outside my area of expertise, so I'd prefer another set of eyes on this one.) Many folks have looked at new added code and we believe it is safe. It is more than welcome that if you or any other folks want to do this check, just let us know if you have any questions/concerns. > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch, YARN-4734.10-NOT_READY.patch, > YARN-4734.11-NOT_READY.patch, YARN-4734.12-NOT_READY.patch, > YARN-4734.13.patch, YARN-4734.14.patch, YARN-4734.15.patch, > YARN-4734.2.patch, YARN-4734.3.patch, YARN-4734.4.patch, YARN-4734.5.patch, > YARN-4734.6.patch, YARN-4734.7.patch, YARN-4734.8.patch, > YARN-4734.9-NOT_READY.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603108#comment-15603108 ] Wangda Tan commented on YARN-4734: -- Thanks [~aw], bq. A question. Given the ... circumstances... lately of patches going into YARN, what's the security status of this branch? Offline double confirmed with [~sunilg] / [~hsreenath] about your question and following answers: a. Existing security status: Current the new UI on the same HttpServer2 instance which hosts REST service / old UI, so we should be able to get security support from existing logics. However, before we can do sufficient tests for security support, I would prefer to suggest user do not expect security support for the UI for now. b. Any possible vulnerabilities? 1) This feature can be completely disabled, new added code are all packaged inside a war file. When this feature disabled, we are not even placing the WAR file in class path where jersey server will extract it. 2) As you know ours new UI is not a conventional web application, its an SPA (Single Page application). In conventional apps there were server side code that had to consider security. Our app just uses REST APIs to get data from the server. In other words, every hack that an user could possible do with the UI, he would be able to do it using other tools like Postman. The user can also inject code from the console and tweek the UI functionality. What is basically implies is that its not worth to worry about security at the UI side :) Instead we just need to ensure that the REST end points are secure. bq. Has anyone done an audit? (Web security is outside my area of expertise, so I'd prefer another set of eyes on this one.) Many folks have looked at new added code and we believe it is safe. It is more than welcome that if you or any other folks want to do this check, just let us know if you have any questions/concerns. > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch, YARN-4734.10-NOT_READY.patch, > YARN-4734.11-NOT_READY.patch, YARN-4734.12-NOT_READY.patch, > YARN-4734.13.patch, YARN-4734.14.patch, YARN-4734.15.patch, > YARN-4734.2.patch, YARN-4734.3.patch, YARN-4734.4.patch, YARN-4734.5.patch, > YARN-4734.6.patch, YARN-4734.7.patch, YARN-4734.8.patch, > YARN-4734.9-NOT_READY.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()
[ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602941#comment-15602941 ] Bibin A Chundatt commented on YARN-5773: [~sunilg] Till then, only one app will activated and rest all apps will be in pending state. - So for N-1 application the AM check happens about (N-1)(N-2)/2 rt? Which we are sure that will not be satisfied since registration happens later. Correct me if i am wrong. So all those apps its not required to check for AM limit rt? > RM recovery too slow due to LeafQueue#activateApplication() > --- > > Key: YARN-5773 > URL: https://issues.apache.org/jira/browse/YARN-5773 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch > > > # Submit application 10K application to default queue. > # All applications are in accepted state > # Now restart resourcemanager > For each application recovery {{LeafQueue#activateApplications()}} is > invoked.Resulting in AM limit check to be done even before Node managers are > getting registered. > Total iteration for N application is about {{N(N+1)/2}} for {{10K}} > application {{5000}} iterations causing time take for Rm to be active > more than 10 min. > Since NM resources are not yet added to during recovery we should skip > {{activateApplicaiton()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()
[ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602907#comment-15602907 ] Sunil G commented on YARN-5773: --- bq.1.If cluster resource is zero don't check AM limit. 2. Skip all apps if queue's AM limit is reached. I am not so sure about this. {{recover}} happens first for all apps and {{Recover}} event will be fired for all apps. {{serviceStart}} happens later, so NMs will be connected to RM later. Till then, only one app will activated and rest all apps will be in pending state. As NMs are up/registered, remaining apps will become activated from {{pendingOrderingPolicy}}. > RM recovery too slow due to LeafQueue#activateApplication() > --- > > Key: YARN-5773 > URL: https://issues.apache.org/jira/browse/YARN-5773 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch > > > # Submit application 10K application to default queue. > # All applications are in accepted state > # Now restart resourcemanager > For each application recovery {{LeafQueue#activateApplications()}} is > invoked.Resulting in AM limit check to be done even before Node managers are > getting registered. > Total iteration for N application is about {{N(N+1)/2}} for {{10K}} > application {{5000}} iterations causing time take for Rm to be active > more than 10 min. > Since NM resources are not yet added to during recovery we should skip > {{activateApplicaiton()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()
[ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602871#comment-15602871 ] Bibin A Chundatt commented on YARN-5773: Thank you [~leftnoteasy] for review comment. {quote} I'm not sure if this is safe: activeApplication is majorly to avoid too many applications are running inside one queue. if we skip the AM limit check for recovering apps, it looks like some problem may occur. apps, {quote} Yes.we should not skip activate application. RM restart issue with too many pending apps was the main intention of this jira. If too many pending apps in leaf queue and RM is restarted for each app attempt submit event the Leaf#activateApplication() gets invoked and for each pending apps the am limit is checked. Restart time increases as the number of apps increases consuming too much time on restart. Will handle following two # If cluster resource is zero don't check AM limit. # Skip all apps if queue's AM limit is reached. Will upload a patch soon > RM recovery too slow due to LeafQueue#activateApplication() > --- > > Key: YARN-5773 > URL: https://issues.apache.org/jira/browse/YARN-5773 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch > > > # Submit application 10K application to default queue. > # All applications are in accepted state > # Now restart resourcemanager > For each application recovery {{LeafQueue#activateApplications()}} is > invoked.Resulting in AM limit check to be done even before Node managers are > getting registered. > Total iteration for N application is about {{N(N+1)/2}} for {{10K}} > application {{5000}} iterations causing time take for Rm to be active > more than 10 min. > Since NM resources are not yet added to during recovery we should skip > {{activateApplicaiton()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4597) Add SCHEDULE to NM container lifecycle
[ https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602841#comment-15602841 ] Hadoop QA commented on YARN-4597: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 19 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 55s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 46s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 9s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 3s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 17s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 7m 17s {color} | {color:red} root generated 2 new + 701 unchanged - 2 fixed = 703 total (was 703) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 40s {color} | {color:red} root: The patch generated 7 new + 1018 unchanged - 15 fixed = 1025 total (was 1033) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager generated 0 new + 236 unchanged - 1 fixed = 236 total (was 237) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s {color} | {color:green} hadoop-yarn-server-tests in the patch passed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} hadoop-mapreduce-client-jobclient in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s {color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 17s {color} | {color:green} hadoop-yarn-common in the patch passed. {color} | |
[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()
[ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602831#comment-15602831 ] Hadoop QA commented on YARN-5773: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 10s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 57s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 20s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 209 unchanged - 0 fixed = 210 total (was 209) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 2s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 18s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 3 new + 938 unchanged - 0 fixed = 941 total (was 938) {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 34m 51s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 49m 40s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler | | | hadoop.yarn.server.resourcemanager.TestRMRestart | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12834978/YARN-5773.0002.patch | | JIRA Issue | YARN-5773 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux e6af7d98acab 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / b18f35f | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/13490/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | javadoc |
[jira] [Resolved] (YARN-5750) YARN-4126 broke Oozie on unsecure cluster
[ https://issues.apache.org/jira/browse/YARN-5750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter resolved YARN-5750. - Resolution: Duplicate YARN-4126 has been reverted from branch-2 and 2.8. It's now only in 3, where it's okay to break this. > YARN-4126 broke Oozie on unsecure cluster > - > > Key: YARN-5750 > URL: https://issues.apache.org/jira/browse/YARN-5750 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: Peter Cseh > > Oozie is using a DummyRenewer on unsecure clusters and can't submit workflows > on an unsecure cluster after YARN-4126. > {noformat} > org.apache.oozie.action.ActionExecutorException: JA009: > org.apache.hadoop.yarn.exceptions.YarnException: java.io.IOException: > Delegation Token can be issued only with kerberos authentication > at > org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getDelegationToken(ClientRMService.java:1092) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getDelegationToken(ApplicationClientProtocolPBServiceImpl.java:335) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:515) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:663) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2423) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2419) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1790) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2419) > Caused by: java.io.IOException: Delegation Token can be issued only with > kerberos authentication > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getDelegationToken(ClientRMService.java:1065) > ... 10 more > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:457) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:437) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1128) > at > org.apache.oozie.action.hadoop.TestJavaActionExecutor.submitAction(TestJavaActionExecutor.java:343) > at > org.apache.oozie.action.hadoop.TestJavaActionExecutor.submitAction(TestJavaActionExecutor.java:363) > at > org.apache.oozie.action.hadoop.TestJavaActionExecutor.testKill(TestJavaActionExecutor.java:602) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at junit.framework.TestCase.runTest(TestCase.java:168) > at junit.framework.TestCase.runBare(TestCase.java:134) > at junit.framework.TestResult$1.protect(TestResult.java:110) > at junit.framework.TestResult.runProtected(TestResult.java:128) > at junit.framework.TestResult.run(TestResult.java:113) > at junit.framework.TestCase.run(TestCase.java:124) > at junit.framework.TestSuite.runTest(TestSuite.java:232) > at junit.framework.TestSuite.run(TestSuite.java:227) > at > org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) > at org.junit.runners.Suite.runChild(Suite.java:128) > at org.junit.runners.Suite.runChild(Suite.java:24) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: > org.apache.hadoop.yarn.exceptions.YarnException: java.io.IOException: > Delegation Token can be issued only with kerberos authentication > at > org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getDelegationToken(ClientRMService.java:1092) > at >
[jira] [Updated] (YARN-4126) RM should not issue delegation tokens in unsecure mode
[ https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-4126: Hadoop Flags: Incompatible change,Reviewed (was: Reviewed) Release Note: Yarn now only issues and allows delegation tokens in secure clusters. Clients should no longer request delegation tokens in a non-secure cluster, or they'll receive an IOException. I've also marked this as incompatible and put something in the Release Note field. I'll also close YARN-5750. > RM should not issue delegation tokens in unsecure mode > -- > > Key: YARN-4126 > URL: https://issues.apache.org/jira/browse/YARN-4126 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Bibin A Chundatt > Fix For: 3.0.0-alpha1 > > Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, > 0003-YARN-4126.patch, 0004-YARN-4126.patch, 0005-YARN-4126.patch, > 0006-YARN-4126.patch > > > ClientRMService#getDelegationToken is currently returning a delegation token > in insecure mode. We should not return the token if it's in insecure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode
[ https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602796#comment-15602796 ] Robert Kanter commented on YARN-4126: - Thanks! > RM should not issue delegation tokens in unsecure mode > -- > > Key: YARN-4126 > URL: https://issues.apache.org/jira/browse/YARN-4126 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Bibin A Chundatt > Fix For: 3.0.0-alpha1 > > Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, > 0003-YARN-4126.patch, 0004-YARN-4126.patch, 0005-YARN-4126.patch, > 0006-YARN-4126.patch > > > ClientRMService#getDelegationToken is currently returning a delegation token > in insecure mode. We should not return the token if it's in insecure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5369) Improve Yarn logs command to get container logs based on Node Id
[ https://issues.apache.org/jira/browse/YARN-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602794#comment-15602794 ] Hadoop QA commented on YARN-5369: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 53s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 0s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 25s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 40s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 100 unchanged - 1 fixed = 101 total (was 101) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 30s {color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 16m 31s {color} | {color:red} hadoop-yarn-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 42m 29s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.client.cli.TestLogsCLI | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12834980/YARN-5369.5.patch | | JIRA Issue | YARN-5369 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 51f67dd2227c 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / b18f35f | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/13489/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/13489/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt | | unit test logs |
[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()
[ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602776#comment-15602776 ] Wangda Tan commented on YARN-5773: -- Thanks [~bibinchundatt] for reporting and working on this issue. I'm not sure if this is safe: activeApplication is majorly to avoid too many applications are running inside one queue. if we skip the AM limit check for recovering apps, it looks like some problem may occur. For example, a cluster with 4K nodes and then restart only left 2K nodes, should we activate only some of the original submitted apps? In my mind we need to optimize activeApplications method, now it scan through all pending apps inside the queue under all conditions. We should be able to optimize this, for example, skip all apps if queue's AM limit reached. > RM recovery too slow due to LeafQueue#activateApplication() > --- > > Key: YARN-5773 > URL: https://issues.apache.org/jira/browse/YARN-5773 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch > > > # Submit application 10K application to default queue. > # All applications are in accepted state > # Now restart resourcemanager > For each application recovery {{LeafQueue#activateApplications()}} is > invoked.Resulting in AM limit check to be done even before Node managers are > getting registered. > Total iteration for N application is about {{N(N+1)/2}} for {{10K}} > application {{5000}} iterations causing time take for Rm to be active > more than 10 min. > Since NM resources are not yet added to during recovery we should skip > {{activateApplicaiton()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5716) Add global scheduler interface definition and update CapacityScheduler to use it.
[ https://issues.apache.org/jira/browse/YARN-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-5716: - Attachment: YARN-5716.008.patch [~sunilg], make sense. I just uploaded ver.8 patch, which removed all set interfaces from new added APIs. And I also added a comment to ResourceCommitRequest constructor to explain the to-release resource behavior. > Add global scheduler interface definition and update CapacityScheduler to use > it. > - > > Key: YARN-5716 > URL: https://issues.apache.org/jira/browse/YARN-5716 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-5716.001.patch, YARN-5716.002.patch, > YARN-5716.003.patch, YARN-5716.004.patch, YARN-5716.005.patch, > YARN-5716.006.patch, YARN-5716.007.patch, YARN-5716.008.patch > > > Target of this JIRA: > - Definition of interfaces / objects which will be used by global scheduling, > this will be shared by different schedulers. > - Modify CapacityScheduler to use it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5369) Improve Yarn logs command to get container logs based on Node Id
[ https://issues.apache.org/jira/browse/YARN-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-5369: Attachment: YARN-5369.5.patch > Improve Yarn logs command to get container logs based on Node Id > > > Key: YARN-5369 > URL: https://issues.apache.org/jira/browse/YARN-5369 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-5369.1.patch, YARN-5369.2.patch, YARN-5369.3.patch, > YARN-5369.4.patch, YARN-5369.5.patch > > > It is helpful if we could have yarn logs --applicationId appId --nodeAddress > ${nodeId} to get all the container logs which ran on the specific nm. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()
[ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602665#comment-15602665 ] Hadoop QA commented on YARN-5773: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 210 unchanged - 0 fixed = 211 total (was 210) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 18s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 3 new + 938 unchanged - 0 fixed = 941 total (was 938) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 35m 22s {color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 50m 39s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12834967/YARN-5773.0001.patch | | JIRA Issue | YARN-5773 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 03b6c1d73bc8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / b18f35f | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/13488/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | javadoc | https://builds.apache.org/job/PreCommit-YARN-Build/13488/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/13488/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U:
[jira] [Updated] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()
[ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-5773: --- Attachment: YARN-5773.0002.patch > RM recovery too slow due to LeafQueue#activateApplication() > --- > > Key: YARN-5773 > URL: https://issues.apache.org/jira/browse/YARN-5773 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch > > > # Submit application 10K application to default queue. > # All applications are in accepted state > # Now restart resourcemanager > For each application recovery {{LeafQueue#activateApplications()}} is > invoked.Resulting in AM limit check to be done even before Node managers are > getting registered. > Total iteration for N application is about {{N(N+1)/2}} for {{10K}} > application {{5000}} iterations causing time take for Rm to be active > more than 10 min. > Since NM resources are not yet added to during recovery we should skip > {{activateApplicaiton()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5772) Replace old Hadoop logo with new one
[ https://issues.apache.org/jira/browse/YARN-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602622#comment-15602622 ] Akira Ajisaka commented on YARN-5772: - LGTM, +1. > Replace old Hadoop logo with new one > > > Key: YARN-5772 > URL: https://issues.apache.org/jira/browse/YARN-5772 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-ui-v2 >Affects Versions: YARN-3368 >Reporter: Akira Ajisaka >Assignee: Akhil PB > Attachments: YARN-5772-YARN-3368.0001.patch, ui2-with-newlogo.png > > > YARN-5161 added Apache Hadoop logo in the UI but the logo is old. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5587) Add support for resource profiles
[ https://issues.apache.org/jira/browse/YARN-5587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602519#comment-15602519 ] Varun Vasudev commented on YARN-5587: - [~leftnoteasy], [~asuresh] - can you take a look at the latest patch? It adds the core resource profile functionality. Thanks! > Add support for resource profiles > - > > Key: YARN-5587 > URL: https://issues.apache.org/jira/browse/YARN-5587 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-5587-YARN-3926.001.patch, > YARN-5587-YARN-3926.002.patch, YARN-5587-YARN-3926.003.patch, > YARN-5587-YARN-3926.004.patch, YARN-5587-YARN-3926.005.patch > > > Add support for resource profiles on the RM side to allow users to use > shorthands to specify resource requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5770) Performance improvement of native-services REST API service
[ https://issues.apache.org/jira/browse/YARN-5770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602512#comment-15602512 ] Hadoop QA commented on YARN-5770: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 52s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 15s {color} | {color:green} yarn-native-services passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} yarn-native-services passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s {color} | {color:green} yarn-native-services passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 10s {color} | {color:red} hadoop-yarn-services-api in yarn-native-services failed. {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s {color} | {color:green} yarn-native-services passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 58s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-slider/hadoop-yarn-slider-core in yarn-native-services has 314 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 29s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services-api in yarn-native-services has 5 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 26s {color} | {color:red} hadoop-yarn-slider-core in yarn-native-services failed. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 7s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 26s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications: The patch generated 3 new + 460 unchanged - 6 fixed = 463 total (was 466) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 6s {color} | {color:red} hadoop-yarn-services-api in the patch failed. {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s {color} | {color:green} hadoop-yarn-slider-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 35s {color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services-api generated 0 new + 1 unchanged - 4 fixed = 1 total (was 5) {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 1m 1s {color} | {color:red} hadoop-yarn-slider-core in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 52s {color} | {color:red} hadoop-yarn-slider-core in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 47s {color} | {color:green} hadoop-yarn-services-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 45s {color} | {color:red} The patch generated 10 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 27m 15s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | slider.core.registry.docstore.TestPublishedConfigurationOutputter | \\ \\ || Subsystem ||
[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602497#comment-15602497 ] Allen Wittenauer commented on YARN-4734: Just got back from a trip. I'll try to take a look at this over the next few days. A question. Given the ... circumstances... lately of patches going into YARN, what's the security status of this branch? Has anyone done an audit? (Web security is outside my area of expertise, so I'd prefer another set of eyes on this one.) > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch, YARN-4734.10-NOT_READY.patch, > YARN-4734.11-NOT_READY.patch, YARN-4734.12-NOT_READY.patch, > YARN-4734.13.patch, YARN-4734.14.patch, YARN-4734.15.patch, > YARN-4734.2.patch, YARN-4734.3.patch, YARN-4734.4.patch, YARN-4734.5.patch, > YARN-4734.6.patch, YARN-4734.7.patch, YARN-4734.8.patch, > YARN-4734.9-NOT_READY.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()
[ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-5773: --- Attachment: YARN-5773.0001.patch Attaching patch for the same. Capacity scheduler on recovery provides whether attempts is of type recovery or not. Skipping LeafQueue#activateApplication() when the attempt is of type recovery. > RM recovery too slow due to LeafQueue#activateApplication() > --- > > Key: YARN-5773 > URL: https://issues.apache.org/jira/browse/YARN-5773 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-5773.0001.patch > > > # Submit application 10K application to default queue. > # All applications are in accepted state > # Now restart resourcemanager > For each application recovery {{LeafQueue#activateApplications()}} is > invoked.Resulting in AM limit check to be done even before Node managers are > getting registered. > Total iteration for N application is about {{N(N+1)/2}} for {{10K}} > application {{5000}} iterations causing time take for Rm to be active > more than 10 min. > Since NM resources are not yet added to during recovery we should skip > {{activateApplicaiton()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5770) Performance improvement of native-services REST API service
[ https://issues.apache.org/jira/browse/YARN-5770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-5770: Attachment: YARN-5770-yarn-native-services.phase1.002.patch > Performance improvement of native-services REST API service > --- > > Key: YARN-5770 > URL: https://issues.apache.org/jira/browse/YARN-5770 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gour Saha >Assignee: Gour Saha > Fix For: yarn-native-services > > Attachments: YARN-5770-yarn-native-services.phase1.001.patch, > YARN-5770-yarn-native-services.phase1.002.patch > > > Make enhancements and bug-fixes to eliminate frequent full GC of the REST API > Service. Dependent on few Slider fixes like SLIDER-1168 as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5770) Performance improvement of native-services REST API service
[ https://issues.apache.org/jira/browse/YARN-5770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602408#comment-15602408 ] Gour Saha edited comment on YARN-5770 at 10/24/16 3:59 PM: --- Uploading 002 patch with 2 findbugs fixes. was (Author: gsaha): Uploading 002 patch with 2 findbug fixes. > Performance improvement of native-services REST API service > --- > > Key: YARN-5770 > URL: https://issues.apache.org/jira/browse/YARN-5770 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gour Saha >Assignee: Gour Saha > Fix For: yarn-native-services > > Attachments: YARN-5770-yarn-native-services.phase1.001.patch, > YARN-5770-yarn-native-services.phase1.002.patch > > > Make enhancements and bug-fixes to eliminate frequent full GC of the REST API > Service. Dependent on few Slider fixes like SLIDER-1168 as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5770) Performance improvement of native-services REST API service
[ https://issues.apache.org/jira/browse/YARN-5770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602408#comment-15602408 ] Gour Saha commented on YARN-5770: - Uploading 002 patch with 2 findbug fixes. > Performance improvement of native-services REST API service > --- > > Key: YARN-5770 > URL: https://issues.apache.org/jira/browse/YARN-5770 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gour Saha >Assignee: Gour Saha > Fix For: yarn-native-services > > Attachments: YARN-5770-yarn-native-services.phase1.001.patch, > YARN-5770-yarn-native-services.phase1.002.patch > > > Make enhancements and bug-fixes to eliminate frequent full GC of the REST API > Service. Dependent on few Slider fixes like SLIDER-1168 as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5716) Add global scheduler interface definition and update CapacityScheduler to use it.
[ https://issues.apache.org/jira/browse/YARN-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602390#comment-15602390 ] Sunil G commented on YARN-5716: --- Thanks [~leftnoteasy]. bq.do you want to add comments to indicate it should be a read-only class or you want to remove writing APIs from these classes? I was expecting to remove setter api's from this interface. Thoughts? bq.continuous-reservation-looking I think the code is slightly complicated, but functionality seems fine. I am checking lazy-preemption now. > Add global scheduler interface definition and update CapacityScheduler to use > it. > - > > Key: YARN-5716 > URL: https://issues.apache.org/jira/browse/YARN-5716 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-5716.001.patch, YARN-5716.002.patch, > YARN-5716.003.patch, YARN-5716.004.patch, YARN-5716.005.patch, > YARN-5716.006.patch, YARN-5716.007.patch > > > Target of this JIRA: > - Definition of interfaces / objects which will be used by global scheduling, > this will be shared by different schedulers. > - Modify CapacityScheduler to use it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5690) Integrate native services modules into maven build
[ https://issues.apache.org/jira/browse/YARN-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602350#comment-15602350 ] Billie Rinaldi commented on YARN-5690: -- This could be due to log4j settings; the slider command uses log4j for output. Are you using the default log4j.properties? > Integrate native services modules into maven build > -- > > Key: YARN-5690 > URL: https://issues.apache.org/jira/browse/YARN-5690 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi > Attachments: YARN-5690-yarn-native-services.001.patch, > YARN-5690-yarn-native-services.002.patch, > YARN-5690-yarn-native-services.003.patch > > > The yarn dist assembly should include jars for the new modules as well as > their new dependencies. We may want to create new lib directories in the > tarball for the dependencies of the slider-core and services API modules, to > avoid adding these dependencies into the general YARN classpath. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()
[ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-5773: --- Summary: RM recovery too slow due to LeafQueue#activateApplication() (was: Skip LeafQueue#activateApplication for running application on recovery) > RM recovery too slow due to LeafQueue#activateApplication() > --- > > Key: YARN-5773 > URL: https://issues.apache.org/jira/browse/YARN-5773 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > > # Submit application 10K application to default queue. > # All applications are in accepted state > # Now restart resourcemanager > For each application recovery {{LeafQueue#activateApplications()}} is > invoked.Resulting in AM limit check to be done even before Node managers are > getting registered. > Total iteration for N application is about {{N(N+1)/2}} for {{10K}} > application {{5000}} iterations causing time take for Rm to be active > more than 10 min. > Since NM resources are not yet added to during recovery we should skip > {{activateApplicaiton()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4597) Add SCHEDULE to NM container lifecycle
[ https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-4597: -- Attachment: YARN-4597.005.patch Updated patch v005 (also updated the Pull Request). * Added testcase to verify the situation noted by [~jianhe] is correctly handled and does not happen: bq. The logic to select opportunisitic container: we may kill more opportunistic containers than required. e.g... * Added some more javadocs and fixed some checkstyles. * Rebased with trunk. > Add SCHEDULE to NM container lifecycle > -- > > Key: YARN-4597 > URL: https://issues.apache.org/jira/browse/YARN-4597 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Chris Douglas >Assignee: Arun Suresh > Attachments: YARN-4597.001.patch, YARN-4597.002.patch, > YARN-4597.003.patch, YARN-4597.004.patch, YARN-4597.005.patch > > > Currently, the NM immediately launches containers after resource > localization. Several features could be more cleanly implemented if the NM > included a separate stage for reserving resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5773) Skip LeafQueue#activateApplication for running application on recovery
[ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601974#comment-15601974 ] Varun Saxena edited comment on YARN-5773 at 10/24/16 1:30 PM: -- Thanks [~bibinchundatt] for filing the JIRA. Agree that we do not need to iterate over all the pending apps on recovery as NMs' are not yet registered. If there are large number of running apps, RM unnecessarily spends quite a bit of time in this loop. Applications can be activated as and when NMs' register again. was (Author: varun_saxena): Thanks [~bibinchundatt] for filing the JIRA. Agree that we do not need to iterate over all the pending apps on recovery as NMs' are not yet registered. If there are large number of running apps, RM unnecessarily spends quite a bit of time in this loop. Applications can be activated as and when nodes are added. > Skip LeafQueue#activateApplication for running application on recovery > -- > > Key: YARN-5773 > URL: https://issues.apache.org/jira/browse/YARN-5773 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > > # Submit application 10K application to default queue. > # All applications are in accepted state > # Now restart resourcemanager > For each application recovery {{LeafQueue#activateApplications()}} is > invoked.Resulting in AM limit check to be done even before Node managers are > getting registered. > Total iteration for N application is about {{N(N+1)/2}} for {{10K}} > application {{5000}} iterations causing time take for Rm to be active > more than 10 min. > Since NM resources are not yet added to during recovery we should skip > {{activateApplicaiton()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5773) Skip LeafQueue#activateApplication for running application on recovery
[ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601974#comment-15601974 ] Varun Saxena edited comment on YARN-5773 at 10/24/16 1:30 PM: -- Thanks [~bibinchundatt] for filing the JIRA. Agree that we do not need to iterate over all the pending apps on recovery as NMs' are not yet registered. If there are large number of running apps, RM unnecessarily spends quite a bit of time in this loop. Applications can be activated as and when NMs' register again. was (Author: varun_saxena): Thanks [~bibinchundatt] for filing the JIRA. Agree that we do not need to iterate over all the pending apps on recovery as NMs' are not yet registered. If there are large number of running apps, RM unnecessarily spends quite a bit of time in this loop. Applications can be activated as and when NMs' register again. > Skip LeafQueue#activateApplication for running application on recovery > -- > > Key: YARN-5773 > URL: https://issues.apache.org/jira/browse/YARN-5773 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > > # Submit application 10K application to default queue. > # All applications are in accepted state > # Now restart resourcemanager > For each application recovery {{LeafQueue#activateApplications()}} is > invoked.Resulting in AM limit check to be done even before Node managers are > getting registered. > Total iteration for N application is about {{N(N+1)/2}} for {{10K}} > application {{5000}} iterations causing time take for Rm to be active > more than 10 min. > Since NM resources are not yet added to during recovery we should skip > {{activateApplicaiton()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5773) Skip LeafQueue#activateApplication for running application on recovery
[ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601974#comment-15601974 ] Varun Saxena commented on YARN-5773: Thanks [~bibinchundatt] for filing the JIRA. Agree that we do not need to iterate over all the pending apps on recovery as NMs' are not yet registered. If there are large number of running apps, RM unnecessarily spends quite a bit of time in this loop. Applications can be activated as and when nodes are added. > Skip LeafQueue#activateApplication for running application on recovery > -- > > Key: YARN-5773 > URL: https://issues.apache.org/jira/browse/YARN-5773 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > > # Submit application 10K application to default queue. > # All applications are in accepted state > # Now restart resourcemanager > For each application recovery {{LeafQueue#activateApplications()}} is > invoked.Resulting in AM limit check to be done even before Node managers are > getting registered. > Total iteration for N application is about {{N(N+1)/2}} for {{10K}} > application {{5000}} iterations causing time take for Rm to be active > more than 10 min. > Since NM resources are not yet added to during recovery we should skip > {{activateApplicaiton()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5773) Skip LeafQueue#activateApplication for running application on recovery
[ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601944#comment-15601944 ] Bibin A Chundatt commented on YARN-5773: *Solution* The following code to skip {{activateApplication()}} on recovery solved the problem. {noformat} private synchronized void activateApplications() { if (!Resources.greaterThan(resourceCalculator, lastClusterResource, lastClusterResource, Resources.none())) { return; } ... {noformat} Thoughts ??? > Skip LeafQueue#activateApplication for running application on recovery > -- > > Key: YARN-5773 > URL: https://issues.apache.org/jira/browse/YARN-5773 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > > # Submit application 10K application to default queue. > # All applications are in accepted state > # Now restart resourcemanager > For each application recovery {{LeafQueue#activateApplications()}} is > invoked.Resulting in AM limit check to be done even before Node managers are > getting registered. > Total iteration for N application is about {{N(N+1)/2}} for {{10K}} > application {{5000}} iterations causing time take for Rm to be active > more than 10 min. > Since NM resources are not yet added to during recovery we should skip > {{activateApplicaiton()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5773) Skip LeafQueue#activateApplication for running application on recovery
Bibin A Chundatt created YARN-5773: -- Summary: Skip LeafQueue#activateApplication for running application on recovery Key: YARN-5773 URL: https://issues.apache.org/jira/browse/YARN-5773 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical # Submit application 10K application to default queue. # All applications are in accepted state # Now restart resourcemanager For each application recovery {{LeafQueue#activateApplications()}} is invoked.Resulting in AM limit check to be done even before Node managers are getting registered. Total iteration for N application is about {{N(N+1)/2}} for {{10K}} application {{5000}} iterations causing time take for Rm to be active more than 10 min. Since NM resources are not yet added to during recovery we should skip {{activateApplicaiton()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5547) NMLeveldbStateStore should be more tolerant of unknown keys
[ https://issues.apache.org/jira/browse/YARN-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated YARN-5547: -- Attachment: YARN-5547.02.patch Please review > NMLeveldbStateStore should be more tolerant of unknown keys > --- > > Key: YARN-5547 > URL: https://issues.apache.org/jira/browse/YARN-5547 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Ajith S > Attachments: YARN-5547.01.patch, YARN-5547.02.patch > > > Whenever new keys are added to the NM state store it will break rolling > downgrades because the code will throw if it encounters an unrecognized key. > If instead it skipped unrecognized keys it could be simpler to continue > supporting rolling downgrades. We need to define the semantics of > unrecognized keys when containers and apps are cleaned up, e.g.: we may want > to delete all keys underneath an app or container directory when it is being > removed from the state store to prevent leaking unrecognized keys. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5771) Provide option to send env to be whitelisted in ContainerLaunchContext
[ https://issues.apache.org/jira/browse/YARN-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601851#comment-15601851 ] Bibin A Chundatt edited comment on YARN-5771 at 10/24/16 12:32 PM: --- In addition to above implementation the following changes also need to be done. # Configuration in nodemanager side to enable this feature. # Add configuration for ENV properties of NM which should never get whitelisted even if send as part of ContainerLaunchContext. Thoughts? was (Author: bibinchundatt): Additional above implementation the following changes also need to be done. # Configuration in nodemanager side to enable this feature. # Add configuration for ENV properties of NM which should never get whitelisted even if send as part of ContainerLaunchContext. Thoughts? > Provide option to send env to be whitelisted in ContainerLaunchContext > --- > > Key: YARN-5771 > URL: https://issues.apache.org/jira/browse/YARN-5771 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: container-whitelist-env-wip.patch > > > As per current implementation ENV to be white listed for container launch is > are configured as part of {{yarn.nodemanager.env-whitelist}} > Specific to container we cannot specify additional ENV properties to be > whitelisted. As part of this jira we are providing an option to provide > additional whitelist ENV. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5771) Provide option to send env to be whitelisted in ContainerLaunchContext
[ https://issues.apache.org/jira/browse/YARN-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601851#comment-15601851 ] Bibin A Chundatt commented on YARN-5771: Additional above implementation the following changes also need to be done. # Configuration in nodemanager side to enable this feature. # Add configuration for ENV properties of NM which should never get whitelisted even if send as part of ContainerLaunchContext. Thoughts? > Provide option to send env to be whitelisted in ContainerLaunchContext > --- > > Key: YARN-5771 > URL: https://issues.apache.org/jira/browse/YARN-5771 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: container-whitelist-env-wip.patch > > > As per current implementation ENV to be white listed for container launch is > are configured as part of {{yarn.nodemanager.env-whitelist}} > Specific to container we cannot specify additional ENV properties to be > whitelisted. As part of this jira we are providing an option to provide > additional whitelist ENV. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5772) Replace old Hadoop logo with new one
[ https://issues.apache.org/jira/browse/YARN-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601694#comment-15601694 ] Hadoop QA commented on YARN-5772: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 3m 43s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 4m 21s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:5a4801a | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12834918/YARN-5772-YARN-3368.0001.patch | | JIRA Issue | YARN-5772 | | Optional Tests | asflicense | | uname | Linux d47170ae53d6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | YARN-3368 / 9690f29 | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/13485/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Replace old Hadoop logo with new one > > > Key: YARN-5772 > URL: https://issues.apache.org/jira/browse/YARN-5772 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-ui-v2 >Affects Versions: YARN-3368 >Reporter: Akira Ajisaka >Assignee: Akhil PB > Attachments: YARN-5772-YARN-3368.0001.patch, ui2-with-newlogo.png > > > YARN-5161 added Apache Hadoop logo in the UI but the logo is old. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5772) Replace old Hadoop logo with new one
[ https://issues.apache.org/jira/browse/YARN-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601658#comment-15601658 ] Sunil G edited comment on YARN-5772 at 10/24/16 11:09 AM: -- Thanks [~akhilpb]. It looks fine for me.. Also attached the screen shot. [~ajisakaa] and [~leftnoteasy]/[~Sreenath]. pls take a look. If its fine, i ll commit the change after jenkins is run. was (Author: sunilg): Thanks [~akhilpb]. It looks fine for me.. Also attached the screen shot. [~ajisakaa] and [~leftnoteasy]/[~Sreenath]. pls take a look. If its fine, i ll commit the change. > Replace old Hadoop logo with new one > > > Key: YARN-5772 > URL: https://issues.apache.org/jira/browse/YARN-5772 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-ui-v2 >Affects Versions: YARN-3368 >Reporter: Akira Ajisaka >Assignee: Akhil PB > Attachments: YARN-5772-YARN-3368.0001.patch, ui2-with-newlogo.png > > > YARN-5161 added Apache Hadoop logo in the UI but the logo is old. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5772) Replace old Hadoop logo with new one
[ https://issues.apache.org/jira/browse/YARN-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-5772: -- Attachment: YARN-5772-YARN-3368.0001.patch > Replace old Hadoop logo with new one > > > Key: YARN-5772 > URL: https://issues.apache.org/jira/browse/YARN-5772 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-ui-v2 >Affects Versions: YARN-3368 >Reporter: Akira Ajisaka >Assignee: Akhil PB > Attachments: YARN-5772-YARN-3368.0001.patch, ui2-with-newlogo.png > > > YARN-5161 added Apache Hadoop logo in the UI but the logo is old. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5772) Replace old Hadoop logo with new one
[ https://issues.apache.org/jira/browse/YARN-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601658#comment-15601658 ] Sunil G commented on YARN-5772: --- Thanks [~akhilpb]. It looks fine for me.. Also attached the screen shot. [~ajisakaa] and [~leftnoteasy]/[~Sreenath]. pls take a look. If its fine, i ll commit the change. > Replace old Hadoop logo with new one > > > Key: YARN-5772 > URL: https://issues.apache.org/jira/browse/YARN-5772 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-ui-v2 >Affects Versions: YARN-3368 >Reporter: Akira Ajisaka >Assignee: Akhil PB > Attachments: ui2-with-newlogo.png > > > YARN-5161 added Apache Hadoop logo in the UI but the logo is old. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5772) Replace old Hadoop logo with new one
[ https://issues.apache.org/jira/browse/YARN-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-5772: -- Attachment: ui2-with-newlogo.png > Replace old Hadoop logo with new one > > > Key: YARN-5772 > URL: https://issues.apache.org/jira/browse/YARN-5772 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-ui-v2 >Affects Versions: YARN-3368 >Reporter: Akira Ajisaka >Assignee: Akhil PB > Attachments: ui2-with-newlogo.png > > > YARN-5161 added Apache Hadoop logo in the UI but the logo is old. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-5772) Replace old Hadoop logo with new one
[ https://issues.apache.org/jira/browse/YARN-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akhil PB reassigned YARN-5772: -- Assignee: Akhil PB > Replace old Hadoop logo with new one > > > Key: YARN-5772 > URL: https://issues.apache.org/jira/browse/YARN-5772 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-ui-v2 >Affects Versions: YARN-3368 >Reporter: Akira Ajisaka >Assignee: Akhil PB > > YARN-5161 added Apache Hadoop logo in the UI but the logo is old. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5705) [YARN-3368] Add support for Timeline V2 to new web UI
[ https://issues.apache.org/jira/browse/YARN-5705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akhil PB updated YARN-5705: --- Attachment: YARN-5705.010.patch > [YARN-3368] Add support for Timeline V2 to new web UI > - > > Key: YARN-5705 > URL: https://issues.apache.org/jira/browse/YARN-5705 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil G >Assignee: Akhil PB > Attachments: YARN-5705.001.patch, YARN-5705.002.patch, > YARN-5705.003.patch, YARN-5705.004.patch, YARN-5705.005.patch, > YARN-5705.006.patch, YARN-5705.007.patch, YARN-5705.008.patch, > YARN-5705.009.patch, YARN-5705.010.patch > > > Integrate timeline v2 to YARN-3368. This is a clone JIRA for YARN-4097 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5611) Provide an API to update lifetime of an application.
[ https://issues.apache.org/jira/browse/YARN-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601420#comment-15601420 ] Varun Vasudev commented on YARN-5611: - Instead of using long as part of the API and expecting clients to convert time to and from UTC epoch, it's much cleaner to use a ISO-8601 formatted string. You can avoid writing the utility functions as well since there are a lot of libraries to handle ISO-8601 dates. > Provide an API to update lifetime of an application. > > > Key: YARN-5611 > URL: https://issues.apache.org/jira/browse/YARN-5611 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-5611.patch, 0002-YARN-5611.patch, > 0003-YARN-5611.patch, YARN-5611.v0.patch > > > YARN-4205 monitors an Lifetime of an applications is monitored if required. > Add an client api to update lifetime of an application. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5772) Replace old Hadoop logo with new one
[ https://issues.apache.org/jira/browse/YARN-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated YARN-5772: Description: YARN-5161 added Apache Hadoop logo in the UI but the logo is old. > Replace old Hadoop logo with new one > > > Key: YARN-5772 > URL: https://issues.apache.org/jira/browse/YARN-5772 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-ui-v2 >Affects Versions: YARN-3368 >Reporter: Akira Ajisaka > > YARN-5161 added Apache Hadoop logo in the UI but the logo is old. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5772) Replace old Hadoop logo with new one
Akira Ajisaka created YARN-5772: --- Summary: Replace old Hadoop logo with new one Key: YARN-5772 URL: https://issues.apache.org/jira/browse/YARN-5772 Project: Hadoop YARN Issue Type: Sub-task Components: yarn-ui-v2 Affects Versions: YARN-3368 Reporter: Akira Ajisaka -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4743) ResourceManager crash because TimSort
[ https://issues.apache.org/jira/browse/YARN-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601328#comment-15601328 ] Yufei Gu commented on YARN-4743: Hi [~gzh1992n], thanks for working on this. The patch v2 looks generally good to me. Some nits: 1. If you want to use if-else statements, better to use {{weight1 == 0}} instead of {{weight1 != 0}} to get better readability. Or we can use this to avoid if-else statements {code} useToWeightRatio1 = -weight1; useToWeightRatio2 = -weight2; {code} 2. Please describe the change in doc of function {{FairShareComparator}}. 3. Please fixed all style issue in Hadoop QA's comment. 4. Can we put the {{TestFairShareComparator}} into {{TestSchedulingPolicy}}, and add doc for the function in the unit test? 5. Not sure why {{startTimeColloection}} and {{nameCollection}} are needed. Can you explain a little bit? > ResourceManager crash because TimSort > - > > Key: YARN-4743 > URL: https://issues.apache.org/jira/browse/YARN-4743 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.0.0-alpha1 >Reporter: Zephyr Guo >Assignee: Zephyr Guo > Attachments: YARN-4743-v1.patch, YARN-4743-v2.patch, timsort.log > > > {code} > 2016-02-26 14:08:50,821 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:868) > at java.util.TimSort.mergeAt(TimSort.java:485) > at java.util.TimSort.mergeCollapse(TimSort.java:410) > at java.util.TimSort.sort(TimSort.java:214) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:316) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:240) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:989) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1185) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) > at java.lang.Thread.run(Thread.java:745) > 2016-02-26 14:08:50,822 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > {code} > Actually, this bug found in 2.6.0-cdh. {{FairShareComparator}} is not > transitive. > We get NaN when memorySize=0 and weight=0. > {code:title=FairSharePolicy.java} > useToWeightRatio1 = s1.getResourceUsage().getMemorySize() / > s1.getWeights().getWeight(ResourceType.MEMORY) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5375) invoke MockRM#drainEvents implicitly in MockRM methods to reduce test failures
[ https://issues.apache.org/jira/browse/YARN-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601173#comment-15601173 ] sandflee commented on YARN-5375: sorry for the delay, will do this in these days > invoke MockRM#drainEvents implicitly in MockRM methods to reduce test failures > -- > > Key: YARN-5375 > URL: https://issues.apache.org/jira/browse/YARN-5375 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: sandflee >Assignee: sandflee > Attachments: YARN-5375.01.patch, YARN-5375.03.patch, > YARN-5375.04.patch, YARN-5375.05.patch, YARN-5375.06.patch, > YARN-5375.07-drain-statestore.patch, YARN-5375.07-sync-statestore.patch > > > seen many test failures related to RMApp/RMAppattempt comes to some state but > some event are not processed in rm event queue or scheduler event queue, > cause test failure, seems we could implicitly invokes drainEvents(should also > drain sheduler event) in some mockRM method like waitForState -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601105#comment-15601105 ] Sunil G commented on YARN-2009: --- Test case failures are not related. YARN-5362 is handling the same. > Priority support for preemption in ProportionalCapacityPreemptionPolicy > --- > > Key: YARN-2009 > URL: https://issues.apache.org/jira/browse/YARN-2009 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Devaraj K >Assignee: Sunil G > Attachments: YARN-2009.0001.patch, YARN-2009.0002.patch, > YARN-2009.0003.patch, YARN-2009.0004.patch, YARN-2009.0005.patch, > YARN-2009.0006.patch, YARN-2009.0007.patch, YARN-2009.0008.patch, > YARN-2009.0009.patch, YARN-2009.0010.patch, YARN-2009.0011.patch, > YARN-2009.0012.patch, YARN-2009.0013.patch, YARN-2009.0014.patch > > > While preempting containers based on the queue ideal assignment, we may need > to consider preempting the low priority application containers first. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5375) invoke MockRM#drainEvents implicitly in MockRM methods to reduce test failures
[ https://issues.apache.org/jira/browse/YARN-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601101#comment-15601101 ] Sunil G commented on YARN-5375: --- I think we need to get this in as many tests are failing randomly. [~sandflee]. seems like we have a consensus for state-store patch approach. In that case, could you please make this as a proper patch here. > invoke MockRM#drainEvents implicitly in MockRM methods to reduce test failures > -- > > Key: YARN-5375 > URL: https://issues.apache.org/jira/browse/YARN-5375 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: sandflee >Assignee: sandflee > Attachments: YARN-5375.01.patch, YARN-5375.03.patch, > YARN-5375.04.patch, YARN-5375.05.patch, YARN-5375.06.patch, > YARN-5375.07-drain-statestore.patch, YARN-5375.07-sync-statestore.patch > > > seen many test failures related to RMApp/RMAppattempt comes to some state but > some event are not processed in rm event queue or scheduler event queue, > cause test failure, seems we could implicitly invokes drainEvents(should also > drain sheduler event) in some mockRM method like waitForState -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5362) TestRMRestart#testFinishedAppRemovalAfterRMRestart can fail
[ https://issues.apache.org/jira/browse/YARN-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601096#comment-15601096 ] Sunil G commented on YARN-5362: --- +1. I am still getting same as [~naganarasimha...@apache.org]. https://builds.apache.org/job/PreCommit-YARN-Build/13472/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/ I think events are not fully drained here which would have come from StateStore. YARN-5375 would have been a clean solution for this. I think we can make progress there with review. > TestRMRestart#testFinishedAppRemovalAfterRMRestart can fail > --- > > Key: YARN-5362 > URL: https://issues.apache.org/jira/browse/YARN-5362 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jason Lowe >Assignee: sandflee > Fix For: 2.9.0, 3.0.0-alpha1 > > Attachments: YARN-5362.01.patch > > > Saw the following in a precommit build that only changed an unrelated unit > test: > {noformat} > Tests run: 29, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 101.265 sec > <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart > testFinishedAppRemovalAfterRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) > Time elapsed: 0.411 sec <<< FAILURE! > java.lang.AssertionError: expected null, but > was:> at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1653) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org