[jira] [Commented] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))
[ https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021810#comment-17021810 ] Brahma Reddy Battula commented on YARN-10089: - Looks testfailure is related, will look into this. > [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM > registeration)) > - > > Key: YARN-10089 > URL: https://issues.apache.org/jira/browse/YARN-10089 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-10089-001.patch, YARN-10089-002.patch, > YARN-10089-003.patch > > > PhysicalResource will be null always, in following scenario > i) Upgrade RM from 2.7 to 3.0. > ii) Upgrade NM from 2.7 to 3.0. > Here when NM re-register,as RMContext already have this nodeID so it will not > added again as httpport also same hence "PhysicalResource" will be always > null in the upgraded cluster till RM restart. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))
[ https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021722#comment-17021722 ] Hadoop QA commented on YARN-10089: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 37s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 21s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 41s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 51s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 17s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 42s{color} | {color:orange} root: The patch generated 1 new + 105 unchanged - 0 fixed = 106 total (was 105) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 2s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 23m 30s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}115m 36s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 20s{color} | {color:green} hadoop-sls in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 52s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}263m 24s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing | | | hadoop.yarn.server.resourcemanager.TestReservationSystemWithRMHA | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-10089 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12991583/YARN-10089-003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux faf93df9c17f 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28
[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry
[ https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021671#comment-17021671 ] Hadoop QA commented on YARN-9768: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 7s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 55s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 0s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 27s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 55s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 51s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 45s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 87m 3s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}181m 20s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-9768 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12991528/YARN-9768.009.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux b928eb10f94c 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC
[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime
[ https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021659#comment-17021659 ] Hadoop QA commented on YARN-10084: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 37s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 49s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 27s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 33s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 87m 57s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 47s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}162m 43s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-10084 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12991581/YARN-10084.004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a1aaabc6fd54 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC
[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS
[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021651#comment-17021651 ] Wangda Tan commented on YARN-9879: -- Thanks [~shuzirra], [~wilfreds] for sharing your thoughts! 1) Regarding change semantics of GetQueueName() to return full qualified queue name v.s. use GetQueuePath: If we decide to go the first route, we need to remove usages of AbstractCSQueue.GetQueuePath (which has 128 usages), and add a GetShortQueueName in some places. So to me, there are no significant differences to just change internal CS usages to use GetQueuePath(). 2) No matter which way we decided to go, I think we should make sure that: API compatibility, this is critical since I assume there're lots of monitoring framework, JMX metrics, etc. based on this. If we upgrade an existing CS-based cluster, they should expect the same result. Please refer to API compatibility: [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html] Internal usage of GetQueuePath (or GetShortQueueName if we choose proposed approach). And externally, we should make sure we can get a queue by short name, or long name. I want to make sure we only check short name / long name in external call (like submit app to specified queue), and in all other places, we use the full queue path to operate. I think introducing a new CSQueueStore sounds good, but I recommend to add a separate method to CSQueueStore to check both short/long names and make it used by external callers only (And in contrast, internal CS method should check only one HashMap instead of two). We can review details of CSQueueStore separately. > Allow multiple leaf queues with the same name in CS > --- > > Key: YARN-9879 > URL: https://issues.apache.org/jira/browse/YARN-9879 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Attachments: DesignDoc_v1.pdf, YARN-9879.POC001.patch > > > Currently the leaf queue's name must be unique regardless of its position in > the queue hierarchy. > Design doc and first proposal is being made, I'll attach it as soon as it's > done. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime
[ https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021613#comment-17021613 ] Eric Badger commented on YARN-10084: Hey [~epayne], the patch looks good, but I have a comment about the tests. You've added a test to make sure that leaf queues correctly inherit their parent's default and max lifetimes. Could you also add a test to check that the leaf queue is able to override the parent's default and max lifetimes? > Allow inheritance of max app lifetime / default app lifetime > > > Key: YARN-10084 > URL: https://issues.apache.org/jira/browse/YARN-10084 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.10.0, 3.2.1, 3.1.3 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: YARN-10084.001.patch, YARN-10084.002.patch, > YARN-10084.003.patch, YARN-10084.004.patch > > > Currently, {{maximum-application-lifetime}} and > {{default-application-lifetime}} must be set for each leaf queue. If it is > not set for a particular leaf queue, then there will be no time limit on apps > running in that queue. It should be possible to set > {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root > queue and allow child queues to override that value if desired. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10094) Add configuration to support NM overuse in RM
[ https://issues.apache.org/jira/browse/YARN-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021610#comment-17021610 ] Eric Payne commented on YARN-10094: --- [~cane], I feel that this JIRA may have the same goal as YARN-291. Several pieces of the overcommit feature are already in YARN. > Add configuration to support NM overuse in RM > - > > Key: YARN-10094 > URL: https://issues.apache.org/jira/browse/YARN-10094 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-10094.001.patch > > > In a large cluster , upgrade NM will cost too much time. > Some times we want to support memory or cpu overuse from RM view. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10090) ApplicationNotFoundException will cause a UndeclaredThrowableException
[ https://issues.apache.org/jira/browse/YARN-10090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021605#comment-17021605 ] Eric Payne commented on YARN-10090: --- [~yzzjjyy], can you please tell me where you are seeing this exception? When I try this (in 2.8 and 3.3), I don't see any exception either in the UI or in the RM log. If you are seeing it in the RM log, that may be okay, in my opinion. > ApplicationNotFoundException will cause a UndeclaredThrowableException > -- > > Key: YARN-10090 > URL: https://issues.apache.org/jira/browse/YARN-10090 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.2 > Environment: Hadoop 2.9.2 >Reporter: qiwei huang >Priority: Minor > > while entering a non-exist application page(e.g. > RM:8088/cluster/app/application_1234), the getApplicationReport will throw an > ApplicationNotFoundException and would cause UndeclaredThrowableException in > the UserGroupInformation. the log is like: > 2020-01-15 15:10:13,056 [6224200281] - ERROR > [90425890@qtp-1302725372-97757:AppBlock@124] - Failed to read the application > application_1572848307818_1234.2020-01-15 15:10:13,056 [6224200281] - ERROR > [90425890@qtp-1302725372-97757:AppBlock@124] - Failed to read the application > application_1572848307818_2006587.java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911) > at org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:114) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppBlock.render(RMAppBlock.java:70) > at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at > org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at > org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:848) at > org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56) > at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at > org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.app(RmController.java:54) > at sun.reflect.GeneratedMethodAccessor222.invoke(Unknown Source) at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:173) at > javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:178) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > at > com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) > at > com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) > at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1440) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at
[jira] [Commented] (YARN-10091) Support clean up orphan app's log in LogAggService
[ https://issues.apache.org/jira/browse/YARN-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021587#comment-17021587 ] Eric Payne commented on YARN-10091: --- [~cane], can you please be more specific? What is an orphan app and where is the directory? Are you talking about /user//.staging? > Support clean up orphan app's log in LogAggService > -- > > Key: YARN-10091 > URL: https://issues.apache.org/jira/browse/YARN-10091 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > > In a large cluster, there will exist orphan app log directory which will > cause disk leak.We should support cleanup app log directory for this kind of > app -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime
[ https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021579#comment-17021579 ] Jim Brennan commented on YARN-10084: +1 (non-binding) on patch 004. I built it locally and ran the unit test again. > Allow inheritance of max app lifetime / default app lifetime > > > Key: YARN-10084 > URL: https://issues.apache.org/jira/browse/YARN-10084 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.10.0, 3.2.1, 3.1.3 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: YARN-10084.001.patch, YARN-10084.002.patch, > YARN-10084.003.patch, YARN-10084.004.patch > > > Currently, {{maximum-application-lifetime}} and > {{default-application-lifetime}} must be set for each leaf queue. If it is > not set for a particular leaf queue, then there will be no time limit on apps > running in that queue. It should be possible to set > {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root > queue and allow child queues to override that value if desired. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))
[ https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-10089: Attachment: YARN-10089-003.patch > [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM > registeration)) > - > > Key: YARN-10089 > URL: https://issues.apache.org/jira/browse/YARN-10089 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-10089-001.patch, YARN-10089-002.patch, > YARN-10089-003.patch > > > PhysicalResource will be null always, in following scenario > i) Upgrade RM from 2.7 to 3.0. > ii) Upgrade NM from 2.7 to 3.0. > Here when NM re-register,as RMContext already have this nodeID so it will not > added again as httpport also same hence "PhysicalResource" will be always > null in the upgraded cluster till RM restart. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9790) Failed to set default-application-lifetime if maximum-application-lifetime is less than or equal to zero
[ https://issues.apache.org/jira/browse/YARN-9790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021574#comment-17021574 ] Eric Payne commented on YARN-9790: -- If there are no objections, I'd like to backport this all the way back to branch-2.10. > Failed to set default-application-lifetime if maximum-application-lifetime is > less than or equal to zero > > > Key: YARN-9790 > URL: https://issues.apache.org/jira/browse/YARN-9790 > Project: Hadoop YARN > Issue Type: Bug >Reporter: kyungwan nam >Assignee: kyungwan nam >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9790.001.patch, YARN-9790.002.patch, > YARN-9790.003.patch, YARN-9790.004.patch > > > capacity-scheduler > {code} > ... > yarn.scheduler.capacity.root.dev.maximum-application-lifetime=-1 > yarn.scheduler.capacity.root.dev.default-application-lifetime=604800 > {code} > refreshQueue was failed as follows > {code} > 2019-08-28 15:21:57,423 WARN resourcemanager.AdminService > (AdminService.java:logAndWrapException(910)) - Exception refresh queues. > java.io.IOException: Failed to re-init queues : Default lifetime604800 can't > exceed maximum lifetime -1 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:477) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:423) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:394) > at > org.apache.hadoop.yarn.server.api.impl.pb.service.ResourceManagerAdministrationProtocolPBServiceImpl.refreshQueues(ResourceManagerAdministrationProtocolPBServiceImpl.java:114) > at > org.apache.hadoop.yarn.proto.ResourceManagerAdministrationProtocol$ResourceManagerAdministrationProtocolService$2.callBlockingMethod(ResourceManagerAdministrationProtocol.java:271) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Default > lifetime604800 can't exceed maximum lifetime -1 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.setupQueueConfigs(LeafQueue.java:268) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:162) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:141) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:259) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:283) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:171) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:726) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:472) > ... 12 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime
[ https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-10084: -- Attachment: YARN-10084.004.patch > Allow inheritance of max app lifetime / default app lifetime > > > Key: YARN-10084 > URL: https://issues.apache.org/jira/browse/YARN-10084 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.10.0, 3.2.1, 3.1.3 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: YARN-10084.001.patch, YARN-10084.002.patch, > YARN-10084.003.patch, YARN-10084.004.patch > > > Currently, {{maximum-application-lifetime}} and > {{default-application-lifetime}} must be set for each leaf queue. If it is > not set for a particular leaf queue, then there will be no time limit on apps > running in that queue. It should be possible to set > {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root > queue and allow child queues to override that value if desired. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))
[ https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-10089: Attachment: YARN-10089-002.patch > [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM > registeration)) > - > > Key: YARN-10089 > URL: https://issues.apache.org/jira/browse/YARN-10089 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-10089-001.patch, YARN-10089-002.patch > > > PhysicalResource will be null always, in following scenario > i) Upgrade RM from 2.7 to 3.0. > ii) Upgrade NM from 2.7 to 3.0. > Here when NM re-register,as RMContext already have this nodeID so it will not > added again as httpport also same hence "PhysicalResource" will be always > null in the upgraded cluster till RM restart. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8472) YARN Container Phase 2
[ https://issues.apache.org/jira/browse/YARN-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang resolved YARN-8472. - Fix Version/s: 3.3.0 Release Note: - Improved debugging Docker container on YARN - Improved security for running Docker containers - Improved cgroup management for docker container. Resolution: Fixed > YARN Container Phase 2 > -- > > Key: YARN-8472 > URL: https://issues.apache.org/jira/browse/YARN-8472 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Fix For: 3.3.0 > > > In YARN-3611, we have implemented basic Docker container support for YARN. > This story is the next phase to improve container usability. > Several area for improvements are: > # Software defined network support > # Interactive shell to container > # User management sss/nscd integration > # Runc/containerd support > # Metrics/Logs integration with Timeline service v2 > # Docker container profiles > # Docker cgroup management -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime
[ https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021554#comment-17021554 ] Eric Payne commented on YARN-10084: --- I clicked on the "compile" link above and it says: {noformat} [ERROR] warning Error running install script for optional dependency: "/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp/node_modules/phantomjs-prebuilt: Command failed. [ERROR] Exit code: 1 [ERROR] Command: node install.js [ERROR] Arguments: [ERROR] Directory: /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp/node_modules/phantomjs-prebuilt [ERROR] Output: [ERROR] PhantomJS not found on PATH [INFO] info This module is OPTIONAL, you can safely ignore this error [ERROR] Downloading https://github.com/Medium/phantomjs/releases/download/v2.1.1/phantomjs-2.1.1-linux-x86_64.tar.bz2 [ERROR] Saving to /tmp/phantomjs/phantomjs-2.1.1-linux-x86_64.tar.bz2 [ERROR] Receiving... [ERROR] [ERROR] Error making request. [ERROR] Error: socket hang up [ERROR] at createHangUpError (_http_client.js:342:15) [ERROR] at TLSSocket.socketOnEnd (_http_client.js:437:23) [ERROR] at emitNone (events.js:111:20) [ERROR] at TLSSocket.emit (events.js:208:7) [ERROR] at endReadableNT (_stream_readable.js:1064:12) [ERROR] at _combinedTickCallback (internal/process/next_tick.js:139:11) [ERROR] at process._tickCallback (internal/process/next_tick.js:181:9) [ERROR] [ERROR] Please report this full log at https://github.com/Medium/phantomjs; {noformat} I don't think this is related to the code in the patch. Hopefully, it's a transient build environment issue. I'm uploading version 004 to address the checkstyle issues. > Allow inheritance of max app lifetime / default app lifetime > > > Key: YARN-10084 > URL: https://issues.apache.org/jira/browse/YARN-10084 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.10.0, 3.2.1, 3.1.3 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: YARN-10084.001.patch, YARN-10084.002.patch, > YARN-10084.003.patch > > > Currently, {{maximum-application-lifetime}} and > {{default-application-lifetime}} must be set for each leaf queue. If it is > not set for a particular leaf queue, then there will be no time limit on apps > running in that queue. It should be possible to set > {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root > queue and allow child queues to override that value if desired. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))
[ https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-10089: Attachment: (was: YARN-10089-002.patch) > [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM > registeration)) > - > > Key: YARN-10089 > URL: https://issues.apache.org/jira/browse/YARN-10089 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-10089-001.patch > > > PhysicalResource will be null always, in following scenario > i) Upgrade RM from 2.7 to 3.0. > ii) Upgrade NM from 2.7 to 3.0. > Here when NM re-register,as RMContext already have this nodeID so it will not > added again as httpport also same hence "PhysicalResource" will be always > null in the upgraded cluster till RM restart. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9292) Implement logic to keep docker image consistent in application that uses :latest tag
[ https://issues.apache.org/jira/browse/YARN-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021545#comment-17021545 ] Eric Yang commented on YARN-9292: - >From today's YARN Docker community meeting, we have decided to abandon this >patch. There is possibilities that AM can fail over a node which has >different latest tag than previous node. The frame of reference to latest tag >is relative to the node where AM is running. If there are inconsistency in >the cluster, this patch will not solve the consistency problem. Newly spawned >AM will use a different sha id that maps to latest tag, which leads to >inconsistent sha id used by the same application. The ideal design is to have YARN client to discover the latest tag is referencing, then populate that information to rest of the job. Unfortunately, there is no connection between YARN and where docker registry might be running. Hence, it is not possible to implement this proper for YARN and Docker integration. The community settle on document this wrinkle and try to avoid using latest tag as best practice. For Runc container, it will be possible to use HDFS as source of truth to look up the global hash designation for runc container. YARN client can query HDFS for the latest tag and it will be consistent on all nodes. This will add some extra protocol interactions between YARN client and RM to solve this problem by the ideal design. > Implement logic to keep docker image consistent in application that uses > :latest tag > > > Key: YARN-9292 > URL: https://issues.apache.org/jira/browse/YARN-9292 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9292.001.patch, YARN-9292.002.patch, > YARN-9292.003.patch, YARN-9292.004.patch, YARN-9292.005.patch, > YARN-9292.006.patch, YARN-9292.007.patch, YARN-9292.008.patch > > > Docker image with latest tag can run in YARN cluster without any validation > in node managers. If a image with latest tag is changed during containers > launch. It might produce inconsistent results between nodes. This is surfaced > toward end of development for YARN-9184 to keep docker image consistent > within a job. One of the ideas to keep :latest tag consistent for a job, is > to use docker image command to figure out the image id and use image id to > propagate to rest of the container requests. There are some challenges to > overcome: > # The latest tag does not exist on the node where first container starts. > The first container will need to download the latest image, and find image > ID. This can introduce lag time for other containers to start. > # If image id is used to start other container, container-executor may have > problems to check if the image is coming from a trusted source. Both image > name and ID must be supply through .cmd file to container-executor. However, > hacker can supply incorrect image id and defeat container-executor security > checks. > If we can over come those challenges, it maybe possible to keep docker image > consistent with one application. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))
[ https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021537#comment-17021537 ] Brahma Reddy Battula commented on YARN-10089: - [~elgoiri] thanks for taking a look. Just cloned the code in my new office laptop, so format issues are there. apart from the following, everything addressed now. since rest of the log's have same format.I feel, we can change all together in another jira..? * Let's use the logger format {} for NodeStatusUpdaterImpl#219. > [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM > registeration)) > - > > Key: YARN-10089 > URL: https://issues.apache.org/jira/browse/YARN-10089 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-10089-001.patch, YARN-10089-002.patch > > > PhysicalResource will be null always, in following scenario > i) Upgrade RM from 2.7 to 3.0. > ii) Upgrade NM from 2.7 to 3.0. > Here when NM re-register,as RMContext already have this nodeID so it will not > added again as httpport also same hence "PhysicalResource" will be always > null in the upgraded cluster till RM restart. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))
[ https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-10089: Attachment: YARN-10089-002.patch > [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM > registeration)) > - > > Key: YARN-10089 > URL: https://issues.apache.org/jira/browse/YARN-10089 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-10089-001.patch, YARN-10089-002.patch > > > PhysicalResource will be null always, in following scenario > i) Upgrade RM from 2.7 to 3.0. > ii) Upgrade NM from 2.7 to 3.0. > Here when NM re-register,as RMContext already have this nodeID so it will not > added again as httpport also same hence "PhysicalResource" will be always > null in the upgraded cluster till RM restart. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))
[ https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021510#comment-17021510 ] Íñigo Goiri commented on YARN-10089: Thanks [~brahmareddy] for fixing this. Minor comments: * Add a comment to the empty {{setPhysicalResource()}} implementations saying that we do this for backwards compatibility or similar. * Let's use the logger format {} for NodeStatusUpdaterImpl#219. * Is it correct to compare with != for ResourceTrackerService#469? Should this be !equals? * Add an extra space in RMNode#137 * There will probably be a longer than 80 chars error in TestNMReconnect. > [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM > registeration)) > - > > Key: YARN-10089 > URL: https://issues.apache.org/jira/browse/YARN-10089 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-10089-001.patch > > > PhysicalResource will be null always, in following scenario > i) Upgrade RM from 2.7 to 3.0. > ii) Upgrade NM from 2.7 to 3.0. > Here when NM re-register,as RMContext already have this nodeID so it will not > added again as httpport also same hence "PhysicalResource" will be always > null in the upgraded cluster till RM restart. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))
[ https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula reassigned YARN-10089: --- Assignee: Brahma Reddy Battula > [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM > registeration)) > - > > Key: YARN-10089 > URL: https://issues.apache.org/jira/browse/YARN-10089 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-10089-001.patch > > > PhysicalResource will be null always, in following scenario > i) Upgrade RM from 2.7 to 3.0. > ii) Upgrade NM from 2.7 to 3.0. > Here when NM re-register,as RMContext already have this nodeID so it will not > added again as httpport also same hence "PhysicalResource" will be always > null in the upgraded cluster till RM restart. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))
[ https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-10089: Attachment: (was: YARN-10089-001.patch) > [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM > registeration)) > - > > Key: YARN-10089 > URL: https://issues.apache.org/jira/browse/YARN-10089 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-10089-001.patch > > > PhysicalResource will be null always, in following scenario > i) Upgrade RM from 2.7 to 3.0. > ii) Upgrade NM from 2.7 to 3.0. > Here when NM re-register,as RMContext already have this nodeID so it will not > added again as httpport also same hence "PhysicalResource" will be always > null in the upgraded cluster till RM restart. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))
[ https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-10089: Attachment: YARN-10089-001.patch > [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM > registeration)) > - > > Key: YARN-10089 > URL: https://issues.apache.org/jira/browse/YARN-10089 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-10089-001.patch > > > PhysicalResource will be null always, in following scenario > i) Upgrade RM from 2.7 to 3.0. > ii) Upgrade NM from 2.7 to 3.0. > Here when NM re-register,as RMContext already have this nodeID so it will not > added again as httpport also same hence "PhysicalResource" will be always > null in the upgraded cluster till RM restart. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))
[ https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021504#comment-17021504 ] Brahma Reddy Battula commented on YARN-10089: - Uploaded the initail patch. Kindly review. [~elgoiri] could please review, as you worked on YARN-5356. *Testcase before FIx:* == {noformat} [INFO] Results: [INFO] [ERROR] Failures: [ERROR] TestNMReconnect.testReconnect:155 expected:<> but was: [ERROR] TestNMReconnect.testReconnect:155 expected:<> but was: [INFO] [ERROR] Tests run: 8, Failures: 2, Errors: 0, Skipped: 0 [INFO] [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 10.730 s [INFO] Finished at: 2020-01-23T02:27:11+05:30{noformat} > [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM > registeration)) > - > > Key: YARN-10089 > URL: https://issues.apache.org/jira/browse/YARN-10089 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-10089-001.patch > > > PhysicalResource will be null always, in following scenario > i) Upgrade RM from 2.7 to 3.0. > ii) Upgrade NM from 2.7 to 3.0. > Here when NM re-register,as RMContext already have this nodeID so it will not > added again as httpport also same hence "PhysicalResource" will be always > null in the upgraded cluster till RM restart. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))
[ https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-10089: Attachment: YARN-10089-001.patch > [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM > registeration)) > - > > Key: YARN-10089 > URL: https://issues.apache.org/jira/browse/YARN-10089 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Priority: Blocker > Attachments: YARN-10089-001.patch > > > PhysicalResource will be null always, in following scenario > i) Upgrade RM from 2.7 to 3.0. > ii) Upgrade NM from 2.7 to 3.0. > Here when NM re-register,as RMContext already have this nodeID so it will not > added again as httpport also same hence "PhysicalResource" will be always > null in the upgraded cluster till RM restart. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime
[ https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021463#comment-17021463 ] Hadoop QA commented on YARN-10084: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 48s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 34s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 4s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 6m 27s{color} | {color:red} hadoop-yarn in trunk failed. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 54s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 54s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 15s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 5 new + 142 unchanged - 0 fixed = 147 total (was 142) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 45s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}103m 23s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 46s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}183m 26s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-10084 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12991550/YARN-10084.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime
[ https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021445#comment-17021445 ] Jim Brennan commented on YARN-10084: Thanks [~epayne]! I am +1 (non-binding) on patch 003. cc: [~ebadger] > Allow inheritance of max app lifetime / default app lifetime > > > Key: YARN-10084 > URL: https://issues.apache.org/jira/browse/YARN-10084 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.10.0, 3.2.1, 3.1.3 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: YARN-10084.001.patch, YARN-10084.002.patch, > YARN-10084.003.patch > > > Currently, {{maximum-application-lifetime}} and > {{default-application-lifetime}} must be set for each leaf queue. If it is > not set for a particular leaf queue, then there will be no time limit on apps > running in that queue. It should be possible to set > {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root > queue and allow child queues to override that value if desired. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime
[ https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021440#comment-17021440 ] Eric Payne commented on YARN-10084: --- Thanks [~Jim_Brennan]. I uploaded version 003. > Allow inheritance of max app lifetime / default app lifetime > > > Key: YARN-10084 > URL: https://issues.apache.org/jira/browse/YARN-10084 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.10.0, 3.2.1, 3.1.3 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: YARN-10084.001.patch, YARN-10084.002.patch, > YARN-10084.003.patch > > > Currently, {{maximum-application-lifetime}} and > {{default-application-lifetime}} must be set for each leaf queue. If it is > not set for a particular leaf queue, then there will be no time limit on apps > running in that queue. It should be possible to set > {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root > queue and allow child queues to override that value if desired. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime
[ https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-10084: -- Attachment: YARN-10084.003.patch > Allow inheritance of max app lifetime / default app lifetime > > > Key: YARN-10084 > URL: https://issues.apache.org/jira/browse/YARN-10084 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.10.0, 3.2.1, 3.1.3 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: YARN-10084.001.patch, YARN-10084.002.patch, > YARN-10084.003.patch > > > Currently, {{maximum-application-lifetime}} and > {{default-application-lifetime}} must be set for each leaf queue. If it is > not set for a particular leaf queue, then there will be no time limit on apps > running in that queue. It should be possible to set > {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root > queue and allow child queues to override that value if desired. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6592) [Umbrella] Rich placement constraints in YARN
[ https://issues.apache.org/jira/browse/YARN-6592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated YARN-6592: - Attachment: (was: [YARN-7812] Improvements to Rich Placement Constraints in YARN - ASF JIRA.pdf) > [Umbrella] Rich placement constraints in YARN > - > > Key: YARN-6592 > URL: https://issues.apache.org/jira/browse/YARN-6592 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Konstantinos Karanasos >Priority: Major > Fix For: 3.1.0 > > Attachments: YARN-6592-Rich-Placement-Constraints-Design-V1.pdf > > > This JIRA consolidates the efforts of YARN-5468 and YARN-4902. > It adds support for rich placement constraints to YARN, such as affinity and > anti-affinity between allocations within the same or across applications. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6592) [Umbrella] Rich placement constraints in YARN
[ https://issues.apache.org/jira/browse/YARN-6592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated YARN-6592: - Attachment: (was: [YARN-5468] Scheduling of long-running applications - ASF JIRA.pdf) > [Umbrella] Rich placement constraints in YARN > - > > Key: YARN-6592 > URL: https://issues.apache.org/jira/browse/YARN-6592 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Konstantinos Karanasos >Priority: Major > Fix For: 3.1.0 > > Attachments: YARN-6592-Rich-Placement-Constraints-Design-V1.pdf > > > This JIRA consolidates the efforts of YARN-5468 and YARN-4902. > It adds support for rich placement constraints to YARN, such as affinity and > anti-affinity between allocations within the same or across applications. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime
[ https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021401#comment-17021401 ] Jim Brennan commented on YARN-10084: Thanks for the update [~epayne]! The code looks good to me. One comment on the documentation: {quote}`yarn.scheduler.capacity.root..default-application-lifetime` | Default lifetime (in seconds) of an application which is submitted to a queue. Any value less than or equal to zero will be considered as disabled. If the user has not submitted application with lifetime value then this value will be taken. It is point-in-time configuration. This feature can be set at any level in the queue hierarchy. Child queues will inherit their parent's value unless overridden at the child level. Child queues can set this property to a value less than or equal to their parent's value. {quote} This sentence is inaccurate. Maybe just remove it or change to something like: If a child queue inherits this from the parent and the parent value is greater than the child's max value, the child's max value will be used for the default. {quote}If set to 0, all the queue's max value must also be unlimited. Note : Default lifetime (if set at this level) can't exceed maximum lifetime. {quote} > Allow inheritance of max app lifetime / default app lifetime > > > Key: YARN-10084 > URL: https://issues.apache.org/jira/browse/YARN-10084 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.10.0, 3.2.1, 3.1.3 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: YARN-10084.001.patch, YARN-10084.002.patch > > > Currently, {{maximum-application-lifetime}} and > {{default-application-lifetime}} must be set for each leaf queue. If it is > not set for a particular leaf queue, then there will be no time limit on apps > running in that queue. It should be possible to set > {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root > queue and allow child queues to override that value if desired. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry
[ https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021368#comment-17021368 ] Íñigo Goiri commented on YARN-9768: --- Let's see what Yetus says. > RM Renew Delegation token thread should timeout and retry > - > > Key: YARN-9768 > URL: https://issues.apache.org/jira/browse/YARN-9768 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: CR Hota >Assignee: Manikandan R >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9768.001.patch, YARN-9768.002.patch, > YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, > YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch, > YARN-9768.009.patch > > > Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews > HDFS tokens received to check for validity and expiration time. > This call is made to an underlying HDFS NN or Router Node (which has exact > APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the > thread remains stuck indefinitely. The thread should ideally timeout the > renewToken and retry from the client's perspective. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9879) Allow multiple leaf queues with the same name in CS
[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021280#comment-17021280 ] Gergely Pollak edited comment on YARN-9879 at 1/22/20 6:10 PM: --- Thank you for your feedback [~leftnoteasy] and [~wilfreds]. Originally I tried to keep the getQueueName's behavior, but as I started to investigate it's behavior I've realized we MUST change the way it works. First let's start with a simple question: What is the purpose of the queue's name? Why does it have one, what do we want to use it for? (Ok these are actually 3 questions) As I see in the code the queue name's main purpose is to IDENTIFY a queue, and not just some nice display string. This means the name MUST identify uniquely the queue. Queues are looked up by their name, hence it must be unique or all those references can break. So this is the reason I changed it's behavior to return a unique identifier (the queue's path). Obviously I must check if it breaks anything, and fix it, but allowing multiple leaf queues with the same name is inherently a breaking change. I just try to minimize the impact to change the reference internally to full name everywhere (as you both suggested earlier). About the API breaking. If we have an API which provides us with a queue name, and currently it is a short name, then anyone who uses it to reference to the queue by the provided name will fail in the case of name duplicates. If we return the full name of the queue, then it will still work for them, unless they build on the fact it is just a short name. As long as the queue name is used for queue identification, and not for string operations, it shouldn't cause any problem. Other cases must be identified. This is why I ended up with this approach. This way we change the queue naming once and for all to use full names, and we adjust services which would fail on this change. But we cannot keep the short queue name as reference and have multiple queues with the same name, it's just impossible. This patch will already introduce some changes which can cause issues in already working systems and it might be better to do all invasive changes at once. I can use the getQueuePath (almost) everywhere where we currently using getQueueName, but the result would be the same, with some severe inconsistencies: Using short names would result you being able to get the name of a queue, but you wouldn't be able to get your queue by that very same name from the queue manager. This is just confusing, inconsistent, and not maintenable in my opinion. The quemanager.get(queue.getQueueName()) call can result in NULL or error! (when the queue name is not unique) This is not good practice in my opinion. We need the ambiguous queue list, because we provide a remove method, which can result in a previously ambiguous name becoming ambiguous, and it's much faster to get it from a hashmap O(1), and then check the size of the Set O(1), instead of looking through all queues to see if the collision have been resolved O( n ). The short name map has been introduced for the very same reason, when we look up a queue, we just look it up in 2 HashMaps 2 x O(1), instead of iterating through all queue names and splicing the last part for short name O( n ). So all in all, I've sacrificed some memory space for a drastic speed increase. O( n ) vs O(1) might not seem a huge improvement in the case of a few queues, but considering the queue parse method will make a get call for each queue to check if it is already present in the store, we have a complexity of O(n*n), which IS something to think about. Please help me to think this through one more time with taking my reasons into consideration, thank you. was (Author: shuzirra): Thank you for your feedback [~leftnoteasy] and [~wilfreds]. Originally I tried to keep the getQueueName's behavior, but as I started to investigate it's behavior I've realized we MUST change the way it works. First let's start with a simple question: What is the purpose of the queue's name? Why does it have one, what do we want to use it for? (Ok these are actually 3 questions) As I see in the code the queue name's main purpose is to IDENTIFY a queue, and not just some nice display string. This means the name MUST identify uniquely the queue. Queues are looked up by their name, hence it must be unique or all those references can break. So this is the reason I changed it's behavior to return a unique identifier (the queue's path). Obviously I must check if it breaks anything, and fix it, but allowing multiple leaf queues with the same name is inherently a breaking change. I just try to minimize the impact to change the reference internally to full name everywhere (as you both suggested earlier). About the API breaking. If we have an API which provides us with a queue name, and currently it is
[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime
[ https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021294#comment-17021294 ] Eric Payne commented on YARN-10084: --- Version 002 attached. I did not change the logic since it already worked as you described. The changes are as follows: - AbstractCSQueue: I added a comment and took out extra parenthesis. - CapacityScheduler.md: I updated the descriptions of maximum-application-lifetime and default-application-lifetime. > Allow inheritance of max app lifetime / default app lifetime > > > Key: YARN-10084 > URL: https://issues.apache.org/jira/browse/YARN-10084 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.10.0, 3.2.1, 3.1.3 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: YARN-10084.001.patch, YARN-10084.002.patch > > > Currently, {{maximum-application-lifetime}} and > {{default-application-lifetime}} must be set for each leaf queue. If it is > not set for a particular leaf queue, then there will be no time limit on apps > running in that queue. It should be possible to set > {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root > queue and allow child queues to override that value if desired. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime
[ https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021287#comment-17021287 ] Eric Payne commented on YARN-10084: --- Okay. Thanks for your further analysis, [~Jim_Brennan]. bq. a child queue should not have a max lifetime longer than its parent's max lifetime After thinking about it more, there is no reason a child queue can't have a larger max lifetime than a parent queue. {quote} What if you want only one queue to have no max? How would you configure that? Would be nice if you could specify the max at the root once, and only specify zero on the long job queue to specify that it has no max. {quote} So, >= 0 means that the max lifetime was set in the config and it should be used. < 0 means use the parent's max lifetime value. If root queue, then < 0 means no lifetime value, and that will be inherited by child queues unless overridden. > Allow inheritance of max app lifetime / default app lifetime > > > Key: YARN-10084 > URL: https://issues.apache.org/jira/browse/YARN-10084 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.10.0, 3.2.1, 3.1.3 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: YARN-10084.001.patch, YARN-10084.002.patch > > > Currently, {{maximum-application-lifetime}} and > {{default-application-lifetime}} must be set for each leaf queue. If it is > not set for a particular leaf queue, then there will be no time limit on apps > running in that queue. It should be possible to set > {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root > queue and allow child queues to override that value if desired. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime
[ https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-10084: -- Attachment: YARN-10084.002.patch > Allow inheritance of max app lifetime / default app lifetime > > > Key: YARN-10084 > URL: https://issues.apache.org/jira/browse/YARN-10084 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.10.0, 3.2.1, 3.1.3 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: YARN-10084.001.patch, YARN-10084.002.patch > > > Currently, {{maximum-application-lifetime}} and > {{default-application-lifetime}} must be set for each leaf queue. If it is > not set for a particular leaf queue, then there will be no time limit on apps > running in that queue. It should be possible to set > {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root > queue and allow child queues to override that value if desired. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9879) Allow multiple leaf queues with the same name in CS
[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021280#comment-17021280 ] Gergely Pollak edited comment on YARN-9879 at 1/22/20 5:01 PM: --- Thank you for your feedback [~leftnoteasy] and [~wilfreds]. Originally I tried to keep the getQueueName's behavior, but as I started to investigate it's behavior I've realized we MUST change the way it works. First let's start with a simple question: What is the purpose of the queue's name? Why does it have one, what do we want to use it for? (Ok these are actually 3 questions) As I see in the code the queue name's main purpose is to IDENTIFY a queue, and not just some nice display string. This means the name MUST identify uniquely the queue. Queues are looked up by their name, hence it must be unique or all those references can break. So this is the reason I changed it's behavior to return a unique identifier (the queue's path). Obviously I must check if it breaks anything, and fix it, but allowing multiple leaf queues with the same name is inherently a breaking change. I just try to minimize the impact to change the reference internally to full name everywhere (as you both suggested earlier). About the API breaking. If we have an API which provides us with a queue name, and currently it is a short name, then anyone who uses it to reference to the queue by the provided name will fail in the case of name duplicates. If we return the full name of the queue, then it will still work for them, unless they build on the fact it is just a short name. As long as the queue name is used for queue identification, and not for string operations, it shouldn't cause any problem. Other cases must be identified. This is why I ended up with this approach. This way we change the queue naming once and for all to use full names, and we adjust services which would fail on this change. But we cannot keep the short queue name as reference and have multiple queues with the same name, it's just impossible. This patch will already introduce some changes which can cause issues in already working systems and it might be better to do all invasive changes at once. I can use the getQueuePath (almost) everywhere where we currently using getQueueName, but the result would be the same, with some severe inconsistencies: Using short names would result you being able to get the name of a queue, but you wouldn't be able to get your queue by that very same name from the queue manager. This is just confusing, inconsistent, and not maintenable in my opinion. The quemanager.get(queue.getQueueName()) call can result in NULL or error! (when the queue name is not unique) This is not good practice in my opinion. We need the ambiguous queue list, because we provide a remove method, which can result in a previously ambiguous name becoming ambiguous, and it's much faster to get it from a hashmap O(1), and then check the size of the Set O(1), instead of looking through all queues to see if the collision have been resolved O(n). The short name map has been introduced for the very same reason, when we look up a queue, we just look it up in 2 HashMaps 2 x O(1), instead of iterating through all queue names and splicing the last part for short name O(n). So all in all, I've sacrificed some memory space for a drastic speed increase. O(n) vs O(1) might not seem a huge improvement in the case of a few queues, but considering the queue parse method will make a get call for each queue to check if it is already present in the store, we have a complexity of O(n*n), which IS something to think about. Please help me to think this through one more time with taking my reasons into consideration, thank you. was (Author: shuzirra): Thank you for your feedback Wilfred Spiegelenburg and Wangda Tan. Originally I tried to keep the getQueueName's behavior, but as I started to investigate it's behavior I've realized we MUST change the way it works. First let's start with a simple question: What is the purpose of the queue's name? Why does it have one, what do we want to use it for? (Ok these are actually 3 questions) As I see in the code the queue name's main purpose is to IDENTIFY a queue, and not just some nice display string. This means the name MUST identify uniquely the queue. Queues are looked up by their name, hence it must be unique or all those references can break. So this is the reason I changed it's behavior to return a unique identifier (the queue's path). Obviously I must check if it breaks anything, and fix it, but allowing multiple leaf queues with the same name is inherently a breaking change. I just try to minimize the impact to change the reference internally to full name everywhere (as you both suggested earlier). About the API breaking. If we have an API which provides us with a queue name, and currently it is
[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS
[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021280#comment-17021280 ] Gergely Pollak commented on YARN-9879: -- Thank you for your feedback Wilfred Spiegelenburg and Wangda Tan. Originally I tried to keep the getQueueName's behavior, but as I started to investigate it's behavior I've realized we MUST change the way it works. First let's start with a simple question: What is the purpose of the queue's name? Why does it have one, what do we want to use it for? (Ok these are actually 3 questions) As I see in the code the queue name's main purpose is to IDENTIFY a queue, and not just some nice display string. This means the name MUST identify uniquely the queue. Queues are looked up by their name, hence it must be unique or all those references can break. So this is the reason I changed it's behavior to return a unique identifier (the queue's path). Obviously I must check if it breaks anything, and fix it, but allowing multiple leaf queues with the same name is inherently a breaking change. I just try to minimize the impact to change the reference internally to full name everywhere (as you both suggested earlier). About the API breaking. If we have an API which provides us with a queue name, and currently it is a short name, then anyone who uses it to reference to the queue by the provided name will fail in the case of name duplicates. If we return the full name of the queue, then it will still work for them, unless they build on the fact it is just a short name. As long as the queue name is used for queue identification, and not for string operations, it shouldn't cause any problem. Other cases must be identified. This is why I ended up with this approach. This way we change the queue naming once and for all to use full names, and we adjust services which would fail on this change. But we cannot keep the short queue name as reference and have multiple queues with the same name, it's just impossible. This patch will already introduce some changes which can cause issues in already working systems and it might be better to do all invasive changes at once. I can use the getQueuePath (almost) everywhere where we currently using getQueueName, but the result would be the same, with some severe inconsistencies: Using short names would result you being able to get the name of a queue, but you wouldn't be able to get your queue by that very same name from the queue manager. This is just confusing, inconsistent, and not maintenable in my opinion. The quemanager.get(queue.getQueueName()) call can result in NULL or error! (when the queue name is not unique) This is not good practice in my opinion. We need the ambiguous queue list, because we provide a remove method, which can result in a previously ambiguous name becoming ambiguous, and it's much faster to get it from a hashmap O(1), and then check the size of the Set O(1), instead of looking through all queues to see if the collision have been resolved O(n). The short name map has been introduced for the very same reason, when we look up a queue, we just look it up in 2 HashMaps 2 x O(1), instead of iterating through all queue names and splicing the last part for short name O(n). So all in all, I've sacrificed some memory space for a drastic speed increase. O(n) vs O(1) might not seem a huge improvement in the case of a few queues, but considering the queue parse method will make a get call for each queue to check if it is already present in the store, we have a complexity of O(n*n), which IS something to think about. Please help me to think this through one more time with taking my reasons into consideration, thank you. > Allow multiple leaf queues with the same name in CS > --- > > Key: YARN-9879 > URL: https://issues.apache.org/jira/browse/YARN-9879 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Attachments: DesignDoc_v1.pdf, YARN-9879.POC001.patch > > > Currently the leaf queue's name must be unique regardless of its position in > the queue hierarchy. > Design doc and first proposal is being made, I'll attach it as soon as it's > done. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10085) FS-CS converter: remove mixed ordering policy check
[ https://issues.apache.org/jira/browse/YARN-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021239#comment-17021239 ] Hadoop QA commented on YARN-10085: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 6s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 16s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m 27s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}152m 37s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-10085 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12991512/YARN-10085-005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 8801c3b30216 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d40d7cc | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_232 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/25420/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25420/testReport/ | | Max. process+thread
[jira] [Commented] (YARN-7913) Improve error handling when application recovery fails with exception
[ https://issues.apache.org/jira/browse/YARN-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021228#comment-17021228 ] Hadoop QA commented on YARN-7913: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-3.1 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 4s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 37s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} branch-3.1 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 1s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 67m 14s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}137m 44s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:70a0ef5d4a6 | | JIRA Issue | YARN-7913 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12991516/YARN-7913-branch-3.1.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 53efb2016a9f 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-3.1 / 96c653d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_232 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25418/testReport/ | | Max. process+thread count | 763 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/25418/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. >
[jira] [Commented] (YARN-10083) Provide utility to ask whether an application is in final status
[ https://issues.apache.org/jira/browse/YARN-10083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021226#comment-17021226 ] Adam Antal commented on YARN-10083: --- Thanks for the commit [~snemeth]! > Provide utility to ask whether an application is in final status > > > Key: YARN-10083 > URL: https://issues.apache.org/jira/browse/YARN-10083 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Minor > Fix For: 3.3.0, 3.2.2 > > Attachments: YARN-10083.001.patch, YARN-10083.002.patch, > YARN-10083.002.patch, YARN-10083.003.patch, YARN-10083.branch-3.2.001.patch > > > This code part is severely duplicated across the Hadoop repo: > {code:java} > public static boolean isApplicationFinalState(YarnApplicationState > appState) { > return appState == YarnApplicationState.FINISHED > || appState == YarnApplicationState.FAILED > || appState == YarnApplicationState.KILLED; > } > {code} > This functionality is used heavily by the log aggregation as well, so we may > do some sanitizing here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10098) Add interface to get node iterators by scheduler key for AppPlacementAllocator
[ https://issues.apache.org/jira/browse/YARN-10098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin Chundatt updated YARN-10098: -- Summary: Add interface to get node iterators by scheduler key for AppPlacementAllocator (was: AppPlacementAllocator getPreferredNodeIterator based on scheduler key) > Add interface to get node iterators by scheduler key for AppPlacementAllocator > -- > > Key: YARN-10098 > URL: https://issues.apache.org/jira/browse/YARN-10098 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin Chundatt >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7913) Improve error handling when application recovery fails with exception
[ https://issues.apache.org/jira/browse/YARN-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021222#comment-17021222 ] Szilard Nemeth commented on YARN-7913: -- Thanks [~wilfreds] for other patches, committed them to their respective branches. Closing this jira. > Improve error handling when application recovery fails with exception > - > > Key: YARN-7913 > URL: https://issues.apache.org/jira/browse/YARN-7913 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.0.0 >Reporter: Gergo Repas >Assignee: Wilfred Spiegelenburg >Priority: Major > Fix For: 3.3.0, 3.2.2, 3.1.4 > > Attachments: YARN-7913-branch-3.1.001.patch, > YARN-7913-branch-3.1.001.patch, YARN-7913-branch-3.2.001.patch, > YARN-7913.000.poc.patch, YARN-7913.001.patch, YARN-7913.002.patch, > YARN-7913.003.patch > > > There are edge cases when the application recovery fails with an exception. > Example failure scenario: > * setup: a queue is a leaf queue in the primary RM's config and the same > queue is a parent queue in the secondary RM's config. > * When failover happens with this setup, the recovery will fail for > applications on this queue, and an APP_REJECTED event will be dispatched to > the async dispatcher. On the same thread (that handles the recovery), a > NullPointerException is thrown when the applicationAttempt is tried to be > recovered > (https://github.com/apache/hadoop/blob/55066cc53dc22b68f9ca55a0029741d6c846be0a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L494). > I don't see a good way to avoid the NPE in this scenario, because when the > NPE occurs the APP_REJECTED has not been processed yet, and we don't know > that the application recovery failed. > Currently the first exception will abort the recovery, and if there are X > applications, there will be ~X passive -> active RM transition attempts - the > passive -> active RM transition will only succeed when the last APP_REJECTED > event is processed on the async dispatcher thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7913) Improve error handling when application recovery fails with exception
[ https://issues.apache.org/jira/browse/YARN-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-7913: - Fix Version/s: 3.1.4 3.2.2 > Improve error handling when application recovery fails with exception > - > > Key: YARN-7913 > URL: https://issues.apache.org/jira/browse/YARN-7913 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.0.0 >Reporter: Gergo Repas >Assignee: Wilfred Spiegelenburg >Priority: Major > Fix For: 3.3.0, 3.2.2, 3.1.4 > > Attachments: YARN-7913-branch-3.1.001.patch, > YARN-7913-branch-3.1.001.patch, YARN-7913-branch-3.2.001.patch, > YARN-7913.000.poc.patch, YARN-7913.001.patch, YARN-7913.002.patch, > YARN-7913.003.patch > > > There are edge cases when the application recovery fails with an exception. > Example failure scenario: > * setup: a queue is a leaf queue in the primary RM's config and the same > queue is a parent queue in the secondary RM's config. > * When failover happens with this setup, the recovery will fail for > applications on this queue, and an APP_REJECTED event will be dispatched to > the async dispatcher. On the same thread (that handles the recovery), a > NullPointerException is thrown when the applicationAttempt is tried to be > recovered > (https://github.com/apache/hadoop/blob/55066cc53dc22b68f9ca55a0029741d6c846be0a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L494). > I don't see a good way to avoid the NPE in this scenario, because when the > NPE occurs the APP_REJECTED has not been processed yet, and we don't know > that the application recovery failed. > Currently the first exception will abort the recovery, and if there are X > applications, there will be ~X passive -> active RM transition attempts - the > passive -> active RM transition will only succeed when the last APP_REJECTED > event is processed on the async dispatcher thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10083) Provide utility to ask whether an application is in final status
[ https://issues.apache.org/jira/browse/YARN-10083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021213#comment-17021213 ] Hudson commented on YARN-10083: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17892 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17892/]) YARN-10083. Provide utility to ask whether an application is in final (snemeth: rev 9520b2ad790bd8527033a03e7ee50da71a85df1d) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppsBlock.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppsBlock.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogToolUtils.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/src/main/java/org/apache/hadoop/yarn/server/timeline/EntityGroupFSTimelineStore.java * (edit) hadoop-tools/hadoop-dynamometer/hadoop-dynamometer-infra/src/main/java/org/apache/hadoop/tools/dynamometer/Client.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/LogServlet.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Apps.java * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/NotRunningJob.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/LogWebServiceUtils.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java > Provide utility to ask whether an application is in final status > > > Key: YARN-10083 > URL: https://issues.apache.org/jira/browse/YARN-10083 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Minor > Fix For: 3.3.0, 3.2.2 > > Attachments: YARN-10083.001.patch, YARN-10083.002.patch, > YARN-10083.002.patch, YARN-10083.003.patch, YARN-10083.branch-3.2.001.patch > > > This code part is severely duplicated across the Hadoop repo: > {code:java} > public static boolean isApplicationFinalState(YarnApplicationState > appState) { > return appState == YarnApplicationState.FINISHED > || appState == YarnApplicationState.FAILED > || appState == YarnApplicationState.KILLED; > } > {code} > This functionality is used heavily by the log aggregation as well, so we may > do some sanitizing here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10099) FS-CS converter: handle allow-undeclared-pools and user-as-default queue properly
[ https://issues.apache.org/jira/browse/YARN-10099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-10099: Description: Based on the latest documentation, there are two important properties that are ignored if we have placement rules: ||Property||Explanation|| |yarn.scheduler.fair.allow-undeclared-pools|If this is true, new queues can be created at application submission time, whether because they are specified as the application’s queue by the submitter or because they are placed there by the user-as-default-queue property. If this is false, any time an app would be placed in a queue that is not specified in the allocations file, it is placed in the “default” queue instead. Defaults to true. *If a queue placement policy is given in the allocations file, this property is ignored.*| |yarn.scheduler.fair.user-as-default-queue|Whether to use the username associated with the allocation as the default queue name, in the event that a queue name is not specified. If this is set to “false” or unset, all jobs have a shared default queue, named “default”. Defaults to true. *If a queue placement policy is given in the allocations file, this property is ignored.*| Right now these settings affects the conversion regardless of the placement rules. was: Based on the latest documentation, there are two important properties that are ignored if we have placement rules: ||Property||Explanation|| |yarn.scheduler.fair.allow-undeclared-pools|If this is true, new queues can be created at application submission time, whether because they are specified as the application’s queue by the submitter or because they are placed there by the user-as-default-queue property. If this is false, any time an app would be placed in a queue that is not specified in the allocations file, it is placed in the “default” queue instead. Defaults to true. *If a queue placement policy is given in the allocations file, this property is ignored.*| |yarn.scheduler.fair.user-as-default-queue|Whether to use the username associated with the allocation as the default queue name, in the event that a queue name is not specified. If this is set to “false” or unset, all jobs have a shared default queue, named “default”. Defaults to true. *If a queue placement policy is given in the allocations file, this property is ignored.*| | | | Right now these settings affects the conversion regardless of the placement rules. > FS-CS converter: handle allow-undeclared-pools and user-as-default queue > properly > - > > Key: YARN-10099 > URL: https://issues.apache.org/jira/browse/YARN-10099 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > > Based on the latest documentation, there are two important properties that > are ignored if we have placement rules: > ||Property||Explanation|| > |yarn.scheduler.fair.allow-undeclared-pools|If this is true, new queues can > be created at application submission time, whether because they are specified > as the application’s queue by the submitter or because they are placed there > by the user-as-default-queue property. If this is false, any time an app > would be placed in a queue that is not specified in the allocations file, it > is placed in the “default” queue instead. Defaults to true. *If a queue > placement policy is given in the allocations file, this property is ignored.*| > |yarn.scheduler.fair.user-as-default-queue|Whether to use the username > associated with the allocation as the default queue name, in the event that a > queue name is not specified. If this is set to “false” or unset, all jobs > have a shared default queue, named “default”. Defaults to true. *If a queue > placement policy is given in the allocations file, this property is ignored.*| > Right now these settings affects the conversion regardless of the placement > rules. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10099) FS-CS converter: handle allow-undeclared-pools and user-as-default queue properly
Peter Bacsko created YARN-10099: --- Summary: FS-CS converter: handle allow-undeclared-pools and user-as-default queue properly Key: YARN-10099 URL: https://issues.apache.org/jira/browse/YARN-10099 Project: Hadoop YARN Issue Type: Sub-task Reporter: Peter Bacsko Assignee: Peter Bacsko Based on the latest documentation, there are two important properties that are ignored if we have placement rules: ||Property||Explanation|| |yarn.scheduler.fair.allow-undeclared-pools|If this is true, new queues can be created at application submission time, whether because they are specified as the application’s queue by the submitter or because they are placed there by the user-as-default-queue property. If this is false, any time an app would be placed in a queue that is not specified in the allocations file, it is placed in the “default” queue instead. Defaults to true. *If a queue placement policy is given in the allocations file, this property is ignored.*| |yarn.scheduler.fair.user-as-default-queue|Whether to use the username associated with the allocation as the default queue name, in the event that a queue name is not specified. If this is set to “false” or unset, all jobs have a shared default queue, named “default”. Defaults to true. *If a queue placement policy is given in the allocations file, this property is ignored.*| | | | Right now these settings affects the conversion regardless of the placement rules. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10083) Provide utility to ask whether an application is in final status
[ https://issues.apache.org/jira/browse/YARN-10083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021183#comment-17021183 ] Szilard Nemeth commented on YARN-10083: --- Hi [~adam.antal], Latest patch LGTM, committed to trunk and branch-3.2 Closing this jira as well. > Provide utility to ask whether an application is in final status > > > Key: YARN-10083 > URL: https://issues.apache.org/jira/browse/YARN-10083 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Minor > Attachments: YARN-10083.001.patch, YARN-10083.002.patch, > YARN-10083.002.patch, YARN-10083.003.patch, YARN-10083.branch-3.2.001.patch > > > This code part is severely duplicated across the Hadoop repo: > {code:java} > public static boolean isApplicationFinalState(YarnApplicationState > appState) { > return appState == YarnApplicationState.FINISHED > || appState == YarnApplicationState.FAILED > || appState == YarnApplicationState.KILLED; > } > {code} > This functionality is used heavily by the log aggregation as well, so we may > do some sanitizing here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10083) Provide utility to ask whether an application is in final status
[ https://issues.apache.org/jira/browse/YARN-10083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-10083: -- Fix Version/s: 3.2.2 3.3.0 > Provide utility to ask whether an application is in final status > > > Key: YARN-10083 > URL: https://issues.apache.org/jira/browse/YARN-10083 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Minor > Fix For: 3.3.0, 3.2.2 > > Attachments: YARN-10083.001.patch, YARN-10083.002.patch, > YARN-10083.002.patch, YARN-10083.003.patch, YARN-10083.branch-3.2.001.patch > > > This code part is severely duplicated across the Hadoop repo: > {code:java} > public static boolean isApplicationFinalState(YarnApplicationState > appState) { > return appState == YarnApplicationState.FINISHED > || appState == YarnApplicationState.FAILED > || appState == YarnApplicationState.KILLED; > } > {code} > This functionality is used heavily by the log aggregation as well, so we may > do some sanitizing here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9462) TestResourceTrackerService.testNodeRemovalGracefully fails sporadically
[ https://issues.apache.org/jira/browse/YARN-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021138#comment-17021138 ] Szilard Nemeth commented on YARN-9462: -- Thanks [~prabhujoseph], Pushed 3.2 patch to branch-3.2 as well. > TestResourceTrackerService.testNodeRemovalGracefully fails sporadically > --- > > Key: YARN-9462 > URL: https://issues.apache.org/jira/browse/YARN-9462 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, test >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Fix For: 3.3.0 > > Attachments: > TestResourceTrackerService.testNodeRemovalGracefully.txt, > YARN-9462-001.patch, YARN-9462-branch-3.2.001.patch > > > TestResourceTrackerService.testNodeRemovalGracefully fails sporadically > {code} > [ERROR] > testNodeRemovalGracefully(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService) > Time elapsed: 3.385 s <<< FAILURE! > java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but > was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at > org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtilDecomToUntracked(TestResourceTrackerService.java:2318) > at > org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtil(TestResourceTrackerService.java:2280) > at > org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalGracefully(TestResourceTrackerService.java:2133) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9462) TestResourceTrackerService.testNodeRemovalGracefully fails sporadically
[ https://issues.apache.org/jira/browse/YARN-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-9462: - Fix Version/s: 3.2.2 > TestResourceTrackerService.testNodeRemovalGracefully fails sporadically > --- > > Key: YARN-9462 > URL: https://issues.apache.org/jira/browse/YARN-9462 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, test >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Fix For: 3.3.0, 3.2.2 > > Attachments: > TestResourceTrackerService.testNodeRemovalGracefully.txt, > YARN-9462-001.patch, YARN-9462-branch-3.2.001.patch > > > TestResourceTrackerService.testNodeRemovalGracefully fails sporadically > {code} > [ERROR] > testNodeRemovalGracefully(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService) > Time elapsed: 3.385 s <<< FAILURE! > java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but > was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at > org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtilDecomToUntracked(TestResourceTrackerService.java:2318) > at > org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtil(TestResourceTrackerService.java:2280) > at > org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalGracefully(TestResourceTrackerService.java:2133) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9768) RM Renew Delegation token thread should timeout and retry
[ https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9768: --- Attachment: YARN-9768.009.patch > RM Renew Delegation token thread should timeout and retry > - > > Key: YARN-9768 > URL: https://issues.apache.org/jira/browse/YARN-9768 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: CR Hota >Assignee: Manikandan R >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9768.001.patch, YARN-9768.002.patch, > YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, > YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch, > YARN-9768.009.patch > > > Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews > HDFS tokens received to check for validity and expiration time. > This call is made to an underlying HDFS NN or Router Node (which has exact > APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the > thread remains stuck indefinitely. The thread should ideally timeout the > renewToken and retry from the client's perspective. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry
[ https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021106#comment-17021106 ] Manikandan R commented on YARN-9768: Rebased the patch. Can you please take it forward? > RM Renew Delegation token thread should timeout and retry > - > > Key: YARN-9768 > URL: https://issues.apache.org/jira/browse/YARN-9768 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: CR Hota >Assignee: Manikandan R >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9768.001.patch, YARN-9768.002.patch, > YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, > YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch, > YARN-9768.009.patch > > > Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews > HDFS tokens received to check for validity and expiration time. > This call is made to an underlying HDFS NN or Router Node (which has exact > APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the > thread remains stuck indefinitely. The thread should ideally timeout the > renewToken and retry from the client's perspective. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7913) Improve error handling when application recovery fails with exception
[ https://issues.apache.org/jira/browse/YARN-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021104#comment-17021104 ] Hadoop QA commented on YARN-7913: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 19m 8s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-3.1 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 29s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 29s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 22s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} branch-3.1 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 22s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 71m 20s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}146m 30s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:70a0ef5d4a6 | | JIRA Issue | YARN-7913 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12991516/YARN-7913-branch-3.1.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 053237d4c20a 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-3.1 / 96c653d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_232 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25417/testReport/ | | Max. process+thread count | 810 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/25417/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. >
[jira] [Commented] (YARN-10085) FS-CS converter: remove mixed ordering policy check
[ https://issues.apache.org/jira/browse/YARN-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021098#comment-17021098 ] Hadoop QA commented on YARN-10085: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 54s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 3s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 47s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 86m 14s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}143m 53s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-10085 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12991512/YARN-10085-005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux d4b3f013bf68 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d40d7cc | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_232 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25416/testReport/ | | Max. process+thread count | 820 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/25416/console | |
[jira] [Commented] (YARN-4575) ApplicationResourceUsageReport should return ALL reserved resource
[ https://issues.apache.org/jira/browse/YARN-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021066#comment-17021066 ] Hadoop QA commented on YARN-4575: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 12s{color} | {color:red} YARN-4575 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-4575 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12782086/0002-YARN-4575.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/25419/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > ApplicationResourceUsageReport should return ALL reserved resource > --- > > Key: YARN-4575 > URL: https://issues.apache.org/jira/browse/YARN-4575 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin Chundatt >Priority: Major > Labels: oct16-easy > Attachments: 0001-YARN-4575.patch, 0002-YARN-4575.patch > > > ApplicationResourceUsageReport reserved resource report is only of default > parition should be of all partitions -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10098) AppPlacementAllocator getPreferredNodeIterator based on scheduler key
[ https://issues.apache.org/jira/browse/YARN-10098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin Chundatt updated YARN-10098: -- Summary: AppPlacementAllocator getPreferredNodeIterator based on scheduler key (was: AppPlacementAllocator get getPreferredNodeIterator based on scheduler key) > AppPlacementAllocator getPreferredNodeIterator based on scheduler key > -- > > Key: YARN-10098 > URL: https://issues.apache.org/jira/browse/YARN-10098 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin Chundatt >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10098) AppPlacementAllocator get getPreferredNodeIterator based on scheduler key
Bibin Chundatt created YARN-10098: - Summary: AppPlacementAllocator get getPreferredNodeIterator based on scheduler key Key: YARN-10098 URL: https://issues.apache.org/jira/browse/YARN-10098 Project: Hadoop YARN Issue Type: Improvement Reporter: Bibin Chundatt -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-4575) ApplicationResourceUsageReport should return ALL reserved resource
[ https://issues.apache.org/jira/browse/YARN-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin Chundatt reassigned YARN-4575: Assignee: (was: Bibin Chundatt) > ApplicationResourceUsageReport should return ALL reserved resource > --- > > Key: YARN-4575 > URL: https://issues.apache.org/jira/browse/YARN-4575 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin Chundatt >Priority: Major > Labels: oct16-easy > Attachments: 0001-YARN-4575.patch, 0002-YARN-4575.patch > > > ApplicationResourceUsageReport reserved resource report is only of default > parition should be of all partitions -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7913) Improve error handling when application recovery fails with exception
[ https://issues.apache.org/jira/browse/YARN-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020988#comment-17020988 ] Szilard Nemeth commented on YARN-7913: -- Thanks [~wilfreds], Makes sense. Triggered build for branch-3.1 patch, also reuploaded the patch so Jenkins will pick that up instead of 3.2 patch. > Improve error handling when application recovery fails with exception > - > > Key: YARN-7913 > URL: https://issues.apache.org/jira/browse/YARN-7913 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.0.0 >Reporter: Gergo Repas >Assignee: Wilfred Spiegelenburg >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-7913-branch-3.1.001.patch, > YARN-7913-branch-3.1.001.patch, YARN-7913-branch-3.2.001.patch, > YARN-7913.000.poc.patch, YARN-7913.001.patch, YARN-7913.002.patch, > YARN-7913.003.patch > > > There are edge cases when the application recovery fails with an exception. > Example failure scenario: > * setup: a queue is a leaf queue in the primary RM's config and the same > queue is a parent queue in the secondary RM's config. > * When failover happens with this setup, the recovery will fail for > applications on this queue, and an APP_REJECTED event will be dispatched to > the async dispatcher. On the same thread (that handles the recovery), a > NullPointerException is thrown when the applicationAttempt is tried to be > recovered > (https://github.com/apache/hadoop/blob/55066cc53dc22b68f9ca55a0029741d6c846be0a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L494). > I don't see a good way to avoid the NPE in this scenario, because when the > NPE occurs the APP_REJECTED has not been processed yet, and we don't know > that the application recovery failed. > Currently the first exception will abort the recovery, and if there are X > applications, there will be ~X passive -> active RM transition attempts - the > passive -> active RM transition will only succeed when the last APP_REJECTED > event is processed on the async dispatcher thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7913) Improve error handling when application recovery fails with exception
[ https://issues.apache.org/jira/browse/YARN-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-7913: - Attachment: YARN-7913-branch-3.1.001.patch > Improve error handling when application recovery fails with exception > - > > Key: YARN-7913 > URL: https://issues.apache.org/jira/browse/YARN-7913 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.0.0 >Reporter: Gergo Repas >Assignee: Wilfred Spiegelenburg >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-7913-branch-3.1.001.patch, > YARN-7913-branch-3.1.001.patch, YARN-7913-branch-3.2.001.patch, > YARN-7913.000.poc.patch, YARN-7913.001.patch, YARN-7913.002.patch, > YARN-7913.003.patch > > > There are edge cases when the application recovery fails with an exception. > Example failure scenario: > * setup: a queue is a leaf queue in the primary RM's config and the same > queue is a parent queue in the secondary RM's config. > * When failover happens with this setup, the recovery will fail for > applications on this queue, and an APP_REJECTED event will be dispatched to > the async dispatcher. On the same thread (that handles the recovery), a > NullPointerException is thrown when the applicationAttempt is tried to be > recovered > (https://github.com/apache/hadoop/blob/55066cc53dc22b68f9ca55a0029741d6c846be0a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L494). > I don't see a good way to avoid the NPE in this scenario, because when the > NPE occurs the APP_REJECTED has not been processed yet, and we don't know > that the application recovery failed. > Currently the first exception will abort the recovery, and if there are X > applications, there will be ~X passive -> active RM transition attempts - the > passive -> active RM transition will only succeed when the last APP_REJECTED > event is processed on the async dispatcher thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10085) FS-CS converter: remove mixed ordering policy check
[ https://issues.apache.org/jira/browse/YARN-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020965#comment-17020965 ] Peter Bacsko commented on YARN-10085: - Fixed checkstyle in patch v5. > FS-CS converter: remove mixed ordering policy check > --- > > Key: YARN-10085 > URL: https://issues.apache.org/jira/browse/YARN-10085 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Critical > Attachments: YARN-10085-001.patch, YARN-10085-002.patch, > YARN-10085-003.patch, YARN-10085-004.patch, YARN-10085-004.patch, > YARN-10085-005.patch > > > In the converter, this part is very strict and probably unnecessary: > {noformat} > // Validate ordering policy > if (queueConverter.isDrfPolicyUsedOnQueueLevel()) { > if (queueConverter.isFifoOrFairSharePolicyUsed()) { > throw new ConversionException( > "DRF ordering policy cannot be used together with fifo/fair"); > } else { > capacitySchedulerConfig.set( > CapacitySchedulerConfiguration.RESOURCE_CALCULATOR_CLASS, > DominantResourceCalculator.class.getCanonicalName()); > } > } > {noformat} > It's also misleading, because Fair policy can be used under DRF, so the error > message is incorrect. > Let's remove these checks and rewrite the converter in a way that it > generates a valid config even if fair/drf is somehow mixed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10085) FS-CS converter: remove mixed ordering policy check
[ https://issues.apache.org/jira/browse/YARN-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-10085: Attachment: YARN-10085-005.patch > FS-CS converter: remove mixed ordering policy check > --- > > Key: YARN-10085 > URL: https://issues.apache.org/jira/browse/YARN-10085 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Critical > Attachments: YARN-10085-001.patch, YARN-10085-002.patch, > YARN-10085-003.patch, YARN-10085-004.patch, YARN-10085-004.patch, > YARN-10085-005.patch > > > In the converter, this part is very strict and probably unnecessary: > {noformat} > // Validate ordering policy > if (queueConverter.isDrfPolicyUsedOnQueueLevel()) { > if (queueConverter.isFifoOrFairSharePolicyUsed()) { > throw new ConversionException( > "DRF ordering policy cannot be used together with fifo/fair"); > } else { > capacitySchedulerConfig.set( > CapacitySchedulerConfiguration.RESOURCE_CALCULATOR_CLASS, > DominantResourceCalculator.class.getCanonicalName()); > } > } > {noformat} > It's also misleading, because Fair policy can be used under DRF, so the error > message is incorrect. > Let's remove these checks and rewrite the converter in a way that it > generates a valid config even if fair/drf is somehow mixed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10085) FS-CS converter: remove mixed ordering policy check
[ https://issues.apache.org/jira/browse/YARN-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020944#comment-17020944 ] Hadoop QA commented on YARN-10085: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 35s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 51s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 28s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 41s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 86m 17s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}142m 10s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-10085 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12991473/YARN-10085-004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux d8a857572002 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d40d7cc | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_232 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/25414/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25414/testReport/ | | Max.
[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS
[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020931#comment-17020931 ] Wilfred Spiegelenburg commented on YARN-9879: - I agree, {{getQueueName()}} should stay as is. We have a {{getQueuePath()}} already. Every CSQueue can already return both. We should change all non external facing calls getting the name of a queue to the path version. The only calls that can stay are the ones that provide their data in an externally viewable form (REST, UI or IPC) as to not break compatibility. I also do not see why we would need the ambiguous queue list. The queue is always unique when a path is used. It does not matter if the current leaf queue name uniqueness is enforced or not. Everything can always be found by its path. If I do not have a path I expect leaf queue uniqueness and can find the queue by just checking the part after the last _dot_ in the path. i.e. * queue paths defined as: root.parent.child1 child queue unique flag is set find a queue with name: *child1* (no dots, expect leaf queue uniqueness) -> returns the queue correctly * add a queue defined as: root.otherparent.child1 child queue unique flag is not set, allowed find a queue with name: *child1* (no dots, expect leaf queue uniqueness) -> returns an error Internally we just store everything using the path, that would remove the whole keeping things in sync and makes the code consistent when combined with using the path everywhere internally > Allow multiple leaf queues with the same name in CS > --- > > Key: YARN-9879 > URL: https://issues.apache.org/jira/browse/YARN-9879 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Attachments: DesignDoc_v1.pdf, YARN-9879.POC001.patch > > > Currently the leaf queue's name must be unique regardless of its position in > the queue hierarchy. > Design doc and first proposal is being made, I'll attach it as soon as it's > done. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org