[jira] [Commented] (YARN-10493) RunC container repository v2
[ https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313424#comment-17313424 ] Matthew Sharp commented on YARN-10493: -- [~ebadger] The latest PR contains the namespace support that we had discussed. I also updated the design doc to outline that a bit more. > RunC container repository v2 > > > Key: YARN-10493 > URL: https://issues.apache.org/jira/browse/YARN-10493 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, yarn >Affects Versions: 3.3.0 >Reporter: Craig Condit >Assignee: Matthew Sharp >Priority: Major > Labels: pull-request-available > Attachments: runc-container-repository-v2-design.pdf, > runc-container-repository-v2-design_updated.pdf > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The current runc container repository design has scalability and usability > issues which will likely limit widespread adoption. We should address this > with a new, V2 layout. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10493) RunC container repository v2
[ https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313418#comment-17313418 ] Hadoop QA commented on YARN-10493: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 40s{color} | | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | | {color:green} No case conflicting files found. {color} | | {color:blue}0{color} | {color:blue} codespell {color} | {color:blue} 0m 1s{color} | | {color:blue} codespell was not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 14m 53s{color} | | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 9s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 5s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 54s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 47s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 51s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 28s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 39s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 5m 32s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 44s{color} | | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s{color} | | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 50s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 29s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 29s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 46s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 46s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} blanks {color} | {color:green} 0m 1s{color} | | {color:green} The patch has no blanks issues. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 40s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 35s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 12s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 25s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 5m 52s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} |
[jira] [Comment Edited] (YARN-10724) Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)
[ https://issues.apache.org/jira/browse/YARN-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313363#comment-17313363 ] Anup Agarwal edited comment on YARN-10724 at 4/1/21, 6:20 PM: -- completedContainer getting called multiple times may or may not be an issue, but logging the same event multiple times might be. SchedulerApplicationAttempt maintains a liveContainers collection and uses it to deduplicate container completion (incl preemption) events; while leafQueue does no such thing, that's why the patch moved the preemption logging to AppAttempt rather than leafQueue, similar to FSAppAttempt. was (Author: 108anup): completedContainer getting called multiple times may or may not be an issue, but logging the same event multiple times might be. SchedulerApplicationAttempt maintains a liveContainers collection and uses it to deduplicate preemption events; while leafQueue does no such thing, that's why the patch moved the preemption logging to AppAttempt rather than leafQueue, similar to FSAppAttempt. > Overcounting of preemptions in CapacityScheduler (LeafQueue metrics) > > > Key: YARN-10724 > URL: https://issues.apache.org/jira/browse/YARN-10724 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anup Agarwal >Assignee: Anup Agarwal >Priority: Minor > Attachments: YARN-10724-trunk.001.patch, YARN-10724-trunk.002.patch > > > Currently CapacityScheduler over-counts preemption metrics inside > QueueMetrics. > > One cause of the over-counting: > When a container is already running, SchedulerNode does not remove the > container immediately from launchedContainer list and waits from the NM to > kill the container. > Both NODE_RESOURCE_UPDATE and NODE_UPDATE invoke > signalContainersIfOvercommited (AbstractYarnScheduler) which look for > containers to preempt based on the launchedContainers list. Both these calls > can create a ContainerPreemptEvent for the same container (as RM is waiting > for NM to kill the container). This leads LeafQueue to log metrics for the > same preemption multiple times. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10724) Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)
[ https://issues.apache.org/jira/browse/YARN-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313363#comment-17313363 ] Anup Agarwal commented on YARN-10724: - completedContainer getting called multiple times may or may not be an issue, but logging the same event multiple times might be. SchedulerApplicationAttempt maintains a liveContainers collection and uses it to deduplicate preemption events; while leafQueue does no such thing, that's why the patch moved the preemption logging to AppAttempt rather than leafQueue, similar to FSAppAttempt. > Overcounting of preemptions in CapacityScheduler (LeafQueue metrics) > > > Key: YARN-10724 > URL: https://issues.apache.org/jira/browse/YARN-10724 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anup Agarwal >Assignee: Anup Agarwal >Priority: Minor > Attachments: YARN-10724-trunk.001.patch, YARN-10724-trunk.002.patch > > > Currently CapacityScheduler over-counts preemption metrics inside > QueueMetrics. > > One cause of the over-counting: > When a container is already running, SchedulerNode does not remove the > container immediately from launchedContainer list and waits from the NM to > kill the container. > Both NODE_RESOURCE_UPDATE and NODE_UPDATE invoke > signalContainersIfOvercommited (AbstractYarnScheduler) which look for > containers to preempt based on the launchedContainers list. Both these calls > can create a ContainerPreemptEvent for the same container (as RM is waiting > for NM to kill the container). This leads LeafQueue to log metrics for the > same preemption multiple times. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10493) RunC container repository v2
[ https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313322#comment-17313322 ] Hadoop QA commented on YARN-10493: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 37s{color} | | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | | {color:green} No case conflicting files found. {color} | | {color:blue}0{color} | {color:blue} codespell {color} | {color:blue} 0m 1s{color} | | {color:blue} codespell was not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 14m 39s{color} | | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 10s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 7s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 53s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 43s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 53s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 29s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 38s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 5m 36s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 1s{color} | | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s{color} | | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 51s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 33s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 33s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 48s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 48s{color} | | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} blanks {color} | {color:red} 0m 0s{color} | [/blanks-eol.txt|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2848/1/artifact/out/blanks-eol.txt] | {color:red} The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 40s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 36s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 16s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 24s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | |
[jira] [Commented] (YARN-10465) Support getClusterNodes, getNodeToLabels, getLabelsToNodes, getClusterNodeLabels API's for Federation
[ https://issues.apache.org/jira/browse/YARN-10465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313317#comment-17313317 ] Brahma Reddy Battula commented on YARN-10465: - [~dmmkr] thanks for reporting and working on this. at first glance, patch lgtm. [~subru] any chance to review this..? > Support getClusterNodes, getNodeToLabels, getLabelsToNodes, > getClusterNodeLabels API's for Federation > - > > Key: YARN-10465 > URL: https://issues.apache.org/jira/browse/YARN-10465 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: D M Murali Krishna Reddy >Assignee: D M Murali Krishna Reddy >Priority: Major > Attachments: YARN-10465.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10724) Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)
[ https://issues.apache.org/jira/browse/YARN-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313255#comment-17313255 ] Zhengbo Li commented on YARN-10724: --- Hi, I may encounter the same issue if it's as described, so I'm trying to understand it better. Do you mean the issue was that LeafQueue's `completedContainer` method are incorrectly invoked multiple times? Thanks > Overcounting of preemptions in CapacityScheduler (LeafQueue metrics) > > > Key: YARN-10724 > URL: https://issues.apache.org/jira/browse/YARN-10724 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anup Agarwal >Assignee: Anup Agarwal >Priority: Minor > Attachments: YARN-10724-trunk.001.patch, YARN-10724-trunk.002.patch > > > Currently CapacityScheduler over-counts preemption metrics inside > QueueMetrics. > > One cause of the over-counting: > When a container is already running, SchedulerNode does not remove the > container immediately from launchedContainer list and waits from the NM to > kill the container. > Both NODE_RESOURCE_UPDATE and NODE_UPDATE invoke > signalContainersIfOvercommited (AbstractYarnScheduler) which look for > containers to preempt based on the launchedContainers list. Both these calls > can create a ContainerPreemptEvent for the same container (as RM is waiting > for NM to kill the container). This leads LeafQueue to log metrics for the > same preemption multiple times. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events
[ https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313243#comment-17313243 ] Qi Zhu edited comment on YARN-10726 at 4/1/21, 3:36 PM: [~pbacsko] I tested locally just now, it passed. I make it fixed. Thanks. was (Author: zhuqi): [~pbacsko] I tested locally just now, it passed. Thanks. > Log the size of DelegationTokenRenewer event queue in case of too many > pending events > - > > Key: YARN-10726 > URL: https://issues.apache.org/jira/browse/YARN-10726 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10726.001.patch, YARN-10726.002.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events
[ https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313243#comment-17313243 ] Qi Zhu commented on YARN-10726: --- [~pbacsko] I tested locally just now, it passed. Thanks. > Log the size of DelegationTokenRenewer event queue in case of too many > pending events > - > > Key: YARN-10726 > URL: https://issues.apache.org/jira/browse/YARN-10726 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10726.001.patch, YARN-10726.002.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events
[ https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313241#comment-17313241 ] Peter Bacsko commented on YARN-10726: - Ok, I strongly believe that the failing tests are flaky. [~zhuqi] could you verify it by running them locally a couple of times? > Log the size of DelegationTokenRenewer event queue in case of too many > pending events > - > > Key: YARN-10726 > URL: https://issues.apache.org/jira/browse/YARN-10726 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10726.001.patch, YARN-10726.002.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events
[ https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313240#comment-17313240 ] Hadoop QA commented on YARN-10726: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 18s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 9s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 40s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 19m 46s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 1m 49s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 4s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | |
[jira] [Comment Edited] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312219#comment-17312219 ] Qi Zhu edited comment on YARN-10503 at 4/1/21, 3:10 PM: [~pbacsko] [~gandras] [~ebadger] Sorry for too many issues needed for review. If you any other advice about this? When you are free. :D Thanks. was (Author: zhuqi): [~pbacsko] [~gandras] [~ebadger] Sorry for too many issues needed for review. If you any other advice about this? :D Thanks. > Support queue capacity in terms of absolute resources with custom > resourceType. > --- > > Key: YARN-10503 > URL: https://issues.apache.org/jira/browse/YARN-10503 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-10503.001.patch, YARN-10503.002.patch, > YARN-10503.003.patch, YARN-10503.004.patch, YARN-10503.005.patch, > YARN-10503.006.patch, YARN-10503.007.patch, YARN-10503.008.patch, > YARN-10503.009.patch > > > Now the absolute resources are memory and cores. > {code:java} > /** > * Different resource types supported. > */ > public enum AbsoluteResourceType { > MEMORY, VCORES; > }{code} > But in our GPU production clusters, we need to support more resourceTypes. > It's very import for cluster scaling when with different resourceType > absolute demands. > > This Jira will handle GPU first. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312219#comment-17312219 ] Qi Zhu edited comment on YARN-10503 at 4/1/21, 3:05 PM: [~pbacsko] [~gandras] [~ebadger] Sorry for too many issues needed for review. If you any other advice about this? :D Thanks. was (Author: zhuqi): [~pbacsko] [~gandras] [~ebadger] If you any other advice? :D Thanks. > Support queue capacity in terms of absolute resources with custom > resourceType. > --- > > Key: YARN-10503 > URL: https://issues.apache.org/jira/browse/YARN-10503 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-10503.001.patch, YARN-10503.002.patch, > YARN-10503.003.patch, YARN-10503.004.patch, YARN-10503.005.patch, > YARN-10503.006.patch, YARN-10503.007.patch, YARN-10503.008.patch, > YARN-10503.009.patch > > > Now the absolute resources are memory and cores. > {code:java} > /** > * Different resource types supported. > */ > public enum AbsoluteResourceType { > MEMORY, VCORES; > }{code} > But in our GPU production clusters, we need to support more resourceTypes. > It's very import for cluster scaling when with different resourceType > absolute demands. > > This Jira will handle GPU first. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10693) Add document for YARN-10623 auto refresh queue conf in cs.
[ https://issues.apache.org/jira/browse/YARN-10693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313219#comment-17313219 ] Peter Bacsko commented on YARN-10693: - I'll review this as soon as I have some spare cycles. > Add document for YARN-10623 auto refresh queue conf in cs. > -- > > Key: YARN-10693 > URL: https://issues.apache.org/jira/browse/YARN-10693 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10693.001.patch, YARN-10693.002.patch, > YARN-10693.003.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10637) We should support fs to cs support for auto refresh queues when conf changed, after YARN-10623 finished.
[ https://issues.apache.org/jira/browse/YARN-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313218#comment-17313218 ] Peter Bacsko commented on YARN-10637: - Thanks [~zhuqi] I think it's good then. [~gandras] do you have any comments? > We should support fs to cs support for auto refresh queues when conf changed, > after YARN-10623 finished. > > > Key: YARN-10637 > URL: https://issues.apache.org/jira/browse/YARN-10637 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10637.001.patch, YARN-10637.002.patch, > YARN-10637.003.patch, YARN-10637.004.patch > > > cc [~pbacsko] [~gandras] [~bteke] > We should also fill this, when YARN-10623 finished. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10693) Add document for YARN-10623 auto refresh queue conf in cs.
[ https://issues.apache.org/jira/browse/YARN-10693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313199#comment-17313199 ] Qi Zhu commented on YARN-10693: --- [~pbacsko] This is the corresponding document. :D > Add document for YARN-10623 auto refresh queue conf in cs. > -- > > Key: YARN-10693 > URL: https://issues.apache.org/jira/browse/YARN-10693 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10693.001.patch, YARN-10693.002.patch, > YARN-10693.003.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10637) We should support fs to cs support for auto refresh queues when conf changed, after YARN-10623 finished.
[ https://issues.apache.org/jira/browse/YARN-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313196#comment-17313196 ] Qi Zhu commented on YARN-10637: --- Thanks [~pbacsko] for review. Actually fs always enabled this, so we don't need to add this for FS-side: {code:java} @Override public void serviceInit(Configuration conf) throws Exception { this.allocFile = getAllocationFile(conf); if (this.allocFile != null) { this.fs = allocFile.getFileSystem(conf); reloadThread = new Thread(() -> { while (running) { try { synchronized (this) { reloadListener.onCheck(); } long time = clock.getTime(); long lastModified = fs.getFileStatus(allocFile).getModificationTime(); if (lastModified > lastSuccessfulReload && time > lastModified + ALLOC_RELOAD_WAIT_MS) { try { reloadAllocations(); } catch (Exception ex) { if (!lastReloadAttemptFailed) { LOG.error("Failed to reload fair scheduler config file - " + "will use existing allocations.", ex); } lastReloadAttemptFailed = true; } } else if (lastModified == 0l) { if (!lastReloadAttemptFailed) { LOG.warn("Failed to reload fair scheduler config file because" + " last modified returned 0. File exists: " + fs.exists(allocFile)); } lastReloadAttemptFailed = true; } } catch (IOException e) { LOG.error("Exception while loading allocation file: " + e); } try { Thread.sleep(reloadIntervalMs); } catch (InterruptedException ex) { LOG.info( "Interrupted while waiting to reload alloc configuration"); } } }); reloadThread.setName("AllocationFileReloader"); reloadThread.setDaemon(true); } super.serviceInit(conf); } {code} > We should support fs to cs support for auto refresh queues when conf changed, > after YARN-10623 finished. > > > Key: YARN-10637 > URL: https://issues.apache.org/jira/browse/YARN-10637 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10637.001.patch, YARN-10637.002.patch, > YARN-10637.003.patch, YARN-10637.004.patch > > > cc [~pbacsko] [~gandras] [~bteke] > We should also fill this, when YARN-10623 finished. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events
[ https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313195#comment-17313195 ] Qi Zhu commented on YARN-10726: --- Thanks [~pbacsko] for commit.:D > Log the size of DelegationTokenRenewer event queue in case of too many > pending events > - > > Key: YARN-10726 > URL: https://issues.apache.org/jira/browse/YARN-10726 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10726.001.patch, YARN-10726.002.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events
[ https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313192#comment-17313192 ] Peter Bacsko commented on YARN-10726: - Ah, I already committed the change. Let's hope Jenkins comes back green :) +1 > Log the size of DelegationTokenRenewer event queue in case of too many > pending events > - > > Key: YARN-10726 > URL: https://issues.apache.org/jira/browse/YARN-10726 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10726.001.patch, YARN-10726.002.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events
[ https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313189#comment-17313189 ] Peter Bacsko commented on YARN-10726: - "hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer" - this is unrelated I believe. This test case has been failing for a long time. > Log the size of DelegationTokenRenewer event queue in case of too many > pending events > - > > Key: YARN-10726 > URL: https://issues.apache.org/jira/browse/YARN-10726 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10726.001.patch, YARN-10726.002.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10637) We should support fs to cs support for auto refresh queues when conf changed, after YARN-10623 finished.
[ https://issues.apache.org/jira/browse/YARN-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313184#comment-17313184 ] Peter Bacsko commented on YARN-10637: - Thanks [~zhuqi] this makes sense. Is this always enabled in Fair Scheduler? Because we should only add this policy if auto-refresh is enabled on the FS-side. > We should support fs to cs support for auto refresh queues when conf changed, > after YARN-10623 finished. > > > Key: YARN-10637 > URL: https://issues.apache.org/jira/browse/YARN-10637 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10637.001.patch, YARN-10637.002.patch, > YARN-10637.003.patch, YARN-10637.004.patch > > > cc [~pbacsko] [~gandras] [~bteke] > We should also fill this, when YARN-10623 finished. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10637) We should support fs to cs support for auto refresh queues when conf changed, after YARN-10623 finished.
[ https://issues.apache.org/jira/browse/YARN-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313141#comment-17313141 ] Qi Zhu commented on YARN-10637: --- [~pbacsko] [~gandras] If you any advice about this, just as we done in auto queue deletion. :D > We should support fs to cs support for auto refresh queues when conf changed, > after YARN-10623 finished. > > > Key: YARN-10637 > URL: https://issues.apache.org/jira/browse/YARN-10637 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10637.001.patch, YARN-10637.002.patch, > YARN-10637.003.patch, YARN-10637.004.patch > > > cc [~pbacsko] [~gandras] [~bteke] > We should also fill this, when YARN-10623 finished. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events
[ https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313139#comment-17313139 ] Qi Zhu commented on YARN-10726: --- Thanks [~pbacsko] for your review and suggestion. You concern is very right, i agree with you. I have updated it in latest patch, just as we done in async dispatcher.:D Thanks. > Log the size of DelegationTokenRenewer event queue in case of too many > pending events > - > > Key: YARN-10726 > URL: https://issues.apache.org/jira/browse/YARN-10726 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10726.001.patch, YARN-10726.002.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events
[ https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu updated YARN-10726: -- Attachment: YARN-10726.002.patch > Log the size of DelegationTokenRenewer event queue in case of too many > pending events > - > > Key: YARN-10726 > URL: https://issues.apache.org/jira/browse/YARN-10726 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10726.001.patch, YARN-10726.002.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events
[ https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313138#comment-17313138 ] Peter Bacsko commented on YARN-10726: - This is from {{AsyncDispatcher}}: {noformat} if (qSize != 0 && qSize % 1000 == 0 && lastEventQueueSizeLogged != qSize) { lastEventQueueSizeLogged = qSize; LOG.info("Size of event-queue is " + qSize); } {noformat} Update the code with {{lastEventQueueSizeLogged}}. > Log the size of DelegationTokenRenewer event queue in case of too many > pending events > - > > Key: YARN-10726 > URL: https://issues.apache.org/jira/browse/YARN-10726 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10726.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10714) Remove dangling dynamic queues on reinitialization
[ https://issues.apache.org/jira/browse/YARN-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313125#comment-17313125 ] Hadoop QA commented on YARN-10714: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 21s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 16s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 24s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 20m 35s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 1m 53s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 56s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 1s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 41s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/888/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 2 unchanged - 2 fixed = 3 total (was 4) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 27s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} |
[jira] [Comment Edited] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events
[ https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313123#comment-17313123 ] Peter Bacsko edited comment on YARN-10726 at 4/1/21, 12:01 PM: --- Thanks [~zhuqi]. I think it's a good idea. My only concern (which might not be valid) is that we have too many events, this code can possibly run too frequently. For example, if you go 998, 998, 999, 1000, 1001, 1002, then it prints at 1000, then it starts to consume events, size goes back from 1000 to 990, then it prints the size again. I think we should limit how often we print this message. We should log it too often, I'm not sure how we do this in other parts of the code. I'll check what can be the best solution. was (Author: pbacsko): Thanks [~zhuqi]. I think it's a good idea. My only "concern" is that we have too many events, this code can possibly run too frequently. For example, if you go 998, 998, 999, 1000, 1001, 1002, then it prints at 1000, then it starts to consume events, size goes back from 1000 to 990, then it prints the size again. I think we should limit how often we print this message. We should log it too often, I'm not sure how we do this in other parts of the code. I'll check what can be the best solution. > Log the size of DelegationTokenRenewer event queue in case of too many > pending events > - > > Key: YARN-10726 > URL: https://issues.apache.org/jira/browse/YARN-10726 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10726.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events
[ https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313123#comment-17313123 ] Peter Bacsko commented on YARN-10726: - Thanks [~zhuqi]. I think it's a good idea. My only "concern" is that we have too many events, this code can possibly run too frequently. For example, if you go 998, 998, 999, 1000, 1001, 1002, then it prints at 1000, then it starts to consume events, size goes back from 1000 to 990, then it prints the size again. I think we should limit how often we print this message. We should log it too often, I'm not sure how we do this in other parts of the code. I'll check what can be the best solution. > Log the size of DelegationTokenRenewer event queue in case of too many > pending events > - > > Key: YARN-10726 > URL: https://issues.apache.org/jira/browse/YARN-10726 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10726.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events
[ https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-10726: Summary: Log the size of DelegationTokenRenewer event queue in case of too many pending events (was: We should log size of pending DelegationTokenRenewerEvent queue, when pending too many events.) > Log the size of DelegationTokenRenewer event queue in case of too many > pending events > - > > Key: YARN-10726 > URL: https://issues.apache.org/jira/browse/YARN-10726 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10726.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9618) NodesListManager event improvement
[ https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313105#comment-17313105 ] Peter Bacsko commented on YARN-9618: Thanks for the patch [~zhuqi] and [~gandras] for the review, I committed this to trunk. > NodesListManager event improvement > -- > > Key: YARN-9618 > URL: https://issues.apache.org/jira/browse/YARN-9618 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin Chundatt >Assignee: Qi Zhu >Priority: Critical > Fix For: 3.4.0 > > Attachments: YARN-9618.001.patch, YARN-9618.002.patch, > YARN-9618.003.patch, YARN-9618.004.patch, YARN-9618.005.patch, > YARN-9618.006.patch, YARN-9618.007.patch > > > Current implementation nodelistmanager event blocks async dispacher and can > cause RM crash and slowing down event processing. > # Cluster restart with 1K running apps . Each usable event will create 1K > events over all events could be 5k*1k events for 5K cluster > # Event processing is blocked till new events are added to queue. > Solution : > # Add another async Event handler similar to scheduler. > # Instead of adding events to dispatcher directly call RMApp event handler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10726) We should log size of pending DelegationTokenRenewerEvent queue, when pending too many events.
[ https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313103#comment-17313103 ] Hadoop QA commented on YARN-10726: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 20s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 22s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 2s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 21m 27s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 2m 0s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 57s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 3s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 13s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | |
[jira] [Updated] (YARN-9618) NodesListManager event improvement
[ https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9618: --- Summary: NodesListManager event improvement (was: NodeListManager event improvement) > NodesListManager event improvement > -- > > Key: YARN-9618 > URL: https://issues.apache.org/jira/browse/YARN-9618 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin Chundatt >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-9618.001.patch, YARN-9618.002.patch, > YARN-9618.003.patch, YARN-9618.004.patch, YARN-9618.005.patch, > YARN-9618.006.patch, YARN-9618.007.patch > > > Current implementation nodelistmanager event blocks async dispacher and can > cause RM crash and slowing down event processing. > # Cluster restart with 1K running apps . Each usable event will create 1K > events over all events could be 5k*1k events for 5K cluster > # Event processing is blocked till new events are added to queue. > Solution : > # Add another async Event handler similar to scheduler. > # Instead of adding events to dispatcher directly call RMApp event handler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10714) Remove dangling dynamic queues on reinitialization
[ https://issues.apache.org/jira/browse/YARN-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313029#comment-17313029 ] Qi Zhu commented on YARN-10714: --- Thanks [~gandras] for patch. The latest patch LGTM +1. Waiting for the jenkins. > Remove dangling dynamic queues on reinitialization > -- > > Key: YARN-10714 > URL: https://issues.apache.org/jira/browse/YARN-10714 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10714.001.patch, YARN-10714.002.patch > > > Current logic does not handle orphaned auto created child queues. The > following example steps show a scenario in which it is possible to submit > applications to an orphaned queue, that has an invalid (already removed) > ParentQueue. > # Auto create a queue root.a.a-auto > # Remove root.a from the config > # Reinitialize CS without restarting it (possible via mutation API) > # Submit application to root.a.a-auto, while root.a is a non-existent queue -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10714) Remove dangling dynamic queues on reinitialization
[ https://issues.apache.org/jira/browse/YARN-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Gyori updated YARN-10714: Attachment: YARN-10714.002.patch > Remove dangling dynamic queues on reinitialization > -- > > Key: YARN-10714 > URL: https://issues.apache.org/jira/browse/YARN-10714 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10714.001.patch, YARN-10714.002.patch > > > Current logic does not handle orphaned auto created child queues. The > following example steps show a scenario in which it is possible to submit > applications to an orphaned queue, that has an invalid (already removed) > ParentQueue. > # Auto create a queue root.a.a-auto > # Remove root.a from the config > # Reinitialize CS without restarting it (possible via mutation API) > # Submit application to root.a.a-auto, while root.a is a non-existent queue -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10726) We should log size of pending DelegationTokenRenewerEvent queue, when pending too many events.
[ https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313009#comment-17313009 ] Qi Zhu commented on YARN-10726: --- cc [~pbacsko] [~gandras] Actually there are no any monitor for the pending DelegationTokenRenewerEvent queue, i think it is important for us to know if the pending DelegationTokenRenewerEvents are too many. It's helpful for big cluster. > We should log size of pending DelegationTokenRenewerEvent queue, when pending > too many events. > -- > > Key: YARN-10726 > URL: https://issues.apache.org/jira/browse/YARN-10726 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10726) We should log size of pending DelegationTokenRenewerEvent queue, when pending too many events.
[ https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313009#comment-17313009 ] Qi Zhu edited comment on YARN-10726 at 4/1/21, 8:46 AM: cc [~pbacsko] [~gandras] Actually there are no any monitor for the pending DelegationTokenRenewerEvent queue, i think it is important for us to know if the pending DelegationTokenRenewerEvents are too many. It's helpful for big cluster. Updated a patch for review. Thanks. was (Author: zhuqi): cc [~pbacsko] [~gandras] Actually there are no any monitor for the pending DelegationTokenRenewerEvent queue, i think it is important for us to know if the pending DelegationTokenRenewerEvents are too many. It's helpful for big cluster. > We should log size of pending DelegationTokenRenewerEvent queue, when pending > too many events. > -- > > Key: YARN-10726 > URL: https://issues.apache.org/jira/browse/YARN-10726 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9618) NodeListManager event improvement
[ https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313006#comment-17313006 ] Qi Zhu commented on YARN-9618: -- Thanks [~pbacsko] [~gandras] for confirm. > NodeListManager event improvement > - > > Key: YARN-9618 > URL: https://issues.apache.org/jira/browse/YARN-9618 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin Chundatt >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-9618.001.patch, YARN-9618.002.patch, > YARN-9618.003.patch, YARN-9618.004.patch, YARN-9618.005.patch, > YARN-9618.006.patch, YARN-9618.007.patch > > > Current implementation nodelistmanager event blocks async dispacher and can > cause RM crash and slowing down event processing. > # Cluster restart with 1K running apps . Each usable event will create 1K > events over all events could be 5k*1k events for 5K cluster > # Event processing is blocked till new events are added to queue. > Solution : > # Add another async Event handler similar to scheduler. > # Instead of adding events to dispatcher directly call RMApp event handler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10726) We should log size of pending DelegationTokenRenewerEvent queue, when pending too many events.
[ https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu updated YARN-10726: -- Issue Type: Improvement (was: Bug) > We should log size of pending DelegationTokenRenewerEvent queue, when pending > too many events. > -- > > Key: YARN-10726 > URL: https://issues.apache.org/jira/browse/YARN-10726 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10726) We should log size of pending DelegationTokenRenewerEvent queue, when pending too many events.
Qi Zhu created YARN-10726: - Summary: We should log size of pending DelegationTokenRenewerEvent queue, when pending too many events. Key: YARN-10726 URL: https://issues.apache.org/jira/browse/YARN-10726 Project: Hadoop YARN Issue Type: Bug Reporter: Qi Zhu Assignee: Qi Zhu -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9618) NodeListManager event improvement
[ https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312997#comment-17312997 ] Andras Gyori commented on YARN-9618: I think it is well done now, +1. > NodeListManager event improvement > - > > Key: YARN-9618 > URL: https://issues.apache.org/jira/browse/YARN-9618 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin Chundatt >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-9618.001.patch, YARN-9618.002.patch, > YARN-9618.003.patch, YARN-9618.004.patch, YARN-9618.005.patch, > YARN-9618.006.patch, YARN-9618.007.patch > > > Current implementation nodelistmanager event blocks async dispacher and can > cause RM crash and slowing down event processing. > # Cluster restart with 1K running apps . Each usable event will create 1K > events over all events could be 5k*1k events for 5K cluster > # Event processing is blocked till new events are added to queue. > Solution : > # Add another async Event handler similar to scheduler. > # Instead of adding events to dispatcher directly call RMApp event handler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9618) NodeListManager event improvement
[ https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312989#comment-17312989 ] Peter Bacsko commented on YARN-9618: +1 LGTM [~gandras] are you OK with the patch? > NodeListManager event improvement > - > > Key: YARN-9618 > URL: https://issues.apache.org/jira/browse/YARN-9618 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin Chundatt >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-9618.001.patch, YARN-9618.002.patch, > YARN-9618.003.patch, YARN-9618.004.patch, YARN-9618.005.patch, > YARN-9618.006.patch, YARN-9618.007.patch > > > Current implementation nodelistmanager event blocks async dispacher and can > cause RM crash and slowing down event processing. > # Cluster restart with 1K running apps . Each usable event will create 1K > events over all events could be 5k*1k events for 5K cluster > # Event processing is blocked till new events are added to queue. > Solution : > # Add another async Event handler similar to scheduler. > # Instead of adding events to dispatcher directly call RMApp event handler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10720) YARN WebAppProxyServlet should support connection timeout to prevent proxy server from hanging
[ https://issues.apache.org/jira/browse/YARN-10720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312945#comment-17312945 ] Peter Bacsko commented on YARN-10720: - +1 thanks [~zhuqi] for the patch, committed to trunk. > YARN WebAppProxyServlet should support connection timeout to prevent proxy > server from hanging > -- > > Key: YARN-10720 > URL: https://issues.apache.org/jira/browse/YARN-10720 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-10720.001.patch, YARN-10720.002.patch, > YARN-10720.003.patch, YARN-10720.004.patch, YARN-10720.005.patch, > YARN-10720.006.patch, image-2021-03-29-14-04-33-776.png, > image-2021-03-29-14-05-32-708.png > > > Following is proxy server show, {color:#de350b}too many connections from one > client{color}, this caused the proxy server hang, and the yarn web can't jump > to web proxy. > !image-2021-03-29-14-04-33-776.png|width=632,height=57! > Following is the AM which is abnormal, but proxy server don't know it is > abnormal already, so the connections can't be closed, we should add time out > support in proxy server to prevent this. And one abnormal AM may cause > hundreds even thousands of connections, it is very heavy. > !image-2021-03-29-14-05-32-708.png|width=669,height=101! > > After i kill the abnormal AM, the proxy server become healthy. This case > happened many times in our production clusters, our clusters are huge, and > the abnormal AM will be existed in a regular case. > > I will add timeout supported in web proxy server in this jira. > > cc [~pbacsko] [~ebadger] [~Jim_Brennan] [~ztang] [~epayne] [~gandras] > [~bteke] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10720) YARN WebAppProxyServlet should support connection timeout to prevent proxy server from hanging
[ https://issues.apache.org/jira/browse/YARN-10720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-10720: Summary: YARN WebAppProxyServlet should support connection timeout to prevent proxy server from hanging (was: YARN WebAppProxyServlet should support connection timeout to prevent proxy server hang.) > YARN WebAppProxyServlet should support connection timeout to prevent proxy > server from hanging > -- > > Key: YARN-10720 > URL: https://issues.apache.org/jira/browse/YARN-10720 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-10720.001.patch, YARN-10720.002.patch, > YARN-10720.003.patch, YARN-10720.004.patch, YARN-10720.005.patch, > YARN-10720.006.patch, image-2021-03-29-14-04-33-776.png, > image-2021-03-29-14-05-32-708.png > > > Following is proxy server show, {color:#de350b}too many connections from one > client{color}, this caused the proxy server hang, and the yarn web can't jump > to web proxy. > !image-2021-03-29-14-04-33-776.png|width=632,height=57! > Following is the AM which is abnormal, but proxy server don't know it is > abnormal already, so the connections can't be closed, we should add time out > support in proxy server to prevent this. And one abnormal AM may cause > hundreds even thousands of connections, it is very heavy. > !image-2021-03-29-14-05-32-708.png|width=669,height=101! > > After i kill the abnormal AM, the proxy server become healthy. This case > happened many times in our production clusters, our clusters are huge, and > the abnormal AM will be existed in a regular case. > > I will add timeout supported in web proxy server in this jira. > > cc [~pbacsko] [~ebadger] [~Jim_Brennan] [~ztang] [~epayne] [~gandras] > [~bteke] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8631) YARN RM fails to add the application to the delegation token renewer on recovery
[ https://issues.apache.org/jira/browse/YARN-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312904#comment-17312904 ] Qi Zhu edited comment on YARN-8631 at 4/1/21, 6:31 AM: --- cc [~snemeth] [~pbacsko] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi] I think YARN-7962 already fixed this case: We change isServiceStarted to false in write lock. {code:java} serviceStateLock.writeLock().lock(); try { isServiceStarted = false; this.renewerService.shutdown(); } finally { serviceStateLock.writeLock().unlock(); } {code} And processDelegationTokenRenewerEvent race condition may happen before YARN-7962 {code:java} private void processDelegationTokenRenewerEvent( DelegationTokenRenewerEvent evt) { serviceStateLock.readLock().lock(); try { if (isServiceStarted) { Future future = renewerService.submit(new DelegationTokenRenewerRunnable(evt)); futures.put(evt, future); } else { pendingEventQueue.add(evt); } } finally { serviceStateLock.readLock().unlock(); } } @Override public void run() { if (evt instanceof DelegationTokenRenewerAppSubmitEvent) { DelegationTokenRenewerAppSubmitEvent appSubmitEvt = (DelegationTokenRenewerAppSubmitEvent) evt; handleDTRenewerAppSubmitEvent(appSubmitEvt); } else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) { DelegationTokenRenewerAppRecoverEvent appRecoverEvt = (DelegationTokenRenewerAppRecoverEvent) evt; handleDTRenewerAppRecoverEvent(appRecoverEvt); } else if (evt.getType().equals( DelegationTokenRenewerEventType.FINISH_APPLICATION)) { DelegationTokenRenewer.this.handleAppFinishEvent(evt); } } @SuppressWarnings("unchecked") private void handleDTRenewerAppRecoverEvent( DelegationTokenRenewerAppRecoverEvent event) { try { // Setup tokens for renewal during recovery DelegationTokenRenewer.this.handleAppSubmitEvent(event); } catch (Throwable t) { LOG.warn("Unable to add the application to the delegation token" + " renewer on recovery.", t); } } {code} Now the race condition not happened, including the null pointer error, my cluster happened also. I think we can close this now. Thanks. was (Author: zhuqi): [~snemeth] [~pbacsko] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi] I think YARN-7962 already fixed this case: We change isServiceStarted to false in write lock. {code:java} serviceStateLock.writeLock().lock(); try { isServiceStarted = false; this.renewerService.shutdown(); } finally { serviceStateLock.writeLock().unlock(); } {code} And processDelegationTokenRenewerEvent race condition may happen before YARN-7962 {code:java} private void processDelegationTokenRenewerEvent( DelegationTokenRenewerEvent evt) { serviceStateLock.readLock().lock(); try { if (isServiceStarted) { Future future = renewerService.submit(new DelegationTokenRenewerRunnable(evt)); futures.put(evt, future); } else { pendingEventQueue.add(evt); } } finally { serviceStateLock.readLock().unlock(); } } @Override public void run() { if (evt instanceof DelegationTokenRenewerAppSubmitEvent) { DelegationTokenRenewerAppSubmitEvent appSubmitEvt = (DelegationTokenRenewerAppSubmitEvent) evt; handleDTRenewerAppSubmitEvent(appSubmitEvt); } else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) { DelegationTokenRenewerAppRecoverEvent appRecoverEvt = (DelegationTokenRenewerAppRecoverEvent) evt; handleDTRenewerAppRecoverEvent(appRecoverEvt); } else if (evt.getType().equals( DelegationTokenRenewerEventType.FINISH_APPLICATION)) { DelegationTokenRenewer.this.handleAppFinishEvent(evt); } } @SuppressWarnings("unchecked") private void handleDTRenewerAppRecoverEvent( DelegationTokenRenewerAppRecoverEvent event) { try { // Setup tokens for renewal during recovery DelegationTokenRenewer.this.handleAppSubmitEvent(event); } catch (Throwable t) { LOG.warn("Unable to add the application to the delegation token" + " renewer on recovery.", t); } } {code} Now the race condition not happened, including the null pointer error, my cluster happened also. I think we can close this now. Thanks. > YARN RM fails to add the application to the delegation token renewer on > recovery > > > Key: YARN-8631 > URL: https://issues.apache.org/jira/browse/YARN-8631 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.1.0 >Reporter: Sanjay Divgi >Assignee: Umesh Mittal >Priority: Blocker > Attachments: YARN-8631.001.patch, > hadoop-yarn-resourcemanager-ctr-e138-1518143905142-429059-01-04.log > > > On HA
[jira] [Comment Edited] (YARN-8631) YARN RM fails to add the application to the delegation token renewer on recovery
[ https://issues.apache.org/jira/browse/YARN-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312904#comment-17312904 ] Qi Zhu edited comment on YARN-8631 at 4/1/21, 6:31 AM: --- [~snemeth] [~pbacsko] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi] I think YARN-7962 already fixed this case: We change isServiceStarted to false in write lock. {code:java} serviceStateLock.writeLock().lock(); try { isServiceStarted = false; this.renewerService.shutdown(); } finally { serviceStateLock.writeLock().unlock(); } {code} And processDelegationTokenRenewerEvent race condition may happen before YARN-7962 {code:java} private void processDelegationTokenRenewerEvent( DelegationTokenRenewerEvent evt) { serviceStateLock.readLock().lock(); try { if (isServiceStarted) { Future future = renewerService.submit(new DelegationTokenRenewerRunnable(evt)); futures.put(evt, future); } else { pendingEventQueue.add(evt); } } finally { serviceStateLock.readLock().unlock(); } } @Override public void run() { if (evt instanceof DelegationTokenRenewerAppSubmitEvent) { DelegationTokenRenewerAppSubmitEvent appSubmitEvt = (DelegationTokenRenewerAppSubmitEvent) evt; handleDTRenewerAppSubmitEvent(appSubmitEvt); } else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) { DelegationTokenRenewerAppRecoverEvent appRecoverEvt = (DelegationTokenRenewerAppRecoverEvent) evt; handleDTRenewerAppRecoverEvent(appRecoverEvt); } else if (evt.getType().equals( DelegationTokenRenewerEventType.FINISH_APPLICATION)) { DelegationTokenRenewer.this.handleAppFinishEvent(evt); } } @SuppressWarnings("unchecked") private void handleDTRenewerAppRecoverEvent( DelegationTokenRenewerAppRecoverEvent event) { try { // Setup tokens for renewal during recovery DelegationTokenRenewer.this.handleAppSubmitEvent(event); } catch (Throwable t) { LOG.warn("Unable to add the application to the delegation token" + " renewer on recovery.", t); } } {code} Now the race condition not happened, including the null pointer error, my cluster happened also. I think we can close this now. Thanks. was (Author: zhuqi): [~snemeth] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi] I think YARN-7962 already fixed this case: We change isServiceStarted to false in write lock. {code:java} serviceStateLock.writeLock().lock(); try { isServiceStarted = false; this.renewerService.shutdown(); } finally { serviceStateLock.writeLock().unlock(); } {code} And processDelegationTokenRenewerEvent race condition may happen before YARN-7962 {code:java} private void processDelegationTokenRenewerEvent( DelegationTokenRenewerEvent evt) { serviceStateLock.readLock().lock(); try { if (isServiceStarted) { Future future = renewerService.submit(new DelegationTokenRenewerRunnable(evt)); futures.put(evt, future); } else { pendingEventQueue.add(evt); } } finally { serviceStateLock.readLock().unlock(); } } @Override public void run() { if (evt instanceof DelegationTokenRenewerAppSubmitEvent) { DelegationTokenRenewerAppSubmitEvent appSubmitEvt = (DelegationTokenRenewerAppSubmitEvent) evt; handleDTRenewerAppSubmitEvent(appSubmitEvt); } else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) { DelegationTokenRenewerAppRecoverEvent appRecoverEvt = (DelegationTokenRenewerAppRecoverEvent) evt; handleDTRenewerAppRecoverEvent(appRecoverEvt); } else if (evt.getType().equals( DelegationTokenRenewerEventType.FINISH_APPLICATION)) { DelegationTokenRenewer.this.handleAppFinishEvent(evt); } } @SuppressWarnings("unchecked") private void handleDTRenewerAppRecoverEvent( DelegationTokenRenewerAppRecoverEvent event) { try { // Setup tokens for renewal during recovery DelegationTokenRenewer.this.handleAppSubmitEvent(event); } catch (Throwable t) { LOG.warn("Unable to add the application to the delegation token" + " renewer on recovery.", t); } } {code} Now the race condition not happened, including the null pointer error, my cluster happened also. I think we can close this now. Thanks. > YARN RM fails to add the application to the delegation token renewer on > recovery > > > Key: YARN-8631 > URL: https://issues.apache.org/jira/browse/YARN-8631 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.1.0 >Reporter: Sanjay Divgi >Assignee: Umesh Mittal >Priority: Blocker > Attachments: YARN-8631.001.patch, > hadoop-yarn-resourcemanager-ctr-e138-1518143905142-429059-01-04.log > > > On HA cluster we have
[jira] [Commented] (YARN-8631) YARN RM fails to add the application to the delegation token renewer on recovery
[ https://issues.apache.org/jira/browse/YARN-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312904#comment-17312904 ] Qi Zhu commented on YARN-8631: -- [~snemeth] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi] I think YARN-7962 already fixed this case: We change isServiceStarted to false in write lock. {code:java} serviceStateLock.writeLock().lock(); try { isServiceStarted = false; this.renewerService.shutdown(); } finally { serviceStateLock.writeLock().unlock(); } {code} And processDelegationTokenRenewerEvent race condition may happen before YARN-7962 {code:java} private void processDelegationTokenRenewerEvent( DelegationTokenRenewerEvent evt) { serviceStateLock.readLock().lock(); try { if (isServiceStarted) { Future future = renewerService.submit(new DelegationTokenRenewerRunnable(evt)); futures.put(evt, future); } else { pendingEventQueue.add(evt); } } finally { serviceStateLock.readLock().unlock(); } } @Override public void run() { if (evt instanceof DelegationTokenRenewerAppSubmitEvent) { DelegationTokenRenewerAppSubmitEvent appSubmitEvt = (DelegationTokenRenewerAppSubmitEvent) evt; handleDTRenewerAppSubmitEvent(appSubmitEvt); } else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) { DelegationTokenRenewerAppRecoverEvent appRecoverEvt = (DelegationTokenRenewerAppRecoverEvent) evt; handleDTRenewerAppRecoverEvent(appRecoverEvt); } else if (evt.getType().equals( DelegationTokenRenewerEventType.FINISH_APPLICATION)) { DelegationTokenRenewer.this.handleAppFinishEvent(evt); } } @SuppressWarnings("unchecked") private void handleDTRenewerAppRecoverEvent( DelegationTokenRenewerAppRecoverEvent event) { try { // Setup tokens for renewal during recovery DelegationTokenRenewer.this.handleAppSubmitEvent(event); } catch (Throwable t) { LOG.warn("Unable to add the application to the delegation token" + " renewer on recovery.", t); } } {code} Now the race condition not happened, including the null pointer error, my cluster happened also. I think we can close this now. Thanks. > YARN RM fails to add the application to the delegation token renewer on > recovery > > > Key: YARN-8631 > URL: https://issues.apache.org/jira/browse/YARN-8631 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.1.0 >Reporter: Sanjay Divgi >Assignee: Umesh Mittal >Priority: Blocker > Attachments: YARN-8631.001.patch, > hadoop-yarn-resourcemanager-ctr-e138-1518143905142-429059-01-04.log > > > On HA cluster we have observed that yarn resource manager fails to add the > application to the delegation token renewer on recovery. > Below is the error: > {code:java} > 2018-08-07 08:41:23,850 INFO security.DelegationTokenRenewer > (DelegationTokenRenewer.java:renewToken(635)) - Renewed delegation-token= > [Kind: TIMELINE_DELEGATION_TOKEN, Service: 172.27.84.192:8188, Ident: > (TIMELINE_DELEGATION_TOKEN owner=hrt_qa_hive_spark, renewer=yarn, realUser=, > issueDate=1533624642302, maxDate=1534229442302, sequenceNumber=18, > masterKeyId=4);exp=1533717683478; apps=[application_1533623972681_0001]] > 2018-08-07 08:41:23,855 WARN security.DelegationTokenRenewer > (DelegationTokenRenewer.java:handleDTRenewerAppRecoverEvent(955)) - Unable to > add the application to the delegation token renewer on recovery. > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:522) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleDTRenewerAppRecoverEvent(DelegationTokenRenewer.java:953) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:79) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:912) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org