[jira] [Resolved] (YARN-10545) Improve the readability of diagnostics log in yarn-ui2 web page.
[ https://issues.apache.org/jira/browse/YARN-10545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu resolved YARN-10545. --- Fix Version/s: 3.4.0 Resolution: Fixed > Improve the readability of diagnostics log in yarn-ui2 web page. > > > Key: YARN-10545 > URL: https://issues.apache.org/jira/browse/YARN-10545 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn-ui-v2 >Reporter: akiyamaneko >Assignee: akiyamaneko >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: Diagnostics shows unreadble.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > If the diagnostic log in yarn-ui2 has multiple lines, line breaks and spaces > will not be displayed, which is hard to read. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10545) Improve the readability of diagnostics log in yarn-ui2 web page.
[ https://issues.apache.org/jira/browse/YARN-10545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu reassigned YARN-10545: - Assignee: akiyamaneko > Improve the readability of diagnostics log in yarn-ui2 web page. > > > Key: YARN-10545 > URL: https://issues.apache.org/jira/browse/YARN-10545 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn-ui-v2 >Reporter: akiyamaneko >Assignee: akiyamaneko >Priority: Minor > Labels: pull-request-available > Attachments: Diagnostics shows unreadble.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > If the diagnostic log in yarn-ui2 has multiple lines, line breaks and spaces > will not be displayed, which is hard to read. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9698) [Umbrella] Tools to help migration from Fair Scheduler to Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-9698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344630#comment-17344630 ] Qi Zhu commented on YARN-9698: -- Thanks [~pbacsko] for reminder. I agree with you that we can creating a new "Phase II" JIRA and move the current subtasks under that. > [Umbrella] Tools to help migration from Fair Scheduler to Capacity Scheduler > > > Key: YARN-9698 > URL: https://issues.apache.org/jira/browse/YARN-9698 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Weiwei Yang >Priority: Major > Labels: fs2cs > Attachments: FS-CS Migration.pdf > > > We see some users want to migrate from Fair Scheduler to Capacity Scheduler, > this Jira is created as an umbrella to track all related efforts for the > migration, the scope contains > * Bug fixes > * Add missing features > * Migration tools that help to generate CS configs based on FS, validate > configs etc > * Documents > this is part of CS component, the purpose is to make the migration process > smooth. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9698) [Umbrella] Tools to help migration from Fair Scheduler to Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-9698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344629#comment-17344629 ] Andras Gyori commented on YARN-9698: [~pbacsko] I agree. Even more so, because the most of open jiras are not related to FS to CS tools. Lets move every viable Jira to a new umbrella. > [Umbrella] Tools to help migration from Fair Scheduler to Capacity Scheduler > > > Key: YARN-9698 > URL: https://issues.apache.org/jira/browse/YARN-9698 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Weiwei Yang >Priority: Major > Labels: fs2cs > Attachments: FS-CS Migration.pdf > > > We see some users want to migrate from Fair Scheduler to Capacity Scheduler, > this Jira is created as an umbrella to track all related efforts for the > migration, the scope contains > * Bug fixes > * Add missing features > * Migration tools that help to generate CS configs based on FS, validate > configs etc > * Documents > this is part of CS component, the purpose is to make the migration process > smooth. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9698) [Umbrella] Tools to help migration from Fair Scheduler to Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-9698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344626#comment-17344626 ] Peter Bacsko commented on YARN-9698: The number of subtasks under this JIRA just keeps growing. The converter called fs2cs is mostly complete. It's not perfect, but it's working. Although new additions are constantly coming, I don't see the point of keeping this particular ticket open, otherwise it will never be closed. I suggest creating a new , "Phase II" JIRA and move the current subtasks under that. Then we can mark this as Fix Version = 3.4.0. [~gandras], [~snemeth], [~zhuqi] opinions? > [Umbrella] Tools to help migration from Fair Scheduler to Capacity Scheduler > > > Key: YARN-9698 > URL: https://issues.apache.org/jira/browse/YARN-9698 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Weiwei Yang >Priority: Major > Labels: fs2cs > Attachments: FS-CS Migration.pdf > > > We see some users want to migrate from Fair Scheduler to Capacity Scheduler, > this Jira is created as an umbrella to track all related efforts for the > migration, the scope contains > * Bug fixes > * Add missing features > * Migration tools that help to generate CS configs based on FS, validate > configs etc > * Documents > this is part of CS component, the purpose is to make the migration process > smooth. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10759) Encapsulate queue config modes
[ https://issues.apache.org/jira/browse/YARN-10759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344573#comment-17344573 ] Peter Bacsko commented on YARN-10759: - Thanks [~gandras] for the patch. I just have one to note. I can see that {{allowZeroCapacitySum}} has been moved to {{AbstractCSQueue}}, although it's really something which is meant for {{ParentQueue}}. I assume this is because the new code is easier to read and no type checks and casts are necessary. Is that correct? I'm wondering if this can cause problems. Because right now, this logic only runs inside {{ParentQueue}}: {noformat} // We also allow children's percent sum = 0 under the following // conditions // - Parent uses weight mode // - Parent uses percent mode, and parent has // (capacity=0 OR allowZero) if (parentCapacityType == QueueCapacityType.PERCENT) { if ((Math.abs(queueCapacities.getCapacity(nodeLabel)) > PRECISION) && (!allowZeroCapacitySum)) { throw new IOException( "Illegal" + " capacity sum of " + childrenPctSum + " for children of queue " + queueName + " for label=" + nodeLabel + ". It is set to 0, but parent percent != 0, and " + "doesn't allow children capacity to set to 0"); } } } {noformat} But after this refactor, leaf queues will have this property too with it being set to "false". Although there are no unit test failures, we need to double check if this extra boolean flag on leafs can have any impact on the existing code. > Encapsulate queue config modes > -- > > Key: YARN-10759 > URL: https://issues.apache.org/jira/browse/YARN-10759 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10759.001.patch, YARN-10759.002.patch, > YARN-10759.003.patch, YARN-10759.004.patch > > > Capacity Scheduler queues have three modes: > * relative/percentage > * weight > * absolute > Most of them have their own: > * validation logic > * config setting logic > * effective capacity calculation logic > These logics can be easily extracted and encapsulated in separate config mode > classes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10759) Encapsulate queue config modes
[ https://issues.apache.org/jira/browse/YARN-10759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344573#comment-17344573 ] Peter Bacsko edited comment on YARN-10759 at 5/14/21, 12:54 PM: Thanks [~gandras] for the patch. I just have one thing to note. I can see that {{allowZeroCapacitySum}} has been moved to {{AbstractCSQueue}}, although it's really something which is meant for {{ParentQueue}}. I assume this is because the new code is easier to read and no type checks and casts are necessary. Is that correct? I'm wondering if this can cause problems. Because right now, this logic only runs inside {{ParentQueue}}: {noformat} // We also allow children's percent sum = 0 under the following // conditions // - Parent uses weight mode // - Parent uses percent mode, and parent has // (capacity=0 OR allowZero) if (parentCapacityType == QueueCapacityType.PERCENT) { if ((Math.abs(queueCapacities.getCapacity(nodeLabel)) > PRECISION) && (!allowZeroCapacitySum)) { throw new IOException( "Illegal" + " capacity sum of " + childrenPctSum + " for children of queue " + queueName + " for label=" + nodeLabel + ". It is set to 0, but parent percent != 0, and " + "doesn't allow children capacity to set to 0"); } } } {noformat} But after this refactor, leaf queues will have this property too with it being set to "false". Although there are no unit test failures, we need to double check if this extra boolean flag on leafs can have any impact on the existing code. was (Author: pbacsko): Thanks [~gandras] for the patch. I just have one to note. I can see that {{allowZeroCapacitySum}} has been moved to {{AbstractCSQueue}}, although it's really something which is meant for {{ParentQueue}}. I assume this is because the new code is easier to read and no type checks and casts are necessary. Is that correct? I'm wondering if this can cause problems. Because right now, this logic only runs inside {{ParentQueue}}: {noformat} // We also allow children's percent sum = 0 under the following // conditions // - Parent uses weight mode // - Parent uses percent mode, and parent has // (capacity=0 OR allowZero) if (parentCapacityType == QueueCapacityType.PERCENT) { if ((Math.abs(queueCapacities.getCapacity(nodeLabel)) > PRECISION) && (!allowZeroCapacitySum)) { throw new IOException( "Illegal" + " capacity sum of " + childrenPctSum + " for children of queue " + queueName + " for label=" + nodeLabel + ". It is set to 0, but parent percent != 0, and " + "doesn't allow children capacity to set to 0"); } } } {noformat} But after this refactor, leaf queues will have this property too with it being set to "false". Although there are no unit test failures, we need to double check if this extra boolean flag on leafs can have any impact on the existing code. > Encapsulate queue config modes > -- > > Key: YARN-10759 > URL: https://issues.apache.org/jira/browse/YARN-10759 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10759.001.patch, YARN-10759.002.patch, > YARN-10759.003.patch, YARN-10759.004.patch > > > Capacity Scheduler queues have three modes: > * relative/percentage > * weight > * absolute > Most of them have their own: > * validation logic > * config setting logic > * effective capacity calculation logic > These logics can be easily extracted and encapsulated in separate config mode > classes. -- This
[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM
[ https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344531#comment-17344531 ] Qi Zhu commented on YARN-9615: -- [~chaosju] Sure.:D > Add dispatcher metrics to RM > > > Key: YARN-9615 > URL: https://issues.apache.org/jira/browse/YARN-9615 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Hung >Assignee: Qi Zhu >Priority: Major > Fix For: 3.4.0, 3.3.1 > > Attachments: YARN-9615-branch-3.3-001.patch, YARN-9615.001.patch, > YARN-9615.002.patch, YARN-9615.003.patch, YARN-9615.004.patch, > YARN-9615.005.patch, YARN-9615.006.patch, YARN-9615.007.patch, > YARN-9615.008.patch, YARN-9615.009.patch, YARN-9615.010.patch, > YARN-9615.011.patch, YARN-9615.011.patch, YARN-9615.poc.patch, > image-2021-03-04-10-35-10-626.png, image-2021-03-04-10-36-12-441.png, > screenshot-1.png > > > It'd be good to have counts/processing times for each event type in RM async > dispatcher and scheduler async dispatcher. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10763) add the speed of containers assigned metrics to ClusterMetrics
[ https://issues.apache.org/jira/browse/YARN-10763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344518#comment-17344518 ] Peter Bacsko commented on YARN-10763: - Thanks [~chaosju] just a final update, very minor things: 1. "Containers assigned in last second" --> missing "the": "Containers assigned in *the* last second" 2. Comment is not necessary, purpose of the executor is trivial: {noformat} /** * The executor service that count containers assigned in last second. * */ {noformat} 3. Nit: space after if {noformat} if(INSTANCE != null && INSTANCE.getAssignCounterExecutor() != null) { INSTANCE.getAssignCounterExecutor().shutdownNow(); } {noformat} I have no further comments. > add the speed of containers assigned metrics to ClusterMetrics > --- > > Key: YARN-10763 > URL: https://issues.apache.org/jira/browse/YARN-10763 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: chaosju >Assignee: chaosju >Priority: Minor > Attachments: YARN-10763.001.patch, YARN-10763.002.patch, > YARN-10763.003.patch, YARN-10763.004.patch, YARN-10763.005.patch, > YARN-10763.006.patch, YARN-10763.007.patch, screenshot-1.png > > > It'd be good to have ContainerAssignedNum/Second in ClusterMetrics for > measuring cluster throughput. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM
[ https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344505#comment-17344505 ] chaosju commented on YARN-9615: --- [~zhuqi] Could I friend you on _Wechat_?ChaosJu is _Wechat_ nickname. > Add dispatcher metrics to RM > > > Key: YARN-9615 > URL: https://issues.apache.org/jira/browse/YARN-9615 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Hung >Assignee: Qi Zhu >Priority: Major > Fix For: 3.4.0, 3.3.1 > > Attachments: YARN-9615-branch-3.3-001.patch, YARN-9615.001.patch, > YARN-9615.002.patch, YARN-9615.003.patch, YARN-9615.004.patch, > YARN-9615.005.patch, YARN-9615.006.patch, YARN-9615.007.patch, > YARN-9615.008.patch, YARN-9615.009.patch, YARN-9615.010.patch, > YARN-9615.011.patch, YARN-9615.011.patch, YARN-9615.poc.patch, > image-2021-03-04-10-35-10-626.png, image-2021-03-04-10-36-12-441.png, > screenshot-1.png > > > It'd be good to have counts/processing times for each event type in RM async > dispatcher and scheduler async dispatcher. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2497) Fair scheduler should support strict node labels
[ https://issues.apache.org/jira/browse/YARN-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344498#comment-17344498 ] Hadoop QA commented on YARN-2497: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 14s{color} | {color:red}{color} | {color:red} YARN-2497 does not apply to branch-3.0. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-2497 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12895017/YARN-2497.branch-3.0.001.patch | | Console output | https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/986/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. > Fair scheduler should support strict node labels > > > Key: YARN-2497 > URL: https://issues.apache.org/jira/browse/YARN-2497 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Wangda Tan >Assignee: Daniel Templeton >Priority: Major > Attachments: YARN-2497.001.patch, YARN-2497.002.patch, > YARN-2497.003.patch, YARN-2497.004.patch, YARN-2497.005.patch, > YARN-2497.006.patch, YARN-2497.007.patch, YARN-2497.008.patch, > YARN-2497.009.patch, YARN-2497.010.patch, YARN-2497.011.patch, > YARN-2497.branch-3.0.001.patch, YARN-2499.WIP01.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2497) Fair scheduler should support strict node labels
[ https://issues.apache.org/jira/browse/YARN-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344491#comment-17344491 ] zhangzhanchang commented on YARN-2497: -- [~templedf] There are multiple patch files, which one should I use?What version is this patch based on?:) > Fair scheduler should support strict node labels > > > Key: YARN-2497 > URL: https://issues.apache.org/jira/browse/YARN-2497 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Wangda Tan >Assignee: Daniel Templeton >Priority: Major > Attachments: YARN-2497.001.patch, YARN-2497.002.patch, > YARN-2497.003.patch, YARN-2497.004.patch, YARN-2497.005.patch, > YARN-2497.006.patch, YARN-2497.007.patch, YARN-2497.008.patch, > YARN-2497.009.patch, YARN-2497.010.patch, YARN-2497.011.patch, > YARN-2497.branch-3.0.001.patch, YARN-2499.WIP01.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10764) Add rm dispatcher event metrics in SLS
[ https://issues.apache.org/jira/browse/YARN-10764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344490#comment-17344490 ] Qi Zhu commented on YARN-10764: --- I think we should add the event related metrics to SLS, such as : # The event queue size. # The every event type consuming average time. etc cc [~snemeth] You are the expert of SLS, what's your opinion about this? Thanks a lot. > Add rm dispatcher event metrics in SLS > --- > > Key: YARN-10764 > URL: https://issues.apache.org/jira/browse/YARN-10764 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler-load-simulator >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > > We should use SLS to confirm if we can get performance improvement of event > consume time etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10761) Add more event type to RM Dispatcher event metrics.
[ https://issues.apache.org/jira/browse/YARN-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344483#comment-17344483 ] Qi Zhu edited comment on YARN-10761 at 5/14/21, 9:26 AM: - Thanks [~snemeth] for reminder. [~snemeth] [~ebadger] Sorry for the commit, and it's the first time i start to commit after to be the committer. The YARN-9615 is contributed by me, so i commit this related small change. I will wait other committers to check and commit when (more than 2 +1) next time, i will study from you, to be a strict committer. Thanks again. was (Author: zhuqi): Thanks [~snemeth] for reminder. Sorry for the commit, and it's the first time i start to commit after to be the committer. The YARN-9615 is contributed by me, so i commit this related small change. I will wait other committers to check and commit when (more than 2 +1) next time, i will study from you, to be a strict committer. Thanks again. > Add more event type to RM Dispatcher event metrics. > --- > > Key: YARN-10761 > URL: https://issues.apache.org/jira/browse/YARN-10761 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10761.001.patch, YARN-10761.002.patch, > YARN-10761.003.patch, image-2021-05-06-16-38-51-406.png, > image-2021-05-06-16-39-28-362.png > > > Since YARN-9615 add NodesListManagerEventType to event metrics. > And we'd better add total 4 busy event type to the metrics according to > YARN-9927. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10761) Add more event type to RM Dispatcher event metrics.
[ https://issues.apache.org/jira/browse/YARN-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344483#comment-17344483 ] Qi Zhu edited comment on YARN-10761 at 5/14/21, 9:19 AM: - Thanks [~snemeth] for reminder. Sorry for the commit, and it's the first time i start to commit after to be the committer. The YARN-9615 is contributed by me, so i commit this related small change. I will wait other committers to check and commit when (more than 2 +1) next time, i will study from you, to be a strict committer. Thanks again. was (Author: zhuqi): Thanks [~snemeth] for reminder. Sorry for the commit. The YARN-9615 is contributed by me, so i commit this related small change. I will wait other committers to check and commit when (more than 2 +1) next time. > Add more event type to RM Dispatcher event metrics. > --- > > Key: YARN-10761 > URL: https://issues.apache.org/jira/browse/YARN-10761 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10761.001.patch, YARN-10761.002.patch, > YARN-10761.003.patch, image-2021-05-06-16-38-51-406.png, > image-2021-05-06-16-39-28-362.png > > > Since YARN-9615 add NodesListManagerEventType to event metrics. > And we'd better add total 4 busy event type to the metrics according to > YARN-9927. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10761) Add more event type to RM Dispatcher event metrics.
[ https://issues.apache.org/jira/browse/YARN-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344483#comment-17344483 ] Qi Zhu edited comment on YARN-10761 at 5/14/21, 9:16 AM: - Thanks [~snemeth] for reminder. Sorry for the commit. The YARN-9615 is contributed by me, so i commit this related small change. I will wait other committers to check and commit when (more than 2 +1) next time. was (Author: zhuqi): Thanks [~snemeth] for reminder. Sorry for the commit. The YARN-9615 is contributed by me, so i commit this related small change. I will wait other committers to check and commit next time. > Add more event type to RM Dispatcher event metrics. > --- > > Key: YARN-10761 > URL: https://issues.apache.org/jira/browse/YARN-10761 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10761.001.patch, YARN-10761.002.patch, > YARN-10761.003.patch, image-2021-05-06-16-38-51-406.png, > image-2021-05-06-16-39-28-362.png > > > Since YARN-9615 add NodesListManagerEventType to event metrics. > And we'd better add total 4 busy event type to the metrics according to > YARN-9927. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10761) Add more event type to RM Dispatcher event metrics.
[ https://issues.apache.org/jira/browse/YARN-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344483#comment-17344483 ] Qi Zhu commented on YARN-10761: --- Thanks [~snemeth] for reminder. Sorry for the commit. The YARN-9615 is contributed by me, so i commit this related small change. I will wait other committers to check and commit next time. > Add more event type to RM Dispatcher event metrics. > --- > > Key: YARN-10761 > URL: https://issues.apache.org/jira/browse/YARN-10761 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10761.001.patch, YARN-10761.002.patch, > YARN-10761.003.patch, image-2021-05-06-16-38-51-406.png, > image-2021-05-06-16-39-28-362.png > > > Since YARN-9615 add NodesListManagerEventType to event metrics. > And we'd better add total 4 busy event type to the metrics according to > YARN-9927. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10324) Fetch data from NodeManager may case read timeout when disk is busy
[ https://issues.apache.org/jira/browse/YARN-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344478#comment-17344478 ] Qi Zhu edited comment on YARN-10324 at 5/14/21, 9:01 AM: - Hi [~yaoguangdong] Thanks for this work. I have added you to the contributor list and assigned this to you. You can submit latest patch to trigger the jenkins. was (Author: zhuqi): Hi [~yaoguangdong] Thanks for this work. I have added you to the contributor list. You can submit latest patch to trigger the jenkins. > Fetch data from NodeManager may case read timeout when disk is busy > --- > > Key: YARN-10324 > URL: https://issues.apache.org/jira/browse/YARN-10324 > Project: Hadoop YARN > Issue Type: Improvement > Components: auxservices >Affects Versions: 2.7.0, 3.2.1 >Reporter: Yao Guangdong >Assignee: Yao Guangdong >Priority: Minor > Labels: patch > Attachments: YARN-10324.001.patch, YARN-10324.002.patch > > > With the cluster size become more and more big.The cost time on Reduce > fetch Map's result from NodeManager become more and more long.We often see > the WARN logs in the reduce's logs as follow. > {quote}2020-06-19 15:43:15,522 WARN [fetcher#8] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to > TX-196-168-211.com:13562 with 5 map outputs > java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:171) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492) > at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:434) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:400) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:271) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:330) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198) > {quote} > We check the NodeManager server find that the disk IO util and connections > became very high when the read timeout happened.We analyze that if we have > 20,000 maps and 1,000 reduces which will make NodeManager generate 20 million > times IO stream operate in the shuffle phase.If the reduce fetch data size is > very small from map output files.Which make the disk IO util become very high > in big cluster.Then read timeout happened frequently.The application finished > time become longer. > We find ShuffleHandler have IndexCache for cache file.out.index file.Then we > want to change the small IO to big IO which can reduce the small disk IO > times. So we try to cache all the small file data(file.out) in memory when > the first fetch request come.Then the others fetch request only need read > data from memory avoid disk IO operation.After we cache data to memory we > find the read timeout disappeared. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10324) Fetch data from NodeManager may case read timeout when disk is busy
[ https://issues.apache.org/jira/browse/YARN-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344478#comment-17344478 ] Qi Zhu commented on YARN-10324: --- Hi [~yaoguangdong] Thanks for this work. I have added you to the contributor list. You can submit latest patch to trigger the jenkins. > Fetch data from NodeManager may case read timeout when disk is busy > --- > > Key: YARN-10324 > URL: https://issues.apache.org/jira/browse/YARN-10324 > Project: Hadoop YARN > Issue Type: Improvement > Components: auxservices >Affects Versions: 2.7.0, 3.2.1 >Reporter: Yao Guangdong >Priority: Minor > Labels: patch > Attachments: YARN-10324.001.patch, YARN-10324.002.patch > > > With the cluster size become more and more big.The cost time on Reduce > fetch Map's result from NodeManager become more and more long.We often see > the WARN logs in the reduce's logs as follow. > {quote}2020-06-19 15:43:15,522 WARN [fetcher#8] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to > TX-196-168-211.com:13562 with 5 map outputs > java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:171) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492) > at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:434) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:400) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:271) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:330) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198) > {quote} > We check the NodeManager server find that the disk IO util and connections > became very high when the read timeout happened.We analyze that if we have > 20,000 maps and 1,000 reduces which will make NodeManager generate 20 million > times IO stream operate in the shuffle phase.If the reduce fetch data size is > very small from map output files.Which make the disk IO util become very high > in big cluster.Then read timeout happened frequently.The application finished > time become longer. > We find ShuffleHandler have IndexCache for cache file.out.index file.Then we > want to change the small IO to big IO which can reduce the small disk IO > times. So we try to cache all the small file data(file.out) in memory when > the first fetch request come.Then the others fetch request only need read > data from memory avoid disk IO operation.After we cache data to memory we > find the read timeout disappeared. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10324) Fetch data from NodeManager may case read timeout when disk is busy
[ https://issues.apache.org/jira/browse/YARN-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu reassigned YARN-10324: - Assignee: Yao Guangdong > Fetch data from NodeManager may case read timeout when disk is busy > --- > > Key: YARN-10324 > URL: https://issues.apache.org/jira/browse/YARN-10324 > Project: Hadoop YARN > Issue Type: Improvement > Components: auxservices >Affects Versions: 2.7.0, 3.2.1 >Reporter: Yao Guangdong >Assignee: Yao Guangdong >Priority: Minor > Labels: patch > Attachments: YARN-10324.001.patch, YARN-10324.002.patch > > > With the cluster size become more and more big.The cost time on Reduce > fetch Map's result from NodeManager become more and more long.We often see > the WARN logs in the reduce's logs as follow. > {quote}2020-06-19 15:43:15,522 WARN [fetcher#8] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to > TX-196-168-211.com:13562 with 5 map outputs > java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:171) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492) > at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:434) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:400) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:271) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:330) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198) > {quote} > We check the NodeManager server find that the disk IO util and connections > became very high when the read timeout happened.We analyze that if we have > 20,000 maps and 1,000 reduces which will make NodeManager generate 20 million > times IO stream operate in the shuffle phase.If the reduce fetch data size is > very small from map output files.Which make the disk IO util become very high > in big cluster.Then read timeout happened frequently.The application finished > time become longer. > We find ShuffleHandler have IndexCache for cache file.out.index file.Then we > want to change the small IO to big IO which can reduce the small disk IO > times. So we try to cache all the small file data(file.out) in memory when > the first fetch request come.Then the others fetch request only need read > data from memory avoid disk IO operation.After we cache data to memory we > find the read timeout disappeared. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10766) [UI2] Bump moment-timezone to 0.5.33
[ https://issues.apache.org/jira/browse/YARN-10766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344462#comment-17344462 ] Qi Zhu commented on YARN-10766: --- Thanks [~gandras] for patch. LGTM +1 > [UI2] Bump moment-timezone to 0.5.33 > > > Key: YARN-10766 > URL: https://issues.apache.org/jira/browse/YARN-10766 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn, yarn-ui-v2 >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Attachments: UI2_Correct_Timezone_After_Bump.png, > UI2_Wrong_Timezone_Before_Bump.png, YARN-10766.001.patch > > > A handful of timezone related fixes were added into 0.5.33 release of > moment-timezone. An example for a scenario in which current UI2 behaviour is > not correct is a user from Australia, where the submission time showed on UI2 > is one hour ahead of the actual time. > Unfortunately moment-timezone data range files have been renamed, which is a > breaking change from the point of view of emberjs. Including all timezones > will increase the overall size of UI2 by an additional ~6 kbs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10761) Add more event type to RM Dispatcher event metrics.
[ https://issues.apache.org/jira/browse/YARN-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344390#comment-17344390 ] Szilard Nemeth commented on YARN-10761: --- [~ebadger], [~chaosju]: Just asking this as a general community question. Isn't it an unspoken rule to wait for +1 from other and one of the committer reviewers should commit one's patches? TBH, I don't see too often that people committing are their patches for themselves. > Add more event type to RM Dispatcher event metrics. > --- > > Key: YARN-10761 > URL: https://issues.apache.org/jira/browse/YARN-10761 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10761.001.patch, YARN-10761.002.patch, > YARN-10761.003.patch, image-2021-05-06-16-38-51-406.png, > image-2021-05-06-16-39-28-362.png > > > Since YARN-9615 add NodesListManagerEventType to event metrics. > And we'd better add total 4 busy event type to the metrics according to > YARN-9927. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10241) [UI2] Yarn web ui2 can't display Beijing time correctly
[ https://issues.apache.org/jira/browse/YARN-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344382#comment-17344382 ] Andras Gyori commented on YARN-10241: - [~jevic] We have encountered the same problem. I think YARN-10766 will solve your case as well. > [UI2] Yarn web ui2 can't display Beijing time correctly > --- > > Key: YARN-10241 > URL: https://issues.apache.org/jira/browse/YARN-10241 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp, yarn-ui-v2 >Affects Versions: 3.1.1 > Environment: *Now the time is: April 21, 2020 11:54* > *But the screenshot shows another time period* > > *versions:* > *HDP-3.1.4.0* > *HDFS 3.1.1* > *yarn 3.1.1* >Reporter: jevic >Priority: Blocker > Attachments: image-2020-04-21-11-53-27-811.png > > > !image-2020-04-21-11-53-27-811.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10761) Add more event type to RM Dispatcher event metrics.
[ https://issues.apache.org/jira/browse/YARN-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344372#comment-17344372 ] Qi Zhu commented on YARN-10761: --- Thanks [~ebadger] [~gandras] [~chaosju] for review. Merged to trunk. > Add more event type to RM Dispatcher event metrics. > --- > > Key: YARN-10761 > URL: https://issues.apache.org/jira/browse/YARN-10761 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10761.001.patch, YARN-10761.002.patch, > YARN-10761.003.patch, image-2021-05-06-16-38-51-406.png, > image-2021-05-06-16-39-28-362.png > > > Since YARN-9615 add NodesListManagerEventType to event metrics. > And we'd better add total 4 busy event type to the metrics according to > YARN-9927. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org