[jira] [Commented] (YARN-9538) Document scheduler/app activities and REST APIs
[ https://issues.apache.org/jira/browse/YARN-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019278#comment-17019278 ] Tao Yang commented on YARN-9538: Thanks [~cheersyang] for the review. Attached v4 patch to fix failures in Jenkins. > Document scheduler/app activities and REST APIs > --- > > Key: YARN-9538 > URL: https://issues.apache.org/jira/browse/YARN-9538 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9538.001.patch, YARN-9538.002.patch, > YARN-9538.003.patch, YARN-9538.004.patch > > > Add documentation for scheduler/app activities in CapacityScheduler.md and > ResourceManagerRest.md. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9538) Document scheduler/app activities and REST APIs
[ https://issues.apache.org/jira/browse/YARN-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9538: --- Attachment: YARN-9538.004.patch > Document scheduler/app activities and REST APIs > --- > > Key: YARN-9538 > URL: https://issues.apache.org/jira/browse/YARN-9538 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9538.001.patch, YARN-9538.002.patch, > YARN-9538.003.patch, YARN-9538.004.patch > > > Add documentation for scheduler/app activities in CapacityScheduler.md and > ResourceManagerRest.md. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10095) Fix help message for yarn rmadmin
[ https://issues.apache.org/jira/browse/YARN-10095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019272#comment-17019272 ] Akira Ajisaka commented on YARN-10095: -- Thank you for filing the jira. Moved the issue from Common to YARN. > Fix help message for yarn rmadmin > - > > Key: YARN-10095 > URL: https://issues.apache.org/jira/browse/YARN-10095 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Minor > > (This issue is identified by [~aajisaka] in > https://issues.apache.org/jira/browse/HADOOP-16753) > The help message of yarn rmadmin seems broken. > Current: > {code:java} > $ yarn rmadmin -help refreshNodes 2>/dev/null > $ > $ yarn rmadmin -help refreshNodes > Usage: yarn rmadmin >-refreshQueues >-refreshNodes [-g|graceful [timeout in seconds] -client|server] >-refreshNodesResources >-refreshSuperUserGroupsConfiguration >-refreshUserToGroupsMappings >-refreshAdminAcls >-refreshServiceAcl >-getGroups [username] >-addToClusterNodeLabels > <"label1(exclusive=true),label2(exclusive=false),label3"> >-removeFromClusterNodeLabels (label splitted by ",") >-replaceLabelsOnNode <"node1[:port]=label1,label2 > node2[:port]=label1,label2"> [-failOnUnknownNodes] >-directlyAccessNodeLabelStore >-refreshClusterMaxPriority >-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]) > or > [NodeID] [resourcetypes] ([OvercommitTimeout]). >-help [cmd]Generic options supported are: > -conf specify an application configuration file > -Ddefine a value for a given property > -fs specify default filesystem URL to use, > overrides 'fs.defaultFS' property from configurations. > -jt specify a ResourceManager > -files specify a comma-separated list of files to > be copied to the map reduce cluster > -libjarsspecify a comma-separated list of jar files > to be included in the classpath > -archives specify a comma-separated list of archives > to be unarchived on the compute machinesThe general command line syntax is: > command [genericOptions] [commandOptions] > {code} > > > Expected: > {code:java} > $ yarn rmadmin -help refreshNodes 2>/dev/null > -refreshNodes [-g|graceful [timeout in seconds] -client|server] > $ yarn rmadmin -help refreshNodes > -refreshNodes [-g|graceful [timeout in seconds] -client|server] > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Moved] (YARN-10095) Fix help message for yarn rmadmin
[ https://issues.apache.org/jira/browse/YARN-10095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka moved HADOOP-16815 to YARN-10095: --- Key: YARN-10095 (was: HADOOP-16815) Project: Hadoop YARN (was: Hadoop Common) > Fix help message for yarn rmadmin > - > > Key: YARN-10095 > URL: https://issues.apache.org/jira/browse/YARN-10095 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Minor > > (This issue is identified by [~aajisaka] in > https://issues.apache.org/jira/browse/HADOOP-16753) > The help message of yarn rmadmin seems broken. > Current: > {code:java} > $ yarn rmadmin -help refreshNodes 2>/dev/null > $ > $ yarn rmadmin -help refreshNodes > Usage: yarn rmadmin >-refreshQueues >-refreshNodes [-g|graceful [timeout in seconds] -client|server] >-refreshNodesResources >-refreshSuperUserGroupsConfiguration >-refreshUserToGroupsMappings >-refreshAdminAcls >-refreshServiceAcl >-getGroups [username] >-addToClusterNodeLabels > <"label1(exclusive=true),label2(exclusive=false),label3"> >-removeFromClusterNodeLabels (label splitted by ",") >-replaceLabelsOnNode <"node1[:port]=label1,label2 > node2[:port]=label1,label2"> [-failOnUnknownNodes] >-directlyAccessNodeLabelStore >-refreshClusterMaxPriority >-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]) > or > [NodeID] [resourcetypes] ([OvercommitTimeout]). >-help [cmd]Generic options supported are: > -conf specify an application configuration file > -Ddefine a value for a given property > -fs specify default filesystem URL to use, > overrides 'fs.defaultFS' property from configurations. > -jt specify a ResourceManager > -files specify a comma-separated list of files to > be copied to the map reduce cluster > -libjarsspecify a comma-separated list of jar files > to be included in the classpath > -archives specify a comma-separated list of archives > to be unarchived on the compute machinesThe general command line syntax is: > command [genericOptions] [commandOptions] > {code} > > > Expected: > {code:java} > $ yarn rmadmin -help refreshNodes 2>/dev/null > -refreshNodes [-g|graceful [timeout in seconds] -client|server] > $ yarn rmadmin -help refreshNodes > -refreshNodes [-g|graceful [timeout in seconds] -client|server] > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7898) [FederationStateStore] Create a proxy chain for FederationStateStore API in the Router
[ https://issues.apache.org/jira/browse/YARN-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-7898: --- Attachment: YARN-7898.v7.patch > [FederationStateStore] Create a proxy chain for FederationStateStore API in > the Router > -- > > Key: YARN-7898 > URL: https://issues.apache.org/jira/browse/YARN-7898 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Minni Mittal >Priority: Major > Attachments: StateStoreProxy StressTest.jpg, > YARN-7898-YARN-7402.proto.patch, YARN-7898-YARN-7402.v1.patch, > YARN-7898-YARN-7402.v2.patch, YARN-7898-YARN-7402.v3.patch, > YARN-7898-YARN-7402.v4.patch, YARN-7898-YARN-7402.v5.patch, > YARN-7898-YARN-7402.v6.patch, YARN-7898.v7.patch > > > As detailed in the proposal in the umbrella JIRA, we are introducing a new > component that routes client request to appropriate FederationStateStore. > This JIRA tracks the creation of a proxy for FederationStateStore in the > Router. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10094) Add a configuration to support NM overuse in RM
[ https://issues.apache.org/jira/browse/YARN-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-10094: Attachment: YARN-10094.001.patch > Add a configuration to support NM overuse in RM > --- > > Key: YARN-10094 > URL: https://issues.apache.org/jira/browse/YARN-10094 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-10094.001.patch > > > In a large cluster , upgrade NM will cost too much time. > Some times we want to support memory or cpu overuse from RM view. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10094) Add configuration to support NM overuse in RM
[ https://issues.apache.org/jira/browse/YARN-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-10094: Summary: Add configuration to support NM overuse in RM (was: Add a configuration to support NM overuse in RM) > Add configuration to support NM overuse in RM > - > > Key: YARN-10094 > URL: https://issues.apache.org/jira/browse/YARN-10094 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-10094.001.patch > > > In a large cluster , upgrade NM will cost too much time. > Some times we want to support memory or cpu overuse from RM view. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10094) Add a configuration to support NM overuse in RM
[ https://issues.apache.org/jira/browse/YARN-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-10094: Description: In a large cluster , upgrade NM will cost too much time. Some times we want to support memory or cpu overuse from RM view. > Add a configuration to support NM overuse in RM > --- > > Key: YARN-10094 > URL: https://issues.apache.org/jira/browse/YARN-10094 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > > In a large cluster , upgrade NM will cost too much time. > Some times we want to support memory or cpu overuse from RM view. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10094) Add a configuration to support NM overuse in RM
zhoukang created YARN-10094: --- Summary: Add a configuration to support NM overuse in RM Key: YARN-10094 URL: https://issues.apache.org/jira/browse/YARN-10094 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: zhoukang Assignee: zhoukang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10093) Support list applications by queue name
[ https://issues.apache.org/jira/browse/YARN-10093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-10093: Summary: Support list applications by queue name (was: Support get applications by queue) > Support list applications by queue name > --- > > Key: YARN-10093 > URL: https://issues.apache.org/jira/browse/YARN-10093 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10093) Support get applications by queue
zhoukang created YARN-10093: --- Summary: Support get applications by queue Key: YARN-10093 URL: https://issues.apache.org/jira/browse/YARN-10093 Project: Hadoop YARN Issue Type: Improvement Reporter: zhoukang Assignee: zhoukang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10092) Support config special log retain time for given user
[ https://issues.apache.org/jira/browse/YARN-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-10092: Summary: Support config special log retain time for given user (was: Support log retain time for give user) > Support config special log retain time for given user > -- > > Key: YARN-10092 > URL: https://issues.apache.org/jira/browse/YARN-10092 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10092) Support log retain time for give user
zhoukang created YARN-10092: --- Summary: Support log retain time for give user Key: YARN-10092 URL: https://issues.apache.org/jira/browse/YARN-10092 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: zhoukang Assignee: zhoukang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10091) Support clean up orphan app's log in LogAggService
zhoukang created YARN-10091: --- Summary: Support clean up orphan app's log in LogAggService Key: YARN-10091 URL: https://issues.apache.org/jira/browse/YARN-10091 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: zhoukang Assignee: zhoukang In a large cluster, there will exist orphan app log directory which will cause disk leak.We should support cleanup app log directory for this kind of app -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10062) Support deploy multiple historyserver in case of sp
[ https://issues.apache.org/jira/browse/YARN-10062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-10062: Attachment: YARN-10062.001.patch > Support deploy multiple historyserver in case of sp > --- > > Key: YARN-10062 > URL: https://issues.apache.org/jira/browse/YARN-10062 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-10062.001.patch > > > In this jira, i want to implement a patch to support history ha. > We can deploy two historyserver and use a load balancer like lvs to support > HA. > But there exist error in our production cluster like below: > {code:java} > 19/12/13/00 does not exist. > 2019-12-21,13:25:06,822 DEBUG org.apache.hadoop.yarn.webapp.Controller: > text/plain; charset=UTF-8: java.io.FileNotFoundException: File > /yarn/xxx/staging/history/done/2019/12/13/00 does not exist. > 2019-12-21,13:25:07,530 DEBUG org.apache.hadoop.yarn.webapp.Controller: > text/plain; charset=UTF-8: java.io.FileNotFoundException: File > /yarn/xxx/staging/history/done/2019/12/13/00 does not exist. > 2019-12-21,13:25:09,910 DEBUG org.apache.hadoop.yarn.webapp.Controller: > text/plain; charset=UTF-8: java.io.FileNotFoundException: File > /yarn/xxx/staging/history/done/2019/12/13/00 does not exist. > 2019-12-21,13:44:29,044 DEBUG org.apache.hadoop.yarn.webapp.Controller: > text/plain; charset=UTF-8: java.io.FileNotFoundException: File > /yarn/xxx/staging/history/done/2019/12/13/00 does not exist. > 2019-12-21,13:47:08,154 DEBUG org.apache.hadoop.yarn.webapp.Controller: > text/plain; charset=UTF-8: java.io.FileNotFoundException: File > /yarn/xxx/staging/history/done/2019/12/13/00 does not exist. > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10062) Support deploy multiple historyserver in case of sp
[ https://issues.apache.org/jira/browse/YARN-10062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-10062: Description: In this jira, i want to implement a patch to support history ha. We can deploy two historyserver and use a load balancer like lvs to support HA. But there exist error in our production cluster like below: {code:java} 19/12/13/00 does not exist. 2019-12-21,13:25:06,822 DEBUG org.apache.hadoop.yarn.webapp.Controller: text/plain; charset=UTF-8: java.io.FileNotFoundException: File /yarn/xxx/staging/history/done/2019/12/13/00 does not exist. 2019-12-21,13:25:07,530 DEBUG org.apache.hadoop.yarn.webapp.Controller: text/plain; charset=UTF-8: java.io.FileNotFoundException: File /yarn/xxx/staging/history/done/2019/12/13/00 does not exist. 2019-12-21,13:25:09,910 DEBUG org.apache.hadoop.yarn.webapp.Controller: text/plain; charset=UTF-8: java.io.FileNotFoundException: File /yarn/xxx/staging/history/done/2019/12/13/00 does not exist. 2019-12-21,13:44:29,044 DEBUG org.apache.hadoop.yarn.webapp.Controller: text/plain; charset=UTF-8: java.io.FileNotFoundException: File /yarn/xxx/staging/history/done/2019/12/13/00 does not exist. 2019-12-21,13:47:08,154 DEBUG org.apache.hadoop.yarn.webapp.Controller: text/plain; charset=UTF-8: java.io.FileNotFoundException: File /yarn/xxx/staging/history/done/2019/12/13/00 does not exist. {code} was:In this jira, i want to implement a patch to support history ha > Support deploy multiple historyserver in case of sp > --- > > Key: YARN-10062 > URL: https://issues.apache.org/jira/browse/YARN-10062 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > > In this jira, i want to implement a patch to support history ha. > We can deploy two historyserver and use a load balancer like lvs to support > HA. > But there exist error in our production cluster like below: > {code:java} > 19/12/13/00 does not exist. > 2019-12-21,13:25:06,822 DEBUG org.apache.hadoop.yarn.webapp.Controller: > text/plain; charset=UTF-8: java.io.FileNotFoundException: File > /yarn/xxx/staging/history/done/2019/12/13/00 does not exist. > 2019-12-21,13:25:07,530 DEBUG org.apache.hadoop.yarn.webapp.Controller: > text/plain; charset=UTF-8: java.io.FileNotFoundException: File > /yarn/xxx/staging/history/done/2019/12/13/00 does not exist. > 2019-12-21,13:25:09,910 DEBUG org.apache.hadoop.yarn.webapp.Controller: > text/plain; charset=UTF-8: java.io.FileNotFoundException: File > /yarn/xxx/staging/history/done/2019/12/13/00 does not exist. > 2019-12-21,13:44:29,044 DEBUG org.apache.hadoop.yarn.webapp.Controller: > text/plain; charset=UTF-8: java.io.FileNotFoundException: File > /yarn/xxx/staging/history/done/2019/12/13/00 does not exist. > 2019-12-21,13:47:08,154 DEBUG org.apache.hadoop.yarn.webapp.Controller: > text/plain; charset=UTF-8: java.io.FileNotFoundException: File > /yarn/xxx/staging/history/done/2019/12/13/00 does not exist. > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9567) Add diagnostics for outstanding resource requests on app attempts page
[ https://issues.apache.org/jira/browse/YARN-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019208#comment-17019208 ] Tao Yang commented on YARN-9567: Thanks [~cheersyang] for the review. I have attached V3 patch with updates: * Enable showing activities info only when CS is enabled. * Support pagination for the activities table, examples: Showing app diagnostics: !app-activities-example.png! Showing scheduler activities (when app diagnostics are not found): !scheduler-activities-example.png! > Add diagnostics for outstanding resource requests on app attempts page > -- > > Key: YARN-9567 > URL: https://issues.apache.org/jira/browse/YARN-9567 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9567.001.patch, YARN-9567.002.patch, > YARN-9567.003.patch, app-activities-example.png, > image-2019-06-04-17-29-29-368.png, image-2019-06-04-17-31-31-820.png, > image-2019-06-04-17-58-11-886.png, image-2019-06-14-11-21-41-066.png, > no_diagnostic_at_first.png, scheduler-activities-example.png, > show_diagnostics_after_requesting_app_activities_REST_API.png > > > Currently on app attempt page we can see outstanding resource requests, it > will be helpful for users to know why if we can join diagnostics of this app > with these. > Discussed with [~cheersyang], we can passively load diagnostics from cache of > completed app activities instead of actively triggering which may bring > uncontrollable risks. > For example: > (1) At first we can see no diagnostic in cache if app activities not > triggered below the outstanding requests. > !no_diagnostic_at_first.png|width=793,height=248! > (2) After requesting the application activities REST API, we can see > diagnostics now. > !show_diagnostics_after_requesting_app_activities_REST_API.png|width=1046,height=276! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9605) Add ZkConfiguredFailoverProxyProvider for RM HA
[ https://issues.apache.org/jira/browse/YARN-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019209#comment-17019209 ] zhoukang commented on YARN-9605: ping~ [~prabhujoseph][~subru][~tangzhankun]Could you help push this jira?thanks > Add ZkConfiguredFailoverProxyProvider for RM HA > --- > > Key: YARN-9605 > URL: https://issues.apache.org/jira/browse/YARN-9605 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-9605.001.patch, YARN-9605.002.patch, > YARN-9605.003.patch, YARN-9605.004.patch, YARN-9605.005.patch, > YARN-9605.006.patch > > > In this issue, i will track a new feature to support > ZkConfiguredFailoverProxyProvider for RM HA -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9567) Add diagnostics for outstanding resource requests on app attempts page
[ https://issues.apache.org/jira/browse/YARN-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9567: --- Attachment: scheduler-activities-example.png > Add diagnostics for outstanding resource requests on app attempts page > -- > > Key: YARN-9567 > URL: https://issues.apache.org/jira/browse/YARN-9567 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9567.001.patch, YARN-9567.002.patch, > YARN-9567.003.patch, app-activities-example.png, > image-2019-06-04-17-29-29-368.png, image-2019-06-04-17-31-31-820.png, > image-2019-06-04-17-58-11-886.png, image-2019-06-14-11-21-41-066.png, > no_diagnostic_at_first.png, scheduler-activities-example.png, > show_diagnostics_after_requesting_app_activities_REST_API.png > > > Currently on app attempt page we can see outstanding resource requests, it > will be helpful for users to know why if we can join diagnostics of this app > with these. > Discussed with [~cheersyang], we can passively load diagnostics from cache of > completed app activities instead of actively triggering which may bring > uncontrollable risks. > For example: > (1) At first we can see no diagnostic in cache if app activities not > triggered below the outstanding requests. > !no_diagnostic_at_first.png|width=793,height=248! > (2) After requesting the application activities REST API, we can see > diagnostics now. > !show_diagnostics_after_requesting_app_activities_REST_API.png|width=1046,height=276! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9567) Add diagnostics for outstanding resource requests on app attempts page
[ https://issues.apache.org/jira/browse/YARN-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9567: --- Attachment: app-activities-example.png > Add diagnostics for outstanding resource requests on app attempts page > -- > > Key: YARN-9567 > URL: https://issues.apache.org/jira/browse/YARN-9567 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9567.001.patch, YARN-9567.002.patch, > YARN-9567.003.patch, app-activities-example.png, > image-2019-06-04-17-29-29-368.png, image-2019-06-04-17-31-31-820.png, > image-2019-06-04-17-58-11-886.png, image-2019-06-14-11-21-41-066.png, > no_diagnostic_at_first.png, > show_diagnostics_after_requesting_app_activities_REST_API.png > > > Currently on app attempt page we can see outstanding resource requests, it > will be helpful for users to know why if we can join diagnostics of this app > with these. > Discussed with [~cheersyang], we can passively load diagnostics from cache of > completed app activities instead of actively triggering which may bring > uncontrollable risks. > For example: > (1) At first we can see no diagnostic in cache if app activities not > triggered below the outstanding requests. > !no_diagnostic_at_first.png|width=793,height=248! > (2) After requesting the application activities REST API, we can see > diagnostics now. > !show_diagnostics_after_requesting_app_activities_REST_API.png|width=1046,height=276! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10069) Showing jstack on UI for containers
[ https://issues.apache.org/jira/browse/YARN-10069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019207#comment-17019207 ] zhoukang commented on YARN-10069: - [~akhilpb]Showing jstack for running container > Showing jstack on UI for containers > --- > > Key: YARN-10069 > URL: https://issues.apache.org/jira/browse/YARN-10069 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > > In this jira, i want to post a patch to support showing jstack on the > container ui -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9979) When a app expired with many containers , scheduler event size will be huge
[ https://issues.apache.org/jira/browse/YARN-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-9979: --- Attachment: YARN-9979.001.patch > When a app expired with many containers , scheduler event size will be huge > --- > > Key: YARN-9979 > URL: https://issues.apache.org/jira/browse/YARN-9979 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9979.001.patch > > > When there is an app expired with many containers, the scheduler event size > will be huge. > {code:java} > 2019-11-11,21:39:49,690 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 9000 > 2019-11-11,21:39:49,695 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 1 > 2019-11-11,21:39:49,700 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 11000 > 2019-11-11,21:39:49,705 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 12000 > 2019-11-11,21:39:49,710 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 13000 > 2019-11-11,21:39:49,715 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 14000 > 2019-11-11,21:39:49,720 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Discarded 1 > messages due to full event buffer including: Size of scheduler event-queue is > 15000 > 2019-11-11,21:39:49,724 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 16000 > 2019-11-11,21:39:49,729 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 17000 > 2019-11-11,21:39:49,733 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 18000 > 2019-11-11,21:40:14,953 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 19000 > 2019-11-11,21:43:09,743 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 19000 > 2019-11-11,21:43:09,750 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 2 > 2019-11-11,21:43:09,758 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 21000 > 2019-11-11,21:43:09,766 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 22000 > 2019-11-11,21:43:09,775 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 23000 > 2019-11-11,21:43:09,783 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 24000 > 2019-11-11,21:43:09,792 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 25000 > 2019-11-11,21:43:09,800 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 26000 > 2019-11-11,21:43:09,807 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 27000 > 2019-11-11,21:43:09,814 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 28000 > 2019-11-11,21:46:29,830 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 29000 > 2019-11-11,21:46:29,841 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 3 > 2019-11-11,21:46:29,850 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 31000 > 2019-11-11,21:46:29,862 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 32000 > 2019-11-11,21:49:49,875 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 33000 > 2019-11-11,21:49:49,875 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 34000 > 2019-11-11,21:49:49,876 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 35000 > 2019-11-11,21:49:49,882 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of > scheduler event-queue is 36000 > 2019-11-11,21:49:49,887 INFO > org.apache.hadoop.yarn.server.r
[jira] [Updated] (YARN-9567) Add diagnostics for outstanding resource requests on app attempts page
[ https://issues.apache.org/jira/browse/YARN-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9567: --- Attachment: YARN-9567.003.patch > Add diagnostics for outstanding resource requests on app attempts page > -- > > Key: YARN-9567 > URL: https://issues.apache.org/jira/browse/YARN-9567 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9567.001.patch, YARN-9567.002.patch, > YARN-9567.003.patch, image-2019-06-04-17-29-29-368.png, > image-2019-06-04-17-31-31-820.png, image-2019-06-04-17-58-11-886.png, > image-2019-06-14-11-21-41-066.png, no_diagnostic_at_first.png, > show_diagnostics_after_requesting_app_activities_REST_API.png > > > Currently on app attempt page we can see outstanding resource requests, it > will be helpful for users to know why if we can join diagnostics of this app > with these. > Discussed with [~cheersyang], we can passively load diagnostics from cache of > completed app activities instead of actively triggering which may bring > uncontrollable risks. > For example: > (1) At first we can see no diagnostic in cache if app activities not > triggered below the outstanding requests. > !no_diagnostic_at_first.png|width=793,height=248! > (2) After requesting the application activities REST API, we can see > diagnostics now. > !show_diagnostics_after_requesting_app_activities_REST_API.png|width=1046,height=276! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10010) NM upload log cost too much time
[ https://issues.apache.org/jira/browse/YARN-10010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019205#comment-17019205 ] zhoukang commented on YARN-10010: - I post a patch in YARN-10056 [~wilfreds] I will close this as dupe > NM upload log cost too much time > > > Key: YARN-10010 > URL: https://issues.apache.org/jira/browse/YARN-10010 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: notfound.png > > > Since thread pool size of log service is 100. > Some times the log uploading service will delay for some apps.like below > !notfound.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10010) NM upload log cost too much time
[ https://issues.apache.org/jira/browse/YARN-10010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang resolved YARN-10010. - Resolution: Duplicate > NM upload log cost too much time > > > Key: YARN-10010 > URL: https://issues.apache.org/jira/browse/YARN-10010 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: notfound.png > > > Since thread pool size of log service is 100. > Some times the log uploading service will delay for some apps.like below > !notfound.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10011) Catch all exception during init app in LogAggregationService
[ https://issues.apache.org/jira/browse/YARN-10011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-10011: Attachment: YARN-10011.001.patch > Catch all exception during init app in LogAggregationService > -- > > Key: YARN-10011 > URL: https://issues.apache.org/jira/browse/YARN-10011 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-10011.001.patch > > > we should catch all exception during init app in LogAggregationService in > case of nm exit > {code:java} > 2019-06-12,09:36:03,652 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.lang.IllegalStateException > at > com.google.common.base.Preconditions.checkState(Preconditions.java:129) > at > org.apache.hadoop.ipc.Client.setCallIdAndRetryCount(Client.java:118) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > at com.sun.proxy.$Proxy22.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2115) > at > org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1300) > at > org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1296) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1312) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.verifyAndCreateRemoteLogDir(LogAggregationService.java:193) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:319) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:116) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9930) Support max running app logic for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019204#comment-17019204 ] zhoukang commented on YARN-9930: Sorry for late reply, i will post a patch later. thanks [~sunilg] > Support max running app logic for CapacityScheduler > --- > > Key: YARN-9930 > URL: https://issues.apache.org/jira/browse/YARN-9930 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, capacityscheduler >Affects Versions: 3.1.0, 3.1.1 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > > In FairScheduler, there has limitation for max running which will let > application pending. > But in CapacityScheduler there has no feature like max running app.Only got > max app,and jobs will be rejected directly on client. > This jira i want to implement this semantic for CapacityScheduler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10056) Logservice may encuonter nm fgc since filesystem will only close when app finished
[ https://issues.apache.org/jira/browse/YARN-10056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-10056: Attachment: YARN-10056.001.patch > Logservice may encuonter nm fgc since filesystem will only close when app > finished > -- > > Key: YARN-10056 > URL: https://issues.apache.org/jira/browse/YARN-10056 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-10056.001.patch > > > Currently, filesystem will only be closed when app finished, which may cause > memory overhead -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10056) Logservice may encuonter nm fgc since filesystem will only close when app finished
[ https://issues.apache.org/jira/browse/YARN-10056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-10056: Description: Currently, filesystem will only be closed when app finished, which may cause memory overhead > Logservice may encuonter nm fgc since filesystem will only close when app > finished > -- > > Key: YARN-10056 > URL: https://issues.apache.org/jira/browse/YARN-10056 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > > Currently, filesystem will only be closed when app finished, which may cause > memory overhead -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9567) Add diagnostics for outstanding resource requests on app attempts page
[ https://issues.apache.org/jira/browse/YARN-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9567: --- Attachment: (was: YARN-9567.003.patch) > Add diagnostics for outstanding resource requests on app attempts page > -- > > Key: YARN-9567 > URL: https://issues.apache.org/jira/browse/YARN-9567 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9567.001.patch, YARN-9567.002.patch, > image-2019-06-04-17-29-29-368.png, image-2019-06-04-17-31-31-820.png, > image-2019-06-04-17-58-11-886.png, image-2019-06-14-11-21-41-066.png, > no_diagnostic_at_first.png, > show_diagnostics_after_requesting_app_activities_REST_API.png > > > Currently on app attempt page we can see outstanding resource requests, it > will be helpful for users to know why if we can join diagnostics of this app > with these. > Discussed with [~cheersyang], we can passively load diagnostics from cache of > completed app activities instead of actively triggering which may bring > uncontrollable risks. > For example: > (1) At first we can see no diagnostic in cache if app activities not > triggered below the outstanding requests. > !no_diagnostic_at_first.png|width=793,height=248! > (2) After requesting the application activities REST API, we can see > diagnostics now. > !show_diagnostics_after_requesting_app_activities_REST_API.png|width=1046,height=276! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9567) Add diagnostics for outstanding resource requests on app attempts page
[ https://issues.apache.org/jira/browse/YARN-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9567: --- Attachment: YARN-9567.003.patch > Add diagnostics for outstanding resource requests on app attempts page > -- > > Key: YARN-9567 > URL: https://issues.apache.org/jira/browse/YARN-9567 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9567.001.patch, YARN-9567.002.patch, > YARN-9567.003.patch, image-2019-06-04-17-29-29-368.png, > image-2019-06-04-17-31-31-820.png, image-2019-06-04-17-58-11-886.png, > image-2019-06-14-11-21-41-066.png, no_diagnostic_at_first.png, > show_diagnostics_after_requesting_app_activities_REST_API.png > > > Currently on app attempt page we can see outstanding resource requests, it > will be helpful for users to know why if we can join diagnostics of this app > with these. > Discussed with [~cheersyang], we can passively load diagnostics from cache of > completed app activities instead of actively triggering which may bring > uncontrollable risks. > For example: > (1) At first we can see no diagnostic in cache if app activities not > triggered below the outstanding requests. > !no_diagnostic_at_first.png|width=793,height=248! > (2) After requesting the application activities REST API, we can see > diagnostics now. > !show_diagnostics_after_requesting_app_activities_REST_API.png|width=1046,height=276! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10060) Historyserver may recover too slow since JobHistory init too slow when there exist too many job
[ https://issues.apache.org/jira/browse/YARN-10060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-10060: Attachment: YARN-10060.001.patch > Historyserver may recover too slow since JobHistory init too slow when there > exist too many job > --- > > Key: YARN-10060 > URL: https://issues.apache.org/jira/browse/YARN-10060 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-10060.001.patch > > > Like below it cost >7min to listen to the service port > {code:java} > 2019-12-24,20:01:37,272 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down > 2019-12-24,20:01:47,354 INFO > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager: Initializing Existing > Jobs... > 2019-12-24,20:08:29,589 INFO org.apache.zookeeper.ClientCnxn: Opening socket > connection to server xxx. Will not attempt to authenticate using SASL > (unknown error) > 2019-12-24,20:08:29,589 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to xxx, initiating session > 2019-12-24,20:08:29,590 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server xxx, sessionid = 0x66d1a13e596ddc9, > negotiated timeout = 5000 > 2019-12-24,20:08:29,593 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x66d1a13e596ddc9 closed > 2019-12-24,20:08:29,593 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down > 2019-12-24,20:08:29,655 INFO > org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage: CachedHistoryStorage > Init > 2019-12-24,20:08:29,681 INFO org.apache.hadoop.ipc.CallQueueManager: Using > callQueue class java.util.concurrent.LinkedBlockingQueue > 2019-12-24,20:08:29,715 INFO org.apache.hadoop.ipc.CallQueueManager: Using > callQueue class java.util.concurrent.LinkedBlockingQueue > 2019-12-24,20:08:29,800 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: > loaded properties from hadoop-metrics2.properties > 2019-12-24,20:08:29,943 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period > at 10 second(s). > 2019-12-24,20:08:29,943 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: JobHistoryServer metrics > system started > 2019-12-24,20:08:29,950 INFO > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: > Updating the current master key for generating delegation tokens > 2019-12-24,20:08:29,951 INFO > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: > Starting expired delegation token remover thread, > tokenRemoverScanInterval=60 min(s) > 2019-12-24,20:08:29,952 INFO > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: > Updating the current master key for generating delegation tokens > 2019-12-24,20:08:30,015 INFO org.apache.hadoop.http.HttpRequestLog: Http > request log for http.requests.jobhistory is not defined > 2019-12-24,20:08:30,025 INFO org.apache.hadoop.http.HttpServer2: Added global > filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter) > 2019-12-24,20:08:30,027 INFO org.apache.hadoop.http.HttpServer2: Added filter > static_user_filter > (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to > context jobhistory > 2019-12-24,20:08:30,027 INFO org.apache.hadoop.http.HttpServer2: Added filter > static_user_filter > (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to > context static > 2019-12-24,20:08:30,030 INFO org.apache.hadoop.http.HttpServer2: adding path > spec: /jobhistory/* > 2019-12-24,20:08:30,030 INFO org.apache.hadoop.http.HttpServer2: adding path > spec: /ws/* > 2019-12-24,20:08:30,057 INFO org.apache.hadoop.http.HttpServer2: Jetty bound > to port 20901 > 2019-12-24,20:08:30,939 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app > /jobhistory started at 20901 > 2019-12-24,20:08:31,177 INFO org.apache.hadoop.yarn.webapp.WebApps: > Registered webapp guice modules > 2019-12-24,20:08:31,187 INFO org.apache.hadoop.ipc.CallQueueManager: Using > callQueue class java.util.concurrent.LinkedBlockingQueue > 2019-12-24,20:08:31,187 INFO org.apache.hadoop.ipc.CallQueueManager: Using > callQueue class java.util.concurrent.LinkedBlockingQueue > 2019-12-24,20:08:31,189 INFO > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding > protocol org.apache.hadoop.mapreduce.v2.api.HSClientProtocolPB to the server > 2019-12-24,20:08:31,216 INFO > org.apache.hadoop.mapreduce.v2.hs.HistoryClientService: Instantiated > HistoryClientService at xxx > 2019-12-24,20:08:31,344 INFO > org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService: > aggregated log deletion started
[jira] [Commented] (YARN-10060) Historyserver may recover too slow since JobHistory init too slow when there exist too many job
[ https://issues.apache.org/jira/browse/YARN-10060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019193#comment-17019193 ] zhoukang commented on YARN-10060: - I will submit a patch to skip load file older than max history age > Historyserver may recover too slow since JobHistory init too slow when there > exist too many job > --- > > Key: YARN-10060 > URL: https://issues.apache.org/jira/browse/YARN-10060 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > > Like below it cost >7min to listen to the service port > {code:java} > 2019-12-24,20:01:37,272 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down > 2019-12-24,20:01:47,354 INFO > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager: Initializing Existing > Jobs... > 2019-12-24,20:08:29,589 INFO org.apache.zookeeper.ClientCnxn: Opening socket > connection to server xxx. Will not attempt to authenticate using SASL > (unknown error) > 2019-12-24,20:08:29,589 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to xxx, initiating session > 2019-12-24,20:08:29,590 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server xxx, sessionid = 0x66d1a13e596ddc9, > negotiated timeout = 5000 > 2019-12-24,20:08:29,593 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x66d1a13e596ddc9 closed > 2019-12-24,20:08:29,593 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down > 2019-12-24,20:08:29,655 INFO > org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage: CachedHistoryStorage > Init > 2019-12-24,20:08:29,681 INFO org.apache.hadoop.ipc.CallQueueManager: Using > callQueue class java.util.concurrent.LinkedBlockingQueue > 2019-12-24,20:08:29,715 INFO org.apache.hadoop.ipc.CallQueueManager: Using > callQueue class java.util.concurrent.LinkedBlockingQueue > 2019-12-24,20:08:29,800 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: > loaded properties from hadoop-metrics2.properties > 2019-12-24,20:08:29,943 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period > at 10 second(s). > 2019-12-24,20:08:29,943 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: JobHistoryServer metrics > system started > 2019-12-24,20:08:29,950 INFO > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: > Updating the current master key for generating delegation tokens > 2019-12-24,20:08:29,951 INFO > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: > Starting expired delegation token remover thread, > tokenRemoverScanInterval=60 min(s) > 2019-12-24,20:08:29,952 INFO > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: > Updating the current master key for generating delegation tokens > 2019-12-24,20:08:30,015 INFO org.apache.hadoop.http.HttpRequestLog: Http > request log for http.requests.jobhistory is not defined > 2019-12-24,20:08:30,025 INFO org.apache.hadoop.http.HttpServer2: Added global > filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter) > 2019-12-24,20:08:30,027 INFO org.apache.hadoop.http.HttpServer2: Added filter > static_user_filter > (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to > context jobhistory > 2019-12-24,20:08:30,027 INFO org.apache.hadoop.http.HttpServer2: Added filter > static_user_filter > (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to > context static > 2019-12-24,20:08:30,030 INFO org.apache.hadoop.http.HttpServer2: adding path > spec: /jobhistory/* > 2019-12-24,20:08:30,030 INFO org.apache.hadoop.http.HttpServer2: adding path > spec: /ws/* > 2019-12-24,20:08:30,057 INFO org.apache.hadoop.http.HttpServer2: Jetty bound > to port 20901 > 2019-12-24,20:08:30,939 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app > /jobhistory started at 20901 > 2019-12-24,20:08:31,177 INFO org.apache.hadoop.yarn.webapp.WebApps: > Registered webapp guice modules > 2019-12-24,20:08:31,187 INFO org.apache.hadoop.ipc.CallQueueManager: Using > callQueue class java.util.concurrent.LinkedBlockingQueue > 2019-12-24,20:08:31,187 INFO org.apache.hadoop.ipc.CallQueueManager: Using > callQueue class java.util.concurrent.LinkedBlockingQueue > 2019-12-24,20:08:31,189 INFO > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding > protocol org.apache.hadoop.mapreduce.v2.api.HSClientProtocolPB to the server > 2019-12-24,20:08:31,216 INFO > org.apache.hadoop.mapreduce.v2.hs.HistoryClientService: Instantiated > HistoryClientService at xxx > 2019-12-24,20:08:31,344 INFO > org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionS
[jira] [Commented] (YARN-10080) Support show app id on localizer thread pool
[ https://issues.apache.org/jira/browse/YARN-10080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019188#comment-17019188 ] zhoukang commented on YARN-10080: - [~abmodi]thanks for review, i submit a new patch to support show container id > Support show app id on localizer thread pool > > > Key: YARN-10080 > URL: https://issues.apache.org/jira/browse/YARN-10080 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-10080-001.patch, YARN-10080.002.patch > > > Currently when we are troubleshooting a container localizer issue, if we want > to analyze the jstack with thread detail, we can not figure out which thread > is processing the given container. So i want to add app id on the thread name -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10080) Support show app id on localizer thread pool
[ https://issues.apache.org/jira/browse/YARN-10080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-10080: Attachment: YARN-10080.002.patch > Support show app id on localizer thread pool > > > Key: YARN-10080 > URL: https://issues.apache.org/jira/browse/YARN-10080 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-10080-001.patch, YARN-10080.002.patch > > > Currently when we are troubleshooting a container localizer issue, if we want > to analyze the jstack with thread detail, we can not figure out which thread > is processing the given container. So i want to add app id on the thread name -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10090) ApplicationNotFoundException will cause a UndeclaredThrowableException
qiwei huang created YARN-10090: -- Summary: ApplicationNotFoundException will cause a UndeclaredThrowableException Key: YARN-10090 URL: https://issues.apache.org/jira/browse/YARN-10090 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.9.2 Environment: Hadoop 2.9.2 Reporter: qiwei huang while entering a non-exist application page(e.g. RM:8088/cluster/app/application_1234), the getApplicationReport will throw an ApplicationNotFoundException and would cause UndeclaredThrowableException in the UserGroupInformation. the log is like: 2020-01-15 15:10:13,056 [6224200281] - ERROR [90425890@qtp-1302725372-97757:AppBlock@124] - Failed to read the application application_1572848307818_1234.2020-01-15 15:10:13,056 [6224200281] - ERROR [90425890@qtp-1302725372-97757:AppBlock@124] - Failed to read the application application_1572848307818_2006587.java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911) at org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:114) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppBlock.render(RMAppBlock.java:70) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:848) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.app(RmController.java:54) at sun.reflect.GeneratedMethodAccessor222.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:173) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:178) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1440) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortba
[jira] [Commented] (YARN-10049) FIFOOrderingPolicy Improvements
[ https://issues.apache.org/jira/browse/YARN-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018890#comment-17018890 ] Manikandan R commented on YARN-10049: - Thanks [~sunilg] [~leftnoteasy] Attaching .001.patch based on earlier discussions. {{FIFOComparator}} has been used in FifoOrderingPolicy, FifoOrderingPolicyForPendingApps, FifoOrderingPolicyWithExclusivePartitions (through other class) and FairOrderingPolicy but in different order. So changes made in comparator would be applicable in all above policies. > FIFOOrderingPolicy Improvements > --- > > Key: YARN-10049 > URL: https://issues.apache.org/jira/browse/YARN-10049 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-10049.001.patch > > > FIFOPolicy of FS does the following comparisons in addition to app priority > comparison: > 1. Using Start time > 2. Using Name > Scope of this jira is to achieve the same comparisons in FIFOOrderingPolicy > of CS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10049) FIFOOrderingPolicy Improvements
[ https://issues.apache.org/jira/browse/YARN-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-10049: Attachment: YARN-10049.001.patch > FIFOOrderingPolicy Improvements > --- > > Key: YARN-10049 > URL: https://issues.apache.org/jira/browse/YARN-10049 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-10049.001.patch > > > FIFOPolicy of FS does the following comparisons in addition to app priority > comparison: > 1. Using Start time > 2. Using Name > Scope of this jira is to achieve the same comparisons in FIFOOrderingPolicy > of CS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org