[jira] [Commented] (YARN-4710) Reduce logging application reserved debug info in FSAppAttempt#assignContainer
[ https://issues.apache.org/jira/browse/YARN-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155970#comment-15155970 ] Lin Yiqun commented on YARN-4710: - In my opinion, this reserved record can be printed when successfully assigning container and printing other concrete info. > Reduce logging application reserved debug info in FSAppAttempt#assignContainer > -- > > Key: YARN-4710 > URL: https://issues.apache.org/jira/browse/YARN-4710 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun >Priority: Minor > Attachments: YARN-4710.001.patch, yarn-debug.log > > > I found lots of unimportant records info in assigning container when I > prepared to debug the problem of container assigning.There are too many > records like this in yarn-resourcemanager.log, and it's difficiult for me to > directly to found the important info. > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:52,971 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:52,976 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:52,981 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:52,986 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:52,991 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:52,996 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,001 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,007 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,012 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,017 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,022 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,027 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,032 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,038 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,050 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,057 DEBUG > {code} > The reason why of so many records is that it always print this info first in > container assigning whether the assigned result is successful or failed. > Can see the complete yarn log in updated log, and you can see how many > records there are. > And in addition, too many these info logging will slow down process of > container assigning.Maybe we should change this logLevel to other level, like > {{trace}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4710) Reduce logging application reserved debug info in FSAppAttempt#assignContainer
[ https://issues.apache.org/jira/browse/YARN-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4710: Attachment: YARN-4710.001.patch Attach a initial patch. The patch change this logLevel from {{DEBUG}} to {{TRACE}}, kindly review, thanks. > Reduce logging application reserved debug info in FSAppAttempt#assignContainer > -- > > Key: YARN-4710 > URL: https://issues.apache.org/jira/browse/YARN-4710 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun >Priority: Minor > Attachments: YARN-4710.001.patch, yarn-debug.log > > > I found lots of unimportant records info in assigning container when I > prepared to debug the problem of container assigning.There are too many > records like this in yarn-resourcemanager.log, and it's difficiult for me to > directly to found the important info. > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:52,971 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:52,976 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:52,981 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:52,986 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:52,991 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:52,996 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,001 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,007 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,012 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,017 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,022 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,027 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,032 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,038 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,050 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,057 DEBUG > {code} > The reason why of so many records is that it always print this info first in > container assigning whether the assigned result is successful or failed. > Can see the complete yarn log in updated log, and you can see how many > records there are. > And in addition, too many these info logging will slow down process of > container assigning.Maybe we should change this logLevel to other level, like > {{trace}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4710) Reduce logging application reserved debug info in FSAppAttempt#assignContainer
[ https://issues.apache.org/jira/browse/YARN-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4710: Attachment: yarn-debug.log > Reduce logging application reserved debug info in FSAppAttempt#assignContainer > -- > > Key: YARN-4710 > URL: https://issues.apache.org/jira/browse/YARN-4710 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun >Priority: Minor > Attachments: yarn-debug.log > > > I found lots of unimportant records info in assigning container when I > prepared to debug the problem of container assigning.There are too many > records like this in yarn-resourcemanager.log, and it's difficiult for me to > directly to found the important info. > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:52,971 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:52,976 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:52,981 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:52,986 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:52,991 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:52,996 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,001 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,007 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,012 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,017 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,022 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,027 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,032 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,038 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,050 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > Node offered to app: application_1449458968698_0011 reserved: false > 2016-02-21 16:31:53,057 DEBUG > {code} > The reason why of so many records is that it always print this info first in > container assigning whether the assigned result is successful or failed. > Can see the complete yarn log in updated log, and you can see how many > records there are. > And in addition, too many these info logging will slow down process of > container assigning.Maybe we should change this logLevel to other level, like > {{trace}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4710) Reduce logging application reserved debug info in FSAppAttempt#assignContainer
Lin Yiqun created YARN-4710: --- Summary: Reduce logging application reserved debug info in FSAppAttempt#assignContainer Key: YARN-4710 URL: https://issues.apache.org/jira/browse/YARN-4710 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.7.1 Reporter: Lin Yiqun Assignee: Lin Yiqun Priority: Minor I found lots of unimportant records info in assigning container when I prepared to debug the problem of container assigning.There are too many records like this in yarn-resourcemanager.log, and it's difficiult for me to directly to found the important info. {code} org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node offered to app: application_1449458968698_0011 reserved: false 2016-02-21 16:31:52,971 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node offered to app: application_1449458968698_0011 reserved: false 2016-02-21 16:31:52,976 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node offered to app: application_1449458968698_0011 reserved: false 2016-02-21 16:31:52,981 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node offered to app: application_1449458968698_0011 reserved: false 2016-02-21 16:31:52,986 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node offered to app: application_1449458968698_0011 reserved: false 2016-02-21 16:31:52,991 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node offered to app: application_1449458968698_0011 reserved: false 2016-02-21 16:31:52,996 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node offered to app: application_1449458968698_0011 reserved: false 2016-02-21 16:31:53,001 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node offered to app: application_1449458968698_0011 reserved: false 2016-02-21 16:31:53,007 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node offered to app: application_1449458968698_0011 reserved: false 2016-02-21 16:31:53,012 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node offered to app: application_1449458968698_0011 reserved: false 2016-02-21 16:31:53,017 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node offered to app: application_1449458968698_0011 reserved: false 2016-02-21 16:31:53,022 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node offered to app: application_1449458968698_0011 reserved: false 2016-02-21 16:31:53,027 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node offered to app: application_1449458968698_0011 reserved: false 2016-02-21 16:31:53,032 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node offered to app: application_1449458968698_0011 reserved: false 2016-02-21 16:31:53,038 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node offered to app: application_1449458968698_0011 reserved: false 2016-02-21 16:31:53,050 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node offered to app: application_1449458968698_0011 reserved: false 2016-02-21 16:31:53,057 DEBUG {code} The reason why of so many records is that it always print this info first in container assigning whether the assigned result is successful or failed. Can see the complete yarn log in updated log, and you can see how many records there are. And in addition, too many these info logging will slow down process of container assigning.Maybe we should change this logLevel to other level, like {{trace}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4668) Reuse objectMapper instance in Yarn
Lin Yiqun created YARN-4668: --- Summary: Reuse objectMapper instance in Yarn Key: YARN-4668 URL: https://issues.apache.org/jira/browse/YARN-4668 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.7.1 Reporter: Lin Yiqun Assignee: Lin Yiqun -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4668) Reuse objectMapper instance in Yarn
[ https://issues.apache.org/jira/browse/YARN-4668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4668: Attachment: YARN.001.patch > Reuse objectMapper instance in Yarn > --- > > Key: YARN-4668 > URL: https://issues.apache.org/jira/browse/YARN-4668 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4668) Reuse objectMapper instance in Yarn
[ https://issues.apache.org/jira/browse/YARN-4668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4668: Attachment: (was: YARN.001.patch) > Reuse objectMapper instance in Yarn > --- > > Key: YARN-4668 > URL: https://issues.apache.org/jira/browse/YARN-4668 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4668) Reuse objectMapper instance in Yarn
[ https://issues.apache.org/jira/browse/YARN-4668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4668: Attachment: YARN-4668.001.patch > Reuse objectMapper instance in Yarn > --- > > Key: YARN-4668 > URL: https://issues.apache.org/jira/browse/YARN-4668 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4668.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4668) Reuse objectMapper instance in Yarn
[ https://issues.apache.org/jira/browse/YARN-4668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4668: Description: This jira is similar to MAPREDUCE-6626, we can see detail info about this problem. > Reuse objectMapper instance in Yarn > --- > > Key: YARN-4668 > URL: https://issues.apache.org/jira/browse/YARN-4668 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4668.001.patch > > > This jira is similar to MAPREDUCE-6626, we can see detail info about this > problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4668) Reuse objectMapper instance in Yarn
[ https://issues.apache.org/jira/browse/YARN-4668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4668: Attachment: YARN-4668.002.patch Fix checkstyle errors. > Reuse objectMapper instance in Yarn > --- > > Key: YARN-4668 > URL: https://issues.apache.org/jira/browse/YARN-4668 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4668.001.patch, YARN-4668.002.patch > > > This jira is similar to MAPREDUCE-6626, we can see detail info about this > problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4534) Remove the redundant symbol in yarn rmadmin help msg
[ https://issues.apache.org/jira/browse/YARN-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097936#comment-15097936 ] Lin Yiqun commented on YARN-4534: - Thanks [~ajisakaa] for review and commit! > Remove the redundant symbol in yarn rmadmin help msg > > > Key: YARN-4534 > URL: https://issues.apache.org/jira/browse/YARN-4534 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun >Priority: Trivial > Fix For: 2.8.0 > > Attachments: YARN-4534.001.patch > > > In rmadmin help, there is a redundant symbol ']' in > {{-directlyAccessNodeLabelStore}} command. The msg is following: > {code} > bin/yarn rmadmin -help > rmadmin is the command to execute YARN administrative commands. > The full syntax is: > yarn rmadmin [-refreshQueues] [-refreshNodes] > [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] > [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] > [[-addToClusterNodeLabels [label1,label2,label3]] > [-removeFromClusterNodeLabels [label1,label2,label3]] [-replaceLabelsOnNode > [node1[:port]=label1,label2 node2[:port]=label1] > [-directlyAccessNodeLabelStore]] [-help [cmd]] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4534) Remove the redundant symbol in yarn rmadmin help msg
[ https://issues.apache.org/jira/browse/YARN-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15087013#comment-15087013 ] Lin Yiqun commented on YARN-4534: - The failed test seems not related. Kindly review. > Remove the redundant symbol in yarn rmadmin help msg > > > Key: YARN-4534 > URL: https://issues.apache.org/jira/browse/YARN-4534 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4534.001.patch > > > In rmadmin help, there is a redundant symbol ']' in > {{-directlyAccessNodeLabelStore}} command. The msg is following: > {code} > bin/yarn rmadmin -help > rmadmin is the command to execute YARN administrative commands. > The full syntax is: > yarn rmadmin [-refreshQueues] [-refreshNodes] > [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] > [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] > [[-addToClusterNodeLabels [label1,label2,label3]] > [-removeFromClusterNodeLabels [label1,label2,label3]] [-replaceLabelsOnNode > [node1[:port]=label1,label2 node2[:port]=label1] > [-directlyAccessNodeLabelStore]] [-help [cmd]] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4533) Killing applications by user in Yarn RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4533: Attachment: YARN-4533.001.patch > Killing applications by user in Yarn RMAdmin CLI > > > Key: YARN-4533 > URL: https://issues.apache.org/jira/browse/YARN-4533 > Project: Hadoop YARN > Issue Type: Sub-task > Components: applications, client >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4533.001.patch > > > The cmd likes > {code} > [-killApplicationsForUser [username]] Kill the applications of specific user. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4533) Killing applications by user in Yarn RMAdmin CLI
Lin Yiqun created YARN-4533: --- Summary: Killing applications by user in Yarn RMAdmin CLI Key: YARN-4533 URL: https://issues.apache.org/jira/browse/YARN-4533 Project: Hadoop YARN Issue Type: Sub-task Reporter: Lin Yiqun Assignee: Lin Yiqun The cmd likes {code} [-killApplicationsForUser [username]] Kill the applications of specific user. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4532) Killing applications by appStates and queue in Yarn Application CLI
Lin Yiqun created YARN-4532: --- Summary: Killing applications by appStates and queue in Yarn Application CLI Key: YARN-4532 URL: https://issues.apache.org/jira/browse/YARN-4532 Project: Hadoop YARN Issue Type: Sub-task Reporter: Lin Yiqun Assignee: Lin Yiqun -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4534) Remove the redundant symbol in yarn rmadmin help msg
[ https://issues.apache.org/jira/browse/YARN-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4534: Attachment: YARN-4534.001.patch > Remove the redundant symbol in yarn rmadmin help msg > > > Key: YARN-4534 > URL: https://issues.apache.org/jira/browse/YARN-4534 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4534.001.patch > > > In rmadmin help, there is a redundant symbol ']' in > {{-directlyAccessNodeLabelStore}} command. The msg is following: > {code} > bin/yarn rmadmin -help > rmadmin is the command to execute YARN administrative commands. > The full syntax is: > yarn rmadmin [-refreshQueues] [-refreshNodes] > [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] > [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] > [[-addToClusterNodeLabels [label1,label2,label3]] > [-removeFromClusterNodeLabels [label1,label2,label3]] [-replaceLabelsOnNode > [node1[:port]=label1,label2 node2[:port]=label1] > [-directlyAccessNodeLabelStore]] [-help [cmd]] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4532) Killing applications by appStates and queue in Yarn Application CLI
[ https://issues.apache.org/jira/browse/YARN-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4532: Description: The cmd likes {code} -killByAppStates The states of application that will be killed. -killOfQueue Kill the applications of specific queue. {code} > Killing applications by appStates and queue in Yarn Application CLI > --- > > Key: YARN-4532 > URL: https://issues.apache.org/jira/browse/YARN-4532 > Project: Hadoop YARN > Issue Type: Sub-task > Components: applications, client >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4532.001.patch > > > The cmd likes > {code} > -killByAppStates The states of application that will be killed. > -killOfQueue Kill the applications of specific queue. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4532) Killing applications by appStates and queue in Yarn Application CLI
[ https://issues.apache.org/jira/browse/YARN-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4532: Attachment: YARN-4532.001.patch > Killing applications by appStates and queue in Yarn Application CLI > --- > > Key: YARN-4532 > URL: https://issues.apache.org/jira/browse/YARN-4532 > Project: Hadoop YARN > Issue Type: Sub-task > Components: applications, client >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4532.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4529) Yarn CLI killing applications in batch
[ https://issues.apache.org/jira/browse/YARN-4529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15080645#comment-15080645 ] Lin Yiqun commented on YARN-4529: - Thanks [~Naganarasimha] for comments. I update the patch as your querys and what else should I do with this patch? > Yarn CLI killing applications in batch > -- > > Key: YARN-4529 > URL: https://issues.apache.org/jira/browse/YARN-4529 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications, client >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4529.001.patch > > > We have not a good way to kill applications conveniently when starting some > apps unexpected. At present, we have to kill them one by one. We can add some > kill command that can kill apps in batch, like these: > {code} > -killByAppStatesThe states of application that will be killed. > -killByUser Kill running-state applications of specific > user. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4529) Yarn CLI killing applications in batch
[ https://issues.apache.org/jira/browse/YARN-4529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4529: Issue Type: New Feature (was: Improvement) > Yarn CLI killing applications in batch > -- > > Key: YARN-4529 > URL: https://issues.apache.org/jira/browse/YARN-4529 > Project: Hadoop YARN > Issue Type: New Feature > Components: applications, client >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4529.001.patch > > > We have not a good way to kill applications conveniently when starting some > apps unexpected. At present, we have to kill them one by one. We can add some > kill command that can kill apps in batch, like these: > {code} > -killByAppStatesThe states of application that will be killed. > -killByUser Kill running-state applications of specific > user. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4534) Remove the redundant symbol in yarn rmadmin help msg
Lin Yiqun created YARN-4534: --- Summary: Remove the redundant symbol in yarn rmadmin help msg Key: YARN-4534 URL: https://issues.apache.org/jira/browse/YARN-4534 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.7.1 Reporter: Lin Yiqun Assignee: Lin Yiqun In rmadmin help, there is a redundant symbol ']' in {{-directlyAccessNodeLabelStore}} command. The msg is following: {code} bin/yarn rmadmin -help rmadmin is the command to execute YARN administrative commands. The full syntax is: yarn rmadmin [-refreshQueues] [-refreshNodes] [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] [[-addToClusterNodeLabels [label1,label2,label3]] [-removeFromClusterNodeLabels [label1,label2,label3]] [-replaceLabelsOnNode [node1[:port]=label1,label2 node2[:port]=label1] [-directlyAccessNodeLabelStore]] [-help [cmd]] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4529) Yarn CLI killing applications in batch
Lin Yiqun created YARN-4529: --- Summary: Yarn CLI killing applications in batch Key: YARN-4529 URL: https://issues.apache.org/jira/browse/YARN-4529 Project: Hadoop YARN Issue Type: Improvement Components: applications, client Affects Versions: 2.7.1 Reporter: Lin Yiqun Assignee: Lin Yiqun We have not a good way to kill applications conveniently when starting some apps unexpected. At present, we have to kill them one by one. We can add some kill command that can kill apps in batch, like these: {code} -killByAppStatesThe states of application that will be killed. -killByUser Kill running-state applications of specific user. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4529) Yarn CLI killing applications in batch
[ https://issues.apache.org/jira/browse/YARN-4529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4529: Attachment: YARN-4529.001.patch > Yarn CLI killing applications in batch > -- > > Key: YARN-4529 > URL: https://issues.apache.org/jira/browse/YARN-4529 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications, client >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4529.001.patch > > > We have not a good way to kill applications conveniently when starting some > apps unexpected. At present, we have to kill them one by one. We can add some > kill command that can kill apps in batch, like these: > {code} > -killByAppStatesThe states of application that will be killed. > -killByUser Kill running-state applications of specific > user. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4440) FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time
[ https://issues.apache.org/jira/browse/YARN-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059268#comment-15059268 ] Lin Yiqun commented on YARN-4440: - Thanks [~zxu] for review and commit. > FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time > - > > Key: YARN-4440 > URL: https://issues.apache.org/jira/browse/YARN-4440 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Fix For: 2.8.0 > > Attachments: YARN-4440.001.patch, YARN-4440.002.patch, > YARN-4440.003.patch > > > It seems there is a bug on {{FSAppAttempt#getAllowedLocalityLevelByTime}} > method > {code} > // default level is NODE_LOCAL > if (! allowedLocalityLevel.containsKey(priority)) { > allowedLocalityLevel.put(priority, NodeType.NODE_LOCAL); > return NodeType.NODE_LOCAL; > } > {code} > If you first invoke this method, it doesn't init time in > lastScheduledContainer and this will lead to execute these code for next > invokation: > {code} > // check waiting time > long waitTime = currentTimeMs; > if (lastScheduledContainer.containsKey(priority)) { > waitTime -= lastScheduledContainer.get(priority); > } else { > waitTime -= getStartTime(); > } > {code} > the waitTime will subtract to FsApp startTime, and this will be easily more > than the delay time and allowedLocality degrade. Because FsApp startTime will > be start earlier than currentTimeMs. So we should add the initial time of > priority to prevent comparing with FsApp startTime and allowedLocalityLevel > degrade. And this problem will have more negative influence for small-jobs. > The YARN-4399 also discuss some problem in aspect of locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4381) Optimize container metrics in NodeManagerMetrics
[ https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4381: Attachment: YARN-4381.003.patch Update the patch and modiy the checkstyle warnings. > Optimize container metrics in NodeManagerMetrics > > > Key: YARN-4381 > URL: https://issues.apache.org/jira/browse/YARN-4381 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4381.001.patch, YARN-4381.002.patch, > YARN-4381.003.patch > > > Recently, I found a issue on nodemanager metrics.That's > {{NodeManagerMetrics#containersLaunched}} is not actually means the container > succeed launched times.Because in some time, it will be failed when receiving > the killing command or happening container-localizationFailed.This will lead > to a failed container.But now,this counter value will be increased in these > code whenever the container is started successfully or failed. > {code} > Credentials credentials = parseCredentials(launchContext); > Container container = > new ContainerImpl(getConfig(), this.dispatcher, > context.getNMStateStore(), launchContext, > credentials, metrics, containerTokenIdentifier); > ApplicationId applicationID = > containerId.getApplicationAttemptId().getApplicationId(); > if (context.getContainers().putIfAbsent(containerId, container) != null) { > NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER, > "ContainerManagerImpl", "Container already running on this node!", > applicationID, containerId); > throw RPCUtil.getRemoteException("Container " + containerIdStr > + " already is running on this node!!"); > } > this.readLock.lock(); > try { > if (!serviceStopped) { > // Create the application > Application application = > new ApplicationImpl(dispatcher, user, applicationID, credentials, > context); > if (null == context.getApplications().putIfAbsent(applicationID, > application)) { > LOG.info("Creating a new application reference for app " + > applicationID); > LogAggregationContext logAggregationContext = > containerTokenIdentifier.getLogAggregationContext(); > MapappAcls = > container.getLaunchContext().getApplicationACLs(); > context.getNMStateStore().storeApplication(applicationID, > buildAppProto(applicationID, user, credentials, appAcls, > logAggregationContext)); > dispatcher.getEventHandler().handle( > new ApplicationInitEvent(applicationID, appAcls, > logAggregationContext)); > } > this.context.getNMStateStore().storeContainer(containerId, request); > dispatcher.getEventHandler().handle( > new ApplicationContainerInitEvent(container)); > > this.context.getContainerTokenSecretManager().startContainerSuccessful( > containerTokenIdentifier); > NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER, > "ContainerManageImpl", applicationID, containerId); > // TODO launchedContainer misplaced -> doesn't necessarily mean a > container > // launch. A finished Application will not launch containers. > metrics.launchedContainer(); > metrics.allocateContainer(containerTokenIdentifier.getResource()); > } else { > throw new YarnException( > "Container start failed as the NodeManager is " + > "in the process of shutting down"); > } > {code} > In addition, we are lack of localzationFailed metric in container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4440) FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time
[ https://issues.apache.org/jira/browse/YARN-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4440: Attachment: YARN-4440.003.patch Modify the error. > FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time > - > > Key: YARN-4440 > URL: https://issues.apache.org/jira/browse/YARN-4440 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4440.001.patch, YARN-4440.002.patch, > YARN-4440.003.patch > > > It seems there is a bug on {{FSAppAttempt#getAllowedLocalityLevelByTime}} > method > {code} > // default level is NODE_LOCAL > if (! allowedLocalityLevel.containsKey(priority)) { > allowedLocalityLevel.put(priority, NodeType.NODE_LOCAL); > return NodeType.NODE_LOCAL; > } > {code} > If you first invoke this method, it doesn't init time in > lastScheduledContainer and this will lead to execute these code for next > invokation: > {code} > // check waiting time > long waitTime = currentTimeMs; > if (lastScheduledContainer.containsKey(priority)) { > waitTime -= lastScheduledContainer.get(priority); > } else { > waitTime -= getStartTime(); > } > {code} > the waitTime will subtract to FsApp startTime, and this will be easily more > than the delay time and allowedLocality degrade. Because FsApp startTime will > be start earlier than currentTimeMs. So we should add the initial time of > priority to prevent comparing with FsApp startTime and allowedLocalityLevel > degrade. And this problem will have more negative influence for small-jobs. > The YARN-4399 also discuss some problem in aspect of locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4396) Log the trance information on FSAppAttempt#assignContainer
[ https://issues.apache.org/jira/browse/YARN-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052552#comment-15052552 ] Lin Yiqun commented on YARN-4396: - This patch can shows detail infos of allocating container. The YARN-4399 and YARN-4440 can use this info to test. And the jekins failed test is not related. Thanks reviewing! > Log the trance information on FSAppAttempt#assignContainer > -- > > Key: YARN-4396 > URL: https://issues.apache.org/jira/browse/YARN-4396 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications, scheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4396.001.patch, YARN-4396.002.patch, > YARN-4396.003.patch, YARN-4396.004.patch > > > When I configure the yarn.scheduler.fair.locality.threshold.node and > yarn.scheduler.fair.locality.threshold.rack to open this function, I have no > detail info of assigning container's locality. And it's important because it > will lead some delay scheduling and will have an influence on my cluster. If > I know these info, I can adjust param in cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4381) Optimize container metrics in NodeManagerMetrics
[ https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4381: Summary: Optimize container metrics in NodeManagerMetrics (was: Add container launchEvent and container localizeFailed metrics in container) > Optimize container metrics in NodeManagerMetrics > > > Key: YARN-4381 > URL: https://issues.apache.org/jira/browse/YARN-4381 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4381.001.patch, YARN-4381.002.patch > > > Recently, I found a issue on nodemanager metrics.That's > {{NodeManagerMetrics#containersLaunched}} is not actually means the container > succeed launched times.Because in some time, it will be failed when receiving > the killing command or happening container-localizationFailed.This will lead > to a failed container.But now,this counter value will be increased in these > code whenever the container is started successfully or failed. > {code} > Credentials credentials = parseCredentials(launchContext); > Container container = > new ContainerImpl(getConfig(), this.dispatcher, > context.getNMStateStore(), launchContext, > credentials, metrics, containerTokenIdentifier); > ApplicationId applicationID = > containerId.getApplicationAttemptId().getApplicationId(); > if (context.getContainers().putIfAbsent(containerId, container) != null) { > NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER, > "ContainerManagerImpl", "Container already running on this node!", > applicationID, containerId); > throw RPCUtil.getRemoteException("Container " + containerIdStr > + " already is running on this node!!"); > } > this.readLock.lock(); > try { > if (!serviceStopped) { > // Create the application > Application application = > new ApplicationImpl(dispatcher, user, applicationID, credentials, > context); > if (null == context.getApplications().putIfAbsent(applicationID, > application)) { > LOG.info("Creating a new application reference for app " + > applicationID); > LogAggregationContext logAggregationContext = > containerTokenIdentifier.getLogAggregationContext(); > MapappAcls = > container.getLaunchContext().getApplicationACLs(); > context.getNMStateStore().storeApplication(applicationID, > buildAppProto(applicationID, user, credentials, appAcls, > logAggregationContext)); > dispatcher.getEventHandler().handle( > new ApplicationInitEvent(applicationID, appAcls, > logAggregationContext)); > } > this.context.getNMStateStore().storeContainer(containerId, request); > dispatcher.getEventHandler().handle( > new ApplicationContainerInitEvent(container)); > > this.context.getContainerTokenSecretManager().startContainerSuccessful( > containerTokenIdentifier); > NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER, > "ContainerManageImpl", applicationID, containerId); > // TODO launchedContainer misplaced -> doesn't necessarily mean a > container > // launch. A finished Application will not launch containers. > metrics.launchedContainer(); > metrics.allocateContainer(containerTokenIdentifier.getResource()); > } else { > throw new YarnException( > "Container start failed as the NodeManager is " + > "in the process of shutting down"); > } > {code} > In addition, we are lack of localzationFailed metric in container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4399) FairScheduler allocated container should resetSchedulingOpportunities count of its priority
[ https://issues.apache.org/jira/browse/YARN-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052026#comment-15052026 ] Lin Yiqun commented on YARN-4399: - I test the failed unit tests and these error is not related. > FairScheduler allocated container should resetSchedulingOpportunities count > of its priority > --- > > Key: YARN-4399 > URL: https://issues.apache.org/jira/browse/YARN-4399 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4399.001.patch, YARN-4399.002.patch, > YARN-4399.003.patch > > > There is a bug on fairScheduler allocating containers when you configurate > the locality configs.When you attempt to assigned a container,it will invoke > {{FSAppAttempt#addSchedulingOpportunity}} whenever it can be assigned > successfully or not. And if you configurate the > yarn.scheduler.fair.locality.threshold.node and > yarn.scheduler.fair.locality.threshold.rack, the schedulingOpportunity value > will influence the locality of containers.Because if one container is > assigned successfully and its priority schedulingOpportunity count will be > increased, and second container will be increased again.This will may be let > their priority of allowedLocality degrade. And this will let this container > dealt by rackRequest. So I think in fairScheduler allocating container, if > the previous container was dealt, its priority of schedulerCount should be > reset to 0, and don't let its value influence container's allocating in next > iteration and this will increased the locality of containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4440) FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time
[ https://issues.apache.org/jira/browse/YARN-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4440: Attachment: YARN-4440.002.patch Resolve the remained error. > FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time > - > > Key: YARN-4440 > URL: https://issues.apache.org/jira/browse/YARN-4440 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4440.001.patch, YARN-4440.002.patch > > > It seems there is a bug on {{FSAppAttempt#getAllowedLocalityLevelByTime}} > method > {code} > // default level is NODE_LOCAL > if (! allowedLocalityLevel.containsKey(priority)) { > allowedLocalityLevel.put(priority, NodeType.NODE_LOCAL); > return NodeType.NODE_LOCAL; > } > {code} > If you first invoke this method, it doesn't init time in > lastScheduledContainer and this will lead to execute these code for next > invokation: > {code} > // check waiting time > long waitTime = currentTimeMs; > if (lastScheduledContainer.containsKey(priority)) { > waitTime -= lastScheduledContainer.get(priority); > } else { > waitTime -= getStartTime(); > } > {code} > the waitTime will subtract to FsApp startTime, and this will be easily more > than the delay time and allowedLocality degrade. Because FsApp startTime will > be start earlier than currentTimeMs. So we should add the initial time of > priority to prevent comparing with FsApp startTime and allowedLocalityLevel > degrade. And this problem will have more negative influence for small-jobs. > The YARN-4399 also discuss some problem in aspect of locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4440) FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time
Lin Yiqun created YARN-4440: --- Summary: FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time Key: YARN-4440 URL: https://issues.apache.org/jira/browse/YARN-4440 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.1 Reporter: Lin Yiqun Assignee: Lin Yiqun It seems there is a bug on {{FSAppAttempt#getAllowedLocalityLevelByTime}} method {code} // default level is NODE_LOCAL if (! allowedLocalityLevel.containsKey(priority)) { allowedLocalityLevel.put(priority, NodeType.NODE_LOCAL); return NodeType.NODE_LOCAL; } {code} If you first invoke this method, it doesn't init time in lastScheduledContainer and this will lead to execute these code for next invokation: {code} // check waiting time long waitTime = currentTimeMs; if (lastScheduledContainer.containsKey(priority)) { waitTime -= lastScheduledContainer.get(priority); } else { waitTime -= getStartTime(); } {code} the waitTime will subtract to FsApp startTime, and this will be easily more than the delay time and allowedLocality degrade. Because FsApp startTime will be start earlier than currentTimeMs. So we should add the initial time of priority to prevent comparing with FsApp startTime and allowedLocalityLevel degrade. And this problem will have more negative influence for small-jobs. The YARN-4399 also discuss some problem in aspect of locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4440) FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time
[ https://issues.apache.org/jira/browse/YARN-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4440: Attachment: YARN-4440.001.patch > FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time > - > > Key: YARN-4440 > URL: https://issues.apache.org/jira/browse/YARN-4440 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4440.001.patch > > > It seems there is a bug on {{FSAppAttempt#getAllowedLocalityLevelByTime}} > method > {code} > // default level is NODE_LOCAL > if (! allowedLocalityLevel.containsKey(priority)) { > allowedLocalityLevel.put(priority, NodeType.NODE_LOCAL); > return NodeType.NODE_LOCAL; > } > {code} > If you first invoke this method, it doesn't init time in > lastScheduledContainer and this will lead to execute these code for next > invokation: > {code} > // check waiting time > long waitTime = currentTimeMs; > if (lastScheduledContainer.containsKey(priority)) { > waitTime -= lastScheduledContainer.get(priority); > } else { > waitTime -= getStartTime(); > } > {code} > the waitTime will subtract to FsApp startTime, and this will be easily more > than the delay time and allowedLocality degrade. Because FsApp startTime will > be start earlier than currentTimeMs. So we should add the initial time of > priority to prevent comparing with FsApp startTime and allowedLocalityLevel > degrade. And this problem will have more negative influence for small-jobs. > The YARN-4399 also discuss some problem in aspect of locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4399) FairScheduler allocated container should resetSchedulingOpportunities count of its priority
[ https://issues.apache.org/jira/browse/YARN-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4399: Attachment: YARN-4399.002.patch Resolve the comile error. > FairScheduler allocated container should resetSchedulingOpportunities count > of its priority > --- > > Key: YARN-4399 > URL: https://issues.apache.org/jira/browse/YARN-4399 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4399.001.patch, YARN-4399.002.patch > > > There is a bug on fairScheduler allocating containers when you configurate > the locality configs.When you attempt to assigned a container,it will invoke > {{FSAppAttempt#addSchedulingOpportunity}} whenever it can be assigned > successfully or not. And if you configurate the > yarn.scheduler.fair.locality.threshold.node and > yarn.scheduler.fair.locality.threshold.rack, the schedulingOpportunity value > will influence the locality of containers.Because if one container is > assigned successfully and its priority schedulingOpportunity count will be > increased, and second container will be increased again.This will may be let > their priority of allowedLocality degrade. And this will let this container > dealt by rackRequest. So I think in fairScheduler allocating container, if > the previous container was dealt, its priority of schedulerCount should be > reset to 0, and don't let its value influence container's allocating in next > iteration and this will increased the locality of containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4399) FairScheduler allocated container should resetSchedulingOpportunities count of its priority
[ https://issues.apache.org/jira/browse/YARN-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4399: Attachment: YARN-4399.003.patch Resolve the remained compile error > FairScheduler allocated container should resetSchedulingOpportunities count > of its priority > --- > > Key: YARN-4399 > URL: https://issues.apache.org/jira/browse/YARN-4399 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4399.001.patch, YARN-4399.002.patch, > YARN-4399.003.patch > > > There is a bug on fairScheduler allocating containers when you configurate > the locality configs.When you attempt to assigned a container,it will invoke > {{FSAppAttempt#addSchedulingOpportunity}} whenever it can be assigned > successfully or not. And if you configurate the > yarn.scheduler.fair.locality.threshold.node and > yarn.scheduler.fair.locality.threshold.rack, the schedulingOpportunity value > will influence the locality of containers.Because if one container is > assigned successfully and its priority schedulingOpportunity count will be > increased, and second container will be increased again.This will may be let > their priority of allowedLocality degrade. And this will let this container > dealt by rackRequest. So I think in fairScheduler allocating container, if > the previous container was dealt, its priority of schedulerCount should be > reset to 0, and don't let its value influence container's allocating in next > iteration and this will increased the locality of containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container
[ https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4381: Attachment: YARN-4381.002.patch Thanks [~djp] for review. I update the container metrics more fine-grained. As you said that the container failed is not only because localizationFailed and is not suitable to add the metric on launchEvent. So I add the metric {{containerLaunchedSuccess}} when container is becoming to running state and seting the {{wasLaunched=true}}. Besides this, I add the another two metric2 for container-failed cases. * one is for containerFailedBeforeLaunched * other one is for containerKilledAfterLaunched And I think these metrics will help us to know more concretely of a container. > Add container launchEvent and container localizeFailed metrics in container > --- > > Key: YARN-4381 > URL: https://issues.apache.org/jira/browse/YARN-4381 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4381.001.patch, YARN-4381.002.patch > > > Recently, I found a issue on nodemanager metrics.That's > {{NodeManagerMetrics#containersLaunched}} is not actually means the container > succeed launched times.Because in some time, it will be failed when receiving > the killing command or happening container-localizationFailed.This will lead > to a failed container.But now,this counter value will be increased in these > code whenever the container is started successfully or failed. > {code} > Credentials credentials = parseCredentials(launchContext); > Container container = > new ContainerImpl(getConfig(), this.dispatcher, > context.getNMStateStore(), launchContext, > credentials, metrics, containerTokenIdentifier); > ApplicationId applicationID = > containerId.getApplicationAttemptId().getApplicationId(); > if (context.getContainers().putIfAbsent(containerId, container) != null) { > NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER, > "ContainerManagerImpl", "Container already running on this node!", > applicationID, containerId); > throw RPCUtil.getRemoteException("Container " + containerIdStr > + " already is running on this node!!"); > } > this.readLock.lock(); > try { > if (!serviceStopped) { > // Create the application > Application application = > new ApplicationImpl(dispatcher, user, applicationID, credentials, > context); > if (null == context.getApplications().putIfAbsent(applicationID, > application)) { > LOG.info("Creating a new application reference for app " + > applicationID); > LogAggregationContext logAggregationContext = > containerTokenIdentifier.getLogAggregationContext(); > MapappAcls = > container.getLaunchContext().getApplicationACLs(); > context.getNMStateStore().storeApplication(applicationID, > buildAppProto(applicationID, user, credentials, appAcls, > logAggregationContext)); > dispatcher.getEventHandler().handle( > new ApplicationInitEvent(applicationID, appAcls, > logAggregationContext)); > } > this.context.getNMStateStore().storeContainer(containerId, request); > dispatcher.getEventHandler().handle( > new ApplicationContainerInitEvent(container)); > > this.context.getContainerTokenSecretManager().startContainerSuccessful( > containerTokenIdentifier); > NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER, > "ContainerManageImpl", applicationID, containerId); > // TODO launchedContainer misplaced -> doesn't necessarily mean a > container > // launch. A finished Application will not launch containers. > metrics.launchedContainer(); > metrics.allocateContainer(containerTokenIdentifier.getResource()); > } else { > throw new YarnException( > "Container start failed as the NodeManager is " + > "in the process of shutting down"); > } > {code} > In addition, we are lack of localzationFailed metric in container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container
[ https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047742#comment-15047742 ] Lin Yiqun commented on YARN-4381: - [~djp], the jenkin report shows that checkstyle warnings is not need to modify and license warnings likes not related. Could you review my patch again? > Add container launchEvent and container localizeFailed metrics in container > --- > > Key: YARN-4381 > URL: https://issues.apache.org/jira/browse/YARN-4381 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4381.001.patch, YARN-4381.002.patch > > > Recently, I found a issue on nodemanager metrics.That's > {{NodeManagerMetrics#containersLaunched}} is not actually means the container > succeed launched times.Because in some time, it will be failed when receiving > the killing command or happening container-localizationFailed.This will lead > to a failed container.But now,this counter value will be increased in these > code whenever the container is started successfully or failed. > {code} > Credentials credentials = parseCredentials(launchContext); > Container container = > new ContainerImpl(getConfig(), this.dispatcher, > context.getNMStateStore(), launchContext, > credentials, metrics, containerTokenIdentifier); > ApplicationId applicationID = > containerId.getApplicationAttemptId().getApplicationId(); > if (context.getContainers().putIfAbsent(containerId, container) != null) { > NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER, > "ContainerManagerImpl", "Container already running on this node!", > applicationID, containerId); > throw RPCUtil.getRemoteException("Container " + containerIdStr > + " already is running on this node!!"); > } > this.readLock.lock(); > try { > if (!serviceStopped) { > // Create the application > Application application = > new ApplicationImpl(dispatcher, user, applicationID, credentials, > context); > if (null == context.getApplications().putIfAbsent(applicationID, > application)) { > LOG.info("Creating a new application reference for app " + > applicationID); > LogAggregationContext logAggregationContext = > containerTokenIdentifier.getLogAggregationContext(); > MapappAcls = > container.getLaunchContext().getApplicationACLs(); > context.getNMStateStore().storeApplication(applicationID, > buildAppProto(applicationID, user, credentials, appAcls, > logAggregationContext)); > dispatcher.getEventHandler().handle( > new ApplicationInitEvent(applicationID, appAcls, > logAggregationContext)); > } > this.context.getNMStateStore().storeContainer(containerId, request); > dispatcher.getEventHandler().handle( > new ApplicationContainerInitEvent(container)); > > this.context.getContainerTokenSecretManager().startContainerSuccessful( > containerTokenIdentifier); > NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER, > "ContainerManageImpl", applicationID, containerId); > // TODO launchedContainer misplaced -> doesn't necessarily mean a > container > // launch. A finished Application will not launch containers. > metrics.launchedContainer(); > metrics.allocateContainer(containerTokenIdentifier.getResource()); > } else { > throw new YarnException( > "Container start failed as the NodeManager is " + > "in the process of shutting down"); > } > {code} > In addition, we are lack of localzationFailed metric in container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container
[ https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044261#comment-15044261 ] Lin Yiqun commented on YARN-4381: - [~djp], do you have some time to review my patch or what else can I do for this jira ? > Add container launchEvent and container localizeFailed metrics in container > --- > > Key: YARN-4381 > URL: https://issues.apache.org/jira/browse/YARN-4381 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4381.001.patch > > > Recently, I found a issue on nodemanager metrics.That's > {{NodeManagerMetrics#containersLaunched}} is not actually means the container > succeed launched times.Because in some time, it will be failed when receiving > the killing command or happening container-localizationFailed.This will lead > to a failed container.But now,this counter value will be increased in these > code whenever the container is started successfully or failed. > {code} > Credentials credentials = parseCredentials(launchContext); > Container container = > new ContainerImpl(getConfig(), this.dispatcher, > context.getNMStateStore(), launchContext, > credentials, metrics, containerTokenIdentifier); > ApplicationId applicationID = > containerId.getApplicationAttemptId().getApplicationId(); > if (context.getContainers().putIfAbsent(containerId, container) != null) { > NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER, > "ContainerManagerImpl", "Container already running on this node!", > applicationID, containerId); > throw RPCUtil.getRemoteException("Container " + containerIdStr > + " already is running on this node!!"); > } > this.readLock.lock(); > try { > if (!serviceStopped) { > // Create the application > Application application = > new ApplicationImpl(dispatcher, user, applicationID, credentials, > context); > if (null == context.getApplications().putIfAbsent(applicationID, > application)) { > LOG.info("Creating a new application reference for app " + > applicationID); > LogAggregationContext logAggregationContext = > containerTokenIdentifier.getLogAggregationContext(); > MapappAcls = > container.getLaunchContext().getApplicationACLs(); > context.getNMStateStore().storeApplication(applicationID, > buildAppProto(applicationID, user, credentials, appAcls, > logAggregationContext)); > dispatcher.getEventHandler().handle( > new ApplicationInitEvent(applicationID, appAcls, > logAggregationContext)); > } > this.context.getNMStateStore().storeContainer(containerId, request); > dispatcher.getEventHandler().handle( > new ApplicationContainerInitEvent(container)); > > this.context.getContainerTokenSecretManager().startContainerSuccessful( > containerTokenIdentifier); > NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER, > "ContainerManageImpl", applicationID, containerId); > // TODO launchedContainer misplaced -> doesn't necessarily mean a > container > // launch. A finished Application will not launch containers. > metrics.launchedContainer(); > metrics.allocateContainer(containerTokenIdentifier.getResource()); > } else { > throw new YarnException( > "Container start failed as the NodeManager is " + > "in the process of shutting down"); > } > {code} > In addition, we are lack of localzationFailed metric in container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4399) FairScheduler allocated container should resetSchedulingOpportunities count of its priority
Lin Yiqun created YARN-4399: --- Summary: FairScheduler allocated container should resetSchedulingOpportunities count of its priority Key: YARN-4399 URL: https://issues.apache.org/jira/browse/YARN-4399 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.7.1 Reporter: Lin Yiqun Assignee: Lin Yiqun There is a bug on fairScheduler allocating containers when you configurate the locality configs.When you attempt to assigned a container,it will invoke {{FSAppAttempt#addSchedulingOpportunity}} whenever it can be assigned successfully or not. And if you configurate the yarn.scheduler.fair.locality.threshold.node and yarn.scheduler.fair.locality.threshold.rack, the schedulingOpportunity value will influence the locality of containers.Because if one container is assigned successfully and its priority schedulingOpportunity count will be increased, and second container will be increased again.This will may be let their priority of allowedLocality degrade. And this will let this container dealt by rackRequest. So I think in fairScheduler allocating container, if the previous container was dealt, its priority of schedulerCount should be reset to 0, and don't let its value influence container's allocating in next iteration and this will increased the locality of containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4399) FairScheduler allocated container should resetSchedulingOpportunities count of its priority
[ https://issues.apache.org/jira/browse/YARN-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4399: Attachment: YARN-4399.001.patch > FairScheduler allocated container should resetSchedulingOpportunities count > of its priority > --- > > Key: YARN-4399 > URL: https://issues.apache.org/jira/browse/YARN-4399 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4399.001.patch > > > There is a bug on fairScheduler allocating containers when you configurate > the locality configs.When you attempt to assigned a container,it will invoke > {{FSAppAttempt#addSchedulingOpportunity}} whenever it can be assigned > successfully or not. And if you configurate the > yarn.scheduler.fair.locality.threshold.node and > yarn.scheduler.fair.locality.threshold.rack, the schedulingOpportunity value > will influence the locality of containers.Because if one container is > assigned successfully and its priority schedulingOpportunity count will be > increased, and second container will be increased again.This will may be let > their priority of allowedLocality degrade. And this will let this container > dealt by rackRequest. So I think in fairScheduler allocating container, if > the previous container was dealt, its priority of schedulerCount should be > reset to 0, and don't let its value influence container's allocating in next > iteration and this will increased the locality of containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4396) Log the trance information on FSAppAttempt#assignContainer
[ https://issues.apache.org/jira/browse/YARN-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4396: Attachment: YARN-4396.004.patch Simplified the debug infos. > Log the trance information on FSAppAttempt#assignContainer > -- > > Key: YARN-4396 > URL: https://issues.apache.org/jira/browse/YARN-4396 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications, scheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4396.001.patch, YARN-4396.002.patch, > YARN-4396.003.patch, YARN-4396.004.patch > > > When I configure the yarn.scheduler.fair.locality.threshold.node and > yarn.scheduler.fair.locality.threshold.rack to open this function, I have no > detail info of assigning container's locality. And it's important because it > will lead some delay scheduling and will have an influence on my cluster. If > I know these info, I can adjust param in cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4396) Log the trance information on FSAppAttempt#assignContainer
[ https://issues.apache.org/jira/browse/YARN-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4396: Attachment: YARN-4396.003.patch Add the debug infos in {{getAllowedLocalityLevelByTime}} and {{getAllowedLocalityLevel}} methods.A sample info: * getAllowedLocalityLevelByTime: {code} 2015-11-27 09:54:21,553 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Waiting time is 32570 ms, more than nodeLocalityDelay time 1000 ms, change allowedLocality from NODE_LOCAL to RACK_LOCAL, priority:10, app attempt id:appattempt_1448589183973_0001_01 2015-11-27 09:54:21,553 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Assign container on qihe129040 node, requestType:OFF_SWITCH, allowedLocality:RACK_LOCAL, schedulingOpportunities:0, priority:10, app attempt id:appattempt_1448589183973_0001_01 {code} * getAllowedLocalityLevel: {code} 2015-11-27 09:42:47,132 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: SchedulingOpportunities count is 2, more than nodeLocalityThreshold num 0.899761581421, change allowedLocality from NODE_LOCAL to RACK_LOCAL, priority:10, app attempt id:appattempt_1448588357362_0001_01 2015-11-27 09:42:47,132 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Assign container on qihe129040 node, requestType:OFF_SWITCH, allowedLocality:RACK_LOCAL, schedulingOpportunities:0, priority:10, app attempt id:appattempt_1448588357362_0001_01 {code} > Log the trance information on FSAppAttempt#assignContainer > -- > > Key: YARN-4396 > URL: https://issues.apache.org/jira/browse/YARN-4396 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications, scheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4396.001.patch, YARN-4396.002.patch, > YARN-4396.003.patch > > > When I configure the yarn.scheduler.fair.locality.threshold.node and > yarn.scheduler.fair.locality.threshold.rack to open this function, I have no > detail info of assigning container's locality. And it's important because it > will lead some delay scheduling and will have an influence on my cluster. If > I know these info, I can adjust param in cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4396) Log the trance information on FSAppAttempt#assignContainer
Lin Yiqun created YARN-4396: --- Summary: Log the trance information on FSAppAttempt#assignContainer Key: YARN-4396 URL: https://issues.apache.org/jira/browse/YARN-4396 Project: Hadoop YARN Issue Type: Improvement Components: applications, scheduler Affects Versions: 2.7.1 Reporter: Lin Yiqun Assignee: Lin Yiqun When I configure the yarn.scheduler.fair.locality.threshold.node and yarn.scheduler.fair.locality.threshold.rack to open this function, I have no detail info of assigning container's locality. And it's important because it will lead some delay scheduling and will have an influence on my cluster. If I know these info, I can adjust param in cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4396) Log the trance information on FSAppAttempt#assignContainer
[ https://issues.apache.org/jira/browse/YARN-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4396: Attachment: YARN-4396.001.patch > Log the trance information on FSAppAttempt#assignContainer > -- > > Key: YARN-4396 > URL: https://issues.apache.org/jira/browse/YARN-4396 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications, scheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4396.001.patch > > > When I configure the yarn.scheduler.fair.locality.threshold.node and > yarn.scheduler.fair.locality.threshold.rack to open this function, I have no > detail info of assigning container's locality. And it's important because it > will lead some delay scheduling and will have an influence on my cluster. If > I know these info, I can adjust param in cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4396) Log the trance information on FSAppAttempt#assignContainer
[ https://issues.apache.org/jira/browse/YARN-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4396: Attachment: YARN-4396.002.patch Modify some debug infos. > Log the trance information on FSAppAttempt#assignContainer > -- > > Key: YARN-4396 > URL: https://issues.apache.org/jira/browse/YARN-4396 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications, scheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4396.001.patch, YARN-4396.002.patch > > > When I configure the yarn.scheduler.fair.locality.threshold.node and > yarn.scheduler.fair.locality.threshold.rack to open this function, I have no > detail info of assigning container's locality. And it's important because it > will lead some delay scheduling and will have an influence on my cluster. If > I know these info, I can adjust param in cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container
[ https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023415#comment-15023415 ] Lin Yiqun commented on YARN-4381: - Thanks [~djp]! > Add container launchEvent and container localizeFailed metrics in container > --- > > Key: YARN-4381 > URL: https://issues.apache.org/jira/browse/YARN-4381 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4381.001.patch > > > Recently, I found a issue on nodemanager metrics.That's > {{NodeManagerMetrics#containersLaunched}} is not actually means the container > succeed launched times.Because in some time, it will be failed when receiving > the killing command or happening container-localizationFailed.This will lead > to a failed container.But now,this counter value will be increased in these > code whenever the container is started successfully or failed. > {code} > Credentials credentials = parseCredentials(launchContext); > Container container = > new ContainerImpl(getConfig(), this.dispatcher, > context.getNMStateStore(), launchContext, > credentials, metrics, containerTokenIdentifier); > ApplicationId applicationID = > containerId.getApplicationAttemptId().getApplicationId(); > if (context.getContainers().putIfAbsent(containerId, container) != null) { > NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER, > "ContainerManagerImpl", "Container already running on this node!", > applicationID, containerId); > throw RPCUtil.getRemoteException("Container " + containerIdStr > + " already is running on this node!!"); > } > this.readLock.lock(); > try { > if (!serviceStopped) { > // Create the application > Application application = > new ApplicationImpl(dispatcher, user, applicationID, credentials, > context); > if (null == context.getApplications().putIfAbsent(applicationID, > application)) { > LOG.info("Creating a new application reference for app " + > applicationID); > LogAggregationContext logAggregationContext = > containerTokenIdentifier.getLogAggregationContext(); > MapappAcls = > container.getLaunchContext().getApplicationACLs(); > context.getNMStateStore().storeApplication(applicationID, > buildAppProto(applicationID, user, credentials, appAcls, > logAggregationContext)); > dispatcher.getEventHandler().handle( > new ApplicationInitEvent(applicationID, appAcls, > logAggregationContext)); > } > this.context.getNMStateStore().storeContainer(containerId, request); > dispatcher.getEventHandler().handle( > new ApplicationContainerInitEvent(container)); > > this.context.getContainerTokenSecretManager().startContainerSuccessful( > containerTokenIdentifier); > NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER, > "ContainerManageImpl", applicationID, containerId); > // TODO launchedContainer misplaced -> doesn't necessarily mean a > container > // launch. A finished Application will not launch containers. > metrics.launchedContainer(); > metrics.allocateContainer(containerTokenIdentifier.getResource()); > } else { > throw new YarnException( > "Container start failed as the NodeManager is " + > "in the process of shutting down"); > } > {code} > In addition, we are lack of localzationFailed metric in container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container
[ https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4381: Attachment: YARN-4381.001.patch I attach a init patch and add two new metrics in {{NodeManagerMetrics}} * containerLocalizeFailed * containersLaunchEventOperation > Add container launchEvent and container localizeFailed metrics in container > --- > > Key: YARN-4381 > URL: https://issues.apache.org/jira/browse/YARN-4381 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Lin Yiqun > Attachments: YARN-4381.001.patch > > > Recently, I found a issue on nodemanager metrics.That's > {{NodeManagerMetrics#containersLaunched}} is not actually means the container > succeed launched times.Because in some time, it will be failed when receiving > the killing command or happening container-localizationFailed.This will lead > to a failed container.But now,this counter value will be increased in these > code whenever the container is started successfully or failed. > {code} > Credentials credentials = parseCredentials(launchContext); > Container container = > new ContainerImpl(getConfig(), this.dispatcher, > context.getNMStateStore(), launchContext, > credentials, metrics, containerTokenIdentifier); > ApplicationId applicationID = > containerId.getApplicationAttemptId().getApplicationId(); > if (context.getContainers().putIfAbsent(containerId, container) != null) { > NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER, > "ContainerManagerImpl", "Container already running on this node!", > applicationID, containerId); > throw RPCUtil.getRemoteException("Container " + containerIdStr > + " already is running on this node!!"); > } > this.readLock.lock(); > try { > if (!serviceStopped) { > // Create the application > Application application = > new ApplicationImpl(dispatcher, user, applicationID, credentials, > context); > if (null == context.getApplications().putIfAbsent(applicationID, > application)) { > LOG.info("Creating a new application reference for app " + > applicationID); > LogAggregationContext logAggregationContext = > containerTokenIdentifier.getLogAggregationContext(); > MapappAcls = > container.getLaunchContext().getApplicationACLs(); > context.getNMStateStore().storeApplication(applicationID, > buildAppProto(applicationID, user, credentials, appAcls, > logAggregationContext)); > dispatcher.getEventHandler().handle( > new ApplicationInitEvent(applicationID, appAcls, > logAggregationContext)); > } > this.context.getNMStateStore().storeContainer(containerId, request); > dispatcher.getEventHandler().handle( > new ApplicationContainerInitEvent(container)); > > this.context.getContainerTokenSecretManager().startContainerSuccessful( > containerTokenIdentifier); > NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER, > "ContainerManageImpl", applicationID, containerId); > // TODO launchedContainer misplaced -> doesn't necessarily mean a > container > // launch. A finished Application will not launch containers. > metrics.launchedContainer(); > metrics.allocateContainer(containerTokenIdentifier.getResource()); > } else { > throw new YarnException( > "Container start failed as the NodeManager is " + > "in the process of shutting down"); > } > {code} > In addition, we are lack of localzationFailed metric in container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container
Lin Yiqun created YARN-4381: --- Summary: Add container launchEvent and container localizeFailed metrics in container Key: YARN-4381 URL: https://issues.apache.org/jira/browse/YARN-4381 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.7.1 Reporter: Lin Yiqun Recently, I found a issue on nodemanager metrics.That's {{NodeManagerMetrics#containersLaunched}} is not actually means the container succeed launched times.Because in some time, it will be failed when receiving the killing command or happening container-localizationFailed.This will lead to a failed container.But now,this counter value will be increased in these code whenever the container is started successfully or failed. {code} Credentials credentials = parseCredentials(launchContext); Container container = new ContainerImpl(getConfig(), this.dispatcher, context.getNMStateStore(), launchContext, credentials, metrics, containerTokenIdentifier); ApplicationId applicationID = containerId.getApplicationAttemptId().getApplicationId(); if (context.getContainers().putIfAbsent(containerId, container) != null) { NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER, "ContainerManagerImpl", "Container already running on this node!", applicationID, containerId); throw RPCUtil.getRemoteException("Container " + containerIdStr + " already is running on this node!!"); } this.readLock.lock(); try { if (!serviceStopped) { // Create the application Application application = new ApplicationImpl(dispatcher, user, applicationID, credentials, context); if (null == context.getApplications().putIfAbsent(applicationID, application)) { LOG.info("Creating a new application reference for app " + applicationID); LogAggregationContext logAggregationContext = containerTokenIdentifier.getLogAggregationContext(); MapappAcls = container.getLaunchContext().getApplicationACLs(); context.getNMStateStore().storeApplication(applicationID, buildAppProto(applicationID, user, credentials, appAcls, logAggregationContext)); dispatcher.getEventHandler().handle( new ApplicationInitEvent(applicationID, appAcls, logAggregationContext)); } this.context.getNMStateStore().storeContainer(containerId, request); dispatcher.getEventHandler().handle( new ApplicationContainerInitEvent(container)); this.context.getContainerTokenSecretManager().startContainerSuccessful( containerTokenIdentifier); NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER, "ContainerManageImpl", applicationID, containerId); // TODO launchedContainer misplaced -> doesn't necessarily mean a container // launch. A finished Application will not launch containers. metrics.launchedContainer(); metrics.allocateContainer(containerTokenIdentifier.getResource()); } else { throw new YarnException( "Container start failed as the NodeManager is " + "in the process of shutting down"); } {code} In addition, we are lack of localzationFailed metric in container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)