[jira] [Commented] (YARN-4710) Reduce logging application reserved debug info in FSAppAttempt#assignContainer

2016-02-21 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155970#comment-15155970
 ] 

Lin Yiqun commented on YARN-4710:
-

In my opinion, this reserved record can be printed when successfully assigning 
container and printing other concrete info.

> Reduce logging application reserved debug info in FSAppAttempt#assignContainer
> --
>
> Key: YARN-4710
> URL: https://issues.apache.org/jira/browse/YARN-4710
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Minor
> Attachments: YARN-4710.001.patch, yarn-debug.log
>
>
> I found lots of unimportant records info in assigning container when I 
> prepared to debug the problem of container assigning.There are too many 
> records like this in yarn-resourcemanager.log, and it's difficiult for me to 
> directly to found the important info.
> {code}
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:52,971 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:52,976 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:52,981 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:52,986 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:52,991 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:52,996 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,001 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,007 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,012 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,017 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,022 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,027 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,032 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,038 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,050 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,057 DEBUG
> {code}
> The reason why of so many records is that it always print this info first in 
> container assigning whether the assigned result is successful or failed.
> Can see the complete yarn log in updated log, and you can see how many 
> records there are.
> And in addition, too many these info logging will slow down process of 
> container assigning.Maybe we should change this logLevel to other level, like 
> {{trace}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4710) Reduce logging application reserved debug info in FSAppAttempt#assignContainer

2016-02-21 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4710:

Attachment: YARN-4710.001.patch

Attach a initial patch. The patch change this logLevel from {{DEBUG}} to 
{{TRACE}}, kindly review, thanks.

> Reduce logging application reserved debug info in FSAppAttempt#assignContainer
> --
>
> Key: YARN-4710
> URL: https://issues.apache.org/jira/browse/YARN-4710
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Minor
> Attachments: YARN-4710.001.patch, yarn-debug.log
>
>
> I found lots of unimportant records info in assigning container when I 
> prepared to debug the problem of container assigning.There are too many 
> records like this in yarn-resourcemanager.log, and it's difficiult for me to 
> directly to found the important info.
> {code}
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:52,971 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:52,976 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:52,981 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:52,986 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:52,991 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:52,996 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,001 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,007 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,012 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,017 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,022 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,027 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,032 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,038 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,050 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,057 DEBUG
> {code}
> The reason why of so many records is that it always print this info first in 
> container assigning whether the assigned result is successful or failed.
> Can see the complete yarn log in updated log, and you can see how many 
> records there are.
> And in addition, too many these info logging will slow down process of 
> container assigning.Maybe we should change this logLevel to other level, like 
> {{trace}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4710) Reduce logging application reserved debug info in FSAppAttempt#assignContainer

2016-02-21 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4710:

Attachment: yarn-debug.log

> Reduce logging application reserved debug info in FSAppAttempt#assignContainer
> --
>
> Key: YARN-4710
> URL: https://issues.apache.org/jira/browse/YARN-4710
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Minor
> Attachments: yarn-debug.log
>
>
> I found lots of unimportant records info in assigning container when I 
> prepared to debug the problem of container assigning.There are too many 
> records like this in yarn-resourcemanager.log, and it's difficiult for me to 
> directly to found the important info.
> {code}
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:52,971 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:52,976 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:52,981 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:52,986 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:52,991 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:52,996 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,001 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,007 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,012 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,017 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,022 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,027 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,032 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,038 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,050 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Node offered to app: application_1449458968698_0011 reserved: false
> 2016-02-21 16:31:53,057 DEBUG
> {code}
> The reason why of so many records is that it always print this info first in 
> container assigning whether the assigned result is successful or failed.
> Can see the complete yarn log in updated log, and you can see how many 
> records there are.
> And in addition, too many these info logging will slow down process of 
> container assigning.Maybe we should change this logLevel to other level, like 
> {{trace}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4710) Reduce logging application reserved debug info in FSAppAttempt#assignContainer

2016-02-21 Thread Lin Yiqun (JIRA)
Lin Yiqun created YARN-4710:
---

 Summary: Reduce logging application reserved debug info in 
FSAppAttempt#assignContainer
 Key: YARN-4710
 URL: https://issues.apache.org/jira/browse/YARN-4710
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.7.1
Reporter: Lin Yiqun
Assignee: Lin Yiqun
Priority: Minor


I found lots of unimportant records info in assigning container when I prepared 
to debug the problem of container assigning.There are too many records like 
this in yarn-resourcemanager.log, and it's difficiult for me to directly to 
found the important info.
{code}
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node 
offered to app: application_1449458968698_0011 reserved: false
2016-02-21 16:31:52,971 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node 
offered to app: application_1449458968698_0011 reserved: false
2016-02-21 16:31:52,976 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node 
offered to app: application_1449458968698_0011 reserved: false
2016-02-21 16:31:52,981 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node 
offered to app: application_1449458968698_0011 reserved: false
2016-02-21 16:31:52,986 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node 
offered to app: application_1449458968698_0011 reserved: false
2016-02-21 16:31:52,991 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node 
offered to app: application_1449458968698_0011 reserved: false
2016-02-21 16:31:52,996 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node 
offered to app: application_1449458968698_0011 reserved: false
2016-02-21 16:31:53,001 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node 
offered to app: application_1449458968698_0011 reserved: false
2016-02-21 16:31:53,007 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node 
offered to app: application_1449458968698_0011 reserved: false
2016-02-21 16:31:53,012 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node 
offered to app: application_1449458968698_0011 reserved: false
2016-02-21 16:31:53,017 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node 
offered to app: application_1449458968698_0011 reserved: false
2016-02-21 16:31:53,022 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node 
offered to app: application_1449458968698_0011 reserved: false
2016-02-21 16:31:53,027 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node 
offered to app: application_1449458968698_0011 reserved: false
2016-02-21 16:31:53,032 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node 
offered to app: application_1449458968698_0011 reserved: false
2016-02-21 16:31:53,038 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node 
offered to app: application_1449458968698_0011 reserved: false
2016-02-21 16:31:53,050 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Node 
offered to app: application_1449458968698_0011 reserved: false
2016-02-21 16:31:53,057 DEBUG
{code}
The reason why of so many records is that it always print this info first in 
container assigning whether the assigned result is successful or failed.
Can see the complete yarn log in updated log, and you can see how many records 
there are.
And in addition, too many these info logging will slow down process of 
container assigning.Maybe we should change this logLevel to other level, like 
{{trace}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4668) Reuse objectMapper instance in Yarn

2016-02-02 Thread Lin Yiqun (JIRA)
Lin Yiqun created YARN-4668:
---

 Summary: Reuse objectMapper instance in Yarn
 Key: YARN-4668
 URL: https://issues.apache.org/jira/browse/YARN-4668
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.7.1
Reporter: Lin Yiqun
Assignee: Lin Yiqun






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4668) Reuse objectMapper instance in Yarn

2016-02-02 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4668:

Attachment: YARN.001.patch

> Reuse objectMapper instance in Yarn
> ---
>
> Key: YARN-4668
> URL: https://issues.apache.org/jira/browse/YARN-4668
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4668) Reuse objectMapper instance in Yarn

2016-02-02 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4668:

Attachment: (was: YARN.001.patch)

> Reuse objectMapper instance in Yarn
> ---
>
> Key: YARN-4668
> URL: https://issues.apache.org/jira/browse/YARN-4668
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4668) Reuse objectMapper instance in Yarn

2016-02-02 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4668:

Attachment: YARN-4668.001.patch

> Reuse objectMapper instance in Yarn
> ---
>
> Key: YARN-4668
> URL: https://issues.apache.org/jira/browse/YARN-4668
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4668.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4668) Reuse objectMapper instance in Yarn

2016-02-02 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4668:

Description: This jira is similar to MAPREDUCE-6626, we can see detail info 
about this problem.

> Reuse objectMapper instance in Yarn
> ---
>
> Key: YARN-4668
> URL: https://issues.apache.org/jira/browse/YARN-4668
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4668.001.patch
>
>
> This jira is similar to MAPREDUCE-6626, we can see detail info about this 
> problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4668) Reuse objectMapper instance in Yarn

2016-02-02 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4668:

Attachment: YARN-4668.002.patch

Fix checkstyle errors.

> Reuse objectMapper instance in Yarn
> ---
>
> Key: YARN-4668
> URL: https://issues.apache.org/jira/browse/YARN-4668
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4668.001.patch, YARN-4668.002.patch
>
>
> This jira is similar to MAPREDUCE-6626, we can see detail info about this 
> problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4534) Remove the redundant symbol in yarn rmadmin help msg

2016-01-14 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097936#comment-15097936
 ] 

Lin Yiqun commented on YARN-4534:
-

Thanks [~ajisakaa] for review and commit!

> Remove the redundant symbol in yarn rmadmin help msg
> 
>
> Key: YARN-4534
> URL: https://issues.apache.org/jira/browse/YARN-4534
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: YARN-4534.001.patch
>
>
> In rmadmin help, there is a redundant symbol ']' in 
> {{-directlyAccessNodeLabelStore}} command. The msg is following:
> {code}
> bin/yarn rmadmin -help
> rmadmin is the command to execute YARN administrative commands.
> The full syntax is: 
> yarn rmadmin [-refreshQueues] [-refreshNodes] 
> [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] 
> [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] 
> [[-addToClusterNodeLabels [label1,label2,label3]] 
> [-removeFromClusterNodeLabels [label1,label2,label3]] [-replaceLabelsOnNode 
> [node1[:port]=label1,label2 node2[:port]=label1] 
> [-directlyAccessNodeLabelStore]] [-help [cmd]]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4534) Remove the redundant symbol in yarn rmadmin help msg

2016-01-06 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15087013#comment-15087013
 ] 

Lin Yiqun commented on YARN-4534:
-

The failed test seems not related. Kindly review.

> Remove the redundant symbol in yarn rmadmin help msg
> 
>
> Key: YARN-4534
> URL: https://issues.apache.org/jira/browse/YARN-4534
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4534.001.patch
>
>
> In rmadmin help, there is a redundant symbol ']' in 
> {{-directlyAccessNodeLabelStore}} command. The msg is following:
> {code}
> bin/yarn rmadmin -help
> rmadmin is the command to execute YARN administrative commands.
> The full syntax is: 
> yarn rmadmin [-refreshQueues] [-refreshNodes] 
> [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] 
> [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] 
> [[-addToClusterNodeLabels [label1,label2,label3]] 
> [-removeFromClusterNodeLabels [label1,label2,label3]] [-replaceLabelsOnNode 
> [node1[:port]=label1,label2 node2[:port]=label1] 
> [-directlyAccessNodeLabelStore]] [-help [cmd]]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4533) Killing applications by user in Yarn RMAdmin CLI

2016-01-03 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4533:

Attachment: YARN-4533.001.patch

> Killing applications by user in Yarn RMAdmin CLI
> 
>
> Key: YARN-4533
> URL: https://issues.apache.org/jira/browse/YARN-4533
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, client
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4533.001.patch
>
>
> The cmd likes
> {code}
> [-killApplicationsForUser [username]] Kill the applications of specific user.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4533) Killing applications by user in Yarn RMAdmin CLI

2016-01-03 Thread Lin Yiqun (JIRA)
Lin Yiqun created YARN-4533:
---

 Summary: Killing applications by user in Yarn RMAdmin CLI
 Key: YARN-4533
 URL: https://issues.apache.org/jira/browse/YARN-4533
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Lin Yiqun
Assignee: Lin Yiqun


The cmd likes
{code}
[-killApplicationsForUser [username]] Kill the applications of specific user.
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4532) Killing applications by appStates and queue in Yarn Application CLI

2016-01-03 Thread Lin Yiqun (JIRA)
Lin Yiqun created YARN-4532:
---

 Summary: Killing applications by appStates and queue in Yarn 
Application CLI
 Key: YARN-4532
 URL: https://issues.apache.org/jira/browse/YARN-4532
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Lin Yiqun
Assignee: Lin Yiqun






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4534) Remove the redundant symbol in yarn rmadmin help msg

2016-01-03 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4534:

Attachment: YARN-4534.001.patch

> Remove the redundant symbol in yarn rmadmin help msg
> 
>
> Key: YARN-4534
> URL: https://issues.apache.org/jira/browse/YARN-4534
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4534.001.patch
>
>
> In rmadmin help, there is a redundant symbol ']' in 
> {{-directlyAccessNodeLabelStore}} command. The msg is following:
> {code}
> bin/yarn rmadmin -help
> rmadmin is the command to execute YARN administrative commands.
> The full syntax is: 
> yarn rmadmin [-refreshQueues] [-refreshNodes] 
> [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] 
> [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] 
> [[-addToClusterNodeLabels [label1,label2,label3]] 
> [-removeFromClusterNodeLabels [label1,label2,label3]] [-replaceLabelsOnNode 
> [node1[:port]=label1,label2 node2[:port]=label1] 
> [-directlyAccessNodeLabelStore]] [-help [cmd]]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4532) Killing applications by appStates and queue in Yarn Application CLI

2016-01-03 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4532:

Description: 
The cmd likes 
{code}
-killByAppStates  The states of application that will be killed.
-killOfQueue  Kill the applications of specific queue.
{code}

> Killing applications by appStates and queue in Yarn Application CLI
> ---
>
> Key: YARN-4532
> URL: https://issues.apache.org/jira/browse/YARN-4532
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, client
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4532.001.patch
>
>
> The cmd likes 
> {code}
> -killByAppStates  The states of application that will be killed.
> -killOfQueue  Kill the applications of specific queue.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4532) Killing applications by appStates and queue in Yarn Application CLI

2016-01-03 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4532:

Attachment: YARN-4532.001.patch

> Killing applications by appStates and queue in Yarn Application CLI
> ---
>
> Key: YARN-4532
> URL: https://issues.apache.org/jira/browse/YARN-4532
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, client
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4532.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4529) Yarn CLI killing applications in batch

2016-01-03 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15080645#comment-15080645
 ] 

Lin Yiqun commented on YARN-4529:
-

Thanks [~Naganarasimha] for comments. I update the patch as your querys and 
what else should I do with this patch?

> Yarn CLI killing applications in batch
> --
>
> Key: YARN-4529
> URL: https://issues.apache.org/jira/browse/YARN-4529
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications, client
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4529.001.patch
>
>
> We have not a good way to kill applications conveniently when starting some 
> apps unexpected. At present, we have to kill them one by one. We can add some 
> kill command that can kill apps in batch, like these:
> {code}
> -killByAppStatesThe states of application that will be killed.
> -killByUser  Kill running-state applications of specific 
> user.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4529) Yarn CLI killing applications in batch

2016-01-03 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4529:

Issue Type: New Feature  (was: Improvement)

> Yarn CLI killing applications in batch
> --
>
> Key: YARN-4529
> URL: https://issues.apache.org/jira/browse/YARN-4529
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: applications, client
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4529.001.patch
>
>
> We have not a good way to kill applications conveniently when starting some 
> apps unexpected. At present, we have to kill them one by one. We can add some 
> kill command that can kill apps in batch, like these:
> {code}
> -killByAppStatesThe states of application that will be killed.
> -killByUser  Kill running-state applications of specific 
> user.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4534) Remove the redundant symbol in yarn rmadmin help msg

2016-01-03 Thread Lin Yiqun (JIRA)
Lin Yiqun created YARN-4534:
---

 Summary: Remove the redundant symbol in yarn rmadmin help msg
 Key: YARN-4534
 URL: https://issues.apache.org/jira/browse/YARN-4534
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.7.1
Reporter: Lin Yiqun
Assignee: Lin Yiqun


In rmadmin help, there is a redundant symbol ']' in 
{{-directlyAccessNodeLabelStore}} command. The msg is following:
{code}
bin/yarn rmadmin -help
rmadmin is the command to execute YARN administrative commands.
The full syntax is: 

yarn rmadmin [-refreshQueues] [-refreshNodes] 
[-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] 
[-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] 
[[-addToClusterNodeLabels [label1,label2,label3]] [-removeFromClusterNodeLabels 
[label1,label2,label3]] [-replaceLabelsOnNode [node1[:port]=label1,label2 
node2[:port]=label1] [-directlyAccessNodeLabelStore]] [-help [cmd]]
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4529) Yarn CLI killing applications in batch

2015-12-30 Thread Lin Yiqun (JIRA)
Lin Yiqun created YARN-4529:
---

 Summary: Yarn CLI killing applications in batch
 Key: YARN-4529
 URL: https://issues.apache.org/jira/browse/YARN-4529
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications, client
Affects Versions: 2.7.1
Reporter: Lin Yiqun
Assignee: Lin Yiqun


We have not a good way to kill applications conveniently when starting some 
apps unexpected. At present, we have to kill them one by one. We can add some 
kill command that can kill apps in batch, like these:
{code}
-killByAppStatesThe states of application that will be killed.
-killByUser  Kill running-state applications of specific 
user.
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4529) Yarn CLI killing applications in batch

2015-12-30 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4529:

Attachment: YARN-4529.001.patch

> Yarn CLI killing applications in batch
> --
>
> Key: YARN-4529
> URL: https://issues.apache.org/jira/browse/YARN-4529
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications, client
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4529.001.patch
>
>
> We have not a good way to kill applications conveniently when starting some 
> apps unexpected. At present, we have to kill them one by one. We can add some 
> kill command that can kill apps in batch, like these:
> {code}
> -killByAppStatesThe states of application that will be killed.
> -killByUser  Kill running-state applications of specific 
> user.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4440) FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time

2015-12-15 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059268#comment-15059268
 ] 

Lin Yiqun commented on YARN-4440:
-

Thanks [~zxu] for review and commit.

> FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time
> -
>
> Key: YARN-4440
> URL: https://issues.apache.org/jira/browse/YARN-4440
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Fix For: 2.8.0
>
> Attachments: YARN-4440.001.patch, YARN-4440.002.patch, 
> YARN-4440.003.patch
>
>
> It seems there is a bug on {{FSAppAttempt#getAllowedLocalityLevelByTime}} 
> method
> {code}
> // default level is NODE_LOCAL
> if (! allowedLocalityLevel.containsKey(priority)) {
>   allowedLocalityLevel.put(priority, NodeType.NODE_LOCAL);
>   return NodeType.NODE_LOCAL;
> }
> {code}
> If you first invoke this method, it doesn't init  time in 
> lastScheduledContainer and this will lead to execute these code for next 
> invokation:
> {code}
> // check waiting time
> long waitTime = currentTimeMs;
> if (lastScheduledContainer.containsKey(priority)) {
>   waitTime -= lastScheduledContainer.get(priority);
> } else {
>   waitTime -= getStartTime();
> }
> {code}
> the waitTime will subtract to FsApp startTime, and this will be easily more 
> than the delay time and allowedLocality degrade. Because FsApp startTime will 
> be start earlier than currentTimeMs. So we should add the initial time of 
> priority to prevent comparing with FsApp startTime and allowedLocalityLevel 
> degrade. And this problem will have more negative influence for small-jobs. 
> The YARN-4399 also discuss some problem in aspect of locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4381) Optimize container metrics in NodeManagerMetrics

2015-12-12 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4381:

Attachment: YARN-4381.003.patch

Update the patch and modiy the checkstyle warnings.

> Optimize container metrics in NodeManagerMetrics
> 
>
> Key: YARN-4381
> URL: https://issues.apache.org/jira/browse/YARN-4381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4381.001.patch, YARN-4381.002.patch, 
> YARN-4381.003.patch
>
>
> Recently, I found a issue on nodemanager metrics.That's 
> {{NodeManagerMetrics#containersLaunched}} is not actually means the container 
> succeed launched times.Because in some time, it will be failed when receiving 
> the killing command or happening container-localizationFailed.This will lead 
> to a failed container.But now,this counter value will be increased in these 
> code whenever the container is started successfully or failed.
> {code}
> Credentials credentials = parseCredentials(launchContext);
> Container container =
> new ContainerImpl(getConfig(), this.dispatcher,
> context.getNMStateStore(), launchContext,
>   credentials, metrics, containerTokenIdentifier);
> ApplicationId applicationID =
> containerId.getApplicationAttemptId().getApplicationId();
> if (context.getContainers().putIfAbsent(containerId, container) != null) {
>   NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER,
> "ContainerManagerImpl", "Container already running on this node!",
> applicationID, containerId);
>   throw RPCUtil.getRemoteException("Container " + containerIdStr
>   + " already is running on this node!!");
> }
> this.readLock.lock();
> try {
>   if (!serviceStopped) {
> // Create the application
> Application application =
> new ApplicationImpl(dispatcher, user, applicationID, credentials, 
> context);
> if (null == context.getApplications().putIfAbsent(applicationID,
>   application)) {
>   LOG.info("Creating a new application reference for app " + 
> applicationID);
>   LogAggregationContext logAggregationContext =
>   containerTokenIdentifier.getLogAggregationContext();
>   Map appAcls =
>   container.getLaunchContext().getApplicationACLs();
>   context.getNMStateStore().storeApplication(applicationID,
>   buildAppProto(applicationID, user, credentials, appAcls,
> logAggregationContext));
>   dispatcher.getEventHandler().handle(
> new ApplicationInitEvent(applicationID, appAcls,
>   logAggregationContext));
> }
> this.context.getNMStateStore().storeContainer(containerId, request);
> dispatcher.getEventHandler().handle(
>   new ApplicationContainerInitEvent(container));
> 
> this.context.getContainerTokenSecretManager().startContainerSuccessful(
>   containerTokenIdentifier);
> NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
>   "ContainerManageImpl", applicationID, containerId);
> // TODO launchedContainer misplaced -> doesn't necessarily mean a 
> container
> // launch. A finished Application will not launch containers.
> metrics.launchedContainer();
> metrics.allocateContainer(containerTokenIdentifier.getResource());
>   } else {
> throw new YarnException(
> "Container start failed as the NodeManager is " +
> "in the process of shutting down");
>   }
> {code}
> In addition, we are lack of localzationFailed metric in container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4440) FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time

2015-12-12 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4440:

Attachment: YARN-4440.003.patch

Modify the error.

> FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time
> -
>
> Key: YARN-4440
> URL: https://issues.apache.org/jira/browse/YARN-4440
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4440.001.patch, YARN-4440.002.patch, 
> YARN-4440.003.patch
>
>
> It seems there is a bug on {{FSAppAttempt#getAllowedLocalityLevelByTime}} 
> method
> {code}
> // default level is NODE_LOCAL
> if (! allowedLocalityLevel.containsKey(priority)) {
>   allowedLocalityLevel.put(priority, NodeType.NODE_LOCAL);
>   return NodeType.NODE_LOCAL;
> }
> {code}
> If you first invoke this method, it doesn't init  time in 
> lastScheduledContainer and this will lead to execute these code for next 
> invokation:
> {code}
> // check waiting time
> long waitTime = currentTimeMs;
> if (lastScheduledContainer.containsKey(priority)) {
>   waitTime -= lastScheduledContainer.get(priority);
> } else {
>   waitTime -= getStartTime();
> }
> {code}
> the waitTime will subtract to FsApp startTime, and this will be easily more 
> than the delay time and allowedLocality degrade. Because FsApp startTime will 
> be start earlier than currentTimeMs. So we should add the initial time of 
> priority to prevent comparing with FsApp startTime and allowedLocalityLevel 
> degrade. And this problem will have more negative influence for small-jobs. 
> The YARN-4399 also discuss some problem in aspect of locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4396) Log the trance information on FSAppAttempt#assignContainer

2015-12-11 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052552#comment-15052552
 ] 

Lin Yiqun commented on YARN-4396:
-

This patch can shows detail infos of allocating container. The YARN-4399 and 
YARN-4440 can use this info to test. And the jekins failed test is not related. 
Thanks reviewing!

> Log the trance information on FSAppAttempt#assignContainer
> --
>
> Key: YARN-4396
> URL: https://issues.apache.org/jira/browse/YARN-4396
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications, scheduler
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4396.001.patch, YARN-4396.002.patch, 
> YARN-4396.003.patch, YARN-4396.004.patch
>
>
> When I configure the yarn.scheduler.fair.locality.threshold.node and 
> yarn.scheduler.fair.locality.threshold.rack to open this function, I have no 
> detail info of assigning container's locality. And it's important because it 
> will lead some delay scheduling and will have an influence on my cluster. If 
> I know these info, I can adjust param in cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4381) Optimize container metrics in NodeManagerMetrics

2015-12-11 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4381:

Summary: Optimize container metrics in NodeManagerMetrics  (was: Add 
container launchEvent and container localizeFailed metrics in container)

> Optimize container metrics in NodeManagerMetrics
> 
>
> Key: YARN-4381
> URL: https://issues.apache.org/jira/browse/YARN-4381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4381.001.patch, YARN-4381.002.patch
>
>
> Recently, I found a issue on nodemanager metrics.That's 
> {{NodeManagerMetrics#containersLaunched}} is not actually means the container 
> succeed launched times.Because in some time, it will be failed when receiving 
> the killing command or happening container-localizationFailed.This will lead 
> to a failed container.But now,this counter value will be increased in these 
> code whenever the container is started successfully or failed.
> {code}
> Credentials credentials = parseCredentials(launchContext);
> Container container =
> new ContainerImpl(getConfig(), this.dispatcher,
> context.getNMStateStore(), launchContext,
>   credentials, metrics, containerTokenIdentifier);
> ApplicationId applicationID =
> containerId.getApplicationAttemptId().getApplicationId();
> if (context.getContainers().putIfAbsent(containerId, container) != null) {
>   NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER,
> "ContainerManagerImpl", "Container already running on this node!",
> applicationID, containerId);
>   throw RPCUtil.getRemoteException("Container " + containerIdStr
>   + " already is running on this node!!");
> }
> this.readLock.lock();
> try {
>   if (!serviceStopped) {
> // Create the application
> Application application =
> new ApplicationImpl(dispatcher, user, applicationID, credentials, 
> context);
> if (null == context.getApplications().putIfAbsent(applicationID,
>   application)) {
>   LOG.info("Creating a new application reference for app " + 
> applicationID);
>   LogAggregationContext logAggregationContext =
>   containerTokenIdentifier.getLogAggregationContext();
>   Map appAcls =
>   container.getLaunchContext().getApplicationACLs();
>   context.getNMStateStore().storeApplication(applicationID,
>   buildAppProto(applicationID, user, credentials, appAcls,
> logAggregationContext));
>   dispatcher.getEventHandler().handle(
> new ApplicationInitEvent(applicationID, appAcls,
>   logAggregationContext));
> }
> this.context.getNMStateStore().storeContainer(containerId, request);
> dispatcher.getEventHandler().handle(
>   new ApplicationContainerInitEvent(container));
> 
> this.context.getContainerTokenSecretManager().startContainerSuccessful(
>   containerTokenIdentifier);
> NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
>   "ContainerManageImpl", applicationID, containerId);
> // TODO launchedContainer misplaced -> doesn't necessarily mean a 
> container
> // launch. A finished Application will not launch containers.
> metrics.launchedContainer();
> metrics.allocateContainer(containerTokenIdentifier.getResource());
>   } else {
> throw new YarnException(
> "Container start failed as the NodeManager is " +
> "in the process of shutting down");
>   }
> {code}
> In addition, we are lack of localzationFailed metric in container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4399) FairScheduler allocated container should resetSchedulingOpportunities count of its priority

2015-12-10 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052026#comment-15052026
 ] 

Lin Yiqun commented on YARN-4399:
-

I test the failed unit tests and these error is not related.

> FairScheduler allocated container should resetSchedulingOpportunities count 
> of its priority
> ---
>
> Key: YARN-4399
> URL: https://issues.apache.org/jira/browse/YARN-4399
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4399.001.patch, YARN-4399.002.patch, 
> YARN-4399.003.patch
>
>
> There is a bug on fairScheduler allocating containers when you configurate 
> the locality configs.When you attempt to assigned a container,it will invoke 
> {{FSAppAttempt#addSchedulingOpportunity}} whenever it can be assigned 
> successfully or not. And if you configurate the 
> yarn.scheduler.fair.locality.threshold.node and 
> yarn.scheduler.fair.locality.threshold.rack, the schedulingOpportunity value 
> will influence the locality of containers.Because if one container is 
> assigned successfully and its  priority schedulingOpportunity count will be 
> increased, and second container will be increased again.This will may be let 
> their priority of allowedLocality degrade. And this will let this container 
> dealt by rackRequest. So I think in fairScheduler allocating container, if 
> the previous container was dealt, its priority of schedulerCount should be 
> reset to 0, and don't let its value influence container's allocating in next 
> iteration and this will increased the locality of containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4440) FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time

2015-12-10 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4440:

Attachment: YARN-4440.002.patch

Resolve the remained error.

> FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time
> -
>
> Key: YARN-4440
> URL: https://issues.apache.org/jira/browse/YARN-4440
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4440.001.patch, YARN-4440.002.patch
>
>
> It seems there is a bug on {{FSAppAttempt#getAllowedLocalityLevelByTime}} 
> method
> {code}
> // default level is NODE_LOCAL
> if (! allowedLocalityLevel.containsKey(priority)) {
>   allowedLocalityLevel.put(priority, NodeType.NODE_LOCAL);
>   return NodeType.NODE_LOCAL;
> }
> {code}
> If you first invoke this method, it doesn't init  time in 
> lastScheduledContainer and this will lead to execute these code for next 
> invokation:
> {code}
> // check waiting time
> long waitTime = currentTimeMs;
> if (lastScheduledContainer.containsKey(priority)) {
>   waitTime -= lastScheduledContainer.get(priority);
> } else {
>   waitTime -= getStartTime();
> }
> {code}
> the waitTime will subtract to FsApp startTime, and this will be easily more 
> than the delay time and allowedLocality degrade. Because FsApp startTime will 
> be start earlier than currentTimeMs. So we should add the initial time of 
> priority to prevent comparing with FsApp startTime and allowedLocalityLevel 
> degrade. And this problem will have more negative influence for small-jobs. 
> The YARN-4399 also discuss some problem in aspect of locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4440) FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time

2015-12-10 Thread Lin Yiqun (JIRA)
Lin Yiqun created YARN-4440:
---

 Summary: FSAppAttempt#getAllowedLocalityLevelByTime should init 
the lastScheduler time
 Key: YARN-4440
 URL: https://issues.apache.org/jira/browse/YARN-4440
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.1
Reporter: Lin Yiqun
Assignee: Lin Yiqun


It seems there is a bug on {{FSAppAttempt#getAllowedLocalityLevelByTime}} method
{code}
// default level is NODE_LOCAL
if (! allowedLocalityLevel.containsKey(priority)) {
  allowedLocalityLevel.put(priority, NodeType.NODE_LOCAL);
  return NodeType.NODE_LOCAL;
}
{code}
If you first invoke this method, it doesn't init  time in 
lastScheduledContainer and this will lead to execute these code for next 
invokation:
{code}
// check waiting time
long waitTime = currentTimeMs;
if (lastScheduledContainer.containsKey(priority)) {
  waitTime -= lastScheduledContainer.get(priority);
} else {
  waitTime -= getStartTime();
}
{code}
the waitTime will subtract to FsApp startTime, and this will be easily more 
than the delay time and allowedLocality degrade. Because FsApp startTime will 
be start earlier than currentTimeMs. So we should add the initial time of 
priority to prevent comparing with FsApp startTime and allowedLocalityLevel 
degrade. And this problem will have more negative influence for small-jobs. The 
YARN-4399 also discuss some problem in aspect of locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4440) FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time

2015-12-10 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4440:

Attachment: YARN-4440.001.patch

> FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time
> -
>
> Key: YARN-4440
> URL: https://issues.apache.org/jira/browse/YARN-4440
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4440.001.patch
>
>
> It seems there is a bug on {{FSAppAttempt#getAllowedLocalityLevelByTime}} 
> method
> {code}
> // default level is NODE_LOCAL
> if (! allowedLocalityLevel.containsKey(priority)) {
>   allowedLocalityLevel.put(priority, NodeType.NODE_LOCAL);
>   return NodeType.NODE_LOCAL;
> }
> {code}
> If you first invoke this method, it doesn't init  time in 
> lastScheduledContainer and this will lead to execute these code for next 
> invokation:
> {code}
> // check waiting time
> long waitTime = currentTimeMs;
> if (lastScheduledContainer.containsKey(priority)) {
>   waitTime -= lastScheduledContainer.get(priority);
> } else {
>   waitTime -= getStartTime();
> }
> {code}
> the waitTime will subtract to FsApp startTime, and this will be easily more 
> than the delay time and allowedLocality degrade. Because FsApp startTime will 
> be start earlier than currentTimeMs. So we should add the initial time of 
> priority to prevent comparing with FsApp startTime and allowedLocalityLevel 
> degrade. And this problem will have more negative influence for small-jobs. 
> The YARN-4399 also discuss some problem in aspect of locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4399) FairScheduler allocated container should resetSchedulingOpportunities count of its priority

2015-12-10 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4399:

Attachment: YARN-4399.002.patch

Resolve the comile error.

> FairScheduler allocated container should resetSchedulingOpportunities count 
> of its priority
> ---
>
> Key: YARN-4399
> URL: https://issues.apache.org/jira/browse/YARN-4399
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4399.001.patch, YARN-4399.002.patch
>
>
> There is a bug on fairScheduler allocating containers when you configurate 
> the locality configs.When you attempt to assigned a container,it will invoke 
> {{FSAppAttempt#addSchedulingOpportunity}} whenever it can be assigned 
> successfully or not. And if you configurate the 
> yarn.scheduler.fair.locality.threshold.node and 
> yarn.scheduler.fair.locality.threshold.rack, the schedulingOpportunity value 
> will influence the locality of containers.Because if one container is 
> assigned successfully and its  priority schedulingOpportunity count will be 
> increased, and second container will be increased again.This will may be let 
> their priority of allowedLocality degrade. And this will let this container 
> dealt by rackRequest. So I think in fairScheduler allocating container, if 
> the previous container was dealt, its priority of schedulerCount should be 
> reset to 0, and don't let its value influence container's allocating in next 
> iteration and this will increased the locality of containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4399) FairScheduler allocated container should resetSchedulingOpportunities count of its priority

2015-12-10 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4399:

Attachment: YARN-4399.003.patch

Resolve the remained compile error

> FairScheduler allocated container should resetSchedulingOpportunities count 
> of its priority
> ---
>
> Key: YARN-4399
> URL: https://issues.apache.org/jira/browse/YARN-4399
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4399.001.patch, YARN-4399.002.patch, 
> YARN-4399.003.patch
>
>
> There is a bug on fairScheduler allocating containers when you configurate 
> the locality configs.When you attempt to assigned a container,it will invoke 
> {{FSAppAttempt#addSchedulingOpportunity}} whenever it can be assigned 
> successfully or not. And if you configurate the 
> yarn.scheduler.fair.locality.threshold.node and 
> yarn.scheduler.fair.locality.threshold.rack, the schedulingOpportunity value 
> will influence the locality of containers.Because if one container is 
> assigned successfully and its  priority schedulingOpportunity count will be 
> increased, and second container will be increased again.This will may be let 
> their priority of allowedLocality degrade. And this will let this container 
> dealt by rackRequest. So I think in fairScheduler allocating container, if 
> the previous container was dealt, its priority of schedulerCount should be 
> reset to 0, and don't let its value influence container's allocating in next 
> iteration and this will increased the locality of containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container

2015-12-08 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4381:

Attachment: YARN-4381.002.patch

Thanks [~djp] for review. I update the container metrics 
more fine-grained. As you said that the container failed is not only because 
localizationFailed and is not suitable to add the metric on launchEvent. So I 
add the metric {{containerLaunchedSuccess}} when container is becoming to 
running state and seting the {{wasLaunched=true}}. Besides this, I add the 
another two metric2 for container-failed cases.
* one is for containerFailedBeforeLaunched
* other one is for containerKilledAfterLaunched
And I think these metrics will help us to know more concretely of a container.

> Add container launchEvent and container localizeFailed metrics in container
> ---
>
> Key: YARN-4381
> URL: https://issues.apache.org/jira/browse/YARN-4381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4381.001.patch, YARN-4381.002.patch
>
>
> Recently, I found a issue on nodemanager metrics.That's 
> {{NodeManagerMetrics#containersLaunched}} is not actually means the container 
> succeed launched times.Because in some time, it will be failed when receiving 
> the killing command or happening container-localizationFailed.This will lead 
> to a failed container.But now,this counter value will be increased in these 
> code whenever the container is started successfully or failed.
> {code}
> Credentials credentials = parseCredentials(launchContext);
> Container container =
> new ContainerImpl(getConfig(), this.dispatcher,
> context.getNMStateStore(), launchContext,
>   credentials, metrics, containerTokenIdentifier);
> ApplicationId applicationID =
> containerId.getApplicationAttemptId().getApplicationId();
> if (context.getContainers().putIfAbsent(containerId, container) != null) {
>   NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER,
> "ContainerManagerImpl", "Container already running on this node!",
> applicationID, containerId);
>   throw RPCUtil.getRemoteException("Container " + containerIdStr
>   + " already is running on this node!!");
> }
> this.readLock.lock();
> try {
>   if (!serviceStopped) {
> // Create the application
> Application application =
> new ApplicationImpl(dispatcher, user, applicationID, credentials, 
> context);
> if (null == context.getApplications().putIfAbsent(applicationID,
>   application)) {
>   LOG.info("Creating a new application reference for app " + 
> applicationID);
>   LogAggregationContext logAggregationContext =
>   containerTokenIdentifier.getLogAggregationContext();
>   Map appAcls =
>   container.getLaunchContext().getApplicationACLs();
>   context.getNMStateStore().storeApplication(applicationID,
>   buildAppProto(applicationID, user, credentials, appAcls,
> logAggregationContext));
>   dispatcher.getEventHandler().handle(
> new ApplicationInitEvent(applicationID, appAcls,
>   logAggregationContext));
> }
> this.context.getNMStateStore().storeContainer(containerId, request);
> dispatcher.getEventHandler().handle(
>   new ApplicationContainerInitEvent(container));
> 
> this.context.getContainerTokenSecretManager().startContainerSuccessful(
>   containerTokenIdentifier);
> NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
>   "ContainerManageImpl", applicationID, containerId);
> // TODO launchedContainer misplaced -> doesn't necessarily mean a 
> container
> // launch. A finished Application will not launch containers.
> metrics.launchedContainer();
> metrics.allocateContainer(containerTokenIdentifier.getResource());
>   } else {
> throw new YarnException(
> "Container start failed as the NodeManager is " +
> "in the process of shutting down");
>   }
> {code}
> In addition, we are lack of localzationFailed metric in container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container

2015-12-08 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047742#comment-15047742
 ] 

Lin Yiqun commented on YARN-4381:
-

[~djp], the jenkin report shows that checkstyle warnings is not need to modify 
and license warnings likes not related. Could you review my patch again?

> Add container launchEvent and container localizeFailed metrics in container
> ---
>
> Key: YARN-4381
> URL: https://issues.apache.org/jira/browse/YARN-4381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4381.001.patch, YARN-4381.002.patch
>
>
> Recently, I found a issue on nodemanager metrics.That's 
> {{NodeManagerMetrics#containersLaunched}} is not actually means the container 
> succeed launched times.Because in some time, it will be failed when receiving 
> the killing command or happening container-localizationFailed.This will lead 
> to a failed container.But now,this counter value will be increased in these 
> code whenever the container is started successfully or failed.
> {code}
> Credentials credentials = parseCredentials(launchContext);
> Container container =
> new ContainerImpl(getConfig(), this.dispatcher,
> context.getNMStateStore(), launchContext,
>   credentials, metrics, containerTokenIdentifier);
> ApplicationId applicationID =
> containerId.getApplicationAttemptId().getApplicationId();
> if (context.getContainers().putIfAbsent(containerId, container) != null) {
>   NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER,
> "ContainerManagerImpl", "Container already running on this node!",
> applicationID, containerId);
>   throw RPCUtil.getRemoteException("Container " + containerIdStr
>   + " already is running on this node!!");
> }
> this.readLock.lock();
> try {
>   if (!serviceStopped) {
> // Create the application
> Application application =
> new ApplicationImpl(dispatcher, user, applicationID, credentials, 
> context);
> if (null == context.getApplications().putIfAbsent(applicationID,
>   application)) {
>   LOG.info("Creating a new application reference for app " + 
> applicationID);
>   LogAggregationContext logAggregationContext =
>   containerTokenIdentifier.getLogAggregationContext();
>   Map appAcls =
>   container.getLaunchContext().getApplicationACLs();
>   context.getNMStateStore().storeApplication(applicationID,
>   buildAppProto(applicationID, user, credentials, appAcls,
> logAggregationContext));
>   dispatcher.getEventHandler().handle(
> new ApplicationInitEvent(applicationID, appAcls,
>   logAggregationContext));
> }
> this.context.getNMStateStore().storeContainer(containerId, request);
> dispatcher.getEventHandler().handle(
>   new ApplicationContainerInitEvent(container));
> 
> this.context.getContainerTokenSecretManager().startContainerSuccessful(
>   containerTokenIdentifier);
> NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
>   "ContainerManageImpl", applicationID, containerId);
> // TODO launchedContainer misplaced -> doesn't necessarily mean a 
> container
> // launch. A finished Application will not launch containers.
> metrics.launchedContainer();
> metrics.allocateContainer(containerTokenIdentifier.getResource());
>   } else {
> throw new YarnException(
> "Container start failed as the NodeManager is " +
> "in the process of shutting down");
>   }
> {code}
> In addition, we are lack of localzationFailed metric in container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container

2015-12-06 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044261#comment-15044261
 ] 

Lin Yiqun commented on YARN-4381:
-

[~djp], do you have some time to review my patch or what else can I do for this 
jira ?

> Add container launchEvent and container localizeFailed metrics in container
> ---
>
> Key: YARN-4381
> URL: https://issues.apache.org/jira/browse/YARN-4381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4381.001.patch
>
>
> Recently, I found a issue on nodemanager metrics.That's 
> {{NodeManagerMetrics#containersLaunched}} is not actually means the container 
> succeed launched times.Because in some time, it will be failed when receiving 
> the killing command or happening container-localizationFailed.This will lead 
> to a failed container.But now,this counter value will be increased in these 
> code whenever the container is started successfully or failed.
> {code}
> Credentials credentials = parseCredentials(launchContext);
> Container container =
> new ContainerImpl(getConfig(), this.dispatcher,
> context.getNMStateStore(), launchContext,
>   credentials, metrics, containerTokenIdentifier);
> ApplicationId applicationID =
> containerId.getApplicationAttemptId().getApplicationId();
> if (context.getContainers().putIfAbsent(containerId, container) != null) {
>   NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER,
> "ContainerManagerImpl", "Container already running on this node!",
> applicationID, containerId);
>   throw RPCUtil.getRemoteException("Container " + containerIdStr
>   + " already is running on this node!!");
> }
> this.readLock.lock();
> try {
>   if (!serviceStopped) {
> // Create the application
> Application application =
> new ApplicationImpl(dispatcher, user, applicationID, credentials, 
> context);
> if (null == context.getApplications().putIfAbsent(applicationID,
>   application)) {
>   LOG.info("Creating a new application reference for app " + 
> applicationID);
>   LogAggregationContext logAggregationContext =
>   containerTokenIdentifier.getLogAggregationContext();
>   Map appAcls =
>   container.getLaunchContext().getApplicationACLs();
>   context.getNMStateStore().storeApplication(applicationID,
>   buildAppProto(applicationID, user, credentials, appAcls,
> logAggregationContext));
>   dispatcher.getEventHandler().handle(
> new ApplicationInitEvent(applicationID, appAcls,
>   logAggregationContext));
> }
> this.context.getNMStateStore().storeContainer(containerId, request);
> dispatcher.getEventHandler().handle(
>   new ApplicationContainerInitEvent(container));
> 
> this.context.getContainerTokenSecretManager().startContainerSuccessful(
>   containerTokenIdentifier);
> NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
>   "ContainerManageImpl", applicationID, containerId);
> // TODO launchedContainer misplaced -> doesn't necessarily mean a 
> container
> // launch. A finished Application will not launch containers.
> metrics.launchedContainer();
> metrics.allocateContainer(containerTokenIdentifier.getResource());
>   } else {
> throw new YarnException(
> "Container start failed as the NodeManager is " +
> "in the process of shutting down");
>   }
> {code}
> In addition, we are lack of localzationFailed metric in container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4399) FairScheduler allocated container should resetSchedulingOpportunities count of its priority

2015-11-30 Thread Lin Yiqun (JIRA)
Lin Yiqun created YARN-4399:
---

 Summary: FairScheduler allocated container should 
resetSchedulingOpportunities count of its priority
 Key: YARN-4399
 URL: https://issues.apache.org/jira/browse/YARN-4399
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.7.1
Reporter: Lin Yiqun
Assignee: Lin Yiqun


There is a bug on fairScheduler allocating containers when you configurate the 
locality configs.When you attempt to assigned a container,it will invoke 
{{FSAppAttempt#addSchedulingOpportunity}} whenever it can be assigned 
successfully or not. And if you configurate the 
yarn.scheduler.fair.locality.threshold.node and 
yarn.scheduler.fair.locality.threshold.rack, the schedulingOpportunity value 
will influence the locality of containers.Because if one container is assigned 
successfully and its  priority schedulingOpportunity count will be increased, 
and second container will be increased again.This will may be let their 
priority of allowedLocality degrade. And this will let this container dealt by 
rackRequest. So I think in fairScheduler allocating container, if the previous 
container was dealt, its priority of schedulerCount should be reset to 0, and 
don't let its value influence container's allocating in next iteration and this 
will increased the locality of containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4399) FairScheduler allocated container should resetSchedulingOpportunities count of its priority

2015-11-30 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4399:

Attachment: YARN-4399.001.patch

> FairScheduler allocated container should resetSchedulingOpportunities count 
> of its priority
> ---
>
> Key: YARN-4399
> URL: https://issues.apache.org/jira/browse/YARN-4399
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4399.001.patch
>
>
> There is a bug on fairScheduler allocating containers when you configurate 
> the locality configs.When you attempt to assigned a container,it will invoke 
> {{FSAppAttempt#addSchedulingOpportunity}} whenever it can be assigned 
> successfully or not. And if you configurate the 
> yarn.scheduler.fair.locality.threshold.node and 
> yarn.scheduler.fair.locality.threshold.rack, the schedulingOpportunity value 
> will influence the locality of containers.Because if one container is 
> assigned successfully and its  priority schedulingOpportunity count will be 
> increased, and second container will be increased again.This will may be let 
> their priority of allowedLocality degrade. And this will let this container 
> dealt by rackRequest. So I think in fairScheduler allocating container, if 
> the previous container was dealt, its priority of schedulerCount should be 
> reset to 0, and don't let its value influence container's allocating in next 
> iteration and this will increased the locality of containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4396) Log the trance information on FSAppAttempt#assignContainer

2015-11-27 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4396:

Attachment: YARN-4396.004.patch

Simplified the debug infos.

> Log the trance information on FSAppAttempt#assignContainer
> --
>
> Key: YARN-4396
> URL: https://issues.apache.org/jira/browse/YARN-4396
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications, scheduler
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4396.001.patch, YARN-4396.002.patch, 
> YARN-4396.003.patch, YARN-4396.004.patch
>
>
> When I configure the yarn.scheduler.fair.locality.threshold.node and 
> yarn.scheduler.fair.locality.threshold.rack to open this function, I have no 
> detail info of assigning container's locality. And it's important because it 
> will lead some delay scheduling and will have an influence on my cluster. If 
> I know these info, I can adjust param in cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4396) Log the trance information on FSAppAttempt#assignContainer

2015-11-26 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4396:

Attachment: YARN-4396.003.patch

Add the debug infos in {{getAllowedLocalityLevelByTime}} and 
{{getAllowedLocalityLevel}} methods.A sample info:
* getAllowedLocalityLevelByTime:
{code}
2015-11-27 09:54:21,553 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
Waiting time is 32570 ms, more than nodeLocalityDelay time 1000 ms, change 
allowedLocality from NODE_LOCAL to RACK_LOCAL, priority:10, app attempt 
id:appattempt_1448589183973_0001_01
2015-11-27 09:54:21,553 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
Assign container on qihe129040 node, requestType:OFF_SWITCH, 
allowedLocality:RACK_LOCAL, schedulingOpportunities:0, priority:10, app attempt 
id:appattempt_1448589183973_0001_01
{code}

* getAllowedLocalityLevel:
{code}
2015-11-27 09:42:47,132 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
SchedulingOpportunities count is 2, more than nodeLocalityThreshold num 
0.899761581421, change allowedLocality from NODE_LOCAL to RACK_LOCAL, 
priority:10, app attempt id:appattempt_1448588357362_0001_01
2015-11-27 09:42:47,132 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
Assign container on qihe129040 node, requestType:OFF_SWITCH, 
allowedLocality:RACK_LOCAL, schedulingOpportunities:0, priority:10, app attempt 
id:appattempt_1448588357362_0001_01
{code}

> Log the trance information on FSAppAttempt#assignContainer
> --
>
> Key: YARN-4396
> URL: https://issues.apache.org/jira/browse/YARN-4396
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications, scheduler
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4396.001.patch, YARN-4396.002.patch, 
> YARN-4396.003.patch
>
>
> When I configure the yarn.scheduler.fair.locality.threshold.node and 
> yarn.scheduler.fair.locality.threshold.rack to open this function, I have no 
> detail info of assigning container's locality. And it's important because it 
> will lead some delay scheduling and will have an influence on my cluster. If 
> I know these info, I can adjust param in cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4396) Log the trance information on FSAppAttempt#assignContainer

2015-11-26 Thread Lin Yiqun (JIRA)
Lin Yiqun created YARN-4396:
---

 Summary: Log the trance information on FSAppAttempt#assignContainer
 Key: YARN-4396
 URL: https://issues.apache.org/jira/browse/YARN-4396
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications, scheduler
Affects Versions: 2.7.1
Reporter: Lin Yiqun
Assignee: Lin Yiqun


When I configure the yarn.scheduler.fair.locality.threshold.node and 
yarn.scheduler.fair.locality.threshold.rack to open this function, I have no 
detail info of assigning container's locality. And it's important because it 
will lead some delay scheduling and will have an influence on my cluster. If I 
know these info, I can adjust param in cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4396) Log the trance information on FSAppAttempt#assignContainer

2015-11-26 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4396:

Attachment: YARN-4396.001.patch

> Log the trance information on FSAppAttempt#assignContainer
> --
>
> Key: YARN-4396
> URL: https://issues.apache.org/jira/browse/YARN-4396
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications, scheduler
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4396.001.patch
>
>
> When I configure the yarn.scheduler.fair.locality.threshold.node and 
> yarn.scheduler.fair.locality.threshold.rack to open this function, I have no 
> detail info of assigning container's locality. And it's important because it 
> will lead some delay scheduling and will have an influence on my cluster. If 
> I know these info, I can adjust param in cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4396) Log the trance information on FSAppAttempt#assignContainer

2015-11-26 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4396:

Attachment: YARN-4396.002.patch

Modify some debug infos.

> Log the trance information on FSAppAttempt#assignContainer
> --
>
> Key: YARN-4396
> URL: https://issues.apache.org/jira/browse/YARN-4396
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications, scheduler
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4396.001.patch, YARN-4396.002.patch
>
>
> When I configure the yarn.scheduler.fair.locality.threshold.node and 
> yarn.scheduler.fair.locality.threshold.rack to open this function, I have no 
> detail info of assigning container's locality. And it's important because it 
> will lead some delay scheduling and will have an influence on my cluster. If 
> I know these info, I can adjust param in cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container

2015-11-23 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023415#comment-15023415
 ] 

Lin Yiqun commented on YARN-4381:
-

Thanks [~djp]!

> Add container launchEvent and container localizeFailed metrics in container
> ---
>
> Key: YARN-4381
> URL: https://issues.apache.org/jira/browse/YARN-4381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4381.001.patch
>
>
> Recently, I found a issue on nodemanager metrics.That's 
> {{NodeManagerMetrics#containersLaunched}} is not actually means the container 
> succeed launched times.Because in some time, it will be failed when receiving 
> the killing command or happening container-localizationFailed.This will lead 
> to a failed container.But now,this counter value will be increased in these 
> code whenever the container is started successfully or failed.
> {code}
> Credentials credentials = parseCredentials(launchContext);
> Container container =
> new ContainerImpl(getConfig(), this.dispatcher,
> context.getNMStateStore(), launchContext,
>   credentials, metrics, containerTokenIdentifier);
> ApplicationId applicationID =
> containerId.getApplicationAttemptId().getApplicationId();
> if (context.getContainers().putIfAbsent(containerId, container) != null) {
>   NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER,
> "ContainerManagerImpl", "Container already running on this node!",
> applicationID, containerId);
>   throw RPCUtil.getRemoteException("Container " + containerIdStr
>   + " already is running on this node!!");
> }
> this.readLock.lock();
> try {
>   if (!serviceStopped) {
> // Create the application
> Application application =
> new ApplicationImpl(dispatcher, user, applicationID, credentials, 
> context);
> if (null == context.getApplications().putIfAbsent(applicationID,
>   application)) {
>   LOG.info("Creating a new application reference for app " + 
> applicationID);
>   LogAggregationContext logAggregationContext =
>   containerTokenIdentifier.getLogAggregationContext();
>   Map appAcls =
>   container.getLaunchContext().getApplicationACLs();
>   context.getNMStateStore().storeApplication(applicationID,
>   buildAppProto(applicationID, user, credentials, appAcls,
> logAggregationContext));
>   dispatcher.getEventHandler().handle(
> new ApplicationInitEvent(applicationID, appAcls,
>   logAggregationContext));
> }
> this.context.getNMStateStore().storeContainer(containerId, request);
> dispatcher.getEventHandler().handle(
>   new ApplicationContainerInitEvent(container));
> 
> this.context.getContainerTokenSecretManager().startContainerSuccessful(
>   containerTokenIdentifier);
> NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
>   "ContainerManageImpl", applicationID, containerId);
> // TODO launchedContainer misplaced -> doesn't necessarily mean a 
> container
> // launch. A finished Application will not launch containers.
> metrics.launchedContainer();
> metrics.allocateContainer(containerTokenIdentifier.getResource());
>   } else {
> throw new YarnException(
> "Container start failed as the NodeManager is " +
> "in the process of shutting down");
>   }
> {code}
> In addition, we are lack of localzationFailed metric in container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container

2015-11-22 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4381:

Attachment: YARN-4381.001.patch

I attach a init patch and add two new metrics in {{NodeManagerMetrics}}
* containerLocalizeFailed
* containersLaunchEventOperation


> Add container launchEvent and container localizeFailed metrics in container
> ---
>
> Key: YARN-4381
> URL: https://issues.apache.org/jira/browse/YARN-4381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
> Attachments: YARN-4381.001.patch
>
>
> Recently, I found a issue on nodemanager metrics.That's 
> {{NodeManagerMetrics#containersLaunched}} is not actually means the container 
> succeed launched times.Because in some time, it will be failed when receiving 
> the killing command or happening container-localizationFailed.This will lead 
> to a failed container.But now,this counter value will be increased in these 
> code whenever the container is started successfully or failed.
> {code}
> Credentials credentials = parseCredentials(launchContext);
> Container container =
> new ContainerImpl(getConfig(), this.dispatcher,
> context.getNMStateStore(), launchContext,
>   credentials, metrics, containerTokenIdentifier);
> ApplicationId applicationID =
> containerId.getApplicationAttemptId().getApplicationId();
> if (context.getContainers().putIfAbsent(containerId, container) != null) {
>   NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER,
> "ContainerManagerImpl", "Container already running on this node!",
> applicationID, containerId);
>   throw RPCUtil.getRemoteException("Container " + containerIdStr
>   + " already is running on this node!!");
> }
> this.readLock.lock();
> try {
>   if (!serviceStopped) {
> // Create the application
> Application application =
> new ApplicationImpl(dispatcher, user, applicationID, credentials, 
> context);
> if (null == context.getApplications().putIfAbsent(applicationID,
>   application)) {
>   LOG.info("Creating a new application reference for app " + 
> applicationID);
>   LogAggregationContext logAggregationContext =
>   containerTokenIdentifier.getLogAggregationContext();
>   Map appAcls =
>   container.getLaunchContext().getApplicationACLs();
>   context.getNMStateStore().storeApplication(applicationID,
>   buildAppProto(applicationID, user, credentials, appAcls,
> logAggregationContext));
>   dispatcher.getEventHandler().handle(
> new ApplicationInitEvent(applicationID, appAcls,
>   logAggregationContext));
> }
> this.context.getNMStateStore().storeContainer(containerId, request);
> dispatcher.getEventHandler().handle(
>   new ApplicationContainerInitEvent(container));
> 
> this.context.getContainerTokenSecretManager().startContainerSuccessful(
>   containerTokenIdentifier);
> NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
>   "ContainerManageImpl", applicationID, containerId);
> // TODO launchedContainer misplaced -> doesn't necessarily mean a 
> container
> // launch. A finished Application will not launch containers.
> metrics.launchedContainer();
> metrics.allocateContainer(containerTokenIdentifier.getResource());
>   } else {
> throw new YarnException(
> "Container start failed as the NodeManager is " +
> "in the process of shutting down");
>   }
> {code}
> In addition, we are lack of localzationFailed metric in container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container

2015-11-22 Thread Lin Yiqun (JIRA)
Lin Yiqun created YARN-4381:
---

 Summary: Add container launchEvent and container localizeFailed 
metrics in container
 Key: YARN-4381
 URL: https://issues.apache.org/jira/browse/YARN-4381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.1
Reporter: Lin Yiqun


Recently, I found a issue on nodemanager metrics.That's 
{{NodeManagerMetrics#containersLaunched}} is not actually means the container 
succeed launched times.Because in some time, it will be failed when receiving 
the killing command or happening container-localizationFailed.This will lead to 
a failed container.But now,this counter value will be increased in these code 
whenever the container is started successfully or failed.
{code}
Credentials credentials = parseCredentials(launchContext);

Container container =
new ContainerImpl(getConfig(), this.dispatcher,
context.getNMStateStore(), launchContext,
  credentials, metrics, containerTokenIdentifier);
ApplicationId applicationID =
containerId.getApplicationAttemptId().getApplicationId();
if (context.getContainers().putIfAbsent(containerId, container) != null) {
  NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER,
"ContainerManagerImpl", "Container already running on this node!",
applicationID, containerId);
  throw RPCUtil.getRemoteException("Container " + containerIdStr
  + " already is running on this node!!");
}

this.readLock.lock();
try {
  if (!serviceStopped) {
// Create the application
Application application =
new ApplicationImpl(dispatcher, user, applicationID, credentials, 
context);
if (null == context.getApplications().putIfAbsent(applicationID,
  application)) {
  LOG.info("Creating a new application reference for app " + 
applicationID);
  LogAggregationContext logAggregationContext =
  containerTokenIdentifier.getLogAggregationContext();
  Map appAcls =
  container.getLaunchContext().getApplicationACLs();
  context.getNMStateStore().storeApplication(applicationID,
  buildAppProto(applicationID, user, credentials, appAcls,
logAggregationContext));
  dispatcher.getEventHandler().handle(
new ApplicationInitEvent(applicationID, appAcls,
  logAggregationContext));
}

this.context.getNMStateStore().storeContainer(containerId, request);
dispatcher.getEventHandler().handle(
  new ApplicationContainerInitEvent(container));

this.context.getContainerTokenSecretManager().startContainerSuccessful(
  containerTokenIdentifier);
NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
  "ContainerManageImpl", applicationID, containerId);
// TODO launchedContainer misplaced -> doesn't necessarily mean a 
container
// launch. A finished Application will not launch containers.
metrics.launchedContainer();
metrics.allocateContainer(containerTokenIdentifier.getResource());
  } else {
throw new YarnException(
"Container start failed as the NodeManager is " +
"in the process of shutting down");
  }
{code}
In addition, we are lack of localzationFailed metric in container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)