[jira] [Updated] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-09-29 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-5995:
--
Attachment: YARN-5995.0005.patch

YARN-5995.0005.patch commit 

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.0003.patch, YARN-5995.0004.patch, YARN-5995.0005.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6521) Yarn Log-aggregation Transform Enable not to Spam the NameNode

2017-05-02 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992496#comment-15992496
 ] 

zhangyubiao commented on YARN-6521:
---

I find the YARN-4904 also  solve the problem.

> Yarn Log-aggregation Transform Enable not to Spam the NameNode
> --
>
> Key: YARN-6521
> URL: https://issues.apache.org/jira/browse/YARN-6521
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhangyubiao
>
> Nowdays We have large cluster,with the apps incr and container incr ,We split 
> an  namespace to store applogs. 
> But the  log-aggregation /tmp/app-logs incr also make the namespace responce 
> slow.  
> We want to chang yarn.log-aggregation-enable true -> false,But Transform the 
> the yarn log cli service also can get the app-logs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6521) Yarn Log-aggregation Transform Enable not to Spam the NameNode

2017-04-28 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988453#comment-15988453
 ] 

zhangyubiao edited comment on YARN-6521 at 4/28/17 10:19 AM:
-

Thanks for all. Thanks for [~rkanter],[~djp]. I will try the HadoopArchiveLogs.

By the way I see that YARN-431 Stabilize YARN application log-handling. But 
None of them basically solve the containers increase  making too many small 
files and causing a problem for the NN at once. 

 If we don't enable  log aggregation and can provide "yarn logs" to get logs.  
The problem will gone forever.  
The thing I face is that we can't get the app and contianerReport  relationship 
 when the app finish(Spark). 


was (Author: piaoyu zhang):
Thanks for all. Thanks for [~rkanter],[~djp]. I will try the HadoopArchiveLogs.

By the way I see that YARN-431 Stabilize YARN application log-handling. But 
None of them basically solve the containers increase  making too many small 
files and causing a problem for the NN at once. 

 If we don't enable  log aggregation and can provide "yarn logs" to get logs.  
The problem will gone forever.  
The thing I face is that we can't get the app and contianerReport  relationship 
 when the app finish. 

> Yarn Log-aggregation Transform Enable not to Spam the NameNode
> --
>
> Key: YARN-6521
> URL: https://issues.apache.org/jira/browse/YARN-6521
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhangyubiao
>
> Nowdays We have large cluster,with the apps incr and container incr ,We split 
> an  namespace to store applogs. 
> But the  log-aggregation /tmp/app-logs incr also make the namespace responce 
> slow.  
> We want to chang yarn.log-aggregation-enable true -> false,But Transform the 
> the yarn log cli service also can get the app-logs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6521) Yarn Log-aggregation Transform Enable not to Spam the NameNode

2017-04-28 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988453#comment-15988453
 ] 

zhangyubiao commented on YARN-6521:
---

Thanks for all. Thanks for [~rkanter],[~djp]. I will try the HadoopArchiveLogs.

By the way I see that YARN-431 Stabilize YARN application log-handling. But 
None of them basically solve the containers increase  making too many small 
files and causing a problem for the NN at once. 

 If we don't enable  log aggregation and can provide "yarn logs" to get logs.  
The problem will gone forever.  
The thing I face is that we can't get the app and contianerReport  relationship 
 when the app finish. 

> Yarn Log-aggregation Transform Enable not to Spam the NameNode
> --
>
> Key: YARN-6521
> URL: https://issues.apache.org/jira/browse/YARN-6521
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhangyubiao
>
> Nowdays We have large cluster,with the apps incr and container incr ,We split 
> an  namespace to store applogs. 
> But the  log-aggregation /tmp/app-logs incr also make the namespace responce 
> slow.  
> We want to chang yarn.log-aggregation-enable true -> false,But Transform the 
> the yarn log cli service also can get the app-logs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6521) Yarn Log-aggregation Transform Enable not to Spam the NameNode

2017-04-26 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984297#comment-15984297
 ] 

zhangyubiao commented on YARN-6521:
---

OK, apps incr mean app increase,container incr mean  container increase.
Transform the log cli service means that don't make the log-aggregation enable  
also can get the app logs by the command "yarn logs  -applicationId 
applicationXXX"





> Yarn Log-aggregation Transform Enable not to Spam the NameNode
> --
>
> Key: YARN-6521
> URL: https://issues.apache.org/jira/browse/YARN-6521
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhangyubiao
>
> Nowdays We have large cluster,with the apps incr and container incr ,We split 
> an  namespace to store applogs. 
> But the  log-aggregation /tmp/app-logs incr also make the namespace responce 
> slow.  
> We want to chang yarn.log-aggregation-enable true -> false,But Transform the 
> the yarn log cli service also can get the app-logs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6521) Yarn Log-aggregation Transform Enable not to Spam the NameNode

2017-04-25 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983305#comment-15983305
 ] 

zhangyubiao commented on YARN-6521:
---

[~sunilg],[~varun_saxena],how about this idea?

> Yarn Log-aggregation Transform Enable not to Spam the NameNode
> --
>
> Key: YARN-6521
> URL: https://issues.apache.org/jira/browse/YARN-6521
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhangyubiao
>
> Nowdays We have large cluster,with the apps incr and container incr ,We split 
> an  namespace to store applogs. 
> But the  log-aggregation /tmp/app-logs incr also make the namespace responce 
> slow.  
> We want to chang yarn.log-aggregation-enable true -> false,But Transform the 
> the yarn log cli service also can get the app-logs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6521) Yarn Log-aggregation Transform Enable not to Spam the NameNode

2017-04-25 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983305#comment-15983305
 ] 

zhangyubiao edited comment on YARN-6521 at 4/25/17 5:34 PM:


[~sunilg],[~varun_saxena], [~templedf] how about this idea?


was (Author: piaoyu zhang):
[~sunilg],[~varun_saxena],how about this idea?

> Yarn Log-aggregation Transform Enable not to Spam the NameNode
> --
>
> Key: YARN-6521
> URL: https://issues.apache.org/jira/browse/YARN-6521
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhangyubiao
>
> Nowdays We have large cluster,with the apps incr and container incr ,We split 
> an  namespace to store applogs. 
> But the  log-aggregation /tmp/app-logs incr also make the namespace responce 
> slow.  
> We want to chang yarn.log-aggregation-enable true -> false,But Transform the 
> the yarn log cli service also can get the app-logs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6516) FairScheduler:the algorithm of assignContainer is so slow for it only can assign a thousand containers per second

2017-04-25 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982465#comment-15982465
 ] 

zhangyubiao commented on YARN-6516:
---

Would you like try  YARN-5483 and YARN-3139 patch. We have the same problem 
before.

> FairScheduler:the algorithm of assignContainer is so slow for it only can 
> assign a thousand containers per second
> -
>
> Key: YARN-6516
> URL: https://issues.apache.org/jira/browse/YARN-6516
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: JackZhou
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6521) Yarn Log-aggregation Transform Enable not to Spam the NameNode

2017-04-24 Thread zhangyubiao (JIRA)
zhangyubiao created YARN-6521:
-

 Summary: Yarn Log-aggregation Transform Enable not to Spam the 
NameNode
 Key: YARN-6521
 URL: https://issues.apache.org/jira/browse/YARN-6521
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: zhangyubiao


Nowdays We have large cluster,with the apps incr and container incr ,We split 
an  namespace to store applogs. But the  log-aggregation /tmp/app-logs incr 
also make the namespace responce slow.  
We want to chang yarn.log-aggregation-enable true -> false,But Transform the
 the yarn log cli service also can get the app-logs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6521) Yarn Log-aggregation Transform Enable not to Spam the NameNode

2017-04-24 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-6521:
--
Description: 
Nowdays We have large cluster,with the apps incr and container incr ,We split 
an  namespace to store applogs. 
But the  log-aggregation /tmp/app-logs incr also make the namespace responce 
slow.  
We want to chang yarn.log-aggregation-enable true -> false,But Transform the 
the yarn log cli service also can get the app-logs.

  was:
Nowdays We have large cluster,with the apps incr and container incr ,We split 
an  namespace to store applogs. But the  log-aggregation /tmp/app-logs incr 
also make the namespace responce slow.  
We want to chang yarn.log-aggregation-enable true -> false,But Transform the
 the yarn log cli service also can get the app-logs.


> Yarn Log-aggregation Transform Enable not to Spam the NameNode
> --
>
> Key: YARN-6521
> URL: https://issues.apache.org/jira/browse/YARN-6521
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhangyubiao
>
> Nowdays We have large cluster,with the apps incr and container incr ,We split 
> an  namespace to store applogs. 
> But the  log-aggregation /tmp/app-logs incr also make the namespace responce 
> slow.  
> We want to chang yarn.log-aggregation-enable true -> false,But Transform the 
> the yarn log cli service also can get the app-logs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6516) FairScheduler:the algorithm of assignContainer is so slow for it only can assign a thousand containers per second

2017-04-24 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980777#comment-15980777
 ] 

zhangyubiao commented on YARN-6516:
---

a thousand containers per second is the max value or the avg value?

> FairScheduler:the algorithm of assignContainer is so slow for it only can 
> assign a thousand containers per second
> -
>
> Key: YARN-6516
> URL: https://issues.apache.org/jira/browse/YARN-6516
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: GirlKiller
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6503) Make StartContainerLogAggregation Async

2017-04-20 Thread zhangyubiao (JIRA)
zhangyubiao created YARN-6503:
-

 Summary:  Make StartContainerLogAggregation Async 
 Key: YARN-6503
 URL: https://issues.apache.org/jira/browse/YARN-6503
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: log-aggregation, nodemanager
Affects Versions: 2.7.1
Reporter: zhangyubiao
Priority: Minor


Yarn Container will slow to convert  to success  if the LogAggregation is slow.
We Want to make LogAggregation async.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-03-31 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-5995:
--
Attachment: YARN-5995.0004.patch

fix findbugs and some checkstyle.  YARN-5995.0004.patch is commit 

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.0003.patch, YARN-5995.0004.patch, YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-03-31 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951609#comment-15951609
 ] 

zhangyubiao edited comment on YARN-5995 at 3/31/17 8:49 PM:


Sorry for late response. YARN-5995.0003.patch is commit.


was (Author: piaoyu zhang):
Sorry for late response.

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.0003.patch, YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-03-31 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951609#comment-15951609
 ] 

zhangyubiao commented on YARN-5995:
---

Sorry for late response.

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.0003.patch, YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-03-31 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-5995:
--
Attachment: YARN-5995.0003.patch

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.0003.patch, YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-03-31 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-5995:
--
Attachment: (was: YARN-5995.003.patch)

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-03-31 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-5995:
--
Attachment: YARN-5995.003.patch

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.003.patch, YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-01-15 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823383#comment-15823383
 ] 

zhangyubiao commented on YARN-5995:
---

[~jianhe],OK.  I will update a patch  this days

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-01-09 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15814163#comment-15814163
 ] 

zhangyubiao edited comment on YARN-5995 at 1/10/17 7:17 AM:


Thanks [~sunilg] .  IMO, Histogram belong com.codahale.metrics  package , 
MutableCounter belong org.apache.hadoop.metrics2.lib package,If we use these 
two metrics counter .it will mess up the metrics lib.  
So,I think it's better to write a MutableHistogram,MutableTimeHistogram 
reference HBase's MutableHistogram,MutableTimeHistogram.  What do you 
thought,[~sunilg] ?


was (Author: piaoyu zhang):
Thanks [~sunilg] .  IMO, Histogram belong com.codahale.metrics  package , 
MutableCounter belong org.apache.hadoop.metrics2.lib package,If we use these 
two metrics counter .it will mess up the metrics lib.
So,I think it's better to write a MutableHistogram,MutableTimeHistogram 
reference HBase's MutableHistogram,MutableTimeHistogram.  What do you 
thought,[~sunilg] ?

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-01-09 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15814163#comment-15814163
 ] 

zhangyubiao commented on YARN-5995:
---

Thanks [~sunilg] .  IMO, Histogram belong com.codahale.metrics  package , 
MutableCounter belong org.apache.hadoop.metrics2.lib package,If we use these 
two metrics counter .it will mess up the metrics lib.
So,I think it's better to write a MutableHistogram,MutableTimeHistogram 
reference HBase's MutableHistogram,MutableTimeHistogram.  What do you 
thought,[~sunilg] ?

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-01-09 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812539#comment-15812539
 ] 

zhangyubiao commented on YARN-5995:
---

Thanks. [~sunilg]

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5188) FairScheduler performance bug

2017-01-09 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812533#comment-15812533
 ] 

zhangyubiao edited comment on YARN-5188 at 1/9/17 6:56 PM:
---

[~chenfolin], It is ok for me to take over this JIRA?


was (Author: piaoyu zhang):
@ChenFolin it is ok for me to take over this JIRA?

> FairScheduler performance bug
> -
>
> Key: YARN-5188
> URL: https://issues.apache.org/jira/browse/YARN-5188
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.5.0
>Reporter: ChenFolin
> Attachments: YARN-5188-1.patch
>
>
>  My Hadoop Cluster has recently encountered a performance problem. Details as 
> Follows.
> There are two point which can cause this performance issue.
> 1: application sort before assign container at FSLeafQueue. TreeSet is not 
> the best, Why not keep orderly ? and then we can use binary search to help 
> keep orderly when a application's resource usage has changed.
> 2: queue sort and assignContainerPreCheck will lead to compute all leafqueue 
> resource usage ,Why can we store the leafqueue usage at memory and update it 
> when assign container op release container happen?
>
>The efficiency of assign container in the Resourcemanager may fall 
> when the number of running and pending application grows. And the fact is the 
> cluster has too many PendingMB or PengdingVcore , and the Cluster 
> current utilization rate may below 20%.
>I checked the resourcemanager logs, I found that every assign 
> container may cost 5 ~ 10 ms, but just 0 ~ 1 ms at usual time.
>  
>I use TestFairScheduler to reproduce the scene:
>  
>Just one queue: root.defalut
>  10240 apps.
>  
>assign container avg time:  6753.9 us ( 6.7539 ms)  
>  apps sort time (FSLeafQueue : Collections.sort(runnableApps, 
> comparator); ): 4657.01 us ( 4.657 ms )
>  compute LeafQueue Resource usage : 905.171 us ( 0.905171 ms )
>  
>  When just root.default, one assign container op contains : ( one apps 
> sort op ) + 2 * ( compute leafqueue usage op )
>According to the above situation, I think the assign container op has 
> a performance problem  . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5188) FairScheduler performance bug

2017-01-09 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812533#comment-15812533
 ] 

zhangyubiao commented on YARN-5188:
---

@ChenFolin it is ok for me to take over this JIRA?

> FairScheduler performance bug
> -
>
> Key: YARN-5188
> URL: https://issues.apache.org/jira/browse/YARN-5188
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.5.0
>Reporter: ChenFolin
> Attachments: YARN-5188-1.patch
>
>
>  My Hadoop Cluster has recently encountered a performance problem. Details as 
> Follows.
> There are two point which can cause this performance issue.
> 1: application sort before assign container at FSLeafQueue. TreeSet is not 
> the best, Why not keep orderly ? and then we can use binary search to help 
> keep orderly when a application's resource usage has changed.
> 2: queue sort and assignContainerPreCheck will lead to compute all leafqueue 
> resource usage ,Why can we store the leafqueue usage at memory and update it 
> when assign container op release container happen?
>
>The efficiency of assign container in the Resourcemanager may fall 
> when the number of running and pending application grows. And the fact is the 
> cluster has too many PendingMB or PengdingVcore , and the Cluster 
> current utilization rate may below 20%.
>I checked the resourcemanager logs, I found that every assign 
> container may cost 5 ~ 10 ms, but just 0 ~ 1 ms at usual time.
>  
>I use TestFairScheduler to reproduce the scene:
>  
>Just one queue: root.defalut
>  10240 apps.
>  
>assign container avg time:  6753.9 us ( 6.7539 ms)  
>  apps sort time (FSLeafQueue : Collections.sort(runnableApps, 
> comparator); ): 4657.01 us ( 4.657 ms )
>  compute LeafQueue Resource usage : 905.171 us ( 0.905171 ms )
>  
>  When just root.default, one assign container op contains : ( one apps 
> sort op ) + 2 * ( compute leafqueue usage op )
>According to the above situation, I think the assign container op has 
> a performance problem  . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-01-03 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15794758#comment-15794758
 ] 

zhangyubiao commented on YARN-5995:
---

[~sunilg],[~templedf]. Sorry for late response. Should I  use 
com.codahale.metrics.Meter or write the MutableMeter for the 
org.apache.hadoop.metrics2.lib?

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2016-12-21 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15766538#comment-15766538
 ] 

zhangyubiao commented on YARN-5995:
---

[~sunilg],OK I will go ahead this days.

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2016-12-15 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753519#comment-15753519
 ] 

zhangyubiao commented on YARN-5995:
---

Fix asflicense and FindBugs, YARN-5995.0002.patch is commit.

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2016-12-15 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-5995:
--
Attachment: YARN-5995.0002.patch

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2016-12-15 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753327#comment-15753327
 ] 

zhangyubiao commented on YARN-5995:
---

Thanks [~sunilg], I update YARN-5995.0001.patch 

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2016-12-15 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-5995:
--
Attachment: YARN-5995.0001.patch

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2016-12-15 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-5995:
--
Attachment: (was: YARN-5995-v1.patch)

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2016-12-15 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-5995:
--
Fix Version/s: (was: 2.7.4)

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995-v1.patch, YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2016-12-15 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-5995:
--
Target Version/s: 3.0.0-alpha2  (was: 3.0.0-beta1)
   Fix Version/s: (was: 3.0.0-beta1)

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>  Labels: patch
> Fix For: 2.7.4
>
> Attachments: YARN-5995-v1.patch, YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2016-12-15 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15750769#comment-15750769
 ] 

zhangyubiao edited comment on YARN-5995 at 12/15/16 12:00 PM:
--

YARN-5995-v1.patch is commit 


was (Author: piaoyu zhang):
YARN-59950-v1.patch is commit 

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>  Labels: patch
> Fix For: 2.7.4, 3.0.0-beta1
>
> Attachments: YARN-5995-v1.patch, YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2016-12-15 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao reopened YARN-5995:
---

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>  Labels: patch
> Fix For: 2.7.4, 3.0.0-beta1
>
> Attachments: YARN-5995-v1.patch, YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2016-12-15 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-5995:
--
Fix Version/s: (was: 2.7.1)
   2.7.4

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>  Labels: patch
> Fix For: 2.7.4, 3.0.0-beta1
>
> Attachments: YARN-5995-v1.patch, YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2016-12-15 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-5995:
--
Target Version/s: 3.0.0-beta1
   Fix Version/s: 2.7.1
  3.0.0-beta1

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>  Labels: patch
> Fix For: 2.7.4, 3.0.0-beta1
>
> Attachments: YARN-5995-v1.patch, YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2016-12-15 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15750769#comment-15750769
 ] 

zhangyubiao commented on YARN-5995:
---

YARN-59950-v1.patch is commit 

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995-v1.patch, YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2016-12-15 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-5995:
--
Attachment: YARN-5995-v1.patch

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995-v1.patch, YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2016-12-14 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15750176#comment-15750176
 ] 

zhangyubiao commented on YARN-5995:
---

OK ,I will give a separate class to take the metrics. I will update a patch 
soon. 

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2016-12-14 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-5995:
--
Attachment: YARN-5995.patch

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2016-12-14 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747759#comment-15747759
 ] 

zhangyubiao commented on YARN-5995:
---

  "RemoveAppCallNumOps" : 3540,
"RemoveAppCallAvgTime" : 340.0,
"RemoveAppCallStdevTime" : 0.0,
"RemoveAppCallIMinTime" : 340.0,
"RemoveAppCallIMaxTime" : 340.0,
"RemoveAppCallMinTime" : 2.0,
"RemoveAppCallMaxTime" : 1407.0,
Find sometimes RemoveAppCall take long time.
@Sunil G, is it ok to add the metrics to fairscheduler-op-durations? Or add a 
rmstatestore-op-durations class to take the metrics.

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2016-12-14 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747759#comment-15747759
 ] 

zhangyubiao edited comment on YARN-5995 at 12/14/16 9:11 AM:
-

  "RemoveAppCallNumOps" : 3540,
"RemoveAppCallAvgTime" : 340.0,
"RemoveAppCallStdevTime" : 0.0,
"RemoveAppCallIMinTime" : 340.0,
"RemoveAppCallIMaxTime" : 340.0,
"RemoveAppCallMinTime" : 2.0,
"RemoveAppCallMaxTime" : 1407.0,
Find sometimes RemoveAppCall take long time.
[~sunilg], is it ok to add the metrics to fairscheduler-op-durations? Or add a 
rmstatestore-op-durations class to take the metrics.


was (Author: piaoyu zhang):
  "RemoveAppCallNumOps" : 3540,
"RemoveAppCallAvgTime" : 340.0,
"RemoveAppCallStdevTime" : 0.0,
"RemoveAppCallIMinTime" : 340.0,
"RemoveAppCallIMaxTime" : 340.0,
"RemoveAppCallMinTime" : 2.0,
"RemoveAppCallMaxTime" : 1407.0,
Find sometimes RemoveAppCall take long time.
@Sunil G, is it ok to add the metrics to fairscheduler-op-durations? Or add a 
rmstatestore-op-durations class to take the metrics.

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2016-12-13 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15744664#comment-15744664
 ] 

zhangyubiao commented on YARN-5995:
---

[~sunilg],thanks. I will update a patch in tomorrow.

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2016-12-13 Thread zhangyubiao (JIRA)
zhangyubiao created YARN-5995:
-

 Summary: Add RMStateStore metrics to monitor all 
RMStateStoreEventTypeTransition performance
 Key: YARN-5995
 URL: https://issues.apache.org/jira/browse/YARN-5995
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: metrics, resourcemanager
Affects Versions: 2.7.1
 Environment: CentOS7.2 Hadoop-2.7.1 
Reporter: zhangyubiao


Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5955) Use threadpool or multiple thread to recover app

2016-12-05 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721355#comment-15721355
 ] 

zhangyubiao edited comment on YARN-5955 at 12/5/16 8:53 AM:


[~templedf], is it zookeeper is the best way to store the app state? is it can 
improve good enough performance to the large cluster to recovery apps?
 as we [~tangshangwen] [~zhengchenyu] see in  
http://www.slideshare.net/arinto/next-generation-hadoop-high-availability-for-yarn.
 it seem like that zookeeper is not the best way to store the app state.
[~Naganarasimha]  [~varun_saxena]  [~rohithsharma]


was (Author: piaoyu zhang):
[~templedf], is it zookeeper is the best way to store the app state? is it can 
improve performance to the large cluster to recovery apps?
 as we [~tangshangwen] [~zhengchenyu] see in  
http://www.slideshare.net/arinto/next-generation-hadoop-high-availability-for-yarn.
 it seem like that zookeeper is not the best way to store the app state.
[~Naganarasimha]  [~varun_saxena]  [~rohithsharma]

> Use threadpool or multiple thread to recover app
> 
>
> Key: YARN-5955
> URL: https://issues.apache.org/jira/browse/YARN-5955
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Zhaofei Meng
>Assignee: Ajith S
> Fix For: 2.7.1
>
>
> current app recovery is one by one,use thead pool can make recovery faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5955) Use threadpool or multiple thread to recover app

2016-12-05 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721355#comment-15721355
 ] 

zhangyubiao edited comment on YARN-5955 at 12/5/16 8:52 AM:


[~templedf], is it zookeeper is the best way to store the app state? is it can 
improve performance to the large cluster to recovery apps?
 as we [~tangshangwen] [~zhengchenyu] see in  
http://www.slideshare.net/arinto/next-generation-hadoop-high-availability-for-yarn.
 it seem like that zookeeper is not the best way to store the app state.
[~Naganarasimha]  [~varun_saxena]  [~rohithsharma]


was (Author: piaoyu zhang):
[~templedf],is it zookeeper is the best way to store the app state? is it can 
improve performance to the large cluster to recovery apps?
 as we [~tangshangwen] [~zhengchenyu] see in  
http://www.slideshare.net/arinto/next-generation-hadoop-high-availability-for-yarn.
 it seem like that zookeeper is not the best way to store the app state.
[~Naganarasimha]  [~varun_saxena]  [~rohithsharma]

> Use threadpool or multiple thread to recover app
> 
>
> Key: YARN-5955
> URL: https://issues.apache.org/jira/browse/YARN-5955
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Zhaofei Meng
>Assignee: Ajith S
> Fix For: 2.7.1
>
>
> current app recovery is one by one,use thead pool can make recovery faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5955) Use threadpool or multiple thread to recover app

2016-12-04 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721355#comment-15721355
 ] 

zhangyubiao edited comment on YARN-5955 at 12/5/16 6:15 AM:


[~templedf],is it zookeeper is the best way to store the app state? is it can 
improve performance to the large cluster to recovery apps?
 as we [~tangshangwen] [~zhengchenyu] see in  
http://www.slideshare.net/arinto/next-generation-hadoop-high-availability-for-yarn.
 it seem like that zookeeper is not the best way to store the app state.
[~Naganarasimha]  [~varun_saxena]  [~rohithsharma]


was (Author: piaoyu zhang):
[~templedf],is it zookeeper is the best way to store the app state? is it can 
improve performance to the large cluster to 
recovery apps?
 as we [~tangshangwen] [~zhengchenyu] see in  
http://www.slideshare.net/arinto/next-generation-hadoop-high-availability-for-yarn.
 it seem like that zookeeper is not the best way to store the app state.
[~Naganarasimha]  [~varun_saxena]  [~rohithsharma]

> Use threadpool or multiple thread to recover app
> 
>
> Key: YARN-5955
> URL: https://issues.apache.org/jira/browse/YARN-5955
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Zhaofei Meng
>Assignee: Ajith S
> Fix For: 2.7.1
>
>
> current app recovery is one by one,use thead pool can make recovery faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5955) Use threadpool or multiple thread to recover app

2016-12-04 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721355#comment-15721355
 ] 

zhangyubiao commented on YARN-5955:
---

[~templedf],is it zookeeper is the best way to store the app state? is it can 
improve performance to the large cluster to 
recovery apps?
 as we [~tangshangwen] [~zhengchenyu] see in  
http://www.slideshare.net/arinto/next-generation-hadoop-high-availability-for-yarn.
 it seem like that zookeeper is not the best way to store the app state.
[~Naganarasimha]  [~varun_saxena]  [~rohithsharma]

> Use threadpool or multiple thread to recover app
> 
>
> Key: YARN-5955
> URL: https://issues.apache.org/jira/browse/YARN-5955
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Zhaofei Meng
>Assignee: Ajith S
> Fix For: 2.7.1
>
>
> current app recovery is one by one,use thead pool can make recovery faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java

2016-11-27 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15700954#comment-15700954
 ] 

zhangyubiao commented on YARN-4090:
---

Hi, [~He Tianyi], Have you make the fairscheduler acl  effective? as we test , 
it will not
appear when the  acl have not effective. 

> Make Collections.sort() more efficient in FSParentQueue.java
> 
>
> Key: YARN-4090
> URL: https://issues.apache.org/jira/browse/YARN-4090
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
> Attachments: YARN-4090-TestResult.pdf, YARN-4090-preview.patch, 
> YARN-4090.001.patch, YARN-4090.002.patch, YARN-4090.003.patch, sampling1.jpg, 
> sampling2.jpg
>
>
> Collections.sort() consumes too much time in a scheduling round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5188) FairScheduler performance bug

2016-11-25 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15697409#comment-15697409
 ] 

zhangyubiao commented on YARN-5188:
---

NI global references: 280


Found one Java-level deadlock:
=
"IPC Server handler 99 on 8032":
  waiting to lock monitor 0x7f9c6c3e1f58 (object 0x7f9113c08d80, a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue),
  which is held by "IPC Server handler 27 on 8032"
"IPC Server handler 27 on 8032":
  waiting to lock monitor 0x01b42518 (object 0x7f9113c0aa08, a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue),
  which is held by "ResourceManager Event Processor"
"ResourceManager Event Processor":
  waiting to lock monitor 0x7f9c6c3e1f58 (object 0x7f9113c08d80, a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue),
  which is held by "IPC Server handler 27 on 8032"

Java stack information for the threads listed above:
===
"IPC Server handler 99 on 8032":
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:160)
- waiting to lock <0x7f9113c08d80> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1518)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:903)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:280)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:431)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
"IPC Server handler 27 on 8032":
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:160)
- waiting to lock <0x7f9113c0aa08> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:167)
- locked <0x7f9113c08d80> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1518)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:903)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:280)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:431)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
"ResourceManager Event Processor":
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.decResourceUsage(FSQueue.java:82)
- waiting to lock <0x7f9113c08d80> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.decResourceUsage(FSQueue.java:84)
- locked <0x7f9113c0aa08> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.decResourceUsage(FSQueue.java:84)
- locked 

[jira] [Commented] (YARN-5188) FairScheduler performance bug

2016-11-25 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15697397#comment-15697397
 ] 

zhangyubiao commented on YARN-5188:
---

[~chenfolin], we also test the patch Yarn-5188 to  2.7.1 and we found the 
deadlock  also occurs.  

> FairScheduler performance bug
> -
>
> Key: YARN-5188
> URL: https://issues.apache.org/jira/browse/YARN-5188
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.5.0
>Reporter: ChenFolin
> Attachments: YARN-5188-1.patch
>
>
>  My Hadoop Cluster has recently encountered a performance problem. Details as 
> Follows.
> There are two point which can cause this performance issue.
> 1: application sort before assign container at FSLeafQueue. TreeSet is not 
> the best, Why not keep orderly ? and then we can use binary search to help 
> keep orderly when a application's resource usage has changed.
> 2: queue sort and assignContainerPreCheck will lead to compute all leafqueue 
> resource usage ,Why can we store the leafqueue usage at memory and update it 
> when assign container op release container happen?
>
>The efficiency of assign container in the Resourcemanager may fall 
> when the number of running and pending application grows. And the fact is the 
> cluster has too many PendingMB or PengdingVcore , and the Cluster 
> current utilization rate may below 20%.
>I checked the resourcemanager logs, I found that every assign 
> container may cost 5 ~ 10 ms, but just 0 ~ 1 ms at usual time.
>  
>I use TestFairScheduler to reproduce the scene:
>  
>Just one queue: root.defalut
>  10240 apps.
>  
>assign container avg time:  6753.9 us ( 6.7539 ms)  
>  apps sort time (FSLeafQueue : Collections.sort(runnableApps, 
> comparator); ): 4657.01 us ( 4.657 ms )
>  compute LeafQueue Resource usage : 905.171 us ( 0.905171 ms )
>  
>  When just root.default, one assign container op contains : ( one apps 
> sort op ) + 2 * ( compute leafqueue usage op )
>According to the above situation, I think the assign container op has 
> a performance problem  . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5188) FairScheduler performance bug

2016-11-25 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15695205#comment-15695205
 ] 

zhangyubiao edited comment on YARN-5188 at 11/26/16 6:33 AM:
-

[~chenfolin] ,Our team attached the patch  Yarn-4090 to 2.7.1 and we found a 
deadlock occurs. I think this patch will cause the same problem ,would you like 
to give a look?


was (Author: piaoyu zhang):
[~chenfolin] ,Our team attached the patch  Yarn-4090 to 2.7.1 and we found a 
deadlock occurs. I think this patch will cause the same problem ,would you like 
give a look?

> FairScheduler performance bug
> -
>
> Key: YARN-5188
> URL: https://issues.apache.org/jira/browse/YARN-5188
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.5.0
>Reporter: ChenFolin
> Attachments: YARN-5188-1.patch
>
>
>  My Hadoop Cluster has recently encountered a performance problem. Details as 
> Follows.
> There are two point which can cause this performance issue.
> 1: application sort before assign container at FSLeafQueue. TreeSet is not 
> the best, Why not keep orderly ? and then we can use binary search to help 
> keep orderly when a application's resource usage has changed.
> 2: queue sort and assignContainerPreCheck will lead to compute all leafqueue 
> resource usage ,Why can we store the leafqueue usage at memory and update it 
> when assign container op release container happen?
>
>The efficiency of assign container in the Resourcemanager may fall 
> when the number of running and pending application grows. And the fact is the 
> cluster has too many PendingMB or PengdingVcore , and the Cluster 
> current utilization rate may below 20%.
>I checked the resourcemanager logs, I found that every assign 
> container may cost 5 ~ 10 ms, but just 0 ~ 1 ms at usual time.
>  
>I use TestFairScheduler to reproduce the scene:
>  
>Just one queue: root.defalut
>  10240 apps.
>  
>assign container avg time:  6753.9 us ( 6.7539 ms)  
>  apps sort time (FSLeafQueue : Collections.sort(runnableApps, 
> comparator); ): 4657.01 us ( 4.657 ms )
>  compute LeafQueue Resource usage : 905.171 us ( 0.905171 ms )
>  
>  When just root.default, one assign container op contains : ( one apps 
> sort op ) + 2 * ( compute leafqueue usage op )
>According to the above situation, I think the assign container op has 
> a performance problem  . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5188) FairScheduler performance bug

2016-11-25 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15695205#comment-15695205
 ] 

zhangyubiao commented on YARN-5188:
---

[~chenfolin] ,Our team attached the patch  Yarn-4090 to 2.7.1 and we found a 
deadlock occurs. I think this patch will cause the same problem ,would you like 
give a look?

> FairScheduler performance bug
> -
>
> Key: YARN-5188
> URL: https://issues.apache.org/jira/browse/YARN-5188
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.5.0
>Reporter: ChenFolin
> Attachments: YARN-5188-1.patch
>
>
>  My Hadoop Cluster has recently encountered a performance problem. Details as 
> Follows.
> There are two point which can cause this performance issue.
> 1: application sort before assign container at FSLeafQueue. TreeSet is not 
> the best, Why not keep orderly ? and then we can use binary search to help 
> keep orderly when a application's resource usage has changed.
> 2: queue sort and assignContainerPreCheck will lead to compute all leafqueue 
> resource usage ,Why can we store the leafqueue usage at memory and update it 
> when assign container op release container happen?
>
>The efficiency of assign container in the Resourcemanager may fall 
> when the number of running and pending application grows. And the fact is the 
> cluster has too many PendingMB or PengdingVcore , and the Cluster 
> current utilization rate may below 20%.
>I checked the resourcemanager logs, I found that every assign 
> container may cost 5 ~ 10 ms, but just 0 ~ 1 ms at usual time.
>  
>I use TestFairScheduler to reproduce the scene:
>  
>Just one queue: root.defalut
>  10240 apps.
>  
>assign container avg time:  6753.9 us ( 6.7539 ms)  
>  apps sort time (FSLeafQueue : Collections.sort(runnableApps, 
> comparator); ): 4657.01 us ( 4.657 ms )
>  compute LeafQueue Resource usage : 905.171 us ( 0.905171 ms )
>  
>  When just root.default, one assign container op contains : ( one apps 
> sort op ) + 2 * ( compute leafqueue usage op )
>According to the above situation, I think the assign container op has 
> a performance problem  . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java

2016-11-25 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15695182#comment-15695182
 ] 

zhangyubiao commented on YARN-4090:
---

Found one Java-level deadlock:
=
"IPC Server handler 98 on 8032":
  waiting to lock monitor 0x7f4e48b1f808 (object 0x7f42e17a5ed8, a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue),
  which is held by "IPC Server handler 76 on 8032"
"IPC Server handler 76 on 8032":
  waiting to lock monitor 0x7f4e388b94f8 (object 0x7f42df3e8450, a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue),
  which is held by "ResourceManager Event Processor"
"ResourceManager Event Processor":
  waiting to lock monitor 0x7f4e48b1f808 (object 0x7f42e17a5ed8, a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue),
  which is held by "IPC Server handler 76 on 8032"

Java stack information for the threads listed above:
===
"IPC Server handler 98 on 8032":
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:149)
- waiting to lock <0x7f42e17a5ed8> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1468)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:903)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:280)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:431)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
"IPC Server handler 76 on 8032":
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:149)
- waiting to lock <0x7f42df3e8450> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:156)
- locked <0x7f42e17a5ed8> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1468)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:903)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:280)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:431)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
"ResourceManager Event Processor":
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.decResourceUsage(FSQueue.java:307)
- waiting to lock <0x7f42e17a5ed8> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.decResourceUsage(FSQueue.java:309)
- locked <0x7f42df3e8450> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.decResourceUsage(FSQueue.java:309)
- locked <0x7f42e0c7cf50> (a 

[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java

2016-11-25 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15695179#comment-15695179
 ] 

zhangyubiao commented on YARN-4090:
---

[~xinxianyin],our team attached the  patch   to 2.7.1  and we found  a deadlock 
occurs.

> Make Collections.sort() more efficient in FSParentQueue.java
> 
>
> Key: YARN-4090
> URL: https://issues.apache.org/jira/browse/YARN-4090
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
> Attachments: YARN-4090-TestResult.pdf, YARN-4090-preview.patch, 
> YARN-4090.001.patch, YARN-4090.002.patch, YARN-4090.003.patch, sampling1.jpg, 
> sampling2.jpg
>
>
> Collections.sort() consumes too much time in a scheduling round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5846) Improve the fairscheduler attemptScheduler

2016-11-09 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15650458#comment-15650458
 ] 

zhangyubiao commented on YARN-5846:
---

:)

> Improve the fairscheduler attemptScheduler 
> ---
>
> Key: YARN-5846
> URL: https://issues.apache.org/jira/browse/YARN-5846
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.1
> Environment: CentOS-7.1
>Reporter: zhengchenyu
>Priority: Critical
>  Labels: fairscheduler
> Fix For: 2.7.1
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> when I assign a container, we must consider two factor:
> (1) sort the queue and application, and select the proper request. 
> (2) then we assure this request's host is just this node (data locality). 
> or skip this loop!
> this algorithm regard the sorting queue and application as primary factor. 
> when yarn consider data locality, for example, 
> yarn.scheduler.fair.locality.threshold.node=1, 
> yarn.scheduler.fair.locality.threshold.rack=1 (or 
> yarn.scheduler.fair.locality-delay-rack-ms and 
> yarn.scheduler.fair.locality-delay-node-ms is very large) and lots of 
> applications are runnig, the process of assigning contianer becomes very slow.
> I think data locality is more important then the sequence of the queue and 
> applications. 
> I wanna a new algorithm like this:
>   (1) when resourcemanager accept a new request, notice the RMNodeImpl, 
> and then record this association between RMNode and request
>   (2) when assign containers for node, we assign container by 
> RMNodeImpl's association between RMNode and request directly
>   (3) then I consider the priority of queue and applation. In one object 
> of RMNodeImpl, we sort the request of association.
>   (4) and I think the sorting of current algorithm is consuming, in 
> especial, losts of applications are running, lots of sorting are called. so I 
> think we should sort the queue and applicaiton in a daemon thread, because 
> less error of queues's sequences is allowed.
>   
>   
>   
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5666) Yarn GetApplicationsAvgTime stuck a long time

2016-09-27 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15526709#comment-15526709
 ] 

zhangyubiao edited comment on YARN-5666 at 9/27/16 4:48 PM:


ip:50320/stacks means I get the stacks form rmip:50320/stacks. [~templedf]


was (Author: piaoyu zhang):
ip:50320/stacks means I get the stacks for rmip:50320/stacks. [~templedf]

> Yarn GetApplicationsAvgTime stuck a long time 
> --
>
> Key: YARN-5666
> URL: https://issues.apache.org/jira/browse/YARN-5666
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, fairscheduler, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.1 
> Hadoop-2.7.1
>Reporter: zhangyubiao
>Priority: Blocker
> Attachments: stacks, stacks2, stacks3
>
>
> "beans" : [ {
> "name" : 
> "Hadoop:service=ResourceManager,name=RpcDetailedActivityForPort8032",
> "modelerType" : "RpcDetailedActivityForPort8032",
> "tag.port" : "8032",
> "tag.Context" : "rpcdetailed",
> "tag.Hostname" : "X",
> "ApplicationNotFoundExceptionNumOps" : 531,
> "ApplicationNotFoundExceptionAvgTime" : 0.0,
> "SubmitApplicationNumOps" : 2085,
> "SubmitApplicationAvgTime" : 0.343749994,
> "GetApplicationReportNumOps" : 297647,
> "GetApplicationReportAvgTime" : 0.0,
> "GetNewApplicationNumOps" : 2163,
> "GetNewApplicationAvgTime" : 0.0,
> "GetApplicationsNumOps" : 102,
> "GetApplicationsAvgTime" : 15999.0,
> "GetClusterMetricsNumOps" : 2405,
> "GetClusterMetricsAvgTime" : 0.01801801801801802,
> "GetDelegationTokenNumOps" : 33,
> "GetDelegationTokenAvgTime" : 0.0,
> "GetClusterNodesNumOps" : 2400,
> "GetClusterNodesAvgTime" : 80.5403726708074,
> "GetQueueInfoNumOps" : 1900,
> "GetQueueInfoAvgTime" : 1212.0089153046054,
> "GetQueueUserAclsNumOps" : 1900,
> "GetQueueUserAclsAvgTime" : 1.18194458,
> "ForceKillApplicationNumOps" : 199,
> "ForceKillApplicationAvgTime" : 0.3688524590163933



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5666) Yarn GetApplicationsAvgTime stuck a long time

2016-09-27 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15526709#comment-15526709
 ] 

zhangyubiao commented on YARN-5666:
---

ip:50320/stacks means I get the stacks for rmip:50320/stacks. [~templedf]

> Yarn GetApplicationsAvgTime stuck a long time 
> --
>
> Key: YARN-5666
> URL: https://issues.apache.org/jira/browse/YARN-5666
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, fairscheduler, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.1 
> Hadoop-2.7.1
>Reporter: zhangyubiao
>Priority: Blocker
> Attachments: stacks, stacks2, stacks3
>
>
> "beans" : [ {
> "name" : 
> "Hadoop:service=ResourceManager,name=RpcDetailedActivityForPort8032",
> "modelerType" : "RpcDetailedActivityForPort8032",
> "tag.port" : "8032",
> "tag.Context" : "rpcdetailed",
> "tag.Hostname" : "X",
> "ApplicationNotFoundExceptionNumOps" : 531,
> "ApplicationNotFoundExceptionAvgTime" : 0.0,
> "SubmitApplicationNumOps" : 2085,
> "SubmitApplicationAvgTime" : 0.343749994,
> "GetApplicationReportNumOps" : 297647,
> "GetApplicationReportAvgTime" : 0.0,
> "GetNewApplicationNumOps" : 2163,
> "GetNewApplicationAvgTime" : 0.0,
> "GetApplicationsNumOps" : 102,
> "GetApplicationsAvgTime" : 15999.0,
> "GetClusterMetricsNumOps" : 2405,
> "GetClusterMetricsAvgTime" : 0.01801801801801802,
> "GetDelegationTokenNumOps" : 33,
> "GetDelegationTokenAvgTime" : 0.0,
> "GetClusterNodesNumOps" : 2400,
> "GetClusterNodesAvgTime" : 80.5403726708074,
> "GetQueueInfoNumOps" : 1900,
> "GetQueueInfoAvgTime" : 1212.0089153046054,
> "GetQueueUserAclsNumOps" : 1900,
> "GetQueueUserAclsAvgTime" : 1.18194458,
> "ForceKillApplicationNumOps" : 199,
> "ForceKillApplicationAvgTime" : 0.3688524590163933



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5666) Yarn GetApplicationsAvgTime stuck a long time

2016-09-27 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15526703#comment-15526703
 ] 

zhangyubiao commented on YARN-5666:
---

And  we set the value 

yarn.scheduler.fair.max.assign
5
yarn-site.xml


I do not konw it is matter for this?  [~templedf]


> Yarn GetApplicationsAvgTime stuck a long time 
> --
>
> Key: YARN-5666
> URL: https://issues.apache.org/jira/browse/YARN-5666
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, fairscheduler, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.1 
> Hadoop-2.7.1
>Reporter: zhangyubiao
>Priority: Blocker
> Attachments: stacks, stacks2, stacks3
>
>
> "beans" : [ {
> "name" : 
> "Hadoop:service=ResourceManager,name=RpcDetailedActivityForPort8032",
> "modelerType" : "RpcDetailedActivityForPort8032",
> "tag.port" : "8032",
> "tag.Context" : "rpcdetailed",
> "tag.Hostname" : "X",
> "ApplicationNotFoundExceptionNumOps" : 531,
> "ApplicationNotFoundExceptionAvgTime" : 0.0,
> "SubmitApplicationNumOps" : 2085,
> "SubmitApplicationAvgTime" : 0.343749994,
> "GetApplicationReportNumOps" : 297647,
> "GetApplicationReportAvgTime" : 0.0,
> "GetNewApplicationNumOps" : 2163,
> "GetNewApplicationAvgTime" : 0.0,
> "GetApplicationsNumOps" : 102,
> "GetApplicationsAvgTime" : 15999.0,
> "GetClusterMetricsNumOps" : 2405,
> "GetClusterMetricsAvgTime" : 0.01801801801801802,
> "GetDelegationTokenNumOps" : 33,
> "GetDelegationTokenAvgTime" : 0.0,
> "GetClusterNodesNumOps" : 2400,
> "GetClusterNodesAvgTime" : 80.5403726708074,
> "GetQueueInfoNumOps" : 1900,
> "GetQueueInfoAvgTime" : 1212.0089153046054,
> "GetQueueUserAclsNumOps" : 1900,
> "GetQueueUserAclsAvgTime" : 1.18194458,
> "ForceKillApplicationNumOps" : 199,
> "ForceKillApplicationAvgTime" : 0.3688524590163933



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5666) Yarn GetApplicationsAvgTime stuck a long time

2016-09-27 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15526691#comment-15526691
 ] 

zhangyubiao commented on YARN-5666:
---

I also see the log .
[2016-09-28T00:40:51.241+08:00] [INFO] 
resourcemanager.scheduler.SchedulerNode.allocateContainer(SchedulerNode.java 
153) [ResourceManager Event Processor] : Assigned container 
container_e20_1474984305031_9754_01_000475 of capacity  
on host OneNode, which has 18 containers,  used and 
 available after allocation
[2016-09-28T00:40:51.242+08:00] [INFO] 
resourcemanager.scheduler.SchedulerNode.allocateContainer(SchedulerNode.java 
153) [ResourceManager Event Processor] : Assigned container 
container_e20_1474984305031_9710_01_000662 of capacity  
on host OneNode, which has 19 containers,  used and 
 available after allocation
[2016-09-28T00:40:56.429+08:00] [INFO] 
resourcemanager.scheduler.SchedulerNode.allocateContainer(SchedulerNode.java 
153) [ResourceManager Event Processor] : Assigned container 
container_e20_1474984305031_9720_01_003085 of capacity  
on host OneNode, which has 20 containers,  used and 
 available after allocation
[2016-09-28T00:40:59.497+08:00] [INFO] 
resourcemanager.scheduler.SchedulerNode.allocateContainer(SchedulerNode.java 
153) [ResourceManager Event Processor] : Assigned container 
container_e20_1474984305031_9714_01_001274 of capacity  
on host OneNode, which has 19 containers,  used and 
 available after allocation
[2016-09-28T00:41:01.982+08:00] [INFO] 
resourcemanager.scheduler.SchedulerNode.allocateContainer(SchedulerNode.java 
153) [ResourceManager Event Processor] : Assigned container 
container_e20_1474984305031_9714_01_001290 of capacity  
on host OneNode, which has 19 containers,  used and 
 available after allocation
[2016-09-28T00:41:06.907+08:00] [INFO] 
resourcemanager.scheduler.SchedulerNode.allocateContainer(SchedulerNode.java 
153) [ResourceManager Event Processor] : Assigned container 
container_e20_1474984305031_8356_01_041217 of capacity  
on host OneNode, which has 19 containers,  used and 
 available after allocation
[2016-09-28T00:41:20.393+08:00] [INFO] 
resourcemanager.scheduler.SchedulerNode.allocateContainer(SchedulerNode.java 
153) [ResourceManager Event Processor] : Assigned container 
container_e20_1474984305031_8356_01_041665 of capacity  
on host OneNode, which has 17 containers,  used and 
 available after allocation
[2016-09-28T00:41:20.394+08:00] [INFO] 
resourcemanager.scheduler.SchedulerNode.allocateContainer(SchedulerNode.java 
153) [ResourceManager Event Processor] : Assigned container 
container_e20_1474984305031_8356_01_041666 of capacity  
on host OneNode, which has 18 containers,  used and 
 available after allocation
[2016-09-28T00:41:49.975+08:00] [INFO] 
resourcemanager.scheduler.SchedulerNode.allocateContainer(SchedulerNode.java 
153) [ResourceManager Event Processor] : Assigned container 
container_e20_1474984305031_9778_01_003641 of capacity  
on host OneNode, which has 14 containers,  used and 
 available after allocation
[2016-09-28T00:41:49.976+08:00] [INFO] 
resourcemanager.scheduler.SchedulerNode.allocateContainer(SchedulerNode.java 
153) [ResourceManager Event Processor] : Assigned container 
container_e20_1474984305031_9778_01_003642 of capacity  
on host OneNode, which has 15 containers,  used and 
 available after allocation
[2016-09-28T00:41:49.977+08:00] [INFO] 
resourcemanager.scheduler.SchedulerNode.allocateContainer(SchedulerNode.java 
153) [ResourceManager Event Processor] : Assigned container 
container_e20_1474984305031_9778_01_003643 of capacity  
on host OneNode, which has 16 containers,  used and 
 available after allocation
[2016-09-28T00:41:49.978+08:00] [INFO] 
resourcemanager.scheduler.SchedulerNode.allocateContainer(SchedulerNode.java 
153) [ResourceManager Event Processor] : Assigned container 
container_e20_1474984305031_9778_01_003644 of capacity  
on host OneNode, which has 17 containers,  used and 
 available after allocation
[2016-09-28T00:41:49.979+08:00] [INFO] 

[jira] [Commented] (YARN-5666) Yarn GetApplicationsAvgTime stuck a long time

2016-09-27 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15526662#comment-15526662
 ] 

zhangyubiao commented on YARN-5666:
---

 [~templedf],I catch a stack when it  with an enormous event queue. I just use 
the ip:50320/stacks  get the stack 

> Yarn GetApplicationsAvgTime stuck a long time 
> --
>
> Key: YARN-5666
> URL: https://issues.apache.org/jira/browse/YARN-5666
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, fairscheduler, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.1 
> Hadoop-2.7.1
>Reporter: zhangyubiao
>Priority: Blocker
> Attachments: stacks, stacks2, stacks3
>
>
> "beans" : [ {
> "name" : 
> "Hadoop:service=ResourceManager,name=RpcDetailedActivityForPort8032",
> "modelerType" : "RpcDetailedActivityForPort8032",
> "tag.port" : "8032",
> "tag.Context" : "rpcdetailed",
> "tag.Hostname" : "X",
> "ApplicationNotFoundExceptionNumOps" : 531,
> "ApplicationNotFoundExceptionAvgTime" : 0.0,
> "SubmitApplicationNumOps" : 2085,
> "SubmitApplicationAvgTime" : 0.343749994,
> "GetApplicationReportNumOps" : 297647,
> "GetApplicationReportAvgTime" : 0.0,
> "GetNewApplicationNumOps" : 2163,
> "GetNewApplicationAvgTime" : 0.0,
> "GetApplicationsNumOps" : 102,
> "GetApplicationsAvgTime" : 15999.0,
> "GetClusterMetricsNumOps" : 2405,
> "GetClusterMetricsAvgTime" : 0.01801801801801802,
> "GetDelegationTokenNumOps" : 33,
> "GetDelegationTokenAvgTime" : 0.0,
> "GetClusterNodesNumOps" : 2400,
> "GetClusterNodesAvgTime" : 80.5403726708074,
> "GetQueueInfoNumOps" : 1900,
> "GetQueueInfoAvgTime" : 1212.0089153046054,
> "GetQueueUserAclsNumOps" : 1900,
> "GetQueueUserAclsAvgTime" : 1.18194458,
> "ForceKillApplicationNumOps" : 199,
> "ForceKillApplicationAvgTime" : 0.3688524590163933



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5666) Yarn GetApplicationsAvgTime stuck a long time

2016-09-27 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-5666:
--
Attachment: stacks3

> Yarn GetApplicationsAvgTime stuck a long time 
> --
>
> Key: YARN-5666
> URL: https://issues.apache.org/jira/browse/YARN-5666
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, fairscheduler, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.1 
> Hadoop-2.7.1
>Reporter: zhangyubiao
>Priority: Blocker
> Attachments: stacks, stacks2, stacks3
>
>
> "beans" : [ {
> "name" : 
> "Hadoop:service=ResourceManager,name=RpcDetailedActivityForPort8032",
> "modelerType" : "RpcDetailedActivityForPort8032",
> "tag.port" : "8032",
> "tag.Context" : "rpcdetailed",
> "tag.Hostname" : "X",
> "ApplicationNotFoundExceptionNumOps" : 531,
> "ApplicationNotFoundExceptionAvgTime" : 0.0,
> "SubmitApplicationNumOps" : 2085,
> "SubmitApplicationAvgTime" : 0.343749994,
> "GetApplicationReportNumOps" : 297647,
> "GetApplicationReportAvgTime" : 0.0,
> "GetNewApplicationNumOps" : 2163,
> "GetNewApplicationAvgTime" : 0.0,
> "GetApplicationsNumOps" : 102,
> "GetApplicationsAvgTime" : 15999.0,
> "GetClusterMetricsNumOps" : 2405,
> "GetClusterMetricsAvgTime" : 0.01801801801801802,
> "GetDelegationTokenNumOps" : 33,
> "GetDelegationTokenAvgTime" : 0.0,
> "GetClusterNodesNumOps" : 2400,
> "GetClusterNodesAvgTime" : 80.5403726708074,
> "GetQueueInfoNumOps" : 1900,
> "GetQueueInfoAvgTime" : 1212.0089153046054,
> "GetQueueUserAclsNumOps" : 1900,
> "GetQueueUserAclsAvgTime" : 1.18194458,
> "ForceKillApplicationNumOps" : 199,
> "ForceKillApplicationAvgTime" : 0.3688524590163933



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5666) Yarn GetApplicationsAvgTime stuck a long time

2016-09-27 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15526376#comment-15526376
 ] 

zhangyubiao commented on YARN-5666:
---

{
  "beans" : [ {
"name" : 
"Hadoop:service=ResourceManager,name=RpcDetailedActivityForPort8030",
"modelerType" : "RpcDetailedActivityForPort8030",
"tag.port" : "8030",
"tag.Context" : "rpcdetailed",
"tag.Hostname" : "",
"ApplicationMasterNotRegisteredExceptionNumOps" : 182,
"ApplicationMasterNotRegisteredExceptionAvgTime" : 0.0,
"RegisterApplicationMasterNumOps" : 4620,
"RegisterApplicationMasterAvgTime" : 6644.6,
"FinishApplicationMasterNumOps" : 7154,
"FinishApplicationMasterAvgTime" : 0.0,
"AllocateNumOps" : 1617043,
"AllocateAvgTime" : 586.0114942528736,
"ApplicationAttemptNotFoundExceptionNumOps" : 629,
"ApplicationAttemptNotFoundExceptionAvgTime" : 3.1095890410958904,
"InvalidApplicationMasterRequestExceptionNumOps" : 354,
"InvalidApplicationMasterRequestExceptionAvgTime" : 0.39997
  } ]
}

> Yarn GetApplicationsAvgTime stuck a long time 
> --
>
> Key: YARN-5666
> URL: https://issues.apache.org/jira/browse/YARN-5666
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, fairscheduler, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.1 
> Hadoop-2.7.1
>Reporter: zhangyubiao
>Priority: Blocker
> Attachments: stacks, stacks2
>
>
> "beans" : [ {
> "name" : 
> "Hadoop:service=ResourceManager,name=RpcDetailedActivityForPort8032",
> "modelerType" : "RpcDetailedActivityForPort8032",
> "tag.port" : "8032",
> "tag.Context" : "rpcdetailed",
> "tag.Hostname" : "X",
> "ApplicationNotFoundExceptionNumOps" : 531,
> "ApplicationNotFoundExceptionAvgTime" : 0.0,
> "SubmitApplicationNumOps" : 2085,
> "SubmitApplicationAvgTime" : 0.343749994,
> "GetApplicationReportNumOps" : 297647,
> "GetApplicationReportAvgTime" : 0.0,
> "GetNewApplicationNumOps" : 2163,
> "GetNewApplicationAvgTime" : 0.0,
> "GetApplicationsNumOps" : 102,
> "GetApplicationsAvgTime" : 15999.0,
> "GetClusterMetricsNumOps" : 2405,
> "GetClusterMetricsAvgTime" : 0.01801801801801802,
> "GetDelegationTokenNumOps" : 33,
> "GetDelegationTokenAvgTime" : 0.0,
> "GetClusterNodesNumOps" : 2400,
> "GetClusterNodesAvgTime" : 80.5403726708074,
> "GetQueueInfoNumOps" : 1900,
> "GetQueueInfoAvgTime" : 1212.0089153046054,
> "GetQueueUserAclsNumOps" : 1900,
> "GetQueueUserAclsAvgTime" : 1.18194458,
> "ForceKillApplicationNumOps" : 199,
> "ForceKillApplicationAvgTime" : 0.3688524590163933



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5666) Yarn GetApplicationsAvgTime stuck a long time

2016-09-27 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15526349#comment-15526349
 ] 

zhangyubiao commented on YARN-5666:
---

[2016-09-27T22:39:46.888+08:00] [INFO] 
yarn.event.AsyncDispatcher.handle(AsyncDispatcher.java 235) [IPC Server handler 
81 on 8030] : Size of event-queue is 555000
[2016-09-27T22:39:46.888+08:00] [INFO] 
yarn.event.AsyncDispatcher.handle(AsyncDispatcher.java 235) [IPC Server handler 
58 on 8030] : Size of event-queue is 555000
[2016-09-27T22:39:46.888+08:00] [INFO] 
yarn.event.AsyncDispatcher.handle(AsyncDispatcher.java 235) [IPC Server handler 
199 on 8030] : Size of event-queue is 555000

ANd I find the  IPC Server handler 199 on 8030  event-queue very large.


> Yarn GetApplicationsAvgTime stuck a long time 
> --
>
> Key: YARN-5666
> URL: https://issues.apache.org/jira/browse/YARN-5666
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, fairscheduler, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.1 
> Hadoop-2.7.1
>Reporter: zhangyubiao
>Priority: Blocker
> Attachments: stacks, stacks2
>
>
> "beans" : [ {
> "name" : 
> "Hadoop:service=ResourceManager,name=RpcDetailedActivityForPort8032",
> "modelerType" : "RpcDetailedActivityForPort8032",
> "tag.port" : "8032",
> "tag.Context" : "rpcdetailed",
> "tag.Hostname" : "X",
> "ApplicationNotFoundExceptionNumOps" : 531,
> "ApplicationNotFoundExceptionAvgTime" : 0.0,
> "SubmitApplicationNumOps" : 2085,
> "SubmitApplicationAvgTime" : 0.343749994,
> "GetApplicationReportNumOps" : 297647,
> "GetApplicationReportAvgTime" : 0.0,
> "GetNewApplicationNumOps" : 2163,
> "GetNewApplicationAvgTime" : 0.0,
> "GetApplicationsNumOps" : 102,
> "GetApplicationsAvgTime" : 15999.0,
> "GetClusterMetricsNumOps" : 2405,
> "GetClusterMetricsAvgTime" : 0.01801801801801802,
> "GetDelegationTokenNumOps" : 33,
> "GetDelegationTokenAvgTime" : 0.0,
> "GetClusterNodesNumOps" : 2400,
> "GetClusterNodesAvgTime" : 80.5403726708074,
> "GetQueueInfoNumOps" : 1900,
> "GetQueueInfoAvgTime" : 1212.0089153046054,
> "GetQueueUserAclsNumOps" : 1900,
> "GetQueueUserAclsAvgTime" : 1.18194458,
> "ForceKillApplicationNumOps" : 199,
> "ForceKillApplicationAvgTime" : 0.3688524590163933



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5666) Yarn GetApplicationsAvgTime stuck a long time

2016-09-25 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15520526#comment-15520526
 ] 

zhangyubiao commented on YARN-5666:
---

[~rohithsharma] Would you like give a look ?Thanks

> Yarn GetApplicationsAvgTime stuck a long time 
> --
>
> Key: YARN-5666
> URL: https://issues.apache.org/jira/browse/YARN-5666
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, fairscheduler, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.1 
> Hadoop-2.7.1
>Reporter: zhangyubiao
>Priority: Blocker
> Attachments: stacks, stacks2
>
>
> "beans" : [ {
> "name" : 
> "Hadoop:service=ResourceManager,name=RpcDetailedActivityForPort8032",
> "modelerType" : "RpcDetailedActivityForPort8032",
> "tag.port" : "8032",
> "tag.Context" : "rpcdetailed",
> "tag.Hostname" : "X",
> "ApplicationNotFoundExceptionNumOps" : 531,
> "ApplicationNotFoundExceptionAvgTime" : 0.0,
> "SubmitApplicationNumOps" : 2085,
> "SubmitApplicationAvgTime" : 0.343749994,
> "GetApplicationReportNumOps" : 297647,
> "GetApplicationReportAvgTime" : 0.0,
> "GetNewApplicationNumOps" : 2163,
> "GetNewApplicationAvgTime" : 0.0,
> "GetApplicationsNumOps" : 102,
> "GetApplicationsAvgTime" : 15999.0,
> "GetClusterMetricsNumOps" : 2405,
> "GetClusterMetricsAvgTime" : 0.01801801801801802,
> "GetDelegationTokenNumOps" : 33,
> "GetDelegationTokenAvgTime" : 0.0,
> "GetClusterNodesNumOps" : 2400,
> "GetClusterNodesAvgTime" : 80.5403726708074,
> "GetQueueInfoNumOps" : 1900,
> "GetQueueInfoAvgTime" : 1212.0089153046054,
> "GetQueueUserAclsNumOps" : 1900,
> "GetQueueUserAclsAvgTime" : 1.18194458,
> "ForceKillApplicationNumOps" : 199,
> "ForceKillApplicationAvgTime" : 0.3688524590163933



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5666) Yarn GetApplicationsAvgTime stuck a long time

2016-09-25 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15520514#comment-15520514
 ] 

zhangyubiao commented on YARN-5666:
---

@Daniel Templeton  Would you like give a look ?Thanks


> Yarn GetApplicationsAvgTime stuck a long time 
> --
>
> Key: YARN-5666
> URL: https://issues.apache.org/jira/browse/YARN-5666
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, fairscheduler, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.1 
> Hadoop-2.7.1
>Reporter: zhangyubiao
>Priority: Blocker
> Attachments: stacks, stacks2
>
>
> "beans" : [ {
> "name" : 
> "Hadoop:service=ResourceManager,name=RpcDetailedActivityForPort8032",
> "modelerType" : "RpcDetailedActivityForPort8032",
> "tag.port" : "8032",
> "tag.Context" : "rpcdetailed",
> "tag.Hostname" : "X",
> "ApplicationNotFoundExceptionNumOps" : 531,
> "ApplicationNotFoundExceptionAvgTime" : 0.0,
> "SubmitApplicationNumOps" : 2085,
> "SubmitApplicationAvgTime" : 0.343749994,
> "GetApplicationReportNumOps" : 297647,
> "GetApplicationReportAvgTime" : 0.0,
> "GetNewApplicationNumOps" : 2163,
> "GetNewApplicationAvgTime" : 0.0,
> "GetApplicationsNumOps" : 102,
> "GetApplicationsAvgTime" : 15999.0,
> "GetClusterMetricsNumOps" : 2405,
> "GetClusterMetricsAvgTime" : 0.01801801801801802,
> "GetDelegationTokenNumOps" : 33,
> "GetDelegationTokenAvgTime" : 0.0,
> "GetClusterNodesNumOps" : 2400,
> "GetClusterNodesAvgTime" : 80.5403726708074,
> "GetQueueInfoNumOps" : 1900,
> "GetQueueInfoAvgTime" : 1212.0089153046054,
> "GetQueueUserAclsNumOps" : 1900,
> "GetQueueUserAclsAvgTime" : 1.18194458,
> "ForceKillApplicationNumOps" : 199,
> "ForceKillApplicationAvgTime" : 0.3688524590163933



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5666) Yarn GetApplicationsAvgTime stuck a long time

2016-09-25 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15520514#comment-15520514
 ] 

zhangyubiao edited comment on YARN-5666 at 9/25/16 9:18 AM:


[~templedf]  Would you like give a look ?Thanks



was (Author: piaoyu zhang):
@Daniel Templeton  Would you like give a look ?Thanks


> Yarn GetApplicationsAvgTime stuck a long time 
> --
>
> Key: YARN-5666
> URL: https://issues.apache.org/jira/browse/YARN-5666
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, fairscheduler, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.1 
> Hadoop-2.7.1
>Reporter: zhangyubiao
>Priority: Blocker
> Attachments: stacks, stacks2
>
>
> "beans" : [ {
> "name" : 
> "Hadoop:service=ResourceManager,name=RpcDetailedActivityForPort8032",
> "modelerType" : "RpcDetailedActivityForPort8032",
> "tag.port" : "8032",
> "tag.Context" : "rpcdetailed",
> "tag.Hostname" : "X",
> "ApplicationNotFoundExceptionNumOps" : 531,
> "ApplicationNotFoundExceptionAvgTime" : 0.0,
> "SubmitApplicationNumOps" : 2085,
> "SubmitApplicationAvgTime" : 0.343749994,
> "GetApplicationReportNumOps" : 297647,
> "GetApplicationReportAvgTime" : 0.0,
> "GetNewApplicationNumOps" : 2163,
> "GetNewApplicationAvgTime" : 0.0,
> "GetApplicationsNumOps" : 102,
> "GetApplicationsAvgTime" : 15999.0,
> "GetClusterMetricsNumOps" : 2405,
> "GetClusterMetricsAvgTime" : 0.01801801801801802,
> "GetDelegationTokenNumOps" : 33,
> "GetDelegationTokenAvgTime" : 0.0,
> "GetClusterNodesNumOps" : 2400,
> "GetClusterNodesAvgTime" : 80.5403726708074,
> "GetQueueInfoNumOps" : 1900,
> "GetQueueInfoAvgTime" : 1212.0089153046054,
> "GetQueueUserAclsNumOps" : 1900,
> "GetQueueUserAclsAvgTime" : 1.18194458,
> "ForceKillApplicationNumOps" : 199,
> "ForceKillApplicationAvgTime" : 0.3688524590163933



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5666) Yarn GetApplicationsAvgTime stuck a long time

2016-09-25 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-5666:
--
Attachment: stacks2

> Yarn GetApplicationsAvgTime stuck a long time 
> --
>
> Key: YARN-5666
> URL: https://issues.apache.org/jira/browse/YARN-5666
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, fairscheduler, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.1 
> Hadoop-2.7.1
>Reporter: zhangyubiao
>Priority: Blocker
> Attachments: stacks, stacks2
>
>
> "beans" : [ {
> "name" : 
> "Hadoop:service=ResourceManager,name=RpcDetailedActivityForPort8032",
> "modelerType" : "RpcDetailedActivityForPort8032",
> "tag.port" : "8032",
> "tag.Context" : "rpcdetailed",
> "tag.Hostname" : "X",
> "ApplicationNotFoundExceptionNumOps" : 531,
> "ApplicationNotFoundExceptionAvgTime" : 0.0,
> "SubmitApplicationNumOps" : 2085,
> "SubmitApplicationAvgTime" : 0.343749994,
> "GetApplicationReportNumOps" : 297647,
> "GetApplicationReportAvgTime" : 0.0,
> "GetNewApplicationNumOps" : 2163,
> "GetNewApplicationAvgTime" : 0.0,
> "GetApplicationsNumOps" : 102,
> "GetApplicationsAvgTime" : 15999.0,
> "GetClusterMetricsNumOps" : 2405,
> "GetClusterMetricsAvgTime" : 0.01801801801801802,
> "GetDelegationTokenNumOps" : 33,
> "GetDelegationTokenAvgTime" : 0.0,
> "GetClusterNodesNumOps" : 2400,
> "GetClusterNodesAvgTime" : 80.5403726708074,
> "GetQueueInfoNumOps" : 1900,
> "GetQueueInfoAvgTime" : 1212.0089153046054,
> "GetQueueUserAclsNumOps" : 1900,
> "GetQueueUserAclsAvgTime" : 1.18194458,
> "ForceKillApplicationNumOps" : 199,
> "ForceKillApplicationAvgTime" : 0.3688524590163933



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5666) Yarn GetApplicationsAvgTime stuck a long time

2016-09-25 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-5666:
--
Attachment: stacks

> Yarn GetApplicationsAvgTime stuck a long time 
> --
>
> Key: YARN-5666
> URL: https://issues.apache.org/jira/browse/YARN-5666
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, fairscheduler, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.1 
> Hadoop-2.7.1
>Reporter: zhangyubiao
>Priority: Blocker
> Attachments: stacks
>
>
> "beans" : [ {
> "name" : 
> "Hadoop:service=ResourceManager,name=RpcDetailedActivityForPort8032",
> "modelerType" : "RpcDetailedActivityForPort8032",
> "tag.port" : "8032",
> "tag.Context" : "rpcdetailed",
> "tag.Hostname" : "X",
> "ApplicationNotFoundExceptionNumOps" : 531,
> "ApplicationNotFoundExceptionAvgTime" : 0.0,
> "SubmitApplicationNumOps" : 2085,
> "SubmitApplicationAvgTime" : 0.343749994,
> "GetApplicationReportNumOps" : 297647,
> "GetApplicationReportAvgTime" : 0.0,
> "GetNewApplicationNumOps" : 2163,
> "GetNewApplicationAvgTime" : 0.0,
> "GetApplicationsNumOps" : 102,
> "GetApplicationsAvgTime" : 15999.0,
> "GetClusterMetricsNumOps" : 2405,
> "GetClusterMetricsAvgTime" : 0.01801801801801802,
> "GetDelegationTokenNumOps" : 33,
> "GetDelegationTokenAvgTime" : 0.0,
> "GetClusterNodesNumOps" : 2400,
> "GetClusterNodesAvgTime" : 80.5403726708074,
> "GetQueueInfoNumOps" : 1900,
> "GetQueueInfoAvgTime" : 1212.0089153046054,
> "GetQueueUserAclsNumOps" : 1900,
> "GetQueueUserAclsAvgTime" : 1.18194458,
> "ForceKillApplicationNumOps" : 199,
> "ForceKillApplicationAvgTime" : 0.3688524590163933



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5666) Yarn GetApplicationsAvgTime stuck a long time

2016-09-25 Thread zhangyubiao (JIRA)
zhangyubiao created YARN-5666:
-

 Summary: Yarn GetApplicationsAvgTime stuck a long time 
 Key: YARN-5666
 URL: https://issues.apache.org/jira/browse/YARN-5666
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, fairscheduler, resourcemanager
Affects Versions: 2.7.1
 Environment: CentOS7.1 
Hadoop-2.7.1
Reporter: zhangyubiao
Priority: Blocker


"beans" : [ {
"name" : 
"Hadoop:service=ResourceManager,name=RpcDetailedActivityForPort8032",
"modelerType" : "RpcDetailedActivityForPort8032",
"tag.port" : "8032",
"tag.Context" : "rpcdetailed",
"tag.Hostname" : "X",
"ApplicationNotFoundExceptionNumOps" : 531,
"ApplicationNotFoundExceptionAvgTime" : 0.0,
"SubmitApplicationNumOps" : 2085,
"SubmitApplicationAvgTime" : 0.343749994,
"GetApplicationReportNumOps" : 297647,
"GetApplicationReportAvgTime" : 0.0,
"GetNewApplicationNumOps" : 2163,
"GetNewApplicationAvgTime" : 0.0,
"GetApplicationsNumOps" : 102,
"GetApplicationsAvgTime" : 15999.0,
"GetClusterMetricsNumOps" : 2405,
"GetClusterMetricsAvgTime" : 0.01801801801801802,
"GetDelegationTokenNumOps" : 33,
"GetDelegationTokenAvgTime" : 0.0,
"GetClusterNodesNumOps" : 2400,
"GetClusterNodesAvgTime" : 80.5403726708074,
"GetQueueInfoNumOps" : 1900,
"GetQueueInfoAvgTime" : 1212.0089153046054,
"GetQueueUserAclsNumOps" : 1900,
"GetQueueUserAclsAvgTime" : 1.18194458,
"ForceKillApplicationNumOps" : 199,
"ForceKillApplicationAvgTime" : 0.3688524590163933



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5209) Transmission ContainerExecutor Class Parameters By The Client

2016-07-11 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao resolved YARN-5209.
---
Resolution: Duplicate

> Transmission ContainerExecutor  Class Parameters By The Client
> --
>
> Key: YARN-5209
> URL: https://issues.apache.org/jira/browse/YARN-5209
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.1
> Environment: CentOS7.1  Hadoop-2.7.1
>Reporter: zhangyubiao
>
> We running DefaultContainerExecutor  in  the cluster  now . 
> But we want use DockerContainerExecutor in the future.
> I wonder if we can transmission parameters by the client,so we can switch 
> DefaultContainerExecutor to DockerContainerExecutor smooth.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5301) NM mount cpu cgroups failed on some system

2016-06-29 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356384#comment-15356384
 ] 

zhangyubiao commented on YARN-5301:
---

Maybe you should see https://issues.apache.org/jira/browse/YARN-2194

> NM mount cpu cgroups failed on some system
> --
>
> Key: YARN-5301
> URL: https://issues.apache.org/jira/browse/YARN-5301
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
>
> on ubuntu  with linux kernel 3.19, , NM start failed if enable auto mount 
> cgroup. try command:
> ./bin/container-executor --mount-cgroups yarn-hadoop cpu=/cgroup/cpufail
> ./bin/container-executor --mount-cgroups yarn-hadoop cpu,cpuacct=/cgroup/cpu  
>   succ



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5209) Transmission ContainerExecutor Class Parameters By The Client

2016-06-11 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326192#comment-15326192
 ] 

zhangyubiao commented on YARN-5209:
---

Thanks [~templedf]

> Transmission ContainerExecutor  Class Parameters By The Client
> --
>
> Key: YARN-5209
> URL: https://issues.apache.org/jira/browse/YARN-5209
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.1
> Environment: CentOS7.1  Hadoop-2.7.1
>Reporter: zhangyubiao
>
> We running DefaultContainerExecutor  in  the cluster  now . 
> But we want use DockerContainerExecutor in the future.
> I wonder if we can transmission parameters by the client,so we can switch 
> DefaultContainerExecutor to DockerContainerExecutor smooth.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5209) Transmission ContainerExecutor Parameters By The Client

2016-06-07 Thread zhangyubiao (JIRA)
zhangyubiao created YARN-5209:
-

 Summary: Transmission ContainerExecutor Parameters By The Client
 Key: YARN-5209
 URL: https://issues.apache.org/jira/browse/YARN-5209
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.7.1
 Environment: CentOS7.1  Hadoop-2.7.1
Reporter: zhangyubiao


We running DefaultContainerExecutor  in  the cluster  now . 
But we want use DockerContainerExecutor in the future.
I wonder if we can transmission parameters by the client,so we can switch 
DefaultContainerExecutor to DockerContainerExecutor smooth.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5209) Transmission ContainerExecutor Class Parameters By The Client

2016-06-07 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-5209:
--
Summary: Transmission ContainerExecutor  Class Parameters By The Client  
(was: Transmission ContainerExecutor Parameters By The Client)

> Transmission ContainerExecutor  Class Parameters By The Client
> --
>
> Key: YARN-5209
> URL: https://issues.apache.org/jira/browse/YARN-5209
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.1
> Environment: CentOS7.1  Hadoop-2.7.1
>Reporter: zhangyubiao
>
> We running DefaultContainerExecutor  in  the cluster  now . 
> But we want use DockerContainerExecutor in the future.
> I wonder if we can transmission parameters by the client,so we can switch 
> DefaultContainerExecutor to DockerContainerExecutor smooth.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM

2016-06-05 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao resolved YARN-3979.
---
Resolution: Duplicate

> Am in ResourceLocalizationService hang 10 min cause RM kill  AM
> ---
>
> Key: YARN-3979
> URL: https://issues.apache.org/jira/browse/YARN-3979
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0
> Environment: CentOS 6.5  Hadoop-2.2.0
>Reporter: zhangyubiao
> Attachments: ERROR103.log
>
>
> 2015-07-27 02:46:17,348 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Created localizer for container_1437735375558
> _104282_01_01
> 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
> Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
> 2015-07-27 02:56:18,510 INFO 
> SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
>  Authorization successful for appattempt_1437735375558_104282_0
> 1 (auth:TOKEN) for protocol=interface 
> org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM

2016-04-17 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-3979:
--
External issue URL:   (was: https://issues.apache.org/jira/browse/YARN-3990)

> Am in ResourceLocalizationService hang 10 min cause RM kill  AM
> ---
>
> Key: YARN-3979
> URL: https://issues.apache.org/jira/browse/YARN-3979
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0
> Environment: CentOS 6.5  Hadoop-2.2.0
>Reporter: zhangyubiao
> Attachments: ERROR103.log
>
>
> 2015-07-27 02:46:17,348 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Created localizer for container_1437735375558
> _104282_01_01
> 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
> Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
> 2015-07-27 02:56:18,510 INFO 
> SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
>  Authorization successful for appattempt_1437735375558_104282_0
> 1 (auth:TOKEN) for protocol=interface 
> org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM

2016-04-17 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-3979:
--
External issue URL: https://issues.apache.org/jira/browse/YARN-3990

> Am in ResourceLocalizationService hang 10 min cause RM kill  AM
> ---
>
> Key: YARN-3979
> URL: https://issues.apache.org/jira/browse/YARN-3979
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0
> Environment: CentOS 6.5  Hadoop-2.2.0
>Reporter: zhangyubiao
> Attachments: ERROR103.log
>
>
> 2015-07-27 02:46:17,348 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Created localizer for container_1437735375558
> _104282_01_01
> 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
> Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
> 2015-07-27 02:56:18,510 INFO 
> SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
>  Authorization successful for appattempt_1437735375558_104282_0
> 1 (auth:TOKEN) for protocol=interface 
> org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4856) RM /ws/v1/cluster/scheduler JSON format Error

2016-03-23 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-4856:
--
Summary: RM  /ws/v1/cluster/scheduler JSON format Error  (was: RM  
/ws/v1/cluster/scheduler JSON format err )

> RM  /ws/v1/cluster/scheduler JSON format Error
> --
>
> Key: YARN-4856
> URL: https://issues.apache.org/jira/browse/YARN-4856
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
> Environment: Hadoop-2.7.1
>Reporter: zhangyubiao
>  Labels: patch
>
> Hadoop-2.7.1  RM  /ws/v1/cluster/scheduler JSON format Error
> Root Queue's ChildQueue  is 
> {"memory":3717120,"vCores":1848},"queueName":"root","schedulingPolicy":"fair","childQueues":{color:red}[{"type":"fairSchedulerLeafQueueInfo",
> {color}"maxApps":400,"queueMaxMapsForEachJob":2147483647,"queueMaxReducesForEachJob":2147483647,"minResources":{"memory":0,"vCores":0},"maxResources":{"memory":0,"vCores":0},
> But  Other's ChildQueue is 
> {"maxApps":300,"queueMaxMapsForEachJob":2147483647,"queueMaxReducesForEachJob":2147483647,"minResources":{"memory":2867200,"vCores":1400},"maxResources":{"memory":2867200,"vCores":1400},"usedResources":{"memory":0,"vCores":0},"steadyFairResources":{"memory":2867200,"vCores":0},"fairResources":{"memory":0,"vCores":0},"clusterResources":{"memory":3717120,"vCores":1848},"queueName":"root.bdp_jmart_ad","schedulingPolicy":"fair","childQueues":{"type":{color:red}
> ["fairSchedulerLeafQueueInfo"],
> {color}"maxApps":300,"queueMaxMapsForEachJob":2147483647,"queueMaxReducesForEachJob":2147483647,"minResources":{"memory":2867200,"vCores":1400},"maxResources":{"memory":2867200,"vCores":1400},"usedResources":{"memory":0,"vCores":0},"steadyFairResources":{"memory":2867200,"vCores":0},"fairResources":{"memory":0,"vCores":0},"clusterResources":{"memory":3717120,"vCores":1848},"queueName":"root.bdp_jmart_ad.jd_ad_anti","schedulingPolicy":"fair","numPendingApps":0,"numActiveApps":0},"childQueues":{"type":"fairSchedulerLeafQueueInfo","maxApps":300,"queueMaxMapsForEachJob":2147483647,"queueMaxReducesForEachJob":2147483647,"minResources":{"memory":2867200,"vCores":1400},"maxResources":{"memory":2867200,"vCores":1400},"usedResources":{"memory":0,"vCores":0},"steadyFairResources":{"memory":2867200,"vCores":0},"fairResources":{"memory":0,"vCores":0},"clusterResources":{"memory":3717120,"vCores":1848},"queueName":"root.bdp_jmart_ad.jd_ad_formal_1","schedulingPolicy":"fair","numPendingApps":0,"numActiveApps":0},"childQueues":{"type":"fairSchedulerLeafQueueInfo","maxApps":300,"queueMaxMapsForEachJob":2147483647,"queueMaxReducesForEachJob":2147483647,"minResources":{"memory":2867200,"vCores":1400},"maxResources":{"memory":2867200,"vCores":1400},"usedResources":{"memory":0,"vCores":0},"steadyFairResources":{"memory":2867200,"vCores":0},"fairResources":{"memory":0,"vCores":0},"clusterResources":{"memory":3717120,"vCores":1848},"queueName":"root.bdp_jmart_ad.jd_ad_oozie","schedulingPolicy":"fair","numPendingApps":0,"numActiveApps":0}},{"maxApps":300,"queueMaxMapsForEachJob":2147483647,"queueMaxReducesForEachJob":2147483647,"minResources":{"memory":0,"vCores":0},"maxResources":{"memory":0,"vCores":0},"usedResources":{"memory":0,"vCores":0},"steadyFairResources":{"memory":0,"vCores":0},"fairResources":{"memory":0,"vCores":0},"clusterResources":{"memory":3717120,"vCores":1848}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4856) RM /ws/v1/cluster/scheduler JSON format err

2016-03-23 Thread zhangyubiao (JIRA)
zhangyubiao created YARN-4856:
-

 Summary: RM  /ws/v1/cluster/scheduler JSON format err 
 Key: YARN-4856
 URL: https://issues.apache.org/jira/browse/YARN-4856
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.1
 Environment: Hadoop-2.7.1
Reporter: zhangyubiao


Hadoop-2.7.1  RM  /ws/v1/cluster/scheduler JSON format Error

Root Queue's ChildQueue  is 
{"memory":3717120,"vCores":1848},"queueName":"root","schedulingPolicy":"fair","childQueues":{color:red}[{"type":"fairSchedulerLeafQueueInfo",
{color}"maxApps":400,"queueMaxMapsForEachJob":2147483647,"queueMaxReducesForEachJob":2147483647,"minResources":{"memory":0,"vCores":0},"maxResources":{"memory":0,"vCores":0},

But  Other's ChildQueue is 
{"maxApps":300,"queueMaxMapsForEachJob":2147483647,"queueMaxReducesForEachJob":2147483647,"minResources":{"memory":2867200,"vCores":1400},"maxResources":{"memory":2867200,"vCores":1400},"usedResources":{"memory":0,"vCores":0},"steadyFairResources":{"memory":2867200,"vCores":0},"fairResources":{"memory":0,"vCores":0},"clusterResources":{"memory":3717120,"vCores":1848},"queueName":"root.bdp_jmart_ad","schedulingPolicy":"fair","childQueues":{"type":{color:red}
["fairSchedulerLeafQueueInfo"],
{color}"maxApps":300,"queueMaxMapsForEachJob":2147483647,"queueMaxReducesForEachJob":2147483647,"minResources":{"memory":2867200,"vCores":1400},"maxResources":{"memory":2867200,"vCores":1400},"usedResources":{"memory":0,"vCores":0},"steadyFairResources":{"memory":2867200,"vCores":0},"fairResources":{"memory":0,"vCores":0},"clusterResources":{"memory":3717120,"vCores":1848},"queueName":"root.bdp_jmart_ad.jd_ad_anti","schedulingPolicy":"fair","numPendingApps":0,"numActiveApps":0},"childQueues":{"type":"fairSchedulerLeafQueueInfo","maxApps":300,"queueMaxMapsForEachJob":2147483647,"queueMaxReducesForEachJob":2147483647,"minResources":{"memory":2867200,"vCores":1400},"maxResources":{"memory":2867200,"vCores":1400},"usedResources":{"memory":0,"vCores":0},"steadyFairResources":{"memory":2867200,"vCores":0},"fairResources":{"memory":0,"vCores":0},"clusterResources":{"memory":3717120,"vCores":1848},"queueName":"root.bdp_jmart_ad.jd_ad_formal_1","schedulingPolicy":"fair","numPendingApps":0,"numActiveApps":0},"childQueues":{"type":"fairSchedulerLeafQueueInfo","maxApps":300,"queueMaxMapsForEachJob":2147483647,"queueMaxReducesForEachJob":2147483647,"minResources":{"memory":2867200,"vCores":1400},"maxResources":{"memory":2867200,"vCores":1400},"usedResources":{"memory":0,"vCores":0},"steadyFairResources":{"memory":2867200,"vCores":0},"fairResources":{"memory":0,"vCores":0},"clusterResources":{"memory":3717120,"vCores":1848},"queueName":"root.bdp_jmart_ad.jd_ad_oozie","schedulingPolicy":"fair","numPendingApps":0,"numActiveApps":0}},{"maxApps":300,"queueMaxMapsForEachJob":2147483647,"queueMaxReducesForEachJob":2147483647,"minResources":{"memory":0,"vCores":0},"maxResources":{"memory":0,"vCores":0},"usedResources":{"memory":0,"vCores":0},"steadyFairResources":{"memory":0,"vCores":0},"fairResources":{"memory":0,"vCores":0},"clusterResources":{"memory":3717120,"vCores":1848}}






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4124) ApplicationMaster Container Launch timed out

2015-09-07 Thread zhangyubiao (JIRA)
zhangyubiao created YARN-4124:
-

 Summary: ApplicationMaster Container Launch timed out
 Key: YARN-4124
 URL: https://issues.apache.org/jira/browse/YARN-4124
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/unmanaged-AM-launcher
Affects Versions: 2.2.0
 Environment: CentOS 6.5 and Hadoop-2.2.0
Reporter: zhangyubiao
Priority: Blocker


ApplicationMaster Container Launch timed out
I find it very slow from ResourceLocalizationService to 
DefaultContainerExecutor ,I will upload the AM Logs 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4124) ApplicationMaster Container Launch timed out

2015-09-07 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-4124:
--
Attachment: 337619.log

> ApplicationMaster Container Launch timed out
> 
>
> Key: YARN-4124
> URL: https://issues.apache.org/jira/browse/YARN-4124
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/unmanaged-AM-launcher
>Affects Versions: 2.2.0
> Environment: CentOS 6.5 and Hadoop-2.2.0
>Reporter: zhangyubiao
>Priority: Blocker
> Attachments: 337619.log
>
>
> ApplicationMaster Container Launch timed out
> I find it very slow from ResourceLocalizationService to 
> DefaultContainerExecutor ,I will upload the AM Logs 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM

2015-08-18 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700886#comment-14700886
 ] 

zhangyubiao commented on YARN-3979:
---

Thanks for Rohith Sharma K S's  patch , We stop the copy of Logs that the 
program gone , and we will test patch for our test enviroment and if it's OK .  
we will patch for our production envirments . Thank you for your help.

 Am in ResourceLocalizationService hang 10 min cause RM kill  AM
 ---

 Key: YARN-3979
 URL: https://issues.apache.org/jira/browse/YARN-3979
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.5  Hadoop-2.2.0
Reporter: zhangyubiao
 Attachments: ERROR103.log


 2015-07-27 02:46:17,348 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1437735375558
 _104282_01_01
 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
 2015-07-27 02:56:18,510 INFO 
 SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
  Authorization successful for appattempt_1437735375558_104282_0
 1 (auth:TOKEN) for protocol=interface 
 org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM

2015-07-30 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647275#comment-14647275
 ] 

zhangyubiao commented on YARN-3979:
---

I  find that the CPU and load is high  because of we use crontab to copy the RM 
Logs。
Today we stop the copy ,the CPU and load become normal 。

 Am in ResourceLocalizationService hang 10 min cause RM kill  AM
 ---

 Key: YARN-3979
 URL: https://issues.apache.org/jira/browse/YARN-3979
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.5  Hadoop-2.2.0
Reporter: zhangyubiao
 Attachments: ERROR103.log


 2015-07-27 02:46:17,348 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1437735375558
 _104282_01_01
 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
 2015-07-27 02:56:18,510 INFO 
 SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
  Authorization successful for appattempt_1437735375558_104282_0
 1 (auth:TOKEN) for protocol=interface 
 org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM

2015-07-30 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647303#comment-14647303
 ] 

zhangyubiao commented on YARN-3979:
---

I send you RM Logs just now 

 Am in ResourceLocalizationService hang 10 min cause RM kill  AM
 ---

 Key: YARN-3979
 URL: https://issues.apache.org/jira/browse/YARN-3979
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.5  Hadoop-2.2.0
Reporter: zhangyubiao
 Attachments: ERROR103.log


 2015-07-27 02:46:17,348 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1437735375558
 _104282_01_01
 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
 2015-07-27 02:56:18,510 INFO 
 SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
  Authorization successful for appattempt_1437735375558_104282_0
 1 (auth:TOKEN) for protocol=interface 
 org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM

2015-07-30 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647358#comment-14647358
 ] 

zhangyubiao commented on YARN-3979:
---

And Today we find that Yarn Memory Reserved very Large , AM stuck to Lanuch.

 Am in ResourceLocalizationService hang 10 min cause RM kill  AM
 ---

 Key: YARN-3979
 URL: https://issues.apache.org/jira/browse/YARN-3979
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.5  Hadoop-2.2.0
Reporter: zhangyubiao
 Attachments: ERROR103.log


 2015-07-27 02:46:17,348 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1437735375558
 _104282_01_01
 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
 2015-07-27 02:56:18,510 INFO 
 SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
  Authorization successful for appattempt_1437735375558_104282_0
 1 (auth:TOKEN) for protocol=interface 
 org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM

2015-07-29 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647095#comment-14647095
 ] 

zhangyubiao commented on YARN-3979:
---

the cluster is about 1600。and about 550 apps running.  2  lakh  apps completed 
.   NodeManager in one times all lost and  recovery for a monment 。 I use 
Hadoop-2.2.0 in CentOS 6.5 

 Am in ResourceLocalizationService hang 10 min cause RM kill  AM
 ---

 Key: YARN-3979
 URL: https://issues.apache.org/jira/browse/YARN-3979
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.5  Hadoop-2.2.0
Reporter: zhangyubiao
 Attachments: ERROR103.log


 2015-07-27 02:46:17,348 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1437735375558
 _104282_01_01
 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
 2015-07-27 02:56:18,510 INFO 
 SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
  Authorization successful for appattempt_1437735375558_104282_0
 1 (auth:TOKEN) for protocol=interface 
 org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM

2015-07-29 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647130#comment-14647130
 ] 

zhangyubiao commented on YARN-3979:
---

I had send you an email of RM Jstack log 
and I wil send your app log soon 


 Am in ResourceLocalizationService hang 10 min cause RM kill  AM
 ---

 Key: YARN-3979
 URL: https://issues.apache.org/jira/browse/YARN-3979
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.5  Hadoop-2.2.0
Reporter: zhangyubiao
 Attachments: ERROR103.log


 2015-07-27 02:46:17,348 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1437735375558
 _104282_01_01
 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
 2015-07-27 02:56:18,510 INFO 
 SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
  Authorization successful for appattempt_1437735375558_104282_0
 1 (auth:TOKEN) for protocol=interface 
 org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM

2015-07-29 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645601#comment-14645601
 ] 

zhangyubiao commented on YARN-3979:
---

Thank you for reply  @Rohith Sharma K S 

 Am in ResourceLocalizationService hang 10 min cause RM kill  AM
 ---

 Key: YARN-3979
 URL: https://issues.apache.org/jira/browse/YARN-3979
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.5  Hadoop-2.2.0
Reporter: zhangyubiao
 Attachments: ERROR103.log


 2015-07-27 02:46:17,348 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1437735375558
 _104282_01_01
 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
 2015-07-27 02:56:18,510 INFO 
 SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
  Authorization successful for appattempt_1437735375558_104282_0
 1 (auth:TOKEN) for protocol=interface 
 org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM

2015-07-29 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-3979:
--
Attachment: ERROR103.log

the RM Log is  so large  So I grep ERROR for logs 

 Am in ResourceLocalizationService hang 10 min cause RM kill  AM
 ---

 Key: YARN-3979
 URL: https://issues.apache.org/jira/browse/YARN-3979
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.5  Hadoop-2.2.0
Reporter: zhangyubiao
 Attachments: ERROR103.log


 2015-07-27 02:46:17,348 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1437735375558
 _104282_01_01
 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
 2015-07-27 02:56:18,510 INFO 
 SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
  Authorization successful for appattempt_1437735375558_104282_0
 1 (auth:TOKEN) for protocol=interface 
 org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM

2015-07-28 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645381#comment-14645381
 ] 

zhangyubiao commented on YARN-3979:
---

I find the RM hang and get the Pstack at that time .


Thread 370 (Thread 0x7f263e4f1700 (LWP 35718)):
#0  0x003abf40b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x7f263ec01f8e in os::PlatformEvent::park() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#2  0x7f263ebd3985 in Monitor::IWait(Thread*, long) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#3  0x7f263ebd3fed in Monitor::wait(bool, long, bool) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#4  0x7f263ed101f5 in Threads::destroy_vm() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#5  0x7f263ea0c97b in jni_DestroyJavaVM () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#6  0x4000223f in JavaMain ()
#7  0x003abf407851 in start_thread () from /lib64/libpthread.so.0
#8  0x003abece811d in clone () from /lib64/libc.so.6
Thread 369 (Thread 0x7f263dfa1700 (LWP 35719)):
#0  0x003abf40b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x7f263ec01f8e in os::PlatformEvent::park() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#2  0x7f263ebd3985 in Monitor::IWait(Thread*, long) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#3  0x7f263ebd414e in Monitor::wait(bool, long, bool) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#4  0x7f263ed67668 in GangWorker::loop() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#5  0x7f263ed675b4 in GangWorker::run() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#6  0x7f263ec0296f in java_start(Thread*) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#7  0x003abf407851 in start_thread () from /lib64/libpthread.so.0
#8  0x003abece811d in clone () from /lib64/libc.so.6
Thread 368 (Thread 0x7f263dea0700 (LWP 35720)):
#0  0x003abf40b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x7f263ec01f8e in os::PlatformEvent::park() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#2  0x7f263ebd3985 in Monitor::IWait(Thread*, long) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#3  0x7f263ebd414e in Monitor::wait(bool, long, bool) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#4  0x7f263ed67668 in GangWorker::loop() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#5  0x7f263ed675b4 in GangWorker::run() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#6  0x7f263ec0296f in java_start(Thread*) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#7  0x003abf407851 in start_thread () from /lib64/libpthread.so.0
#8  0x003abece811d in clone () from /lib64/libc.so.6
Thread 367 (Thread 0x7f263dd9f700 (LWP 35721)):
#0  0x003abf40b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x7f263ec01f8e in os::PlatformEvent::park() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#2  0x7f263ebd3985 in Monitor::IWait(Thread*, long) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#3  0x7f263ebd414e in Monitor::wait(bool, long, bool) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#4  0x7f263ed67668 in GangWorker::loop() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#5  0x7f263ed675b4 in GangWorker::run() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#6  0x7f263ec0296f in java_start(Thread*) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#7  0x003abf407851 in start_thread () from /lib64/libpthread.so.0
#8  0x003abece811d in clone () from /lib64/libc.so.6
Thread 366 (Thread 0x7f263dc9e700 (LWP 35722)):
#0  0x003abf40b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x7f263ec01f8e in os::PlatformEvent::park() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#2  0x7f263ebd3985 in Monitor::IWait(Thread*, long) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#3  0x7f263ebd414e in Monitor::wait(bool, long, bool) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#4  0x7f263ed67668 in GangWorker::loop() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#5  0x7f263ed675b4 in GangWorker::run() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#6  0x7f263ec0296f in java_start(Thread*) () from 

[jira] [Updated] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM

2015-07-28 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-3979:
--
Description: 
2015-07-27 02:46:17,348 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Created localizer for container_1437735375558
_104282_01_01
2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth 
successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
2015-07-27 02:56:18,510 INFO 
SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
 Authorization successful for appattempt_1437735375558_104282_0
1 (auth:TOKEN) for protocol=interface 
org.apache.hadoop.yarn.api.ContainerManagementProtocolPB

I find the RM hang and get the Pstack at that time .


Thread 370 (Thread 0x7f263e4f1700 (LWP 35718)):
#0  0x003abf40b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x7f263ec01f8e in os::PlatformEvent::park() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#2  0x7f263ebd3985 in Monitor::IWait(Thread*, long) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#3  0x7f263ebd3fed in Monitor::wait(bool, long, bool) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#4  0x7f263ed101f5 in Threads::destroy_vm() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#5  0x7f263ea0c97b in jni_DestroyJavaVM () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#6  0x4000223f in JavaMain ()
#7  0x003abf407851 in start_thread () from /lib64/libpthread.so.0
#8  0x003abece811d in clone () from /lib64/libc.so.6
Thread 369 (Thread 0x7f263dfa1700 (LWP 35719)):
#0  0x003abf40b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x7f263ec01f8e in os::PlatformEvent::park() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#2  0x7f263ebd3985 in Monitor::IWait(Thread*, long) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#3  0x7f263ebd414e in Monitor::wait(bool, long, bool) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#4  0x7f263ed67668 in GangWorker::loop() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#5  0x7f263ed675b4 in GangWorker::run() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#6  0x7f263ec0296f in java_start(Thread*) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#7  0x003abf407851 in start_thread () from /lib64/libpthread.so.0
#8  0x003abece811d in clone () from /lib64/libc.so.6
Thread 368 (Thread 0x7f263dea0700 (LWP 35720)):
#0  0x003abf40b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x7f263ec01f8e in os::PlatformEvent::park() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#2  0x7f263ebd3985 in Monitor::IWait(Thread*, long) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#3  0x7f263ebd414e in Monitor::wait(bool, long, bool) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#4  0x7f263ed67668 in GangWorker::loop() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#5  0x7f263ed675b4 in GangWorker::run() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#6  0x7f263ec0296f in java_start(Thread*) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#7  0x003abf407851 in start_thread () from /lib64/libpthread.so.0
#8  0x003abece811d in clone () from /lib64/libc.so.6
Thread 367 (Thread 0x7f263dd9f700 (LWP 35721)):
#0  0x003abf40b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x7f263ec01f8e in os::PlatformEvent::park() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#2  0x7f263ebd3985 in Monitor::IWait(Thread*, long) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#3  0x7f263ebd414e in Monitor::wait(bool, long, bool) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#4  0x7f263ed67668 in GangWorker::loop() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#5  0x7f263ed675b4 in GangWorker::run() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#6  0x7f263ec0296f in java_start(Thread*) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#7  0x003abf407851 in start_thread () from /lib64/libpthread.so.0
#8  0x003abece811d in clone () from /lib64/libc.so.6
Thread 366 (Thread 0x7f263dc9e700 (LWP 35722)):
#0  0x003abf40b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x7f263ec01f8e in os::PlatformEvent::park() () from 

[jira] [Updated] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM

2015-07-28 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-3979:
--
Description: 
2015-07-27 02:46:17,348 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Created localizer for container_1437735375558
_104282_01_01
2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth 
successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
2015-07-27 02:56:18,510 INFO 
SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
 Authorization successful for appattempt_1437735375558_104282_0
1 (auth:TOKEN) for protocol=interface 
org.apache.hadoop.yarn.api.ContainerManagementProtocolPB

  was:
2015-07-27 02:46:17,348 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Created localizer for container_1437735375558
_104282_01_01
2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth 
successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
2015-07-27 02:56:18,510 INFO 
SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
 Authorization successful for appattempt_1437735375558_104282_0
1 (auth:TOKEN) for protocol=interface 
org.apache.hadoop.yarn.api.ContainerManagementProtocolPB

I find the RM hang and get the Pstack at that time .


Thread 370 (Thread 0x7f263e4f1700 (LWP 35718)):
#0  0x003abf40b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x7f263ec01f8e in os::PlatformEvent::park() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#2  0x7f263ebd3985 in Monitor::IWait(Thread*, long) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#3  0x7f263ebd3fed in Monitor::wait(bool, long, bool) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#4  0x7f263ed101f5 in Threads::destroy_vm() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#5  0x7f263ea0c97b in jni_DestroyJavaVM () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#6  0x4000223f in JavaMain ()
#7  0x003abf407851 in start_thread () from /lib64/libpthread.so.0
#8  0x003abece811d in clone () from /lib64/libc.so.6
Thread 369 (Thread 0x7f263dfa1700 (LWP 35719)):
#0  0x003abf40b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x7f263ec01f8e in os::PlatformEvent::park() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#2  0x7f263ebd3985 in Monitor::IWait(Thread*, long) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#3  0x7f263ebd414e in Monitor::wait(bool, long, bool) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#4  0x7f263ed67668 in GangWorker::loop() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#5  0x7f263ed675b4 in GangWorker::run() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#6  0x7f263ec0296f in java_start(Thread*) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#7  0x003abf407851 in start_thread () from /lib64/libpthread.so.0
#8  0x003abece811d in clone () from /lib64/libc.so.6
Thread 368 (Thread 0x7f263dea0700 (LWP 35720)):
#0  0x003abf40b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x7f263ec01f8e in os::PlatformEvent::park() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#2  0x7f263ebd3985 in Monitor::IWait(Thread*, long) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#3  0x7f263ebd414e in Monitor::wait(bool, long, bool) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#4  0x7f263ed67668 in GangWorker::loop() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#5  0x7f263ed675b4 in GangWorker::run() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#6  0x7f263ec0296f in java_start(Thread*) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#7  0x003abf407851 in start_thread () from /lib64/libpthread.so.0
#8  0x003abece811d in clone () from /lib64/libc.so.6
Thread 367 (Thread 0x7f263dd9f700 (LWP 35721)):
#0  0x003abf40b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x7f263ec01f8e in os::PlatformEvent::park() () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#2  0x7f263ebd3985 in Monitor::IWait(Thread*, long) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#3  0x7f263ebd414e in Monitor::wait(bool, long, bool) () from 
/software/servers/jdk1.6.0_25/jre/lib/amd64/server/libjvm.so
#4  0x7f263ed67668 in GangWorker::loop() () from 

[jira] [Commented] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM

2015-07-28 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645487#comment-14645487
 ] 

zhangyubiao commented on YARN-3979:
---

In the first time I find the some job hang 10 min  and we chang the 
yarn.resourcemanager.client.thread-count 50 - 100 
yarn.resourcemanager.scheduler.client.thread-count  50-100
yarn.resourcemanager.resource-tracker.client.thread-count 50 - 80 
,and few days we find that RM machine in sometimes the load and 
CPU use beging high.

And I find the RM Logs  event queue begin very large 

2015-07-29 01:59:21,196 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4924000
2015-07-29 01:59:21,196 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4925000
2015-07-29 01:59:21,196 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4926000
2015-07-29 01:59:21,196 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4927000
2015-07-29 01:59:21,196 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4928000
2015-07-29 01:59:21,197 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4929000
2015-07-29 01:59:21,197 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 493
2015-07-29 01:59:21,197 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 493
2015-07-29 01:59:21,197 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4931000
2015-07-29 01:59:21,197 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4932000
2015-07-29 01:59:21,197 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4933000
2015-07-29 01:59:21,197 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4934000
2015-07-29 01:59:21,197 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4935000
2015-07-29 01:59:21,197 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4936000
2015-07-29 01:59:21,197 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4937000
2015-07-29 01:59:21,198 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4938000
2015-07-29 01:59:21,198 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4939000
2015-07-29 01:59:21,198 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 494
2015-07-29 01:59:21,198 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4941000
2015-07-29 01:59:21,198 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4942000
2015-07-29 01:59:21,198 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4943000
2015-07-29 01:59:21,198 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4944000
2015-07-29 01:59:21,199 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4945000
2015-07-29 01:59:21,199 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4946000
2015-07-29 01:59:21,199 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4947000
2015-07-29 01:59:21,199 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4948000
2015-07-29 01:59:21,199 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4949000

 Am in ResourceLocalizationService hang 10 min cause RM kill  AM
 ---

 Key: YARN-3979
 URL: https://issues.apache.org/jira/browse/YARN-3979
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.5  Hadoop-2.2.0
Reporter: zhangyubiao

 2015-07-27 02:46:17,348 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1437735375558
 _104282_01_01
 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
 2015-07-27 02:56:18,510 INFO 
 SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
  Authorization successful for appattempt_1437735375558_104282_0
 1 (auth:TOKEN) for protocol=interface 
 org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM

2015-07-26 Thread zhangyubiao (JIRA)
zhangyubiao created YARN-3979:
-

 Summary: Am in ResourceLocalizationService hang 10 min cause RM 
kill  AM
 Key: YARN-3979
 URL: https://issues.apache.org/jira/browse/YARN-3979
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.5  Hadoop-2.2.0
Reporter: zhangyubiao


2015-07-27 02:46:17,348 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Created localizer for container_1437735375558
_104282_01_01
2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth 
successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
2015-07-27 02:56:18,510 INFO 
SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
 Authorization successful for appattempt_1437735375558_104282_0
1 (auth:TOKEN) for protocol=interface 
org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)