[jira] [Commented] (YARN-6791) Add a new acl control to make YARN acl control perfect

2017-07-11 Thread daemon (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083450#comment-16083450
 ] 

daemon commented on YARN-6791:
--

[~templedf] hi, templedf. Can you help me review code?  pretty thanks!

> Add a new acl control to make YARN acl control perfect
> --
>
> Key: YARN-6791
> URL: https://issues.apache.org/jira/browse/YARN-6791
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: daemon
>Assignee: daemon
> Fix For: 2.7.2
>
> Attachments: screenshot-1.png, screenshot-2.png, YARN-6791.001.patch, 
> YARN-6791.002.patch
>
>
> The yarn application acl control is not so perfect which could let us pretty 
> confused sometimes.
> !screenshot-1.png!
> The yarn acl is disabled default, but when we enable it. 
> You will find the others who neither  is yarn admin nor queue admin can not 
> view the application detail. 
> It will show something as blow in YARN RM web ui:
> !screenshot-2.png! 
> So when we enable those configs, you will find it is very inconvenient to 
> view the application status
> when use RM web ui.
> There are two ways to solve the problem:
> 1.  To make the web ui more perfect, and can allow user to login as any uses 
> he want.
> Besides, it should provide a perfect verification mechanism.
> 2. Add a config in YarnConfiguration, which can allow some uses view any 
> applications he want.
> But he has not modify permissions. 
> In this way, the user can view all applications  but can not kill other users 
> applications.
> Of the above two solutions, I will choose the second solution.
> It is low cost but is more useful.
> I will work on this, and upload the patch later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6769) Put the no demand queue after the most in FairSharePolicy#compare

2017-07-13 Thread daemon (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086624#comment-16086624
 ] 

daemon commented on YARN-6769:
--

[~yufeigu], Thanks yufei. I really name is zhouyunfan. 
Thank you so mush for doing so mush for me!

> Put the no demand queue after the most in FairSharePolicy#compare
> -
>
> Key: YARN-6769
> URL: https://issues.apache.org/jira/browse/YARN-6769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
>Assignee: daemon
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: YARN-6769.001.patch, YARN-6769.002.patch, 
> YARN-6769.003.patch, YARN-6769.004.patch
>
>
> When use fairsheduler as RM scheduler, before assign container we will sort 
> all queues or applications. 
> We will use FairSharePolicy#compare as the comparator, but the comparator is 
> not so perfect.
> It have a problem as blow:
> 1. when a queue use resource over minShare(minResources), it will put behind 
> the queue whose demand is zeor.
> so it will greater opportunity to get the resource although it do not want. 
> It will waste schedule time when assign container
> to queue or application.
> I have fix it, and I will upload the patch to the jira.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6791) Add a new acl control to make YARN acl control perfect

2017-07-11 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6791:
-
Attachment: YARN-6791.002.patch

> Add a new acl control to make YARN acl control perfect
> --
>
> Key: YARN-6791
> URL: https://issues.apache.org/jira/browse/YARN-6791
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: daemon
>Assignee: daemon
> Fix For: 2.7.2
>
> Attachments: screenshot-1.png, screenshot-2.png, YARN-6791.001.patch, 
> YARN-6791.002.patch
>
>
> The yarn application acl control is not so perfect which could let us pretty 
> confused sometimes.
> !screenshot-1.png!
> The yarn acl is disabled default, but when we enable it. 
> You will find the others who neither  is yarn admin nor queue admin can not 
> view the application detail. 
> It will show something as blow in YARN RM web ui:
> !screenshot-2.png! 
> So when we enable those configs, you will find it is very inconvenient to 
> view the application status
> when use RM web ui.
> There are two ways to solve the problem:
> 1.  To make the web ui more perfect, and can allow user to login as any uses 
> he want.
> Besides, it should provide a perfect verification mechanism.
> 2. Add a config in YarnConfiguration, which can allow some uses view any 
> applications he want.
> But he has not modify permissions. 
> In this way, the user can view all applications  but can not kill other users 
> applications.
> Of the above two solutions, I will choose the second solution.
> It is low cost but is more useful.
> I will work on this, and upload the patch later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6769) Put the no demand queue after the most in FairSharePolicy#compare

2017-07-09 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6769:
-
Attachment: YARN-6769.002.patch

> Put the no demand queue after the most in FairSharePolicy#compare
> -
>
> Key: YARN-6769
> URL: https://issues.apache.org/jira/browse/YARN-6769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
>Assignee: daemon
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: YARN-6769.001.patch, YARN-6769.002.patch
>
>
> When use fairsheduler as RM scheduler, before assign container we will sort 
> all queues or applications. 
> We will use FairSharePolicy#compare as the comparator, but the comparator is 
> not so perfect.
> It have a problem as blow:
> 1. when a queue use resource over minShare(minResources), it will put behind 
> the queue whose demand is zeor.
> so it will greater opportunity to get the resource although it do not want. 
> It will waste schedule time when assign container
> to queue or application.
> I have fix it, and I will upload the patch to the jira.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6769) Put the no demand queue after the most in FairSharePolicy#compare

2017-07-09 Thread daemon (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079643#comment-16079643
 ] 

daemon commented on YARN-6769:
--

[~hadoopqa] I am sorry for my break the test, so I fix it and upload a new 
patch file.
[~yufeigu] yufei, please help me review the code if you have free time,  thanks 
a lot.

> Put the no demand queue after the most in FairSharePolicy#compare
> -
>
> Key: YARN-6769
> URL: https://issues.apache.org/jira/browse/YARN-6769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
>Assignee: daemon
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: YARN-6769.001.patch, YARN-6769.002.patch
>
>
> When use fairsheduler as RM scheduler, before assign container we will sort 
> all queues or applications. 
> We will use FairSharePolicy#compare as the comparator, but the comparator is 
> not so perfect.
> It have a problem as blow:
> 1. when a queue use resource over minShare(minResources), it will put behind 
> the queue whose demand is zeor.
> so it will greater opportunity to get the resource although it do not want. 
> It will waste schedule time when assign container
> to queue or application.
> I have fix it, and I will upload the patch to the jira.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6802) Support view leaf queue am resource usage in RM web ui

2017-07-11 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6802:
-
Attachment: YARN-6802.001.patch

> Support view leaf queue am resource usage in RM web ui
> --
>
> Key: YARN-6802
> URL: https://issues.apache.org/jira/browse/YARN-6802
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: daemon
>Assignee: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png, screenshot-2.png, YARN-6802.001.patch
>
>
> RM Web ui should support view leaf queue am resource usage. 
> !screenshot-2.png!
> I will upload my patch later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6769) Put the no demand queue after the most in FairSharePolicy#compare

2017-07-11 Thread daemon (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081923#comment-16081923
 ] 

daemon commented on YARN-6769:
--

[~yufeigu] Thanks yufei. You are right,  I have already fix those problems.

> Put the no demand queue after the most in FairSharePolicy#compare
> -
>
> Key: YARN-6769
> URL: https://issues.apache.org/jira/browse/YARN-6769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
>Assignee: daemon
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: YARN-6769.001.patch, YARN-6769.002.patch, 
> YARN-6769.003.patch
>
>
> When use fairsheduler as RM scheduler, before assign container we will sort 
> all queues or applications. 
> We will use FairSharePolicy#compare as the comparator, but the comparator is 
> not so perfect.
> It have a problem as blow:
> 1. when a queue use resource over minShare(minResources), it will put behind 
> the queue whose demand is zeor.
> so it will greater opportunity to get the resource although it do not want. 
> It will waste schedule time when assign container
> to queue or application.
> I have fix it, and I will upload the patch to the jira.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6769) Put the no demand queue after the most in FairSharePolicy#compare

2017-07-09 Thread daemon (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079869#comment-16079869
 ] 

daemon commented on YARN-6769:
--

[~templedf] hi, Daniel.  Can you help review my code?  Pretty thanks!

> Put the no demand queue after the most in FairSharePolicy#compare
> -
>
> Key: YARN-6769
> URL: https://issues.apache.org/jira/browse/YARN-6769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
>Assignee: daemon
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: YARN-6769.001.patch, YARN-6769.002.patch
>
>
> When use fairsheduler as RM scheduler, before assign container we will sort 
> all queues or applications. 
> We will use FairSharePolicy#compare as the comparator, but the comparator is 
> not so perfect.
> It have a problem as blow:
> 1. when a queue use resource over minShare(minResources), it will put behind 
> the queue whose demand is zeor.
> so it will greater opportunity to get the resource although it do not want. 
> It will waste schedule time when assign container
> to queue or application.
> I have fix it, and I will upload the patch to the jira.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6516) FairScheduler:the algorithm of assignContainer is so slow for it only can assign a thousand containers per second

2017-07-09 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon reassigned YARN-6516:


Assignee: daemon

> FairScheduler:the algorithm of assignContainer is so slow for it only can 
> assign a thousand containers per second
> -
>
> Key: YARN-6516
> URL: https://issues.apache.org/jira/browse/YARN-6516
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: JackZhou
>Assignee: daemon
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6769) Put the no demand queue after the most in FairSharePolicy#compare

2017-07-10 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6769:
-
Attachment: YARN-6769.003.patch

> Put the no demand queue after the most in FairSharePolicy#compare
> -
>
> Key: YARN-6769
> URL: https://issues.apache.org/jira/browse/YARN-6769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
>Assignee: daemon
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: YARN-6769.001.patch, YARN-6769.002.patch, 
> YARN-6769.003.patch
>
>
> When use fairsheduler as RM scheduler, before assign container we will sort 
> all queues or applications. 
> We will use FairSharePolicy#compare as the comparator, but the comparator is 
> not so perfect.
> It have a problem as blow:
> 1. when a queue use resource over minShare(minResources), it will put behind 
> the queue whose demand is zeor.
> so it will greater opportunity to get the resource although it do not want. 
> It will waste schedule time when assign container
> to queue or application.
> I have fix it, and I will upload the patch to the jira.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6802) Support view leaf queue am resource usage in RM web ui

2017-07-11 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6802:
-
Description: 
RM Web ui should support view leaf queue am resource usage. 
!screenshot-1.png!

I will upload my patch later.

> Support view leaf queue am resource usage in RM web ui
> --
>
> Key: YARN-6802
> URL: https://issues.apache.org/jira/browse/YARN-6802
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: daemon
>Assignee: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png
>
>
> RM Web ui should support view leaf queue am resource usage. 
> !screenshot-1.png!
> I will upload my patch later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6802) Support view leaf queue am resource usage in RM web ui

2017-07-11 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6802:
-
Attachment: screenshot-2.png

> Support view leaf queue am resource usage in RM web ui
> --
>
> Key: YARN-6802
> URL: https://issues.apache.org/jira/browse/YARN-6802
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: daemon
>Assignee: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> RM Web ui should support view leaf queue am resource usage. 
> !screenshot-1.png!
> I will upload my patch later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6802) Support view leaf queue am resource usage in RM web ui

2017-07-11 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6802:
-
Description: 
RM Web ui should support view leaf queue am resource usage. 

!screenshot-2.png!

I will upload my patch later.

  was:
RM Web ui should support view leaf queue am resource usage. 
!screenshot-1.png!

I will upload my patch later.


> Support view leaf queue am resource usage in RM web ui
> --
>
> Key: YARN-6802
> URL: https://issues.apache.org/jira/browse/YARN-6802
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: daemon
>Assignee: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> RM Web ui should support view leaf queue am resource usage. 
> !screenshot-2.png!
> I will upload my patch later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6802) Support view leaf queue am resource usage in RM web ui

2017-07-11 Thread daemon (JIRA)
daemon created YARN-6802:


 Summary: Support view leaf queue am resource usage in RM web ui
 Key: YARN-6802
 URL: https://issues.apache.org/jira/browse/YARN-6802
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Affects Versions: 2.7.2
Reporter: daemon
Assignee: daemon
 Fix For: 2.8.0






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6802) Support view leaf queue am resource usage in RM web ui

2017-07-11 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6802:
-
Attachment: screenshot-1.png

> Support view leaf queue am resource usage in RM web ui
> --
>
> Key: YARN-6802
> URL: https://issues.apache.org/jira/browse/YARN-6802
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: daemon
>Assignee: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6791) Add a new acl control to make YARN acl control perfect

2017-07-11 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6791:
-
Attachment: YARN-6791.001.patch

> Add a new acl control to make YARN acl control perfect
> --
>
> Key: YARN-6791
> URL: https://issues.apache.org/jira/browse/YARN-6791
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: daemon
>Assignee: daemon
> Fix For: 2.7.2
>
> Attachments: screenshot-1.png, screenshot-2.png, YARN-6791.001.patch
>
>
> The yarn application acl control is not so perfect which could let us pretty 
> confused sometimes.
> !screenshot-1.png!
> The yarn acl is disabled default, but when we enable it. 
> You will find the others who neither  is yarn admin nor queue admin can not 
> view the application detail. 
> It will show something as blow in YARN RM web ui:
> !screenshot-2.png! 
> So when we enable those configs, you will find it is very inconvenient to 
> view the application status
> when use RM web ui.
> There are two ways to solve the problem:
> 1.  To make the web ui more perfect, and can allow user to login as any uses 
> he want.
> Besides, it should provide a perfect verification mechanism.
> 2. Add a config in YarnConfiguration, which can allow some uses view any 
> applications he want.
> But he has not modify permissions. 
> In this way, the user can view all applications  but can not kill other users 
> applications.
> Of the above two solutions, I will choose the second solution.
> It is low cost but is more useful.
> I will work on this, and upload the patch later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6769) Put the no demand queue after the most in FairSharePolicy#compare

2017-07-13 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6769:
-
Attachment: YARN-6769.004.patch

> Put the no demand queue after the most in FairSharePolicy#compare
> -
>
> Key: YARN-6769
> URL: https://issues.apache.org/jira/browse/YARN-6769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
>Assignee: daemon
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: YARN-6769.001.patch, YARN-6769.002.patch, 
> YARN-6769.003.patch, YARN-6769.004.patch
>
>
> When use fairsheduler as RM scheduler, before assign container we will sort 
> all queues or applications. 
> We will use FairSharePolicy#compare as the comparator, but the comparator is 
> not so perfect.
> It have a problem as blow:
> 1. when a queue use resource over minShare(minResources), it will put behind 
> the queue whose demand is zeor.
> so it will greater opportunity to get the resource although it do not want. 
> It will waste schedule time when assign container
> to queue or application.
> I have fix it, and I will upload the patch to the jira.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6769) Put the no demand queue after the most in FairSharePolicy#compare

2017-07-13 Thread daemon (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085530#comment-16085530
 ] 

daemon commented on YARN-6769:
--

[~yufeigu] hi, yufei. Is there any other problems in my new patch?

> Put the no demand queue after the most in FairSharePolicy#compare
> -
>
> Key: YARN-6769
> URL: https://issues.apache.org/jira/browse/YARN-6769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
>Assignee: daemon
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: YARN-6769.001.patch, YARN-6769.002.patch, 
> YARN-6769.003.patch, YARN-6769.004.patch
>
>
> When use fairsheduler as RM scheduler, before assign container we will sort 
> all queues or applications. 
> We will use FairSharePolicy#compare as the comparator, but the comparator is 
> not so perfect.
> It have a problem as blow:
> 1. when a queue use resource over minShare(minResources), it will put behind 
> the queue whose demand is zeor.
> so it will greater opportunity to get the resource although it do not want. 
> It will waste schedule time when assign container
> to queue or application.
> I have fix it, and I will upload the patch to the jira.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6802) Support view leaf queue am resource usage in RM web ui

2017-07-13 Thread daemon (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085601#comment-16085601
 ] 

daemon commented on YARN-6802:
--

[~sunilg] Can you review my patch?

> Support view leaf queue am resource usage in RM web ui
> --
>
> Key: YARN-6802
> URL: https://issues.apache.org/jira/browse/YARN-6802
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: daemon
>Assignee: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png, screenshot-2.png, YARN-6802.001.patch
>
>
> RM Web ui should support view leaf queue am resource usage. 
> !screenshot-2.png!
> I will upload my patch later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6791) Add a new acl control to make YARN acl control perfect

2017-07-10 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6791:
-
Attachment: screenshot-1.png

> Add a new acl control to make YARN acl control perfect
> --
>
> Key: YARN-6791
> URL: https://issues.apache.org/jira/browse/YARN-6791
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.7.2
>
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6791) Add a new acl control to make YARN acl control perfect

2017-07-10 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6791:
-
Attachment: screenshot-2.png

> Add a new acl control to make YARN acl control perfect
> --
>
> Key: YARN-6791
> URL: https://issues.apache.org/jira/browse/YARN-6791
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.7.2
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6791) Add a new acl control to make YARN acl control perfect

2017-07-10 Thread daemon (JIRA)
daemon created YARN-6791:


 Summary: Add a new acl control to make YARN acl control perfect
 Key: YARN-6791
 URL: https://issues.apache.org/jira/browse/YARN-6791
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Affects Versions: 2.7.2
Reporter: daemon
 Fix For: 2.7.2






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6791) Add a new acl control to make YARN acl control perfect

2017-07-10 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6791:
-
Description: 
The yarn application acl control is not so perfect which could let us pretty 
confused sometimes.
!screenshot-1.png!

The yarn acl is disabled default, but when we enable it. 
You will find the others who neither  is yarn admin nor queue admin can not 
view the application detail. 
It will show something as blow in YARN RM web ui:
!screenshot-2.png! 

So when we enable those configs, you will find it is very inconvenient to view 
the application status
when use RM web ui.

There are two ways to solve the problem:
1.  To make the web ui more perfect, and can allow user to login as any uses he 
want.
Besides, it should provide a perfect verification mechanism.
2. Add a config in YarnConfiguration, which can allow some uses view any 
applications he want.
But he has not modify permissions. 
In this way, the user can view all applications  but can not kill other users 
applications.

Of the above two solutions, I will choose the second solution.
It is low cost but is more useful.

I will work on this, and upload the patch later.

  was:
The yarn application acl control is not so perfect which could let us pretty 
confused sometimes.
!screenshot-1.png!

The yarn acl is disabled default, but when we enable it. 
You will find the others who neither  is yarn admin nor queue admin, he will 
not view the status
of the application. It will show something as blow in YARN RM web ui:
!screenshot-2.png! 

So when we enable those configs, you will find it is very inconvenient to view 
the application status
for uses.
There are two ways to solve the problem:
1.  To make the web ui more perfect, and can allow user to login as any uses he 
want.
Besides, it should provide a perfect verification mechanism.
2. Add a config in YarnConfiguration, which can allow some uses view any 
applications he want.
But he has not modify permissions. 
In this way, the user can view all applications  but can not kill other users 
applications.

Of the above two solutions, I will choose the second solution.
It is low cost but is more useful.

I will work on this, and upload the patch later.


> Add a new acl control to make YARN acl control perfect
> --
>
> Key: YARN-6791
> URL: https://issues.apache.org/jira/browse/YARN-6791
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: daemon
>Assignee: daemon
> Fix For: 2.7.2
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> The yarn application acl control is not so perfect which could let us pretty 
> confused sometimes.
> !screenshot-1.png!
> The yarn acl is disabled default, but when we enable it. 
> You will find the others who neither  is yarn admin nor queue admin can not 
> view the application detail. 
> It will show something as blow in YARN RM web ui:
> !screenshot-2.png! 
> So when we enable those configs, you will find it is very inconvenient to 
> view the application status
> when use RM web ui.
> There are two ways to solve the problem:
> 1.  To make the web ui more perfect, and can allow user to login as any uses 
> he want.
> Besides, it should provide a perfect verification mechanism.
> 2. Add a config in YarnConfiguration, which can allow some uses view any 
> applications he want.
> But he has not modify permissions. 
> In this way, the user can view all applications  but can not kill other users 
> applications.
> Of the above two solutions, I will choose the second solution.
> It is low cost but is more useful.
> I will work on this, and upload the patch later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6769) Put the no demand queue after the most in FairSharePolicy#compare

2017-07-08 Thread daemon (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079453#comment-16079453
 ] 

daemon commented on YARN-6769:
--

[~yufei] Thanks yufei. I have already upload my patch file, what is the next I 
should do?

> Put the no demand queue after the most in FairSharePolicy#compare
> -
>
> Key: YARN-6769
> URL: https://issues.apache.org/jira/browse/YARN-6769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
>Assignee: daemon
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: YARN-6769.001.patch
>
>
> When use fairsheduler as RM scheduler, before assign container we will sort 
> all queues or applications. 
> We will use FairSharePolicy#compare as the comparator, but the comparator is 
> not so perfect.
> It have a problem as blow:
> 1. when a queue use resource over minShare(minResources), it will put behind 
> the queue whose demand is zeor.
> so it will greater opportunity to get the resource although it do not want. 
> It will waste schedule time when assign container
> to queue or application.
> I have fix it, and I will upload the patch to the jira.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6769) Put the no demand queue after the most in FairSharePolicy#compare

2017-07-08 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6769:
-
Attachment: YARN-6769.001.patch

> Put the no demand queue after the most in FairSharePolicy#compare
> -
>
> Key: YARN-6769
> URL: https://issues.apache.org/jira/browse/YARN-6769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
>Assignee: daemon
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: YARN-6769.001.patch
>
>
> When use fairsheduler as RM scheduler, before assign container we will sort 
> all queues or applications. 
> We will use FairSharePolicy#compare as the comparator, but the comparator is 
> not so perfect.
> It have a problem as blow:
> 1. when a queue use resource over minShare(minResources), it will put behind 
> the queue whose demand is zeor.
> so it will greater opportunity to get the resource although it do not want. 
> It will waste schedule time when assign container
> to queue or application.
> I have fix it, and I will upload the patch to the jira.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6769) Put the no demand queue after the most in FairSharePolicy#compare

2017-07-08 Thread daemon (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079453#comment-16079453
 ] 

daemon edited comment on YARN-6769 at 7/9/17 5:20 AM:
--

[~yufei] Thanks yufei. I have already upload my patch file, what is the next I 
should do?


was (Author: daemon):
[~yufei] Thanks yufei. I have already upload my patch file, what is the next I 
should do?

> Put the no demand queue after the most in FairSharePolicy#compare
> -
>
> Key: YARN-6769
> URL: https://issues.apache.org/jira/browse/YARN-6769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
>Assignee: daemon
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: YARN-6769.001.patch
>
>
> When use fairsheduler as RM scheduler, before assign container we will sort 
> all queues or applications. 
> We will use FairSharePolicy#compare as the comparator, but the comparator is 
> not so perfect.
> It have a problem as blow:
> 1. when a queue use resource over minShare(minResources), it will put behind 
> the queue whose demand is zeor.
> so it will greater opportunity to get the resource although it do not want. 
> It will waste schedule time when assign container
> to queue or application.
> I have fix it, and I will upload the patch to the jira.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6791) Add a new acl control to make YARN acl control perfect

2017-07-10 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6791:
-
Description: 
The yarn application acl control is not so perfect which could let us pretty 
confused sometimes.
!screenshot-1.png!

The yarn acl is disabled default, but when we enable it. 
You will find the others who neither  is yarn admin nor queue admin, he will 
not view the status
of the application. It will show something as blow in YARN RM web ui:
!screenshot-2.png! 

So when we enable those configs, you will find it is very inconvenient to view 
the application status
for uses.
There are two ways to solve the problem:
1.  To make the web ui more perfect, and can allow user to login as any uses he 
want.
Besides, it should provide a perfect verification mechanism.
2. Add a config in YarnConfiguration, which can allow some uses view any 
applications he want.
But he has not modify permissions. 
In this way, the user can view all applications  but can not kill other users 
applications.

Of the above two solutions, I will choose the second solution.
It is low cost but is more useful.

I will work on this, and upload the patch later.

> Add a new acl control to make YARN acl control perfect
> --
>
> Key: YARN-6791
> URL: https://issues.apache.org/jira/browse/YARN-6791
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.7.2
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> The yarn application acl control is not so perfect which could let us pretty 
> confused sometimes.
> !screenshot-1.png!
> The yarn acl is disabled default, but when we enable it. 
> You will find the others who neither  is yarn admin nor queue admin, he will 
> not view the status
> of the application. It will show something as blow in YARN RM web ui:
> !screenshot-2.png! 
> So when we enable those configs, you will find it is very inconvenient to 
> view the application status
> for uses.
> There are two ways to solve the problem:
> 1.  To make the web ui more perfect, and can allow user to login as any uses 
> he want.
> Besides, it should provide a perfect verification mechanism.
> 2. Add a config in YarnConfiguration, which can allow some uses view any 
> applications he want.
> But he has not modify permissions. 
> In this way, the user can view all applications  but can not kill other users 
> applications.
> Of the above two solutions, I will choose the second solution.
> It is low cost but is more useful.
> I will work on this, and upload the patch later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6791) Add a new acl control to make YARN acl control perfect

2017-07-10 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon reassigned YARN-6791:


Assignee: daemon

> Add a new acl control to make YARN acl control perfect
> --
>
> Key: YARN-6791
> URL: https://issues.apache.org/jira/browse/YARN-6791
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: daemon
>Assignee: daemon
> Fix For: 2.7.2
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> The yarn application acl control is not so perfect which could let us pretty 
> confused sometimes.
> !screenshot-1.png!
> The yarn acl is disabled default, but when we enable it. 
> You will find the others who neither  is yarn admin nor queue admin, he will 
> not view the status
> of the application. It will show something as blow in YARN RM web ui:
> !screenshot-2.png! 
> So when we enable those configs, you will find it is very inconvenient to 
> view the application status
> for uses.
> There are two ways to solve the problem:
> 1.  To make the web ui more perfect, and can allow user to login as any uses 
> he want.
> Besides, it should provide a perfect verification mechanism.
> 2. Add a config in YarnConfiguration, which can allow some uses view any 
> applications he want.
> But he has not modify permissions. 
> In this way, the user can view all applications  but can not kill other users 
> applications.
> Of the above two solutions, I will choose the second solution.
> It is low cost but is more useful.
> I will work on this, and upload the patch later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6769) Put the no demand queue after the most in FairSharePolicy#compare

2017-07-06 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6769:
-
Description: 
When use fairsheduler as RM scheduler, before assign container we will sort all 
queues or applications. 
We will use FairSharePolicy#compare as the comparator, but the comparator is 
not so perfect.
It have a problem as blow:
1. when a queue use resource over minShare(minResources), it will put behind 
the queue whose demand is zeor.
so it will greater opportunity to get the resource although it do not want. It 
will waste schedule time when assign container
to queue or application.

I have fix it, and I will upload the patch to the jira.

> Put the no demand queue after the most in FairSharePolicy#compare
> -
>
> Key: YARN-6769
> URL: https://issues.apache.org/jira/browse/YARN-6769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
>Priority: Minor
> Fix For: 2.9.0
>
>
> When use fairsheduler as RM scheduler, before assign container we will sort 
> all queues or applications. 
> We will use FairSharePolicy#compare as the comparator, but the comparator is 
> not so perfect.
> It have a problem as blow:
> 1. when a queue use resource over minShare(minResources), it will put behind 
> the queue whose demand is zeor.
> so it will greater opportunity to get the resource although it do not want. 
> It will waste schedule time when assign container
> to queue or application.
> I have fix it, and I will upload the patch to the jira.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (YARN-6769) Put the no demand queue after the most in FairSharePolicy#compare

2017-07-06 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6769:
-
Comment: was deleted

(was: diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
index f8cdb45929..e930b80e45 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
@@ -79,6 +79,19 @@ public String getName() {
 
 @Override
 public int compare(Schedulable s1, Schedulable s2) {
+  Resource demand1 = s1.getDemand();
+  Resource demand2 = s2.getDemand();
+  // Put the schedulable which does not require resource to
+  // the end. So the other schedulable can get resource as soon as
+  // possible though it use resource greater then it minShare or demand.
+  if (demand1.equals(Resources.none()) &&
+  !demand2.equals(Resources.none())) {
+return 1;
+  } else if (demand2.equals(Resources.none()) &&
+  !demand1.equals(Resources.none())) {
+return -1;
+  }
+  
   double minShareRatio1, minShareRatio2;
   double useToWeightRatio1, useToWeightRatio2;
   double weight1, weight2;
@@ -86,9 +99,9 @@ public int compare(Schedulable s1, Schedulable s2) {
   Resource resourceUsage1 = s1.getResourceUsage();
   Resource resourceUsage2 = s2.getResourceUsage();
   Resource minShare1 = Resources.min(RESOURCE_CALCULATOR, null,
-  s1.getMinShare(), s1.getDemand());
+  s1.getMinShare(), demand1);
   Resource minShare2 = Resources.min(RESOURCE_CALCULATOR, null,
-  s2.getMinShare(), s2.getDemand());
+  s2.getMinShare(), demand2);
   boolean s1Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,
   resourceUsage1, minShare1);
   boolean s2Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,
)

> Put the no demand queue after the most in FairSharePolicy#compare
> -
>
> Key: YARN-6769
> URL: https://issues.apache.org/jira/browse/YARN-6769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
>Priority: Minor
> Fix For: 2.9.0
>
>
> When use fairsheduler as RM scheduler, before assign container we will sort 
> all queues or applications. 
> We will use FairSharePolicy#compare as the comparator, but the comparator is 
> not so perfect.
> It have a problem as blow:
> 1. when a queue use resource over minShare(minResources), it will put behind 
> the queue whose demand is zeor.
> so it will greater opportunity to get the resource although it do not want. 
> It will waste schedule time when assign container
> to queue or application.
> I have fix it, and I will upload the patch to the jira.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (YARN-6769) Put the no demand queue after the most in FairSharePolicy#compare

2017-07-06 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6769:
-
Comment: was deleted

(was: diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
index f8cdb45929..e930b80e45 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java

b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
@@ -79,6  79,19 @@ public String getName() {
 
 @Override
 public int compare(Schedulable s1, Schedulable s2) {
   Resource demand1 = s1.getDemand();
   Resource demand2 = s2.getDemand();
   // Put the schedulable which does not require resource to
   // the end. So the other schedulable can get resource as soon as
   // possible though it use resource greater then it minShare or demand.
   if (demand1.equals(Resources.none()) &&
   !demand2.equals(Resources.none())) {
 return 1;
   } else if (demand2.equals(Resources.none()) &&
   !demand1.equals(Resources.none())) {
 return -1;
   }
   
   double minShareRatio1, minShareRatio2;
   double useToWeightRatio1, useToWeightRatio2;
   double weight1, weight2;
@@ -86,9  99,9 @@ public int compare(Schedulable s1, Schedulable s2) {
   Resource resourceUsage1 = s1.getResourceUsage();
   Resource resourceUsage2 = s2.getResourceUsage();
   Resource minShare1 = Resources.min(RESOURCE_CALCULATOR, null,
-  s1.getMinShare(), s1.getDemand());
   s1.getMinShare(), demand1);
   Resource minShare2 = Resources.min(RESOURCE_CALCULATOR, null,
-  s2.getMinShare(), s2.getDemand());
   s2.getMinShare(), demand2);
   boolean s1Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,
   resourceUsage1, minShare1);
   boolean s2Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,
)

> Put the no demand queue after the most in FairSharePolicy#compare
> -
>
> Key: YARN-6769
> URL: https://issues.apache.org/jira/browse/YARN-6769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
>Priority: Minor
> Fix For: 2.9.0
>
>
> When use fairsheduler as RM scheduler, before assign container we will sort 
> all queues or applications. 
> We will use FairSharePolicy#compare as the comparator, but the comparator is 
> not so perfect.
> It have a problem as blow:
> 1. when a queue use resource over minShare(minResources), it will put behind 
> the queue whose demand is zeor.
> so it will greater opportunity to get the resource although it do not want. 
> It will waste schedule time when assign container
> to queue or application.
> I have fix it, and I will upload the patch to the jira.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (YARN-6769) Put the no demand queue after the most in FairSharePolicy#compare

2017-07-06 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6769:
-
Comment: was deleted

(was: 
{code:java}
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
index f8cdb45929..e930b80e45 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
@@ -79,6 +79,19 @@ public String getName() {
 
 @Override
 public int compare(Schedulable s1, Schedulable s2) {
+  Resource demand1 = s1.getDemand();
+  Resource demand2 = s2.getDemand();
+  // Put the schedulable which does not require resource to
+  // the end. So the other schedulable can get resource as soon as
+  // possible though it use resource greater then it minShare or demand.
+  if (demand1.equals(Resources.none()) &&
+  !demand2.equals(Resources.none())) {
+return 1;
+  } else if (demand2.equals(Resources.none()) &&
+  !demand1.equals(Resources.none())) {
+return -1;
+  }
+  
   double minShareRatio1, minShareRatio2;
   double useToWeightRatio1, useToWeightRatio2;
   double weight1, weight2;
@@ -86,9 +99,9 @@ public int compare(Schedulable s1, Schedulable s2) {
   Resource resourceUsage1 = s1.getResourceUsage();
   Resource resourceUsage2 = s2.getResourceUsage();
   Resource minShare1 = Resources.min(RESOURCE_CALCULATOR, null,
-  s1.getMinShare(), s1.getDemand());
+  s1.getMinShare(), demand1);
   Resource minShare2 = Resources.min(RESOURCE_CALCULATOR, null,
-  s2.getMinShare(), s2.getDemand());
+  s2.getMinShare(), demand2);
   boolean s1Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,
   resourceUsage1, minShare1);
   boolean s2Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,

{code}
)

> Put the no demand queue after the most in FairSharePolicy#compare
> -
>
> Key: YARN-6769
> URL: https://issues.apache.org/jira/browse/YARN-6769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
>Priority: Minor
> Fix For: 2.9.0
>
>
> When use fairsheduler as RM scheduler, before assign container we will sort 
> all queues or applications. 
> We will use FairSharePolicy#compare as the comparator, but the comparator is 
> not so perfect.
> It have a problem as blow:
> 1. when a queue use resource over minShare(minResources), it will put behind 
> the queue whose demand is zeor.
> so it will greater opportunity to get the resource although it do not want. 
> It will waste schedule time when assign container
> to queue or application.
> I have fix it, and I will upload the patch to the jira.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6769) Put the no demand queue after the most in FairSharePolicy#compare

2017-07-06 Thread daemon (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077525#comment-16077525
 ] 

daemon commented on YARN-6769:
--

diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
index f8cdb45929..e930b80e45 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java

b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
@@ -79,6  79,19 @@ public String getName() {
 
 @Override
 public int compare(Schedulable s1, Schedulable s2) {
   Resource demand1 = s1.getDemand();
   Resource demand2 = s2.getDemand();
   // Put the schedulable which does not require resource to
   // the end. So the other schedulable can get resource as soon as
   // possible though it use resource greater then it minShare or demand.
   if (demand1.equals(Resources.none()) &&
   !demand2.equals(Resources.none())) {
 return 1;
   } else if (demand2.equals(Resources.none()) &&
   !demand1.equals(Resources.none())) {
 return -1;
   }
   
   double minShareRatio1, minShareRatio2;
   double useToWeightRatio1, useToWeightRatio2;
   double weight1, weight2;
@@ -86,9  99,9 @@ public int compare(Schedulable s1, Schedulable s2) {
   Resource resourceUsage1 = s1.getResourceUsage();
   Resource resourceUsage2 = s2.getResourceUsage();
   Resource minShare1 = Resources.min(RESOURCE_CALCULATOR, null,
-  s1.getMinShare(), s1.getDemand());
   s1.getMinShare(), demand1);
   Resource minShare2 = Resources.min(RESOURCE_CALCULATOR, null,
-  s2.getMinShare(), s2.getDemand());
   s2.getMinShare(), demand2);
   boolean s1Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,
   resourceUsage1, minShare1);
   boolean s2Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,


> Put the no demand queue after the most in FairSharePolicy#compare
> -
>
> Key: YARN-6769
> URL: https://issues.apache.org/jira/browse/YARN-6769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
>Priority: Minor
> Fix For: 2.9.0
>
>
> When use fairsheduler as RM scheduler, before assign container we will sort 
> all queues or applications. 
> We will use FairSharePolicy#compare as the comparator, but the comparator is 
> not so perfect.
> It have a problem as blow:
> 1. when a queue use resource over minShare(minResources), it will put behind 
> the queue whose demand is zeor.
> so it will greater opportunity to get the resource although it do not want. 
> It will waste schedule time when assign container
> to queue or application.
> I have fix it, and I will upload the patch to the jira.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6769) Put the no demand queue after the most in FairSharePolicy#compare

2017-07-06 Thread daemon (JIRA)
daemon created YARN-6769:


 Summary: Put the no demand queue after the most in 
FairSharePolicy#compare
 Key: YARN-6769
 URL: https://issues.apache.org/jira/browse/YARN-6769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.2
Reporter: daemon
Priority: Minor
 Fix For: 2.9.0






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6772) Several ways to improve fair scheduler schedule performance

2017-07-07 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6772:
-
Summary: Several ways to improve fair scheduler schedule performance  (was: 
Several way to improve fair scheduler schedule performance)

> Several ways to improve fair scheduler schedule performance
> ---
>
> Key: YARN-6772
> URL: https://issues.apache.org/jira/browse/YARN-6772
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.7.2
>
>
> There are several ways to 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6772) Several way to improve fair scheduler schedule performance

2017-07-07 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6772:
-
Description: There are several ways to 

> Several way to improve fair scheduler schedule performance
> --
>
> Key: YARN-6772
> URL: https://issues.apache.org/jira/browse/YARN-6772
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.7.2
>
>
> There are several ways to 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6772) Several way to improve fair scheduler schedule performance

2017-07-07 Thread daemon (JIRA)
daemon created YARN-6772:


 Summary: Several way to improve fair scheduler schedule performance
 Key: YARN-6772
 URL: https://issues.apache.org/jira/browse/YARN-6772
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.7.2
Reporter: daemon
 Fix For: 2.7.2






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6772) Several ways to improve fair scheduler schedule performance

2017-07-07 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6772:
-
Description: 
There are several ways to improve fair scheduler schedule performance, and it 
improve  a lot performance in my test environment.
We have run it in our production cluster, and the scheduler is pretty stable 
and faster.
It can assign over 5000 containers per second, and sometimes over 1 
containers.

  was:There are several ways to 


> Several ways to improve fair scheduler schedule performance
> ---
>
> Key: YARN-6772
> URL: https://issues.apache.org/jira/browse/YARN-6772
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.7.2
>
>
> There are several ways to improve fair scheduler schedule performance, and it 
> improve  a lot performance in my test environment.
> We have run it in our production cluster, and the scheduler is pretty stable 
> and faster.
> It can assign over 5000 containers per second, and sometimes over 1 
> containers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6320) FairScheduler:Identifying apps to assign in updateThread

2017-05-31 Thread daemon (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032452#comment-16032452
 ] 

daemon commented on YARN-6320:
--

I think I have solved the problem and in my test environment the scheduler can 
assign over 5000 containers per second.

> FairScheduler:Identifying apps to assign in updateThread
> 
>
> Key: YARN-6320
> URL: https://issues.apache.org/jira/browse/YARN-6320
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tao Jie
>
> In FairScheduler today, we have 1)UpdateThread that update queue/app status, 
> fairshare, starvation info, 2)nodeUpdate triggered by NM heartbeat that do 
> the scheduling. When we handle one nodeUpdate, we will top-down from the root 
> queue to the leafqueues and find the most needy application to allocate 
> container according to queue's fairshare. Also we should sort children at 
> each hierarchy level.
> My thought is that we have a global sorted {{candidateAppList}} which keeps 
> apps need to assign, and move the logic that "find app that should allocate 
> resource to" from nodeUpdate to UpdateThread. In UpdateThread, we find 
> candidate apps to assign and put them into {{candidateAppList}}. In 
> nodeUpdate, we consume the list and allocate containers to apps. 
> As far as I see, we can have 3 benifits:
> 1, nodeUpdate() is invoked much more frequently than update() in 
> UpdateThread, especially in a large cluster. As a result we can reduce much 
> unnecessary sorting.
> 2, It will have better coordination with YARN-5829, we can indicate apps to 
> assign more directly rather than let nodes find the best apps to assign.
> 3, It seems to be easier to introduce scheduling restricts such as nodelabel, 
> affinity/anti-affinity into FS, since we can pre-allocate containers 
> asynchronously.
> [~kasha], [~templedf], [~yufeigu] like to hear your thoughts.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5814) Add druid as storage backend in YARN Timeline Service

2017-06-08 Thread daemon (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042759#comment-16042759
 ] 

daemon commented on YARN-5814:
--

[~BINGXUE QIU] hi, bingxue,can you upload your patch? 
Your patch is very useful for me!

>  Add druid as storage backend in YARN Timeline Service
> --
>
> Key: YARN-5814
> URL: https://issues.apache.org/jira/browse/YARN-5814
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: ATSv2
>Affects Versions: 3.0.0-alpha2
>Reporter: Bingxue Qiu
> Attachments: Add-Druid-in-YARN-Timeline-Service.pdf
>
>
> h3. Introduction
> I propose to add druid as storage backend in YARN Timeline Service.
> We run more than 6000 applications and generate 450 million metrics daily in 
> Alibaba Clusters with thousands of nodes. We need to collect and store 
> meta/events/metrics data, online analyze the utilization reports of various 
> dimensions and display the trends of allocation/usage resources for cluster 
> by joining and aggregating data. It helps us to manage and optimize the 
> cluster by tracking resource utilization.
> To achieve our goal we have changed to use druid as the storage instead of 
> HBase and have achieved sub-second OLAP performance in our production 
> environment for few months. 
> h3. Analysis
> Currently YARN Timeline Service only supports aggregating metrics at a) flow 
> level by FlowRunCoprocessor and b) application level metrics aggregating by 
> AppLevelTimelineCollector, offline (time-based periodic) aggregation for 
> flows/users/queues for reporting and analysis is planned but not yet 
> implemented. YARN Timeline Service chooses Apache HBase as the primary 
> storage backend. As we all know that HBase doesn't fit for OLAP.
>  For arbitrary exploration of data,such as online analyze the utilization 
> reports of various dimensions(Queue,Flow,Users,Application,CPU,Memory) by 
> joining and aggregating data, Druid's custom column format enables ad-hoc 
> queries without pre-computation. The format also enables fast scans on 
> columns, which is important for good aggregation performance.
> To achieve our goal that support to online analyze the utilization reports of 
> various dimensions, display the variation trends of allocation/usage 
> resources for cluster, and arbitrary exploration of data, we propose to add 
> druid storage and implement DruidWriter /DruidReader in YARN Timeline Service.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)
daemon created YARN-6710:


 Summary: There is a heavy bug in FSLeafQueue#amResourceUsage which 
will let the fair scheduler not assign container to the queue
 Key: YARN-6710
 URL: https://issues.apache.org/jira/browse/YARN-6710
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.2
Reporter: daemon
 Fix For: 2.8.0






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6710:
-
Attachment: screenshot-2.png

> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> There are over three thousand nodes in my hadoop production cluster, and we 
> use fair schedule as my scheduler.
> Though there are many free resource in my resource manager, but there are 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6710:
-
Attachment: screenshot-3.png

> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> There are over three thousand nodes in my hadoop production cluster, and we 
> use fair schedule as my scheduler.
> Though there are many free resource in my resource manager, but there are 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6710:
-
Attachment: screenshot-1.png

> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6710:
-
Description: 
There are over three thousand nodes in my hadoop production cluster, and we use 
fair schedule as my scheduler.
Though there are many free resource in my resource manager, but there are 46 
applications pending. 
Those applications can not run after  several hours, and in the end I have to 
stop them.

I reproduce the scene in my test environment, and I find a bug in FSLeafQueue. 
In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater than 
itself.
When fair scheduler try to assign container to a application attempt,  it will 
do as follow check:

!screenshot-2.png!
!screenshot-3.png!

Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater 
then it real value.
So when the value of amResourceUsage greater than the value of 
Resources.multiply(getFairShare(), maxAMShare) ,
and the FSLeafQueue#canRunAppAM function will return false which will let the 
fair scheduler not assign container
to the FSAppAttempt. 
In this scenario, all the application attempt will pending and never get any 
resource.

I find the reason why so many applications in my leaf queue is pending. I will 
describe it as follow:

When fair scheduler first assign a container to the application attempt, it 
will do something as blow:
!screenshot-4.png!

When fair scheduler remove the application attempt from the leaf queue, it will 
do something as blow:
!screenshot-5.png!

But when application attempt unregister itself, and all the container in the 
SchedulerApplicationAttempt#liveContainers 
are complete.  There is a APP_ATTEMPT_REMOVED event will send to fair 
scheduler, but it is asynchronous.
Before the application attempt is removed from FSLeafQueue, and there are 
pending request in FSAppAttempt.
The fair scheduler will assign container to the FSAppAttempt, because the size 
of the liveContainers will equals to
1. 
So the FSLeafQueue will add to container resource to the 
FSLeafQueue#amResourceUsage,  it will
let the value of amResourceUsage greater then itself. 
In the end, the value of FSLeafQueue#amResourceUsage is preety large although 
there is no application
it the queue.
When new application come, and the value of FSLeafQueue#amResourceUsage  
greater than the value
of Resources.multiply(getFairShare(), maxAMShare), it will let the scheduler 
never assign container to
the queue.
All of the applications in the queue will always pending.

  was:
There are over three thousand nodes in my hadoop production cluster, and we use 
fair schedule as my scheduler.
Though there are many free resource in my resource manager, but there are 46 
applications pending. 
Those applications can not run after  several hours, and in the end I have to 
stop them.

I reproduce the scene in my test environment, and I find a bug in FSLeafQueue. 
In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater than 
itself.
When fair scheduler try to assign container to a application attempt,  it will 
do as follow check:

!screenshot-2.png!
!screenshot-3.png!

Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater 
then it real value.
So when the value of amResourceUsage greater than the value of 
Resources.multiply(getFairShare(), maxAMShare) ,
and the FSLeafQueue#canRunAppAM function will return false which will let the 
fair scheduler not assign container
to the FSAppAttempt. 
In this scenario, all the application attempt will pending and never get any 
resource.

I find the reason why so many applications in my leaf queue is pending. I will 
describe it as flow:


> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png, screenshot-5.png
>
>
> There are over three thousand nodes in my hadoop production cluster, and we 
> use fair schedule as my scheduler.
> Though there are many free resource in my resource manager, but there are 46 
> applications pending. 
> Those applications can not run after  several hours, and in the end I have to 
> stop them.
> I reproduce the scene in my test environment, and I find a bug in 
> FSLeafQueue. 
> In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater 
> than itself.
> When fair scheduler try to assign container to a application attempt,  it 
> will do as 

[jira] [Updated] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6710:
-
Description: 
There are over three thousand nodes in my hadoop production cluster, and we use 
fair schedule as my scheduler.
Though my cluster is leisure but there are about 

> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png
>
>
> There are over three thousand nodes in my hadoop production cluster, and we 
> use fair schedule as my scheduler.
> Though my cluster is leisure but there are about 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048658#comment-16048658
 ] 

daemon commented on YARN-6710:
--

[~dan...@cloudera.com] I am sorry, I am try to express myself. But my english 
is so poor, so it is very slow
for me to express myself.

> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> There are over three thousand nodes in my hadoop production cluster, and we 
> use fair schedule as my scheduler.
> Though there are many free resource in my resource manager, but there are 46 
> applications pending. 
> Those applications can not run after  several hours, and in the end I have to 
> stop them.
> I reproduce the scene in my test environment, and I find a bug in 
> FSLeafQueue. 
> In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater 
> than itself.
> When fair scheduler try to assign container to a application attempt,  it 
> will do as follow check:
> !screenshot-2.png!
> !screenshot-3.png!
> Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater 
> then it real value.
> So when the value of amResourceUsage greater than the value of 
> Resources.multiply(getFairShare(), maxAMShare) ,
> and the FSLeafQueue#canRunAppAM function will return false which will let the 
> fair scheduler not assign container
> to the FSAppAttempt. 
> In this scenario, all the application attempt will pending and never get any 
> resource.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6710:
-
Attachment: screenshot-4.png

> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> There are over three thousand nodes in my hadoop production cluster, and we 
> use fair schedule as my scheduler.
> Though there are many free resource in my resource manager, but there are 46 
> applications pending. 
> Those applications can not run after  several hours, and in the end I have to 
> stop them.
> I reproduce the scene in my test environment, and I find a bug in 
> FSLeafQueue. 
> In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater 
> than itself.
> When fair scheduler try to assign container to a application attempt,  it 
> will do as follow check:
> !screenshot-2.png!
> !screenshot-3.png!
> Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater 
> then it real value.
> So when the value of amResourceUsage greater than the value of 
> Resources.multiply(getFairShare(), maxAMShare) ,
> and the FSLeafQueue#canRunAppAM function will return false which will let the 
> fair scheduler not assign container
> to the FSAppAttempt. 
> In this scenario, all the application attempt will pending and never get any 
> resource.
> I find the reason why so many applications in my leaf queue is pending. I 
> will describe it as flow:



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6710:
-
Description: 
There are over three thousand nodes in my hadoop production cluster, and we use 
fair schedule as my scheduler.
Though there are many free resource in my resource manager, but there are 

  was:
There are over three thousand nodes in my hadoop production cluster, and we use 
fair schedule as my scheduler.
Though my cluster is leisure but there are about 


> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png
>
>
> There are over three thousand nodes in my hadoop production cluster, and we 
> use fair schedule as my scheduler.
> Though there are many free resource in my resource manager, but there are 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6710:
-
Description: 
There are over three thousand nodes in my hadoop production cluster, and we use 
fair schedule as my scheduler.
Though there are many free resource in my resource manager, but there are 46 
applications pending. 
Those applications can not run after  several hours, and in the end I have to 
stop them.

I reproduce the scene in my test environment, and I find a bug in FSLeafQueue. 
In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater than 
itself.
When fair scheduler try to assign container to a application attempt,  it will 
do as follow check:

!screenshot-2.png!
!screenshot-3.png!

Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater 
then it real value.
So when the value of amResourceUsage greater than the value of 
Resources.multiply(getFairShare(), maxAMShare) ,
and the FSLeafQueue#canRunAppAM function will return false which will let the 
fair scheduler not assign container
to the FSAppAttempt. 
In this scenario, all the application attempt will pending and never get any 
resource.

  was:
There are over three thousand nodes in my hadoop production cluster, and we use 
fair schedule as my scheduler.
Though there are many free resource in my resource manager, but there are 


> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> There are over three thousand nodes in my hadoop production cluster, and we 
> use fair schedule as my scheduler.
> Though there are many free resource in my resource manager, but there are 46 
> applications pending. 
> Those applications can not run after  several hours, and in the end I have to 
> stop them.
> I reproduce the scene in my test environment, and I find a bug in 
> FSLeafQueue. 
> In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater 
> than itself.
> When fair scheduler try to assign container to a application attempt,  it 
> will do as follow check:
> !screenshot-2.png!
> !screenshot-3.png!
> Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater 
> then it real value.
> So when the value of amResourceUsage greater than the value of 
> Resources.multiply(getFairShare(), maxAMShare) ,
> and the FSLeafQueue#canRunAppAM function will return false which will let the 
> fair scheduler not assign container
> to the FSAppAttempt. 
> In this scenario, all the application attempt will pending and never get any 
> resource.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6710:
-
Description: 
There are over three thousand nodes in my hadoop production cluster, and we use 
fair schedule as my scheduler.
Though there are many free resource in my resource manager, but there are 46 
applications pending. 
Those applications can not run after  several hours, and in the end I have to 
stop them.

I reproduce the scene in my test environment, and I find a bug in FSLeafQueue. 
In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater than 
itself.
When fair scheduler try to assign container to a application attempt,  it will 
do as follow check:

!screenshot-2.png!
!screenshot-3.png!

Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater 
then it real value.
So when the value of amResourceUsage greater than the value of 
Resources.multiply(getFairShare(), maxAMShare) ,
and the FSLeafQueue#canRunAppAM function will return false which will let the 
fair scheduler not assign container
to the FSAppAttempt. 
In this scenario, all the application attempt will pending and never get any 
resource.

I find the reason why so many applications in my leaf queue is pending. I will 
describe it as flow:

  was:
There are over three thousand nodes in my hadoop production cluster, and we use 
fair schedule as my scheduler.
Though there are many free resource in my resource manager, but there are 46 
applications pending. 
Those applications can not run after  several hours, and in the end I have to 
stop them.

I reproduce the scene in my test environment, and I find a bug in FSLeafQueue. 
In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater than 
itself.
When fair scheduler try to assign container to a application attempt,  it will 
do as follow check:

!screenshot-2.png!
!screenshot-3.png!

Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater 
then it real value.
So when the value of amResourceUsage greater than the value of 
Resources.multiply(getFairShare(), maxAMShare) ,
and the FSLeafQueue#canRunAppAM function will return false which will let the 
fair scheduler not assign container
to the FSAppAttempt. 
In this scenario, all the application attempt will pending and never get any 
resource.


> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> There are over three thousand nodes in my hadoop production cluster, and we 
> use fair schedule as my scheduler.
> Though there are many free resource in my resource manager, but there are 46 
> applications pending. 
> Those applications can not run after  several hours, and in the end I have to 
> stop them.
> I reproduce the scene in my test environment, and I find a bug in 
> FSLeafQueue. 
> In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater 
> than itself.
> When fair scheduler try to assign container to a application attempt,  it 
> will do as follow check:
> !screenshot-2.png!
> !screenshot-3.png!
> Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater 
> then it real value.
> So when the value of amResourceUsage greater than the value of 
> Resources.multiply(getFairShare(), maxAMShare) ,
> and the FSLeafQueue#canRunAppAM function will return false which will let the 
> fair scheduler not assign container
> to the FSAppAttempt. 
> In this scenario, all the application attempt will pending and never get any 
> resource.
> I find the reason why so many applications in my leaf queue is pending. I 
> will describe it as flow:



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6710:
-
Attachment: screenshot-5.png

> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png, screenshot-5.png
>
>
> There are over three thousand nodes in my hadoop production cluster, and we 
> use fair schedule as my scheduler.
> Though there are many free resource in my resource manager, but there are 46 
> applications pending. 
> Those applications can not run after  several hours, and in the end I have to 
> stop them.
> I reproduce the scene in my test environment, and I find a bug in 
> FSLeafQueue. 
> In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater 
> than itself.
> When fair scheduler try to assign container to a application attempt,  it 
> will do as follow check:
> !screenshot-2.png!
> !screenshot-3.png!
> Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater 
> then it real value.
> So when the value of amResourceUsage greater than the value of 
> Resources.multiply(getFairShare(), maxAMShare) ,
> and the FSLeafQueue#canRunAppAM function will return false which will let the 
> fair scheduler not assign container
> to the FSAppAttempt. 
> In this scenario, all the application attempt will pending and never get any 
> resource.
> I find the reason why so many applications in my leaf queue is pending. I 
> will describe it as flow:



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-19 Thread daemon (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053715#comment-16053715
 ] 

daemon commented on YARN-6710:
--

[~yufeigu]  使用中文表述可能更清楚些, 这个问题导致的原因对于YARN端主要是由于
1. Application attempt运行完成之后,AM 向RM发送unregisterApplicationMaster RPC请求。RM在处理
这个消息时,做些简单的处理然后就向FairScheduler发送APP_ATTEMPT_REMOVED消息就返回了。 
而APP_ATTEMPT_REMOVED的处理是异步的,所以在FairScheduler中,对应的FSAppAttempt会过段时间
才会被remove掉。

这个问题会导致两个比较严重的后果发生:
1. 在这个时间间隔,FairScheduler还会给FSAppAttempt 分派Container。 并且会在分派Container的时候,如果
if (getLiveContainers().size() == 1 && !getUnmanagedAM()) 情况满足的话,会继续累加am 
resource的值到amResourceUsage,使得amResourceUsage的值比实际的值大很多。 在实际的情况中,可能会导致队列中的
作业一直pending,并且永远得不到资源, 这个就是我在上面描述的情况。  
对于amResourceUsage统计的值比实际大很多问题,社区已经有patch fix这个问题了。 具体可以查看这个jira:
https://issues.apache.org/jira/browse/YARN-3415。

2. 导致FairScheduler会给已经Finished的Application attempt分派Container, 
虽然对应的Container,在NM汇报
心跳的时候,RM会给NM发送Response,让对应的NM cleanup它。 但是会造成资源的浪费。 并且目前调度速度那么快,
这种问题会更加明显。

虽然社区版本中已经解决了amResourceUsage的问题,但我觉得它只是解决了问题域中的一部分。 
上述的问题2也是急需要解决的问题。 虽然我看到YARN-3415对应的也解决了Spark框架中unreigster application 
attempt之前把对应的pending的申请资源申请都清空了。 
但是YARN作为一个通用的资源分派框架是需要Cover这些所有可能遇到的情况。对于一个通用的资源分派框架,我们不能限定用户的使用方式。
不能依赖用户每次unregister application master的时候,会在之前释放所有pending的request。

所以,我们需要在分派container之前就要做对应的判断,这个是急需解决的问题。麻烦yufei根据我所说的,再
评估下这个问题有没有需要解决。

谢谢,


> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png, screenshot-5.png
>
>
> There are over three thousand nodes in my hadoop production cluster, and we 
> use fair schedule as my scheduler.
> Though there are many free resource in my resource manager, but there are 46 
> applications pending. 
> Those applications can not run after  several hours, and in the end I have to 
> stop them.
> I reproduce the scene in my test environment, and I find a bug in 
> FSLeafQueue. 
> In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater 
> than itself.
> When fair scheduler try to assign container to a application attempt,  it 
> will do as follow check:
> !screenshot-2.png!
> !screenshot-3.png!
> Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater 
> then it real value.
> So when the value of amResourceUsage greater than the value of 
> Resources.multiply(getFairShare(), maxAMShare) ,
> and the FSLeafQueue#canRunAppAM function will return false which will let the 
> fair scheduler not assign container
> to the FSAppAttempt. 
> In this scenario, all the application attempt will pending and never get any 
> resource.
> I find the reason why so many applications in my leaf queue is pending. I 
> will describe it as follow:
> When fair scheduler first assign a container to the application attempt, it 
> will do something as blow:
> !screenshot-4.png!
> When fair scheduler remove the application attempt from the leaf queue, it 
> will do something as blow:
> !screenshot-5.png!
> But when application attempt unregister itself, and all the container in the 
> SchedulerApplicationAttempt#liveContainers 
> are complete.  There is a APP_ATTEMPT_REMOVED event will send to fair 
> scheduler, but it is asynchronous.
> Before the application attempt is removed from FSLeafQueue, and there are 
> pending request in FSAppAttempt.
> The fair scheduler will assign container to the FSAppAttempt, because the 
> size of the liveContainers will equals to
> 1. 
> So the FSLeafQueue will add to container resource to the 
> FSLeafQueue#amResourceUsage,  it will
> let the value of amResourceUsage greater then itself. 
> In the end, the value of FSLeafQueue#amResourceUsage is preety large although 
> there is no application
> it the queue.
> When new application come, and the value of FSLeafQueue#amResourceUsage  
> greater than the value
> of Resources.multiply(getFairShare(), maxAMShare), it will let the scheduler 
> never assign container to
> the queue.
> All of the applications in the queue will always pending.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org