[jira] [Commented] (YARN-7592) yarn.federation.failover.enabled missing in yarn-default.xml

2023-12-22 Thread yanbin.zhang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799746#comment-17799746
 ] 

yanbin.zhang commented on YARN-7592:


[~slfan1989] Thank you for your prompt reply.

> yarn.federation.failover.enabled missing in yarn-default.xml
> 
>
> Key: YARN-7592
> URL: https://issues.apache.org/jira/browse/YARN-7592
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.0.0-beta1
>Reporter: Gera Shegalov
>Priority: Major
> Attachments: IssueReproduce.patch
>
>
> yarn.federation.failover.enabled should be documented in yarn-default.xml. I 
> am also not sure why it should be true by default and force the HA retry 
> policy in {{RMProxy#createRMProxy}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7592) yarn.federation.failover.enabled missing in yarn-default.xml

2023-12-22 Thread yanbin.zhang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799725#comment-17799725
 ] 

yanbin.zhang commented on YARN-7592:


[~slfan1989] Do you have any thoughts on this? This bug seems to have not been 
resolved yet.

> yarn.federation.failover.enabled missing in yarn-default.xml
> 
>
> Key: YARN-7592
> URL: https://issues.apache.org/jira/browse/YARN-7592
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.0.0-beta1
>Reporter: Gera Shegalov
>Priority: Major
> Attachments: IssueReproduce.patch
>
>
> yarn.federation.failover.enabled should be documented in yarn-default.xml. I 
> am also not sure why it should be true by default and force the HA retry 
> policy in {{RMProxy#createRMProxy}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11633) [Federation] Improve LoadBasedRouterPolicy To Use Available vcores

2023-12-13 Thread yanbin.zhang (Jira)
yanbin.zhang created YARN-11633:
---

 Summary: [Federation] Improve LoadBasedRouterPolicy To Use 
Available vcores
 Key: YARN-11633
 URL: https://issues.apache.org/jira/browse/YARN-11633
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: federation
Affects Versions: 3.3.6
Reporter: yanbin.zhang


When selecting a subcluster, consider not only available memory but also 
available vcore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11624) CapacityScheduler: Add configuration to disable AM preemption

2023-12-04 Thread yanbin.zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yanbin.zhang updated YARN-11624:

Description: Disable AM-preemption for CapacityScheduler, like 
FairScheduler: -YARN-9537-  (was: Disable AM-preemption for CapacityScheduler 
like fair-scheduler: -YARN-9537-)

> CapacityScheduler: Add configuration to disable AM preemption
> -
>
> Key: YARN-11624
> URL: https://issues.apache.org/jira/browse/YARN-11624
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: yanbin.zhang
>Priority: Major
>
> Disable AM-preemption for CapacityScheduler, like FairScheduler: -YARN-9537-



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11624) CapacityScheduler: Add configuration to disable AM preemption

2023-12-04 Thread yanbin.zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yanbin.zhang updated YARN-11624:

Description: Like FairScheduler feature: YARN-9537, for CapacityScheduler 
to disable AM-preemption.  (was: Like FairScheduler feature: YARN-9537, add 
global flag for CapacityScheduler to disable AM-preemption.)

> CapacityScheduler: Add configuration to disable AM preemption
> -
>
> Key: YARN-11624
> URL: https://issues.apache.org/jira/browse/YARN-11624
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: yanbin.zhang
>Priority: Major
>
> Like FairScheduler feature: YARN-9537, for CapacityScheduler to disable 
> AM-preemption.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11624) CapacityScheduler: Add configuration to disable AM preemption

2023-12-04 Thread yanbin.zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yanbin.zhang updated YARN-11624:

Description: Like FairScheduler feature: YARN-9537, add global flag for 
CapacityScheduler to disable AM-preemption.  (was: Like FairScheduler feature: 
YARN-10625, add global flag for CapacityScheduler to disable AM-preemption.)

> CapacityScheduler: Add configuration to disable AM preemption
> -
>
> Key: YARN-11624
> URL: https://issues.apache.org/jira/browse/YARN-11624
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: yanbin.zhang
>Priority: Major
>
> Like FairScheduler feature: YARN-9537, add global flag for CapacityScheduler 
> to disable AM-preemption.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11624) CapacityScheduler: Add configuration to disable AM preemption

2023-12-04 Thread yanbin.zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yanbin.zhang updated YARN-11624:

Summary: CapacityScheduler: Add configuration to disable AM preemption  
(was: CapacityScheduler: add global flag to disable AM-preemption)

> CapacityScheduler: Add configuration to disable AM preemption
> -
>
> Key: YARN-11624
> URL: https://issues.apache.org/jira/browse/YARN-11624
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: yanbin.zhang
>Priority: Major
>
> Like FairScheduler feature: YARN-10625, add global flag for CapacityScheduler 
> to disable AM-preemption.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11624) CapacityScheduler: add global flag to disable AM-preemption

2023-12-04 Thread yanbin.zhang (Jira)
yanbin.zhang created YARN-11624:
---

 Summary: CapacityScheduler: add global flag to disable 
AM-preemption
 Key: YARN-11624
 URL: https://issues.apache.org/jira/browse/YARN-11624
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler
Reporter: yanbin.zhang


Like FairScheduler feature: YARN-10625, add global flag for CapacityScheduler 
to disable AM-preemption.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11115) Add configuration to disable AM preemption for capacity scheduler

2023-12-04 Thread yanbin.zhang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792755#comment-17792755
 ] 

yanbin.zhang commented on YARN-5:
-

Take it up.

> Add configuration to disable AM preemption for capacity scheduler
> -
>
> Key: YARN-5
> URL: https://issues.apache.org/jira/browse/YARN-5
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Yuan Luo
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I think it's necessary to add configuration to disable AM preemption for 
> capacity-scheduler, like fair-scheduler feature: YARN-9537.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11623) FairScheduler: Document AM preemption related changes (YARN-9537 and YARN-10625)

2023-12-03 Thread yanbin.zhang (Jira)
yanbin.zhang created YARN-11623:
---

 Summary: FairScheduler: Document AM preemption related changes 
(YARN-9537 and YARN-10625)
 Key: YARN-11623
 URL: https://issues.apache.org/jira/browse/YARN-11623
 Project: Hadoop YARN
  Issue Type: Task
  Components: fairscheduler
Reporter: yanbin.zhang


Extend the documentation with these enhancements about YARN-9537 and YARN-10625.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10631) Document AM preemption related changes (YARN-9537 and YARN-10625)

2023-12-03 Thread yanbin.zhang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792656#comment-17792656
 ] 

yanbin.zhang commented on YARN-10631:
-

take it up

> Document AM preemption related changes (YARN-9537 and YARN-10625)
> -
>
> Key: YARN-10631
> URL: https://issues.apache.org/jira/browse/YARN-10631
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> Preemption-related changes were introduced in YARN-9537 and YARN-10625.
> These also introduce new properties which are not documented for Fair 
> Scheduler. Extend the documentation with these enhancements.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] (YARN-10900) Yarn nodes missing from router web app

2023-11-22 Thread yanbin.zhang (Jira)


[ https://issues.apache.org/jira/browse/YARN-10900 ]


yanbin.zhang deleted comment on YARN-10900:
-

was (Author: it_singer):
After enabling YARN Federation, I executed the command 
{code:java}
yarn jar hadoop-mapreduce-examples-3.3.0.jar pi 16 1000{code}
, and used services across other sub-clusters in my cluster, and reported an 
error 'Invalid AMRMToken from appattempt_xxx'. I don't know about you. How is 
it configured? My version is 3.3.0. [~Babbleshack]  [~zhangjunj] 

> Yarn nodes missing from router web app
> --
>
> Key: YARN-10900
> URL: https://issues.apache.org/jira/browse/YARN-10900
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, router
>Affects Versions: 3.2.1
>Reporter: Babble Shack
>Priority: Major
> Fix For: 3.4.0
>
>
> {color:#172b4d}Hi,
>  I am trying to configure YARN Federation mode.I seem to be able to schedule 
> to all nodes in my federation across each of my subclusters.
>  
>  However my federation router shows both of my subclusters, but nodes from 
> only a single cluster.{color}
>   
>  
> {color:#ff}[!https://preview.redd.it/hdjwtn43ptj71.png?width=1437=png=webp=2d55343688c0de7a6f3da629e334cd318219c392!|https://preview.redd.it/hdjwtn43ptj71.png?width=1437=png=webp=2d55343688c0de7a6f3da629e334cd318219c392]{color}
>  {color:#172b4d}Federation Page – Showing both clusters and both nodes{color}
> {color:#172b4d}
>  This page is showing both of my clusters, configured with a single <8 CPU, 
> 7GB> node.{color}
> {color:#172b4d}However the "Nodes" and "About" pages are invalid.{color}
>  
>  
> [!https://preview.redd.it/lawgst2yotj71.png?width=1373=png=webp=d06663904538bc993418c6184c3686cc2b02ea6e!|https://preview.redd.it/lawgst2yotj71.png?width=1373=png=webp=d06663904538bc993418c6184c3686cc2b02ea6e]
>  {color:#172b4d}Nodes Page – showing nodes from only one cluster{color}
>  
>  
> [!https://preview.redd.it/dtuqblquotj71.png?width=482=png=webp=df740ff49df8b8de5015bccc936c9accee65aee0!|https://preview.redd.it/dtuqblquotj71.png?width=482=png=webp=df740ff49df8b8de5015bccc936c9accee65aee0]
>  {color:#172b4d}About Page – showing nodes from only one cluster{color}
> {color:#172b4d}Each node is configured as follows:{color}
>  {color:#172b4d}*Minimum Memory Allocation:* 512MB{color}
>  {color:#172b4d}*Minimum CPU Allocation:* 1{color}
>  {color:#172b4d}*Maximum Memory Allocation:* 7168{color}
>  {color:#172b4d}*Maximum CPU Allocation:* 7{color}
> {color:#172b4d}Federation configuration can be found at this 
> [link|https://drive.google.com/file/d/16xc2V7CvJLVQgsaDEOHhIaDrz5dnKxB_/view?usp=sharing]{color}
> {color:#172b4d}Has anyone had an issue like this before, does anyone have any 
> solutions?{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10900) Yarn nodes missing from router web app

2023-11-22 Thread yanbin.zhang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788666#comment-17788666
 ] 

yanbin.zhang commented on YARN-10900:
-

After enabling YARN Federation, I executed the command 
{code:java}
yarn jar hadoop-mapreduce-examples-3.3.0.jar pi 16 1000{code}
, and used services across other sub-clusters in my cluster, and reported an 
error 'Invalid AMRMToken from appattempt_xxx'. I don't know about you. How is 
it configured? My version is 3.3.0. [~Babbleshack]  [~zhangjunj] 

> Yarn nodes missing from router web app
> --
>
> Key: YARN-10900
> URL: https://issues.apache.org/jira/browse/YARN-10900
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, router
>Affects Versions: 3.2.1
>Reporter: Babble Shack
>Priority: Major
> Fix For: 3.4.0
>
>
> {color:#172b4d}Hi,
>  I am trying to configure YARN Federation mode.I seem to be able to schedule 
> to all nodes in my federation across each of my subclusters.
>  
>  However my federation router shows both of my subclusters, but nodes from 
> only a single cluster.{color}
>   
>  
> {color:#ff}[!https://preview.redd.it/hdjwtn43ptj71.png?width=1437=png=webp=2d55343688c0de7a6f3da629e334cd318219c392!|https://preview.redd.it/hdjwtn43ptj71.png?width=1437=png=webp=2d55343688c0de7a6f3da629e334cd318219c392]{color}
>  {color:#172b4d}Federation Page – Showing both clusters and both nodes{color}
> {color:#172b4d}
>  This page is showing both of my clusters, configured with a single <8 CPU, 
> 7GB> node.{color}
> {color:#172b4d}However the "Nodes" and "About" pages are invalid.{color}
>  
>  
> [!https://preview.redd.it/lawgst2yotj71.png?width=1373=png=webp=d06663904538bc993418c6184c3686cc2b02ea6e!|https://preview.redd.it/lawgst2yotj71.png?width=1373=png=webp=d06663904538bc993418c6184c3686cc2b02ea6e]
>  {color:#172b4d}Nodes Page – showing nodes from only one cluster{color}
>  
>  
> [!https://preview.redd.it/dtuqblquotj71.png?width=482=png=webp=df740ff49df8b8de5015bccc936c9accee65aee0!|https://preview.redd.it/dtuqblquotj71.png?width=482=png=webp=df740ff49df8b8de5015bccc936c9accee65aee0]
>  {color:#172b4d}About Page – showing nodes from only one cluster{color}
> {color:#172b4d}Each node is configured as follows:{color}
>  {color:#172b4d}*Minimum Memory Allocation:* 512MB{color}
>  {color:#172b4d}*Minimum CPU Allocation:* 1{color}
>  {color:#172b4d}*Maximum Memory Allocation:* 7168{color}
>  {color:#172b4d}*Maximum CPU Allocation:* 7{color}
> {color:#172b4d}Federation configuration can be found at this 
> [link|https://drive.google.com/file/d/16xc2V7CvJLVQgsaDEOHhIaDrz5dnKxB_/view?usp=sharing]{color}
> {color:#172b4d}Has anyone had an issue like this before, does anyone have any 
> solutions?{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11604) Fix code annotation errors such as class DefaultClientRequestInterceptor

2023-11-01 Thread yanbin.zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yanbin.zhang updated YARN-11604:

Description: 
Fix code annotation errors such as class DefaultClientRequestInterceptor:

!image-2023-11-02-10-46-50-762.png!

  was:Fix code annotation errors such as class DefaultClientRequestInterceptor


> Fix code annotation errors such as class DefaultClientRequestInterceptor
> 
>
> Key: YARN-11604
> URL: https://issues.apache.org/jira/browse/YARN-11604
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.3.6
>Reporter: yanbin.zhang
>Priority: Trivial
> Attachments: image-2023-11-02-10-46-50-762.png
>
>
> Fix code annotation errors such as class DefaultClientRequestInterceptor:
> !image-2023-11-02-10-46-50-762.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11604) Fix code annotation errors such as class DefaultClientRequestInterceptor

2023-11-01 Thread yanbin.zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yanbin.zhang updated YARN-11604:

Attachment: image-2023-11-02-10-46-50-762.png

> Fix code annotation errors such as class DefaultClientRequestInterceptor
> 
>
> Key: YARN-11604
> URL: https://issues.apache.org/jira/browse/YARN-11604
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.3.6
>Reporter: yanbin.zhang
>Priority: Trivial
> Attachments: image-2023-11-02-10-46-50-762.png
>
>
> Fix code annotation errors such as class DefaultClientRequestInterceptor



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11604) Fix code annotation errors such as class DefaultClientRequestInterceptor

2023-11-01 Thread yanbin.zhang (Jira)
yanbin.zhang created YARN-11604:
---

 Summary: Fix code annotation errors such as class 
DefaultClientRequestInterceptor
 Key: YARN-11604
 URL: https://issues.apache.org/jira/browse/YARN-11604
 Project: Hadoop YARN
  Issue Type: Bug
  Components: federation
Affects Versions: 3.3.6
Reporter: yanbin.zhang


Fix code annotation errors such as class DefaultClientRequestInterceptor



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11600) After jetty is upgraded to 9.4.51.v20230217, sls cannot load js/css

2023-10-27 Thread yanbin.zhang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780265#comment-17780265
 ] 

yanbin.zhang commented on YARN-11600:
-

Thank you very much [~zuston] 

> After jetty is upgraded to 9.4.51.v20230217, sls cannot load js/css
> ---
>
> Key: YARN-11600
> URL: https://issues.apache.org/jira/browse/YARN-11600
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: yanbin.zhang
>Priority: Major
> Attachments: image-2023-10-26-09-52-30-975.png
>
>
> !image-2023-10-26-09-52-30-975.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11600) After jetty is upgraded to 9.4.51.v20230217, sls cannot load js/css

2023-10-25 Thread yanbin.zhang (Jira)
yanbin.zhang created YARN-11600:
---

 Summary: After jetty is upgraded to 9.4.51.v20230217, sls cannot 
load js/css
 Key: YARN-11600
 URL: https://issues.apache.org/jira/browse/YARN-11600
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Reporter: yanbin.zhang
 Attachments: image-2023-10-26-09-52-30-975.png

!image-2023-10-26-09-52-30-975.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11591) Fix some wrong symbols in Federation.md

2023-10-12 Thread yanbin.zhang (Jira)
yanbin.zhang created YARN-11591:
---

 Summary: Fix some wrong symbols in Federation.md
 Key: YARN-11591
 URL: https://issues.apache.org/jira/browse/YARN-11591
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: yanbin.zhang


Fix some wrong symbols in Federation.md



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10178) Global Scheduler async thread crash caused by 'Comparison method violates its general contract'

2021-07-04 Thread yanbin.zhang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17374442#comment-17374442
 ] 

yanbin.zhang commented on YARN-10178:
-

Can the problem be solved if the preemption function is turned off? [~zhuqi] 
[~tuyu] [~wangda] [~pbacsko]

> Global Scheduler async thread crash caused by 'Comparison method violates its 
> general contract'
> ---
>
> Key: YARN-10178
> URL: https://issues.apache.org/jira/browse/YARN-10178
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.2.1
>Reporter: tuyu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10178.001.patch, YARN-10178.002.patch, 
> YARN-10178.003.patch, YARN-10178.004.patch, YARN-10178.005.patch
>
>
> Global Scheduler Async Thread crash stack
> {code:java}
> ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received 
> RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, 
> Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: 
> Comparison method violates its general contract!  
>at 
> java.util.TimSort.mergeHi(TimSort.java:899)
> at java.util.TimSort.mergeAt(TimSort.java:516)
> at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
> at java.util.TimSort.sort(TimSort.java:254)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1462)
> at java.util.Collections.sort(Collections.java:177)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616)
> {code}
> JAVA 8 Arrays.sort default use timsort algo, and timsort has  few require 
> {code:java}
> 1.x.compareTo(y) != y.compareTo(x)
> 2.x>y,y>z --> x > z
> 3.x=y, x.compareTo(z) == y.compareTo(z)
> {code}
> if not Arrays paramters not satify this require,TimSort will throw 
> 'java.lang.IllegalArgumentException'
> look at PriorityUtilizationQueueOrderingPolicy.compare function,we will know 
> Capacity Scheduler use this these queue resource usage to compare
> {code:java}
> AbsoluteUsedCapacity
> UsedCapacity
> ConfiguredMinResource
> AbsoluteCapacity
> {code}
> In Capacity Scheduler Global Scheduler AsyncThread use 
> PriorityUtilizationQueueOrderingPolicy function to choose queue to assign 
> container,and construct a CSAssignment struct, and use 
> submitResourceCommitRequest function add CSAssignment to backlogs
> ResourceCommitterService  will tryCommit this CSAssignment,look tryCommit 
> function,there will update queue resource usage
> {code:java}
> public boolean tryCommit(Resource cluster, ResourceCommitRequest r,
> boolean updatePending) {
>   long commitStart = System.nanoTime();
>   ResourceCommitRequest request =
>   (ResourceCommitRequest) r;
>  
>   ...
>   boolean isSuccess = false;
>   if (attemptId != null) {
> FiCaSchedulerApp app = getApplicationAttempt(attemptId);
> // Required sanity check for attemptId - when async-scheduling enabled,
> // proposal might be outdated if AM failover just finished
> // and proposal queue was not be consumed in time
> if (app != null && attemptId.equals(app.getApplicationAttemptId())) {
>   if (app.accept(cluster, 

[jira] [Issue Comment Deleted] (YARN-10178) Global Scheduler async thread crash caused by 'Comparison method violates its general contract'

2021-07-04 Thread yanbin.zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yanbin.zhang updated YARN-10178:

Comment: was deleted

(was: If you turn off the preemption function and solve the problem 
?[~zhuqi][~wangda][~tuyu])

> Global Scheduler async thread crash caused by 'Comparison method violates its 
> general contract'
> ---
>
> Key: YARN-10178
> URL: https://issues.apache.org/jira/browse/YARN-10178
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.2.1
>Reporter: tuyu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10178.001.patch, YARN-10178.002.patch, 
> YARN-10178.003.patch, YARN-10178.004.patch, YARN-10178.005.patch
>
>
> Global Scheduler Async Thread crash stack
> {code:java}
> ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received 
> RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, 
> Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: 
> Comparison method violates its general contract!  
>at 
> java.util.TimSort.mergeHi(TimSort.java:899)
> at java.util.TimSort.mergeAt(TimSort.java:516)
> at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
> at java.util.TimSort.sort(TimSort.java:254)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1462)
> at java.util.Collections.sort(Collections.java:177)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616)
> {code}
> JAVA 8 Arrays.sort default use timsort algo, and timsort has  few require 
> {code:java}
> 1.x.compareTo(y) != y.compareTo(x)
> 2.x>y,y>z --> x > z
> 3.x=y, x.compareTo(z) == y.compareTo(z)
> {code}
> if not Arrays paramters not satify this require,TimSort will throw 
> 'java.lang.IllegalArgumentException'
> look at PriorityUtilizationQueueOrderingPolicy.compare function,we will know 
> Capacity Scheduler use this these queue resource usage to compare
> {code:java}
> AbsoluteUsedCapacity
> UsedCapacity
> ConfiguredMinResource
> AbsoluteCapacity
> {code}
> In Capacity Scheduler Global Scheduler AsyncThread use 
> PriorityUtilizationQueueOrderingPolicy function to choose queue to assign 
> container,and construct a CSAssignment struct, and use 
> submitResourceCommitRequest function add CSAssignment to backlogs
> ResourceCommitterService  will tryCommit this CSAssignment,look tryCommit 
> function,there will update queue resource usage
> {code:java}
> public boolean tryCommit(Resource cluster, ResourceCommitRequest r,
> boolean updatePending) {
>   long commitStart = System.nanoTime();
>   ResourceCommitRequest request =
>   (ResourceCommitRequest) r;
>  
>   ...
>   boolean isSuccess = false;
>   if (attemptId != null) {
> FiCaSchedulerApp app = getApplicationAttempt(attemptId);
> // Required sanity check for attemptId - when async-scheduling enabled,
> // proposal might be outdated if AM failover just finished
> // and proposal queue was not be consumed in time
> if (app != null && attemptId.equals(app.getApplicationAttemptId())) {
>   if (app.accept(cluster, request, updatePending)
>   

[jira] [Commented] (YARN-10178) Global Scheduler async thread crash caused by 'Comparison method violates its general contract'

2021-07-02 Thread yanbin.zhang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373412#comment-17373412
 ] 

yanbin.zhang commented on YARN-10178:
-

If you turn off the preemption function and solve the problem 
?[~zhuqi][~wangda][~tuyu]

> Global Scheduler async thread crash caused by 'Comparison method violates its 
> general contract'
> ---
>
> Key: YARN-10178
> URL: https://issues.apache.org/jira/browse/YARN-10178
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.2.1
>Reporter: tuyu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10178.001.patch, YARN-10178.002.patch, 
> YARN-10178.003.patch, YARN-10178.004.patch, YARN-10178.005.patch
>
>
> Global Scheduler Async Thread crash stack
> {code:java}
> ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received 
> RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, 
> Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: 
> Comparison method violates its general contract!  
>at 
> java.util.TimSort.mergeHi(TimSort.java:899)
> at java.util.TimSort.mergeAt(TimSort.java:516)
> at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
> at java.util.TimSort.sort(TimSort.java:254)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1462)
> at java.util.Collections.sort(Collections.java:177)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616)
> {code}
> JAVA 8 Arrays.sort default use timsort algo, and timsort has  few require 
> {code:java}
> 1.x.compareTo(y) != y.compareTo(x)
> 2.x>y,y>z --> x > z
> 3.x=y, x.compareTo(z) == y.compareTo(z)
> {code}
> if not Arrays paramters not satify this require,TimSort will throw 
> 'java.lang.IllegalArgumentException'
> look at PriorityUtilizationQueueOrderingPolicy.compare function,we will know 
> Capacity Scheduler use this these queue resource usage to compare
> {code:java}
> AbsoluteUsedCapacity
> UsedCapacity
> ConfiguredMinResource
> AbsoluteCapacity
> {code}
> In Capacity Scheduler Global Scheduler AsyncThread use 
> PriorityUtilizationQueueOrderingPolicy function to choose queue to assign 
> container,and construct a CSAssignment struct, and use 
> submitResourceCommitRequest function add CSAssignment to backlogs
> ResourceCommitterService  will tryCommit this CSAssignment,look tryCommit 
> function,there will update queue resource usage
> {code:java}
> public boolean tryCommit(Resource cluster, ResourceCommitRequest r,
> boolean updatePending) {
>   long commitStart = System.nanoTime();
>   ResourceCommitRequest request =
>   (ResourceCommitRequest) r;
>  
>   ...
>   boolean isSuccess = false;
>   if (attemptId != null) {
> FiCaSchedulerApp app = getApplicationAttempt(attemptId);
> // Required sanity check for attemptId - when async-scheduling enabled,
> // proposal might be outdated if AM failover just finished
> // and proposal queue was not be consumed in time
> if (app != null && attemptId.equals(app.getApplicationAttemptId())) {
>   if (app.accept(cluster, request,