from:"Sunil G \(JIRA\)"

[jira] [Assigned] (YARN-10166) Add detail log for ApplicationAttemptNotFoundException

2020-06-12 Thread Sunil G (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-10166:
--

Assignee: Youquan Lin

> Add detail log for ApplicationAttemptNotFoundException
> --
>
> Key: YARN-10166
> URL: https://issues.apache.org/jira/browse/YARN-10166
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Youquan Lin
>Assignee: Youquan Lin
>Priority: Minor
>  Labels: patch
> Attachments: YARN-10166-001.patch, YARN-10166-002.patch, 
> YARN-10166-003.patch, YARN-10166-004.patch
>
>
>      Suppose user A killed the app, then ApplicationMasterService will  call 
> unregisterAttempt() for this app. Sometimes, app's AM continues to call the 
> alloate() method and reports an error as follows.
> {code:java}
> Application attempt appattempt_1582520281010_15271_01 doesn't exist in 
> ApplicationMasterService cache.
> {code}
>     If user B has been watching the AM log, he will be confused why the 
> attempt is no longer in the ApplicationMasterService cache. So I think we can 
> add detail log for ApplicationAttemptNotFoundException as follows.
> {code:java}
> Application attempt appattempt_1582630210671_14658_01 doesn't exist in 
> ApplicationMasterService cache.App state: KILLED,finalStatus: KILLED 
> ,diagnostics: App application_1582630210671_14658 killed by userA from 
> 127.0.0.1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10166) Add detail log for ApplicationAttemptNotFoundException

2020-06-12 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134335#comment-17134335
 ] 

Sunil G commented on YARN-10166:


Looks good to me!

I can check this in tomo if there are no objections!

> Add detail log for ApplicationAttemptNotFoundException
> --
>
> Key: YARN-10166
> URL: https://issues.apache.org/jira/browse/YARN-10166
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Youquan Lin
>Assignee: Youquan Lin
>Priority: Minor
>  Labels: patch
> Attachments: YARN-10166-001.patch, YARN-10166-002.patch, 
> YARN-10166-003.patch, YARN-10166-004.patch
>
>
>      Suppose user A killed the app, then ApplicationMasterService will  call 
> unregisterAttempt() for this app. Sometimes, app's AM continues to call the 
> alloate() method and reports an error as follows.
> {code:java}
> Application attempt appattempt_1582520281010_15271_01 doesn't exist in 
> ApplicationMasterService cache.
> {code}
>     If user B has been watching the AM log, he will be confused why the 
> attempt is no longer in the ApplicationMasterService cache. So I think we can 
> add detail log for ApplicationAttemptNotFoundException as follows.
> {code:java}
> Application attempt appattempt_1582630210671_14658_01 doesn't exist in 
> ApplicationMasterService cache.App state: KILLED,finalStatus: KILLED 
> ,diagnostics: App application_1582630210671_14658 killed by userA from 
> 127.0.0.1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10166) Add detail log for ApplicationAttemptNotFoundException

2020-06-12 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134335#comment-17134335
 ] 

Sunil G edited comment on YARN-10166 at 6/12/20, 4:04 PM:
--

Looks good to me!

I can check this in tomo, if there are no objections!


was (Author: sunilg):
Looks good to me!

I can check this in tomo if there are no objections!

> Add detail log for ApplicationAttemptNotFoundException
> --
>
> Key: YARN-10166
> URL: https://issues.apache.org/jira/browse/YARN-10166
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Youquan Lin
>Assignee: Youquan Lin
>Priority: Minor
>  Labels: patch
> Attachments: YARN-10166-001.patch, YARN-10166-002.patch, 
> YARN-10166-003.patch, YARN-10166-004.patch
>
>
>      Suppose user A killed the app, then ApplicationMasterService will  call 
> unregisterAttempt() for this app. Sometimes, app's AM continues to call the 
> alloate() method and reports an error as follows.
> {code:java}
> Application attempt appattempt_1582520281010_15271_01 doesn't exist in 
> ApplicationMasterService cache.
> {code}
>     If user B has been watching the AM log, he will be confused why the 
> attempt is no longer in the ApplicationMasterService cache. So I think we can 
> add detail log for ApplicationAttemptNotFoundException as follows.
> {code:java}
> Application attempt appattempt_1582630210671_14658_01 doesn't exist in 
> ApplicationMasterService cache.App state: KILLED,finalStatus: KILLED 
> ,diagnostics: App application_1582630210671_14658 killed by userA from 
> 127.0.0.1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10333) YarnClient obtain Delegation Token for Log Aggregation Path

2020-07-08 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153326#comment-17153326
 ] 

Sunil G commented on YARN-10333:


This change looks fine to me.

cc [~ztang] [~bibinchundatt] [~rohithsharmaks] thoughts?

> YarnClient obtain Delegation Token for Log Aggregation Path
> ---
>
> Key: YARN-10333
> URL: https://issues.apache.org/jira/browse/YARN-10333
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-10333-001.patch, YARN-10333-002.patch, 
> YARN-10333-003.patch
>
>
> There are use cases where Yarn Log Aggregation Path is configured to a 
> FileSystem like S3 or ABFS different from what is configured in fs.defaultFS 
> (HDFS). Log Aggregation fails as the client has token only for fs.defaultFS 
> and not for log aggregation path.
> This Jira is to improve YarnClient by obtaining delegation token for log 
> aggregation path and add it to the Credential of Container Launch Context 
> similar to how it does for Timeline Delegation Token.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10364) Absolute Resource [memory=0] is considered as Percentage config type

2020-07-27 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166176#comment-17166176
 ] 

Sunil G commented on YARN-10364:


Yes. This is an issue as we are considering 0 to be abs mode.

Its better we have to have a flag set and derived as read from config and then 
use for a bit more longer. Seems like a miss earlier. 

> Absolute Resource [memory=0] is considered as Percentage config type
> 
>
> Key: YARN-10364
> URL: https://issues.apache.org/jira/browse/YARN-10364
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Prabhu Joseph
>Assignee: Bilwa S T
>Priority: Major
>
> Absolute Resource [memory=0] is considered as Percentage config type. This 
> causes failure while converting queues from Percentage to Absolute Resources 
> automatically. 
> *Repro:*
> 1. Queue A = 100% and child queues Queue A.B = 0%, A.C=100%
> 2. While converting above to absolute resource automatically, capacity of 
> queue A = [memory=], A.B = [memory=0]
> This fails with below as A is considered as Absolute Resource whereas B is 
> considered as Percentage config type.
> {code}
> 2020-07-23 09:36:40,499 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: 
> CapacityScheduler configuration validation failed:java.io.IOException: Failed 
> to re-init queues : Parent queue 'root.A' and child queue 'root.A.B' should 
> use either percentage based capacityconfiguration or absolute resource 
> together for label:
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10364) Absolute Resource [memory=0] is considered as Percentage config type

2020-08-08 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173598#comment-17173598
 ] 

Sunil G commented on YARN-10364:


+1. [~prabhujoseph] lets commit this.

Thanks [~BilwaST]

> Absolute Resource [memory=0] is considered as Percentage config type
> 
>
> Key: YARN-10364
> URL: https://issues.apache.org/jira/browse/YARN-10364
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Prabhu Joseph
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10364.001.patch, YARN-10364.002.patch, 
> YARN-10364.003.patch
>
>
> Absolute Resource [memory=0] is considered as Percentage config type. This 
> causes failure while converting queues from Percentage to Absolute Resources 
> automatically. 
> *Repro:*
> 1. Queue A = 100% and child queues Queue A.B = 0%, A.C=100%
> 2. While converting above to absolute resource automatically, capacity of 
> queue A = [memory=], A.B = [memory=0]
> This fails with below as A is considered as Absolute Resource whereas B is 
> considered as Percentage config type.
> {code}
> 2020-07-23 09:36:40,499 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: 
> CapacityScheduler configuration validation failed:java.io.IOException: Failed 
> to re-init queues : Parent queue 'root.A' and child queue 'root.A.B' should 
> use either percentage based capacityconfiguration or absolute resource 
> together for label:
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10364) Absolute Resource [memory=0] is considered as Percentage config type

2020-08-08 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173599#comment-17173599
 ] 

Sunil G commented on YARN-10364:


I think test case failures are related. [~BilwaST] cud u please help to check.

> Absolute Resource [memory=0] is considered as Percentage config type
> 
>
> Key: YARN-10364
> URL: https://issues.apache.org/jira/browse/YARN-10364
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Prabhu Joseph
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10364.001.patch, YARN-10364.002.patch, 
> YARN-10364.003.patch
>
>
> Absolute Resource [memory=0] is considered as Percentage config type. This 
> causes failure while converting queues from Percentage to Absolute Resources 
> automatically. 
> *Repro:*
> 1. Queue A = 100% and child queues Queue A.B = 0%, A.C=100%
> 2. While converting above to absolute resource automatically, capacity of 
> queue A = [memory=], A.B = [memory=0]
> This fails with below as A is considered as Absolute Resource whereas B is 
> considered as Percentage config type.
> {code}
> 2020-07-23 09:36:40,499 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: 
> CapacityScheduler configuration validation failed:java.io.IOException: Failed 
> to re-init queues : Parent queue 'root.A' and child queue 'root.A.B' should 
> use either percentage based capacityconfiguration or absolute resource 
> together for label:
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10389) Option to override RMWebServices with custom WebService class

2020-08-08 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173732#comment-17173732
 ] 

Sunil G commented on YARN-10389:


Thanks [~tanu.ajmera] for working on this. 

Few comments
 # rename yarn.htpp.rmwebapp.custom.webservice.class to 
yarn.webapp.custom.webservice.class
 # In RMWebApp, is it possible to have a NULL scenario for conf object. cc 
[~prabhujoseph]

 

> Option to override RMWebServices with custom WebService class
> -
>
> Key: YARN-10389
> URL: https://issues.apache.org/jira/browse/YARN-10389
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: Prabhu Joseph
>Assignee: Tanu Ajmera
>Priority: Major
> Attachments: YARN-10389-001.patch, YARN-10389-002.patch, 
> YARN-10389-003.patch, YARN-10389-004.patch, YARN-10389-005.patch
>
>
> YARN-8047 provides support to add custom WebServices as part of RMWebApp.  
> Since each WebService has to have a separate WebService Path, /ws/v1/cluster 
> root path cannot be used globally.
> Another alternative is to provide an option to override the RMWebServices 
> with custom WebServices implementation which can extend the RMWebService, 
> this way /ws/v1/cluster path can be used globally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10396) Max applications calculation per queue disregards queue level settings in absolute mode

2020-08-12 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176401#comment-17176401
 ] 

Sunil G commented on YARN-10396:


[~bteke] Thanks for the patch.

Could you add some test cases to cover this scenario as well. Thanks.

> Max applications calculation per queue disregards queue level settings in 
> absolute mode
> ---
>
> Key: YARN-10396
> URL: https://issues.apache.org/jira/browse/YARN-10396
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10396.001.patch
>
>
> Looking at the following code in 
> {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.java#L1126}}
> {code:java}
> int maxApplications = (int) (conf.getMaximumSystemApplications()
> * childQueue.getQueueCapacities().getAbsoluteCapacity(label));
> leafQueue.setMaxApplications(maxApplications);{code}
> In Absolute Resources mode setting the number of maximum applications on 
> queue level gets overridden with the system level setting scaled down to the 
> available resources. This means that the only way to set the maximum number 
> of applications is to change the queue's resource pool. This line should 
> consider the queue's 
> {{yarn.scheduler.capacity.\{queuepath}.maximum-applications }}setting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10396) Max applications calculation per queue disregards queue level settings in absolute mode

2020-08-18 Thread Sunil G (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-10396:
---
Fix Version/s: 3.1.5
   3.3.1
   3.2.2

> Max applications calculation per queue disregards queue level settings in 
> absolute mode
> ---
>
> Key: YARN-10396
> URL: https://issues.apache.org/jira/browse/YARN-10396
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5
>
> Attachments: YARN-10396.001.patch, YARN-10396.002.patch, 
> YARN-10396.003.patch
>
>
> Looking at the following code in 
> {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.java#L1126}}
> {code:java}
> int maxApplications = (int) (conf.getMaximumSystemApplications()
> * childQueue.getQueueCapacities().getAbsoluteCapacity(label));
> leafQueue.setMaxApplications(maxApplications);{code}
> In Absolute Resources mode setting the number of maximum applications on 
> queue level gets overridden with the system level setting scaled down to the 
> available resources. This means that the only way to set the maximum number 
> of applications is to change the queue's resource pool. This line should 
> consider the queue's 
> {{yarn.scheduler.capacity.\{queuepath}.maximum-applications }}setting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10396) Max applications calculation per queue disregards queue level settings in absolute mode

2020-08-20 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181192#comment-17181192
 ] 

Sunil G commented on YARN-10396:


[~ste...@apache.org] Apologies from my end, Its my bad.

Backport got cleanly applied and I pushed quickly.

I ll revert the same now.

 

> Max applications calculation per queue disregards queue level settings in 
> absolute mode
> ---
>
> Key: YARN-10396
> URL: https://issues.apache.org/jira/browse/YARN-10396
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5
>
> Attachments: YARN-10396.001.patch, YARN-10396.002.patch, 
> YARN-10396.003.patch
>
>
> Looking at the following code in 
> {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.java#L1126}}
> {code:java}
> int maxApplications = (int) (conf.getMaximumSystemApplications()
> * childQueue.getQueueCapacities().getAbsoluteCapacity(label));
> leafQueue.setMaxApplications(maxApplications);{code}
> In Absolute Resources mode setting the number of maximum applications on 
> queue level gets overridden with the system level setting scaled down to the 
> available resources. This means that the only way to set the maximum number 
> of applications is to change the queue's resource pool. This line should 
> consider the queue's 
> {{yarn.scheduler.capacity.\{queuepath}.maximum-applications }}setting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10396) Max applications calculation per queue disregards queue level settings in absolute mode

2020-08-20 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181196#comment-17181196
 ] 

Sunil G commented on YARN-10396:


[~bteke], you have used a new test class from branch-3.3 caused a compilation 
error. Please help to fix this for branch-3.2 and branch-3.1
{code:java}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile 
(default-testCompile) on project hadoop-yarn-server-resourcemanager: 
Compilation failure: Compilation failure:
[ERROR] 
/Users/sgovindan/Work/repos/apache/hadoop-commit/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java:[1024,5]
 cannot find symbol
[ERROR] symbol:   class CSQueueStore
[ERROR] location: class 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestParentQueue
[ERROR] 
/Users/sgovindan/Work/repos/apache/hadoop-commit/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java:[1024,31]
 cannot find symbol
[ERROR] symbol:   class CSQueueStore
[ERROR] location: class 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestParentQueue
[ERROR] -> [Help 1] {code}

> Max applications calculation per queue disregards queue level settings in 
> absolute mode
> ---
>
> Key: YARN-10396
> URL: https://issues.apache.org/jira/browse/YARN-10396
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5
>
> Attachments: YARN-10396.001.patch, YARN-10396.002.patch, 
> YARN-10396.003.patch
>
>
> Looking at the following code in 
> {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.java#L1126}}
> {code:java}
> int maxApplications = (int) (conf.getMaximumSystemApplications()
> * childQueue.getQueueCapacities().getAbsoluteCapacity(label));
> leafQueue.setMaxApplications(maxApplications);{code}
> In Absolute Resources mode setting the number of maximum applications on 
> queue level gets overridden with the system level setting scaled down to the 
> available resources. This means that the only way to set the maximum number 
> of applications is to change the queue's resource pool. This line should 
> consider the queue's 
> {{yarn.scheduler.capacity.\{queuepath}.maximum-applications }}setting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10396) Max applications calculation per queue disregards queue level settings in absolute mode

2020-08-20 Thread Sunil G (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-10396:
---
Fix Version/s: (was: 3.1.5)
   (was: 3.2.2)

> Max applications calculation per queue disregards queue level settings in 
> absolute mode
> ---
>
> Key: YARN-10396
> URL: https://issues.apache.org/jira/browse/YARN-10396
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10396.001.patch, YARN-10396.002.patch, 
> YARN-10396.003.patch
>
>
> Looking at the following code in 
> {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.java#L1126}}
> {code:java}
> int maxApplications = (int) (conf.getMaximumSystemApplications()
> * childQueue.getQueueCapacities().getAbsoluteCapacity(label));
> leafQueue.setMaxApplications(maxApplications);{code}
> In Absolute Resources mode setting the number of maximum applications on 
> queue level gets overridden with the system level setting scaled down to the 
> available resources. This means that the only way to set the maximum number 
> of applications is to change the queue's resource pool. This line should 
> consider the queue's 
> {{yarn.scheduler.capacity.\{queuepath}.maximum-applications }}setting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10360) Support Multi Node Placement in SingleConstraintAppPlacementAllocator

2020-08-21 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181725#comment-17181725
 ] 

Sunil G commented on YARN-10360:


Yes. When we implemented this, our intention at that point of time to enable 
for Locality allocator. Later once Placement allocator got added, it should 
have been considered.

Thanks [~prabhujoseph] for moving this to base class and as a default 
implementation. I am +1 with the approach and patch. But some test failures are 
there. Please help to check the same. Thanks.

> Support Multi Node Placement in SingleConstraintAppPlacementAllocator
> -
>
> Key: YARN-10360
> URL: https://issues.apache.org/jira/browse/YARN-10360
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, multi-node-placement
>Affects Versions: 3.4.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-10360-001.patch, YARN-10360-002.patch
>
>
> Currently, placement constraints are not supported when Multi Node Placement 
> is enabled. This Jira is to add Support for Multi Node Placement in 
> SingleConstraintAppPlacementAllocator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10360) Support Multi Node Placement in SingleConstraintAppPlacementAllocator

2020-08-24 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183117#comment-17183117
 ] 

Sunil G commented on YARN-10360:


+1 for the patch.

Thanks [~prabhujoseph]

> Support Multi Node Placement in SingleConstraintAppPlacementAllocator
> -
>
> Key: YARN-10360
> URL: https://issues.apache.org/jira/browse/YARN-10360
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, multi-node-placement
>Affects Versions: 3.4.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-10360-001.patch, YARN-10360-002.patch
>
>
> Currently, placement constraints are not supported when Multi Node Placement 
> is enabled. This Jira is to add Support for Multi Node Placement in 
> SingleConstraintAppPlacementAllocator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10429) [Umbrella] YARN UI2 Improvements

2020-09-08 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192036#comment-17192036
 ] 

Sunil G commented on YARN-10429:


Thanks [~akhilpb]

Will we cover ember-3 upgrade here?

> [Umbrella] YARN UI2 Improvements 
> -
>
> Key: YARN-10429
> URL: https://issues.apache.org/jira/browse/YARN-10429
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn-ui-v2
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Major
>
> cc: [~sunilg]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10244) backport YARN-9848 to branch-3.2

2020-09-12 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17194690#comment-17194690
 ] 

Sunil G commented on YARN-10244:


Yes, this is a blocker. Waiting for CI results.

> backport YARN-9848 to branch-3.2
> 
>
> Key: YARN-10244
> URL: https://issues.apache.org/jira/browse/YARN-10244
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, resourcemanager
>Reporter: Steven Rand
>Assignee: Steven Rand
>Priority: Major
> Attachments: YARN-10244-branch-3.2.001.patch, 
> YARN-10244-branch-3.2.002.patch, YARN-10244-branch-3.2.003.patch
>
>
> Backporting YARN-9848 to branch-3.2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10244) backport YARN-9848 to branch-3.2

2020-09-29 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204463#comment-17204463
 ] 

Sunil G commented on YARN-10244:


Thanks [~Steven Rand]

Lets get this in. 

> backport YARN-9848 to branch-3.2
> 
>
> Key: YARN-10244
> URL: https://issues.apache.org/jira/browse/YARN-10244
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, resourcemanager
>Reporter: Steven Rand
>Assignee: Steven Rand
>Priority: Major
> Attachments: YARN-10244-branch-3.2.001.patch, 
> YARN-10244-branch-3.2.002.patch, YARN-10244-branch-3.2.003.patch
>
>
> Backporting YARN-9848 to branch-3.2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10451) RM (v1) UI NodesPage can NPE when yarn.io/gpu resource type is defined.

2020-10-02 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206603#comment-17206603
 ] 

Sunil G commented on YARN-10451:


+1 pending jenkins.

> RM (v1) UI NodesPage can NPE when yarn.io/gpu resource type is defined.
> ---
>
> Key: YARN-10451
> URL: https://issues.apache.org/jira/browse/YARN-10451
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10451.001.patch, YARN-10451.002.patch
>
>
> The NodesPage in the RM (v1) UI will NPE when the {{yarn.resource-types}} 
> property defines {{yarn.io}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9848) revert YARN-4946

2020-10-15 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17214812#comment-17214812
 ] 

Sunil G commented on YARN-9848:
---

Thanks [~aajisaka], This approach is much better now. 

Thank you.

> revert YARN-4946
> 
>
> Key: YARN-9848
> URL: https://issues.apache.org/jira/browse/YARN-9848
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, resourcemanager
>Reporter: Steven Rand
>Assignee: Steven Rand
>Priority: Blocker
> Fix For: 3.3.0, 3.2.2
>
> Attachments: YARN-9848-01.patch, YARN-9848.002.patch, 
> YARN-9848.003.patch
>
>
> In YARN-4946, we've been discussing a revert due to the potential for keeping 
> more applications in the state store than desired, and the potential to 
> greatly increase RM recovery times.
>  
> I'm in favor of reverting the patch, but other ideas along the lines of 
> YARN-9571 would work as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10453) Add partition resource info to get-node-labels and label-mappings api responses

2020-10-19 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217221#comment-17217221
 ] 

Sunil G commented on YARN-10453:


Kicked jenkins again.

> Add partition resource info to get-node-labels and label-mappings api 
> responses
> ---
>
> Key: YARN-10453
> URL: https://issues.apache.org/jira/browse/YARN-10453
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Major
> Attachments: YARN-10453.001.patch, YARN-10453.002.patch
>
>
> This jira will add partition resource info to responses get-node-labels and 
> label-mappings apis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10453) Add partition resource info to get-node-labels and label-mappings api responses

2020-10-20 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217493#comment-17217493
 ] 

Sunil G commented on YARN-10453:


+1.

I am committing this shortly if there are no issues.

> Add partition resource info to get-node-labels and label-mappings api 
> responses
> ---
>
> Key: YARN-10453
> URL: https://issues.apache.org/jira/browse/YARN-10453
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Major
> Attachments: YARN-10453.001.patch, YARN-10453.002.patch, 
> YARN-10453.003.patch
>
>
> This jira will add partition resource info to responses get-node-labels and 
> label-mappings apis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-19 Thread Sunil G (Jira)

Sunil G created YARN-10540:
--

 Summary: Node page is broken in YARN UI1 and UI2 including 
RMWebService api for nodes
 Key: YARN-10540
 URL: https://issues.apache.org/jira/browse/YARN-10540
 Project: Hadoop YARN
  Issue Type: Task
  Components: webapp
Affects Versions: 3.2.2
Reporter: Sunil G


YARN-10450 added changes in NodeInfo class.

Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
{code:java}
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
at 
org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
at 
org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
at 
org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
 {code}
{code:java}
2020-12-19 22:55:54,846 WARN 
org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-19 Thread Sunil G (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-10540:
---
Target Version/s: 3.2.2

> Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes
> 
>
> Key: YARN-10540
> URL: https://issues.apache.org/jira/browse/YARN-10540
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: webapp
>Affects Versions: 3.2.2
>Reporter: Sunil G
>Priority: Blocker
> Attachments: Screenshot 2020-12-19 at 11.01.43 PM.png, Screenshot 
> 2020-12-19 at 11.02.14 PM.png
>
>
> YARN-10450 added changes in NodeInfo class.
> Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
>  {code}
> {code:java}
> 2020-12-19 22:55:54,846 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-19 Thread Sunil G (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-10540:
---
Attachment: Screenshot 2020-12-19 at 11.01.43 PM.png

> Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes
> 
>
> Key: YARN-10540
> URL: https://issues.apache.org/jira/browse/YARN-10540
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: webapp
>Affects Versions: 3.2.2
>Reporter: Sunil G
>Priority: Blocker
> Attachments: Screenshot 2020-12-19 at 11.01.43 PM.png, Screenshot 
> 2020-12-19 at 11.02.14 PM.png
>
>
> YARN-10450 added changes in NodeInfo class.
> Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
>  {code}
> {code:java}
> 2020-12-19 22:55:54,846 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-19 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17252243#comment-17252243
 ] 

Sunil G commented on YARN-10540:


!Screenshot 2020-12-19 at 11.02.14 PM.png!!Screenshot 2020-12-19 at 11.01.43 
PM.png!

> Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes
> 
>
> Key: YARN-10540
> URL: https://issues.apache.org/jira/browse/YARN-10540
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: webapp
>Affects Versions: 3.2.2
>Reporter: Sunil G
>Priority: Blocker
> Attachments: Screenshot 2020-12-19 at 11.01.43 PM.png, Screenshot 
> 2020-12-19 at 11.02.14 PM.png
>
>
> YARN-10450 added changes in NodeInfo class.
> Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
>  {code}
> {code:java}
> 2020-12-19 22:55:54,846 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-19 Thread Sunil G (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-10540:
---
Attachment: Screenshot 2020-12-19 at 11.02.14 PM.png

> Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes
> 
>
> Key: YARN-10540
> URL: https://issues.apache.org/jira/browse/YARN-10540
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: webapp
>Affects Versions: 3.2.2
>Reporter: Sunil G
>Priority: Blocker
> Attachments: Screenshot 2020-12-19 at 11.01.43 PM.png, Screenshot 
> 2020-12-19 at 11.02.14 PM.png
>
>
> YARN-10450 added changes in NodeInfo class.
> Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
>  {code}
> {code:java}
> 2020-12-19 22:55:54,846 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-19 Thread Sunil G (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-10540:
---
Priority: Critical  (was: Blocker)

> Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes
> 
>
> Key: YARN-10540
> URL: https://issues.apache.org/jira/browse/YARN-10540
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: webapp
>Affects Versions: 3.2.2
>Reporter: Sunil G
>Priority: Critical
> Attachments: Screenshot 2020-12-19 at 11.01.43 PM.png, Screenshot 
> 2020-12-19 at 11.02.14 PM.png
>
>
> YARN-10450 added changes in NodeInfo class.
> Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
>  {code}
> {code:java}
> 2020-12-19 22:55:54,846 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-19 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17252245#comment-17252245
 ] 

Sunil G commented on YARN-10540:


I brought up this local cluster in my MAC. And I can see below log in NM.

Since ResourceCalculatorPlugin can't be initialized, NodeUtilization object 
will be NULL. And that caused the issue mentioned in this jira.

So far, no one was using NodeUtilization in NodeInfo, and this issue was never 
uncovered so far.
{code:java}
2020-12-19 22:37:05,096 WARN 
org.apache.hadoop.yarn.util.ResourceCalculatorPlugin: Failed to instantiate 
default resource calculator. Could not determine OS
2020-12-19 22:37:05,096 INFO 
org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl:  Using 
ResourceCalculatorPlugin : null {code}

> Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes
> 
>
> Key: YARN-10540
> URL: https://issues.apache.org/jira/browse/YARN-10540
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: webapp
>Affects Versions: 3.2.2
>Reporter: Sunil G
>Priority: Blocker
> Attachments: Screenshot 2020-12-19 at 11.01.43 PM.png, Screenshot 
> 2020-12-19 at 11.02.14 PM.png
>
>
> YARN-10450 added changes in NodeInfo class.
> Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
>  {code}
> {code:java}
> 2020-12-19 22:55:54,846 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-19 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17252303#comment-17252303
 ] 

Sunil G commented on YARN-10540:


Hi [~hexiaoqiao]

I tried the latest TRUNK build and I am able to see that these pages are 
getting loaded. I am checking a bit more in a VM.

Could you please click on NODES tab in this UI and see whether it is working in 
your env?

> Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes
> 
>
> Key: YARN-10540
> URL: https://issues.apache.org/jira/browse/YARN-10540
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: webapp
>Affects Versions: 3.2.2
>Reporter: Sunil G
>Priority: Critical
> Attachments: Screenshot 2020-12-19 at 11.01.43 PM.png, Screenshot 
> 2020-12-19 at 11.02.14 PM.png, yarnui2onubuntu.png
>
>
> YARN-10450 added changes in NodeInfo class.
> Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
>  {code}
> {code:java}
> 2020-12-19 22:55:54,846 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-21 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17252953#comment-17252953
 ] 

Sunil G commented on YARN-10540:


[~Jim_Brennan] [~ebadger] pls share your thoughts as we see this NPE only in 
Mac and in branch-3.2.2. Are we missing some patches?

> Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes
> 
>
> Key: YARN-10540
> URL: https://issues.apache.org/jira/browse/YARN-10540
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: webapp
>Affects Versions: 3.2.2
>Reporter: Sunil G
>Priority: Critical
> Attachments: Mac-Yarn-UI.png, Screenshot 2020-12-19 at 11.01.43 
> PM.png, Screenshot 2020-12-19 at 11.02.14 PM.png, Yarn-UI-Ubuntu.png, 
> yarnodes.png, yarnui2onubuntu.png
>
>
> YARN-10450 added changes in NodeInfo class.
> Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
>  {code}
> {code:java}
> 2020-12-19 22:55:54,846 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-23 Thread Sunil G (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-10540:
---
Attachment: Screenshot 2020-12-23 at 8.24.42 PM.png

> Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes
> 
>
> Key: YARN-10540
> URL: https://issues.apache.org/jira/browse/YARN-10540
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: webapp
>Affects Versions: 3.2.2
>Reporter: Sunil G
>Assignee: Jim Brennan
>Priority: Critical
> Fix For: 3.4.0, 3.3.1, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: Mac-Yarn-UI.png, Screenshot 2020-12-19 at 11.01.43 
> PM.png, Screenshot 2020-12-19 at 11.02.14 PM.png, Screenshot 2020-12-23 at 
> 8.24.42 PM.png, YARN-10540.001.patch, Yarn-UI-Ubuntu.png, osx-yarn-ui2.png, 
> yarnodes.png, yarnui2onubuntu.png
>
>
> YARN-10450 added changes in NodeInfo class.
> Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
>  {code}
> {code:java}
> 2020-12-19 22:55:54,846 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-23 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254112#comment-17254112
 ] 

Sunil G commented on YARN-10540:


!Screenshot 2020-12-23 at 8.24.42 PM.png!

> Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes
> 
>
> Key: YARN-10540
> URL: https://issues.apache.org/jira/browse/YARN-10540
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: webapp
>Affects Versions: 3.2.2
>Reporter: Sunil G
>Assignee: Jim Brennan
>Priority: Critical
> Fix For: 3.4.0, 3.3.1, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: Mac-Yarn-UI.png, Screenshot 2020-12-19 at 11.01.43 
> PM.png, Screenshot 2020-12-19 at 11.02.14 PM.png, Screenshot 2020-12-23 at 
> 8.24.42 PM.png, YARN-10540.001.patch, Yarn-UI-Ubuntu.png, osx-yarn-ui2.png, 
> yarnodes.png, yarnui2onubuntu.png
>
>
> YARN-10450 added changes in NodeInfo class.
> Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
>  {code}
> {code:java}
> 2020-12-19 22:55:54,846 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-23 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254114#comment-17254114
 ] 

Sunil G commented on YARN-10540:


[~hexiaoqiao], This is working for me in the latest build.

You are missing the cors configurations in YARN. If you add them, this will 
load.
{code:java}


  hadoop.http.filter.initializers
  org.apache.hadoop.security.HttpCrossOriginFilterInitializer

  Enable/disable the cross-origin (CORS) filter.
  hadoop.http.cross-origin.enabled
  true

  Comma separated list of origins that are allowed for web
services needing cross-origin (CORS) support. Wildcards (*) and patterns
allowed
  hadoop.http.cross-origin.allowed-origins
  *

  Comma separated list of methods that are allowed for web
services needing cross-origin (CORS) support.
  hadoop.http.cross-origin.allowed-methods
  GET,POST,HEAD

  Comma separated list of headers that are allowed for web
services needing cross-origin (CORS) support.
  hadoop.http.cross-origin.allowed-headers
  X-Requested-With,Content-Type,Accept,Origin

  The number of seconds a pre-flighted request can be cached
for web services needing cross-origin (CORS) support.
  hadoop.http.cross-origin.max-age
  1800
 {code}

> Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes
> 
>
> Key: YARN-10540
> URL: https://issues.apache.org/jira/browse/YARN-10540
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: webapp
>Affects Versions: 3.2.2
>Reporter: Sunil G
>Assignee: Jim Brennan
>Priority: Critical
> Fix For: 3.4.0, 3.3.1, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: Mac-Yarn-UI.png, Screenshot 2020-12-19 at 11.01.43 
> PM.png, Screenshot 2020-12-19 at 11.02.14 PM.png, Screenshot 2020-12-23 at 
> 8.24.42 PM.png, YARN-10540.001.patch, Yarn-UI-Ubuntu.png, osx-yarn-ui2.png, 
> yarnodes.png, yarnui2onubuntu.png
>
>
> YARN-10450 added changes in NodeInfo class.
> Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
>  {code}
> {code:java}
> 2020-12-19 22:55:54,846 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-10559) Fair sharing intra-queue preemption support in Capacity Scheduler

2021-01-05 Thread Sunil G (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-10559:
--

Assignee: VADAGA ANANYO RAO

> Fair sharing intra-queue preemption support in Capacity Scheduler
> -
>
> Key: YARN-10559
> URL: https://issues.apache.org/jira/browse/YARN-10559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.1.4
>Reporter: VADAGA ANANYO RAO
>Assignee: VADAGA ANANYO RAO
>Priority: Major
> Fix For: 3.1.4
>
> Attachments: FairOP_preemption-design_doc_v1.pdf, 
> YARN-10559.0001.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Usecase:
> Due to the way Capacity Scheduler preemption works, If a single user submits 
> a large application to a queue (using 100% of resources), that job will not 
> be preempted by future applications from the same user within the same queue. 
> This implies that the later applications will be forced to wait for 
> completion of the long running application. This prevents multiple long 
> running, large, applications from running concurrently.
> Support fair sharing among apps while preempting applications from same queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10559) Fair sharing intra-queue preemption support in Capacity Scheduler

2021-01-05 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258863#comment-17258863
 ] 

Sunil G commented on YARN-10559:


[~epayne], could you please help to take a look in to this improvement.

We got this issue when a single user was using the entire queue and fairness 
based preemption was not happening due to some checks. This is a simple effort 
to fix as an initial approach. 

> Fair sharing intra-queue preemption support in Capacity Scheduler
> -
>
> Key: YARN-10559
> URL: https://issues.apache.org/jira/browse/YARN-10559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.1.4
>Reporter: VADAGA ANANYO RAO
>Assignee: VADAGA ANANYO RAO
>Priority: Major
> Fix For: 3.1.4
>
> Attachments: FairOP_preemption-design_doc_v1.pdf, 
> YARN-10559.0001.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Usecase:
> Due to the way Capacity Scheduler preemption works, If a single user submits 
> a large application to a queue (using 100% of resources), that job will not 
> be preempted by future applications from the same user within the same queue. 
> This implies that the later applications will be forced to wait for 
> completion of the long running application. This prevents multiple long 
> running, large, applications from running concurrently.
> Support fair sharing among apps while preempting applications from same queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10504) Implement weight mode in Capacity Scheduler

2021-01-11 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262580#comment-17262580
 ] 

Sunil G commented on YARN-10504:


I checked the latest patch over the weekend. There are no major comments from 
my side on this.

However, I think some more detailed refactoring is needed which should address 
a common way to handle the weight, percentage, absolute values etc. This could 
help to avoid a few hardcodings as well. So let's create another Jira to track 
this.

If test cases are fine, then I am generally +ve in getting this in soon. Thanks.

Thanks, [~bteke] [~wangda] [~zhuqi] for the efforts on this.

> Implement weight mode in Capacity Scheduler
> ---
>
> Key: YARN-10504
> URL: https://issues.apache.org/jira/browse/YARN-10504
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10504.001.patch, YARN-10504.002.patch, 
> YARN-10504.003.patch, YARN-10504.004.patch, YARN-10504.005.patch, 
> YARN-10504.006.patch, YARN-10504.007.patch, YARN-10504.008.patch, 
> YARN-10504.009.patch, YARN-10504.010.patch, YARN-10504.ver-1.patch, 
> YARN-10504.ver-2.patch, YARN-10504.ver-3.patch
>
>
> To allow the possibility to flexibly create queues in Capacity Scheduler a 
> weight mode should be introduced. The existing \{{capacity }}property should 
> be used with a different syntax, i.e:
> root.users.capacity = (1.0) or ~1.0 or ^1.0 or @1.0
> root.users.capacity = 1.0w
> root.users.capacity = w:1.0
> Weight support should not impact the existing functionality.
>  
> The new functionality should: 
>  * accept and validate the new weight values
>  * enforce a singular mode on the whole queue tree
>  * (re)calculate the relative (percentage-based) capacities based on the 
> weights during launch and every time the queue structure changes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10559) Fair sharing intra-queue preemption support in Capacity Scheduler

2021-01-11 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263089#comment-17263089
 ] 

Sunil G commented on YARN-10559:


[~ananyo_rao] Thanks for the efforts. Few minor comments.
 # Please take a look at the newly introduced 7 checkstyle warnings and help to 
check how many of them can be fixed. 
[https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/457/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt]
 # In setFairShareForApps, there could be some chances for divide by 0. Let me 
quote those here.
{code:java}
int numOfAppsInQueue = tq.leafQueue.getAllApplications().size();
Resource fairShareAcrossApps = Resources.divideAndCeil(
  this.rc, queueReassignableResource, numOfAppsInQueue);
{code}
We are handling this internally. but still, it's better to add a check.

 # It's better to ensure two points here. We should get a non-negative resource 
and its better to check & ensurefairShareForApp cannot be 0
{code:java}
Resource fairShareForApp = Resources.min(
rc, clusterResource, fairShareAcrossApps, fairShareWithinUL);{code}

> Fair sharing intra-queue preemption support in Capacity Scheduler
> -
>
> Key: YARN-10559
> URL: https://issues.apache.org/jira/browse/YARN-10559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.1.4
>Reporter: VADAGA ANANYO RAO
>Assignee: VADAGA ANANYO RAO
>Priority: Major
> Attachments: FairOP_preemption-design_doc_v1.pdf, 
> FairOP_preemption-design_doc_v2.pdf, YARN-10559.0001.patch, 
> YARN-10559.0002.patch, YARN-10559.0003.patch, YARN-10559.0004.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Usecase:
> Due to the way Capacity Scheduler preemption works, If a single user submits 
> a large application to a queue (using 100% of resources), that job will not 
> be preempted by future applications from the same user within the same queue. 
> This implies that the later applications will be forced to wait for 
> completion of the long running application. This prevents multiple long 
> running, large, applications from running concurrently.
> Support fair sharing among apps while preempting applications from same queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10559) Fair sharing intra-queue preemption support in Capacity Scheduler

2021-01-11 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263089#comment-17263089
 ] 

Sunil G edited comment on YARN-10559 at 1/12/21, 5:42 AM:
--

[~ananyo_rao] Thanks for the efforts. Few minor comments.
 # Please take a look at the newly introduced 7 checkstyle warnings and help to 
check how many of them can be fixed. 
[https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/457/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt]
 # In setFairShareForApps, there could be some chances for divide by 0. Let me 
quote those here.
{code:java}
int numOfAppsInQueue = tq.leafQueue.getAllApplications().size();
Resource fairShareAcrossApps = Resources.divideAndCeil(
  this.rc, queueReassignableResource, numOfAppsInQueue);
{code}
We are handling this internally. but still, it's better to add a check.
 # It's better to ensure two points here. We should get a non-negative resource 
and its better to check & ensurefairShareForApp cannot be 0
{code:java}
Resource fairShareForApp = Resources.min(
rc, clusterResource, fairShareAcrossApps, fairShareWithinUL);{code}


was (Author: sunilg):
[~ananyo_rao] Thanks for the efforts. Few minor comments.
 # Please take a look at the newly introduced 7 checkstyle warnings and help to 
check how many of them can be fixed. 
[https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/457/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt]
 # In setFairShareForApps, there could be some chances for divide by 0. Let me 
quote those here.
{code:java}
int numOfAppsInQueue = tq.leafQueue.getAllApplications().size();
Resource fairShareAcrossApps = Resources.divideAndCeil(
  this.rc, queueReassignableResource, numOfAppsInQueue);
{code}
We are handling this internally. but still, it's better to add a check.

 # It's better to ensure two points here. We should get a non-negative resource 
and its better to check & ensurefairShareForApp cannot be 0
{code:java}
Resource fairShareForApp = Resources.min(
rc, clusterResource, fairShareAcrossApps, fairShareWithinUL);{code}

> Fair sharing intra-queue preemption support in Capacity Scheduler
> -
>
> Key: YARN-10559
> URL: https://issues.apache.org/jira/browse/YARN-10559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.1.4
>Reporter: VADAGA ANANYO RAO
>Assignee: VADAGA ANANYO RAO
>Priority: Major
> Attachments: FairOP_preemption-design_doc_v1.pdf, 
> FairOP_preemption-design_doc_v2.pdf, YARN-10559.0001.patch, 
> YARN-10559.0002.patch, YARN-10559.0003.patch, YARN-10559.0004.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Usecase:
> Due to the way Capacity Scheduler preemption works, If a single user submits 
> a large application to a queue (using 100% of resources), that job will not 
> be preempted by future applications from the same user within the same queue. 
> This implies that the later applications will be forced to wait for 
> completion of the long running application. This prevents multiple long 
> running, large, applications from running concurrently.
> Support fair sharing among apps while preempting applications from same queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10559) Fair sharing intra-queue preemption support in Capacity Scheduler

2021-01-11 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263095#comment-17263095
 ] 

Sunil G commented on YARN-10559:


Also you may need to change *break* statement to *continue* in 
skipContainerBasedOnIntraQueuePolicy method.

> Fair sharing intra-queue preemption support in Capacity Scheduler
> -
>
> Key: YARN-10559
> URL: https://issues.apache.org/jira/browse/YARN-10559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.1.4
>Reporter: VADAGA ANANYO RAO
>Assignee: VADAGA ANANYO RAO
>Priority: Major
> Attachments: FairOP_preemption-design_doc_v1.pdf, 
> FairOP_preemption-design_doc_v2.pdf, YARN-10559.0001.patch, 
> YARN-10559.0002.patch, YARN-10559.0003.patch, YARN-10559.0004.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Usecase:
> Due to the way Capacity Scheduler preemption works, If a single user submits 
> a large application to a queue (using 100% of resources), that job will not 
> be preempted by future applications from the same user within the same queue. 
> This implies that the later applications will be forced to wait for 
> completion of the long running application. This prevents multiple long 
> running, large, applications from running concurrently.
> Support fair sharing among apps while preempting applications from same queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10575) Hadoop

2021-01-18 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267108#comment-17267108
 ] 

Sunil G commented on YARN-10575:


[~Pushpalatha_13] Is jira created by mistake?
please reopen and add more context. I will close this for now.

> Hadoop
> --
>
> Key: YARN-10575
> URL: https://issues.apache.org/jira/browse/YARN-10575
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Pushpalatha S K
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10575) Hadoop

2021-01-18 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267108#comment-17267108
 ] 

Sunil G edited comment on YARN-10575 at 1/18/21, 9:15 AM:
--

[~Pushpalatha_13] Is jira created by mistake?
please reopen and add more context if needed.


was (Author: sunilg):
[~Pushpalatha_13] Is jira created by mistake?
please reopen and add more context. I will close this for now.

> Hadoop
> --
>
> Key: YARN-10575
> URL: https://issues.apache.org/jira/browse/YARN-10575
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Pushpalatha S K
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-10512) CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include mode of operation for CS

2021-01-18 Thread Sunil G (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-10512:
--

Assignee: Sunil G  (was: Szilard Nemeth)

> CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include 
> mode of operation for CS
> --
>
> Key: YARN-10512
> URL: https://issues.apache.org/jira/browse/YARN-10512
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-10512.001.patch, YARN-10512.002.patch, 
> YARN-10512.003.patch
>
>
> Under this umbrella (YARN-10496), weight-mode has been implemented for CS 
> with YARN-10504.
> We would like to expose the mode of operation with the RM's /scheduler REST 
> endpoint.
> The field name will be 'mode'.
> All queue representations in the response will be uniformly hold any of the 
> mode values of: "percentage", "absolute", "weight".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-10512) CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include mode of operation for CS

2021-01-18 Thread Sunil G (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-10512:
--

Assignee: Szilard Nemeth  (was: Sunil G)

> CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include 
> mode of operation for CS
> --
>
> Key: YARN-10512
> URL: https://issues.apache.org/jira/browse/YARN-10512
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10512.001.patch, YARN-10512.002.patch, 
> YARN-10512.003.patch
>
>
> Under this umbrella (YARN-10496), weight-mode has been implemented for CS 
> with YARN-10504.
> We would like to expose the mode of operation with the RM's /scheduler REST 
> endpoint.
> The field name will be 'mode'.
> All queue representations in the response will be uniformly hold any of the 
> mode values of: "percentage", "absolute", "weight".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10512) CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include mode of operation for CS

2021-01-18 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267366#comment-17267366
 ] 

Sunil G commented on YARN-10512:


[~snemeth] Thanks for working on this. And thanks [~gandras] for reviews.
I am also +1 for the v3 patch.

If there are no major comments, I can merge this. 

> CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include 
> mode of operation for CS
> --
>
> Key: YARN-10512
> URL: https://issues.apache.org/jira/browse/YARN-10512
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10512.001.patch, YARN-10512.002.patch, 
> YARN-10512.003.patch
>
>
> Under this umbrella (YARN-10496), weight-mode has been implemented for CS 
> with YARN-10504.
> We would like to expose the mode of operation with the RM's /scheduler REST 
> endpoint.
> The field name will be 'mode'.
> All queue representations in the response will be uniformly hold any of the 
> mode values of: "percentage", "absolute", "weight".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10512) CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include mode of operation for CS

2021-01-19 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267832#comment-17267832
 ] 

Sunil G commented on YARN-10512:


[~snemeth], I could see some more checkstyle issues. Are those fixable?

> CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include 
> mode of operation for CS
> --
>
> Key: YARN-10512
> URL: https://issues.apache.org/jira/browse/YARN-10512
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10512.001.patch, YARN-10512.002.patch, 
> YARN-10512.003.patch, YARN-10512.004.patch
>
>
> Under this umbrella (YARN-10496), weight-mode has been implemented for CS 
> with YARN-10504.
> We would like to expose the mode of operation with the RM's /scheduler REST 
> endpoint.
> The field name will be 'mode'.
> All queue representations in the response will be uniformly hold any of the 
> mode values of: "percentage", "absolute", "weight".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10572) Merge YARN-8557 and YARN-10352, and rebase based YARN-10380.

2021-01-19 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268373#comment-17268373
 ] 

Sunil G commented on YARN-10572:


[~zhuqi], Description or the task items of this jira is vague. All work is 
happening in the apache "trunk" and not in the branch, hence I am not very sure 
about the work described here.

So please help to update the description in line with the patch, which could be 
a bug fix or improvement, to clearly state the task. Thanks.
cc [~leftnoteasy]

> Merge YARN-8557 and YARN-10352, and rebase based YARN-10380.
> 
>
> Key: YARN-10572
> URL: https://issues.apache.org/jira/browse/YARN-10572
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
> Attachments: YARN-10572.001.patch
>
>
> The work is :
> 1. Because of  YARN-10380, We should rebase YARN-10352
> 2. Also merge YARN-8557 for not running case skip.
> 3. Refactor some method in YARN-10380



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10512) CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include mode of operation for CS

2021-01-19 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268385#comment-17268385
 ] 

Sunil G commented on YARN-10512:


[~snemeth], my suggestion is to create a followup jira to handle the changes 
suggested by [~pbacsko].
If there are no issues, I will push this now.

> CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include 
> mode of operation for CS
> --
>
> Key: YARN-10512
> URL: https://issues.apache.org/jira/browse/YARN-10512
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10512.001.patch, YARN-10512.002.patch, 
> YARN-10512.003.patch, YARN-10512.004.patch
>
>
> Under this umbrella (YARN-10496), weight-mode has been implemented for CS 
> with YARN-10504.
> We would like to expose the mode of operation with the RM's /scheduler REST 
> endpoint.
> The field name will be 'mode'.
> All queue representations in the response will be uniformly hold any of the 
> mode values of: "percentage", "absolute", "weight".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10512) CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include mode of operation for CS

2021-01-19 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268402#comment-17268402
 ] 

Sunil G commented on YARN-10512:


[~snemeth], I pushed to trunk. But branch-3.3, 3.2 all have conflicts. Please 
help to rebase.

> CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include 
> mode of operation for CS
> --
>
> Key: YARN-10512
> URL: https://issues.apache.org/jira/browse/YARN-10512
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10512.001.patch, YARN-10512.002.patch, 
> YARN-10512.003.patch, YARN-10512.004.patch
>
>
> Under this umbrella (YARN-10496), weight-mode has been implemented for CS 
> with YARN-10504.
> We would like to expose the mode of operation with the RM's /scheduler REST 
> endpoint.
> The field name will be 'mode'.
> All queue representations in the response will be uniformly hold any of the 
> mode values of: "percentage", "absolute", "weight".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10579) CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include weight values for queues

2021-01-20 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268712#comment-17268712
 ] 

Sunil G commented on YARN-10579:


[~snemeth] test failures are related.

> CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include 
> weight values for queues
> --
>
> Key: YARN-10579
> URL: https://issues.apache.org/jira/browse/YARN-10579
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10579.001.patch
>
>
> Under this umbrella (YARN-10496), weight-mode has been implemented for CS 
> with YARN-10504.
>  We would like to expose the weight values for all queues with the RM's 
> /scheduler REST endpoint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10598) CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to extend the creation type with additional information

2021-01-26 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272546#comment-17272546
 ] 

Sunil G commented on YARN-10598:


[~bteke] [~snemeth]
queueType cannot be changed as It will be incompatible. It's better we add only 
a new variable. 

> CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to extend the 
> creation type with additional information
> --
>
> Key: YARN-10598
> URL: https://issues.apache.org/jira/browse/YARN-10598
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10598.001.patch
>
>
> Under this umbrella (YARN-10496), weight-mode has been implemented for CS 
> with YARN-10504.
> Auto-queue creation has been also implemented with YARN-10506.
> Connected to this effort, we would like to expose the type of the queue with 
> the RM's /scheduler REST endpoint.
> To extend/modify the values added in YARN-10581 these 3 fields will describe 
> a queue:
>  * queueType : *parent/leaf*
>  * creationMethod : *static/dynamicLegacy/dynamicFlexible*
>  * autoCreationEligibility : *off/legacy/flexible*
> After this change here are some example cases:
>  * Static parent queue which has the auto-creation-enabled-v2 false:
>  ** queueType : *parent*
>  ** creationMethod : *static*
>  ** autoCreationEligibility : *off*
>  * Static managed parent (can have dynamic children):
>  ** queueType : *parent*
>  ** creationMethod : *static*
>  ** autoCreationEligibility : *legacy*
>  * Legacy auto-created leaf queue (cannot have children):
>  ** queueType : *leaf*
>  ** creationMethod : *dynamicLegacy*
>  ** autoCreationEligibility : *off*
>  * Auto-created (v2) parent queue, (implicitly) auto-creation-enabled-v2 
> true: 
>  ** queueType : *parent*
>  ** creationMethod : *dynamicFlexible*
>  ** autoCreationEligibility : *flexible*
>  * Auto-created (v2) leaf queue (cannot have children):
>  ** queueType : *leaf*
>  ** creationMethod : *dynamicFlexible*
>  ** autoCreationEligibility : *off*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10617) Fifo and Fair intra-queue preemption goes on indefinitely when apps are in pending state due to max AM limit reached

2021-02-08 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281535#comment-17281535
 ] 

Sunil G commented on YARN-10617:


Hi [~ananyo_rao]
Yes. In the preemption module, we get all apps from the scheduler. Hence some 
of the apps may be in a pending state which cant be scheduled (due to AM limit 
etc). So I think this is a quick fix. 

> Fifo and Fair intra-queue preemption goes on indefinitely when apps are in 
> pending state due to max AM limit reached
> 
>
> Key: YARN-10617
> URL: https://issues.apache.org/jira/browse/YARN-10617
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.1.1
>Reporter: VADAGA ANANYO RAO
>Assignee: VADAGA ANANYO RAO
>Priority: Major
> Attachments: YARN-10617.patch
>
>
> This case occurs when:
> 1. an application gets submitted in a cluster running at max-AM limit.
> 2. The new job requests AM resource. So it has 1 pending request.
> 3. To fulfil this request, the preemption logic preempts 1 resource from a 
> running app.
> 4. Because the cluster is at max-AM limit, the scheduler re-assigns the 
> preempted container back to the running app.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-11056) Incorrect capitalization of NVIDIA in the docs

2022-05-17 Thread Sunil G (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-11056:
--

Assignee: Bharati Jadhav

> Incorrect capitalization of NVIDIA in the docs 
> ---
>
> Key: YARN-11056
> URL: https://issues.apache.org/jira/browse/YARN-11056
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Gera Shegalov
>Assignee: Bharati Jadhav
>Priority: Trivial
>  Labels: doc, doc-site
>
> According to [https://www.nvidia.com/en-us/about-nvidia/legal-info/]  the 
> spelling should be all-caps NVIDIA
> Examples of differing capitalization 
> https://github.com/apache/hadoop/blob/03cfc852791c14fad39db4e5b14104a276c08e59/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/UsingGpus.md
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10789) RM HA startup can fail due to race conditions in ZKConfigurationStore

2021-05-27 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352382#comment-17352382
 ] 

Sunil G commented on YARN-10789:


Thanks [~tarunparimi] 
please put WARN log for all NodeExists exception as well. 

> RM HA startup can fail due to race conditions in ZKConfigurationStore
> -
>
> Key: YARN-10789
> URL: https://issues.apache.org/jira/browse/YARN-10789
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-10789.001.patch
>
>
> We are observing below error randomly during hadoop install and RM initial 
> startup when HA is enabled and yarn.scheduler.configuration.store.class=zk is 
> configured. This causes one of the RMs to not startup.
> {code:java}
> 2021-05-26 12:59:18,986 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state INITED
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
> NodeExists for /confstore/CONF_STORE
> {code}
> We are trying to create the znode /confstore/CONF_STORE when we initialize 
> the ZKConfigurationStore. But the problem is that the ZKConfigurationStore is 
> initialized when CapacityScheduler does a serviceInit. This serviceInit is 
> done by both Active and Standby RM. So we can run into a race condition when 
> both Active and Standby try to create the same znode when both RM are started 
> at same time.
> ZKRMStateStore on the other hand avoids such race conditions, by creating the 
> znodes only after serviceStart. serviceStart only happens for the active RM 
> which won the leader election, unlike serviceInit which happens irrespective 
> of leader election.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-11078) Set env vars in a cross platform compatible way

2022-02-20 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-11078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495180#comment-17495180
 ] 

Sunil G commented on YARN-11078:


Hi [~gaurava]
you may need to add a new profile flag for windows here with set command. 

> Set env vars in a cross platform compatible way
> ---
>
> Key: YARN-11078
> URL: https://issues.apache.org/jira/browse/YARN-11078
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: webapp
>Affects Versions: 3.4.0
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Major
>
> Prior to running a node.js command, a *TMPDIR* environment variable is set - 
> [hadoop/package.json at 11d144d2284be29da1f49e163db0763636dcf058 · 
> apache/hadoop 
> (github.com)|https://github.com/apache/hadoop/blob/11d144d2284be29da1f49e163db0763636dcf058/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/package.json#L11-L13]
> {code:json}
> "build:mvn": "TMPDIR=tmp node/node ./node_modules/ember-cli/bin/ember build 
> -prod"
> {code}
> This causes the command execution to fail on Windows since environment 
> variables are set using the *set* command on Windows. The equivalent command 
> on Windows would be -
> {code:json}
> "build:mvn": "set TMPDIR=tmp; node/node ./node_modules/ember-cli/bin/ember 
> build -prod"
> {code}
> There's no cross platform way to set the environment variables. Thus, we need 
> to be able to chose either of the commands based on the platform where Hadoop 
> gets built on.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-11096) Support node load based scheduling

2022-03-23 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17511519#comment-17511519
 ] 

Sunil G commented on YARN-11096:


I dont think there is a design limitation here.
It will be better if we can get a common implementation here. cc 
[~bibinchundatt] [~snemeth]

> Support node load based scheduling
> --
>
> Key: YARN-11096
> URL: https://issues.apache.org/jira/browse/YARN-11096
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Deegue
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ResourceManager can scheduled according to the node load reported by 
> NodeManager through heartbeat.
>  
> We can set up threshold and auto skip the nodes with high load when 
> scheduling.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9699) Migration tool that help to generate CS config based on FS config

2019-09-27 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939317#comment-16939317
 ] 

Sunil G commented on YARN-9699:
---

I would vote for standalone like below
{code:java}

Usage: yarn resourcemanager [-format-state-store]
 {code}
Above command doesnt need YARN to be running

> Migration tool that help to generate CS config based on FS config
> -
>
> Key: YARN-9699
> URL: https://issues.apache.org/jira/browse/YARN-9699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wanqiang Ji
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: FS_to_CS_migration_POC.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9864) Format CS Configuration present in Configuration Store

2019-09-30 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941127#comment-16941127
 ] 

Sunil G commented on YARN-9864:
---

Thanks [~Prabhu Joseph] for adding CLI command as well.

Couple of more questions here:
 # Currently i can see that the existing sched-conf REST client call also using 
user.name param. Cud u pls confirm that it is not breaking any old behavior ?
 # Cud u pls double confirm that while you delete ZK path, there wont be any 
concurrent writes to the same?

Thanks

> Format CS Configuration present in Configuration Store
> --
>
> Key: YARN-9864
> URL: https://issues.apache.org/jira/browse/YARN-9864
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9864-001.patch, YARN-9864-002.patch, 
> YARN-9864-003.patch, YARN-9864-004.patch
>
>
> This provides an option to format the configuration changes present in 
> ConfigurationStore (ZK, LevelDB) and reinitialize from the Local 
> Capacity-scheduler.xml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9864) Format CS Configuration present in Configuration Store

2019-09-30 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941203#comment-16941203
 ] 

Sunil G commented on YARN-9864:
---

Thanks [~Prabhu Joseph]

Looks fine, +1

Will commit tomorrow if there are no objections.

> Format CS Configuration present in Configuration Store
> --
>
> Key: YARN-9864
> URL: https://issues.apache.org/jira/browse/YARN-9864
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9864-001.patch, YARN-9864-002.patch, 
> YARN-9864-003.patch, YARN-9864-004.patch
>
>
> This provides an option to format the configuration changes present in 
> ConfigurationStore (ZK, LevelDB) and reinitialize from the Local 
> Capacity-scheduler.xml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9864) Format CS Configuration present in Configuration Store

2019-09-30 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941501#comment-16941501
 ] 

Sunil G commented on YARN-9864:
---

Thanks [~Prabhu Joseph] 

Committed to trunk and branch-3.2, however branch-3.1 cherry-pick is failing.

> Format CS Configuration present in Configuration Store
> --
>
> Key: YARN-9864
> URL: https://issues.apache.org/jira/browse/YARN-9864
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9864-001.patch, YARN-9864-002.patch, 
> YARN-9864-003.patch, YARN-9864-004.patch, YARN-9864-005.patch
>
>
> This provides an option to format the configuration changes present in 
> ConfigurationStore (ZK, LevelDB) and reinitialize from the Local 
> Capacity-scheduler.xml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9801) SchedConfCli does not work with https RM

2019-09-30 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941537#comment-16941537
 ] 

Sunil G commented on YARN-9801:
---

+1 for this patch. I think we can get this in now.

> SchedConfCli does not work with https RM
> 
>
> Key: YARN-9801
> URL: https://issues.apache.org/jira/browse/YARN-9801
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9801-001.patch, YARN-9801-002.patch
>
>
> SchedConfCli does not work with https RM
> {code}
> [yarn@rmhost-1 /]$ yarn schedulerconf -global 
> yarn.scheduler.capacity.maximum-applications=1
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> Exception in thread "main" com.sun.jersey.api.client.ClientHandlerException: 
> javax.net.ssl.SSLHandshakeException: Error while authenticating with 
> endpoint: https://:8090/ws/v1/cluster/scheduler-conf
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
>   at com.sun.jersey.api.client.Client.handle(Client.java:652)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.put(WebResource.java:529)
>   at 
> org.apache.hadoop.yarn.client.cli.SchedConfCLI.updateSchedulerConfOnRMNode(SchedConfCLI.java:178)
>   at 
> org.apache.hadoop.yarn.webapp.util.WebAppUtils.execOnActiveRM(WebAppUtils.java:102)
>   at 
> org.apache.hadoop.yarn.client.cli.SchedConfCLI.run(SchedConfCLI.java:143)
>   at 
> org.apache.hadoop.yarn.client.cli.SchedConfCLI.main(SchedConfCLI.java:77)
> Caused by: javax.net.ssl.SSLHandshakeException: Error while authenticating 
> with endpoint: https://:8090/ws/v1/cluster/scheduler-conf
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.wrapExceptionWithMessage(KerberosAuthenticator.java:232)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:216)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:348)
>   at 
> org.apache.hadoop.yarn.client.cli.SchedConfCLI$1.getHttpURLConnection(SchedConfCLI.java:157)
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:165)
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
>   ... 8 more
> Caused by: javax.net.ssl.SSLHandshakeException: 
> sun.security.validator.ValidatorException: PKIX path building failed: 
> sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
> valid certification path to requested target
>   at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
>   at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1959)
>   at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)
>   at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
>   at 
> sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1514)
>   at 
> sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
>   at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1026)
>   at sun.security.ssl.Handshaker.process_record(Handshaker.java:961)
>   at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1072)
>   at 
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385)
>   at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1413)
>   at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1397)
>   at 
> sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559)
>   at 
> sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
>   at 
> sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:153)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:189)
>   ... 12 more
> Caused by: sun.security.validator.ValidatorException: PKIX pat

[jira] [Updated] (YARN-9801) SchedConfCli does not work with https mode

2019-10-01 Thread Sunil G (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-9801:
--
Summary: SchedConfCli does not work with https mode  (was: SchedConfCli 
does not work with https RM)

> SchedConfCli does not work with https mode
> --
>
> Key: YARN-9801
> URL: https://issues.apache.org/jira/browse/YARN-9801
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9801-001.patch, YARN-9801-002.patch, 
> YARN-9801-003.patch
>
>
> SchedConfCli does not work with https RM
> {code}
> [yarn@rmhost-1 /]$ yarn schedulerconf -global 
> yarn.scheduler.capacity.maximum-applications=1
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> Exception in thread "main" com.sun.jersey.api.client.ClientHandlerException: 
> javax.net.ssl.SSLHandshakeException: Error while authenticating with 
> endpoint: https://:8090/ws/v1/cluster/scheduler-conf
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
>   at com.sun.jersey.api.client.Client.handle(Client.java:652)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.put(WebResource.java:529)
>   at 
> org.apache.hadoop.yarn.client.cli.SchedConfCLI.updateSchedulerConfOnRMNode(SchedConfCLI.java:178)
>   at 
> org.apache.hadoop.yarn.webapp.util.WebAppUtils.execOnActiveRM(WebAppUtils.java:102)
>   at 
> org.apache.hadoop.yarn.client.cli.SchedConfCLI.run(SchedConfCLI.java:143)
>   at 
> org.apache.hadoop.yarn.client.cli.SchedConfCLI.main(SchedConfCLI.java:77)
> Caused by: javax.net.ssl.SSLHandshakeException: Error while authenticating 
> with endpoint: https://:8090/ws/v1/cluster/scheduler-conf
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.wrapExceptionWithMessage(KerberosAuthenticator.java:232)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:216)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:348)
>   at 
> org.apache.hadoop.yarn.client.cli.SchedConfCLI$1.getHttpURLConnection(SchedConfCLI.java:157)
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:165)
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
>   ... 8 more
> Caused by: javax.net.ssl.SSLHandshakeException: 
> sun.security.validator.ValidatorException: PKIX path building failed: 
> sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
> valid certification path to requested target
>   at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
>   at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1959)
>   at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)
>   at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
>   at 
> sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1514)
>   at 
> sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
>   at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1026)
>   at sun.security.ssl.Handshaker.process_record(Handshaker.java:961)
>   at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1072)
>   at 
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385)
>   at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1413)
>   at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1397)
>   at 
> sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559)
>   at 
> sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
>   at 
> sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:153)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:189)
>   ... 12 more
> Caused by: sun.security.validator.Val

[jira] [Updated] (YARN-9864) Format CS Configuration present in Configuration Store

2019-10-01 Thread Sunil G (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-9864:
--
Hadoop Flags: Reviewed

> Format CS Configuration present in Configuration Store
> --
>
> Key: YARN-9864
> URL: https://issues.apache.org/jira/browse/YARN-9864
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9864-001.patch, YARN-9864-002.patch, 
> YARN-9864-003.patch, YARN-9864-004.patch, YARN-9864-005.patch, 
> YARN-9864-branch-3.1.001.patch, YARN-9864-branch-3.1.002.patch
>
>
> This provides an option to format the configuration changes present in 
> ConfigurationStore (ZK, LevelDB) and reinitialize from the Local 
> Capacity-scheduler.xml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9864) Format CS Configuration present in Configuration Store

2019-10-01 Thread Sunil G (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-9864:
--
Fix Version/s: 3.1.4
   3.2.2
   3.3.0

> Format CS Configuration present in Configuration Store
> --
>
> Key: YARN-9864
> URL: https://issues.apache.org/jira/browse/YARN-9864
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9864-001.patch, YARN-9864-002.patch, 
> YARN-9864-003.patch, YARN-9864-004.patch, YARN-9864-005.patch, 
> YARN-9864-branch-3.1.001.patch, YARN-9864-branch-3.1.002.patch
>
>
> This provides an option to format the configuration changes present in 
> ConfigurationStore (ZK, LevelDB) and reinitialize from the Local 
> Capacity-scheduler.xml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9699) Migration tool that help to generate CS config based on FS config (Phase 1)

2019-10-04 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944622#comment-16944622
 ] 

Sunil G commented on YARN-9699:
---

Thanks [~pbacsko]
 # pls remove the fair-schduler.xml from root path. if thats needed, please 
move to test/resources
 # yarn resourcemanager CLI is used here. hence ruling out the factor that RM 
need nt have to run. I am thinking abt a case on a live system where we need to 
upgrade. more thoughts are there on this which CLI is better, however we can 
address this later considering the completeness in this. I ll open a Jira to 
discuss the CLI in details later.
 # I would like to see more test results. kindly attach the same with patch.

> Migration tool that help to generate CS config based on FS config (Phase 1)
> ---
>
> Key: YARN-9699
> URL: https://issues.apache.org/jira/browse/YARN-9699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wanqiang Ji
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: FS_to_CS_migration_POC.patch, YARN-9699.001.patch, 
> YARN-9699.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9873) Version Number for each Scheduler Config Change

2019-10-04 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944628#comment-16944628
 ] 

Sunil G commented on YARN-9873:
---

+1

> Version Number for each Scheduler Config Change
> ---
>
> Key: YARN-9873
> URL: https://issues.apache.org/jira/browse/YARN-9873
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9873-001.patch, YARN-9873-002.patch
>
>
> Version Number support for each Scheduler Config Change. This also helps to 
> know when the last change happened.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-9873) Version Number for each Scheduler Config Change

2019-10-04 Thread Sunil G (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-9873:
-

Assignee: Sunil G  (was: Prabhu Joseph)

> Version Number for each Scheduler Config Change
> ---
>
> Key: YARN-9873
> URL: https://issues.apache.org/jira/browse/YARN-9873
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-9873-001.patch, YARN-9873-002.patch
>
>
> Version Number support for each Scheduler Config Change. This also helps to 
> know when the last change happened.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9873) Version Number for each Scheduler Config Change

2019-10-04 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944629#comment-16944629
 ] 

Sunil G commented on YARN-9873:
---

I went through this patch, overall looks good.

Tested also locally. This helps to have a version for all scheduler config.

> Version Number for each Scheduler Config Change
> ---
>
> Key: YARN-9873
> URL: https://issues.apache.org/jira/browse/YARN-9873
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-9873-001.patch, YARN-9873-002.patch
>
>
> Version Number support for each Scheduler Config Change. This also helps to 
> know when the last change happened.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-9873) Mutation API Config Change updates Version Number

2019-10-04 Thread Sunil G (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-9873:
-

Assignee: Prabhu Joseph  (was: Sunil G)

> Mutation API Config Change updates Version Number 
> --
>
> Key: YARN-9873
> URL: https://issues.apache.org/jira/browse/YARN-9873
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9873-001.patch, YARN-9873-002.patch
>
>
> Version Number support for each Scheduler Config Change. This also helps to 
> know when the last change happened.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9873) Mutation API Config Change updates Version Number

2019-10-04 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944644#comment-16944644
 ] 

Sunil G commented on YARN-9873:
---

Thanks [~Prabhu Joseph]

> Mutation API Config Change updates Version Number 
> --
>
> Key: YARN-9873
> URL: https://issues.apache.org/jira/browse/YARN-9873
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
> Attachments: YARN-9873-001.patch, YARN-9873-002.patch
>
>
> Version Number support for each Scheduler Config Change. This also helps to 
> know when the last change happened.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Reopened] (YARN-9873) Mutation API Config Change updates Version Number

2019-10-04 Thread Sunil G (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reopened YARN-9873:
---

> Mutation API Config Change updates Version Number 
> --
>
> Key: YARN-9873
> URL: https://issues.apache.org/jira/browse/YARN-9873
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
> Attachments: YARN-9873-001.patch, YARN-9873-002.patch
>
>
> Version Number support for each Scheduler Config Change. This also helps to 
> know when the last change happened.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9873) Mutation API Config Change updates Version Number

2019-10-04 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944969#comment-16944969
 ] 

Sunil G commented on YARN-9873:
---

There is an issue in this patch with number retrieval. Reverting for now.

[~Prabhu Joseph], kindly help to address the version number issue.

> Mutation API Config Change updates Version Number 
> --
>
> Key: YARN-9873
> URL: https://issues.apache.org/jira/browse/YARN-9873
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
> Attachments: YARN-9873-001.patch, YARN-9873-002.patch
>
>
> Version Number support for each Scheduler Config Change. This also helps to 
> know when the last change happened.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9873) Mutation API Config Change updates Version Number

2019-10-09 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947523#comment-16947523
 ] 

Sunil G commented on YARN-9873:
---

Thanks [~Prabhu Joseph] for addressing comments

Pushing this in . +1

> Mutation API Config Change updates Version Number 
> --
>
> Key: YARN-9873
> URL: https://issues.apache.org/jira/browse/YARN-9873
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
> Attachments: YARN-9873-001.patch, YARN-9873-002.patch, 
> YARN-9873-003.patch, YARN-9873-004.patch
>
>
> Version Number support for each Scheduler Config Change. This also helps to 
> know when the last change happened.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9873) Mutation API Config Change need to update Version Number

2019-10-09 Thread Sunil G (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-9873:
--
Summary: Mutation API Config Change need to update Version Number   (was: 
Mutation API Config Change updates Version Number )

> Mutation API Config Change need to update Version Number 
> -
>
> Key: YARN-9873
> URL: https://issues.apache.org/jira/browse/YARN-9873
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
> Attachments: YARN-9873-001.patch, YARN-9873-002.patch, 
> YARN-9873-003.patch, YARN-9873-004.patch
>
>
> Version Number support for each Scheduler Config Change. This also helps to 
> know when the last change happened.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9873) Mutation API Config Change need to update Version Number

2019-10-09 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947524#comment-16947524
 ] 

Sunil G commented on YARN-9873:
---

Committed to trunk/branch-3.2

> Mutation API Config Change need to update Version Number 
> -
>
> Key: YARN-9873
> URL: https://issues.apache.org/jira/browse/YARN-9873
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
> Attachments: YARN-9873-001.patch, YARN-9873-002.patch, 
> YARN-9873-003.patch, YARN-9873-004.patch
>
>
> Version Number support for each Scheduler Config Change. This also helps to 
> know when the last change happened.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9900) Revert Invalid Config and Refresh Support in SchedulerConfig Format

2019-10-15 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952486#comment-16952486
 ] 

Sunil G commented on YARN-9900:
---

{code:java}
2585try {
2586  rm.getRMContext().getRMAdminService().refreshQueues();
2587} catch (IOException | YarnException e) {
2588  LOG.error("Exception thrown when formatting configuration.", e);
2589  mutableConfigurationProvider.revertFromOldConfig(conf);
2590  throw e;
2591}
2592rm.getRMContext().getRMAdminService().refreshQueues(); {code}
1. Here refreshQueues is repeated again. Pls check this. Thanks [~Prabhu Joseph]

2. Rename revertFromOldConfig ==> revertToOldConfig

> Revert Invalid Config and Refresh Support in SchedulerConfig Format
> ---
>
> Key: YARN-9900
> URL: https://issues.apache.org/jira/browse/YARN-9900
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0, 3.2.2, 3.1.4
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9900-001.patch, YARN-9900-002.patch
>
>
> Format Scheduler Config Option has to revert to the previous scheduler 
> configuration in case of invalid capacity-scheduler.xml contents. And refresh 
> has to be done after format so  that RM need not be restarted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9900) Revert Invalid Config and Refresh Support in SchedulerConfig Format

2019-10-16 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952778#comment-16952778
 ] 

Sunil G commented on YARN-9900:
---

+1. Committing this shortly

> Revert Invalid Config and Refresh Support in SchedulerConfig Format
> ---
>
> Key: YARN-9900
> URL: https://issues.apache.org/jira/browse/YARN-9900
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0, 3.2.2, 3.1.4
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9900-001.patch, YARN-9900-002.patch, 
> YARN-9900-003.patch
>
>
> Format Scheduler Config Option has to revert to the previous scheduler 
> configuration in case of invalid capacity-scheduler.xml contents. And refresh 
> has to be done after format so  that RM need not be restarted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9900) Revert to previous state when Invalid Config is applied and Refresh Support in SchedulerConfig Format

2019-10-16 Thread Sunil G (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-9900:
--
Summary: Revert to previous state when Invalid Config is applied and 
Refresh Support in SchedulerConfig Format  (was: Revert Invalid Config and 
Refresh Support in SchedulerConfig Format)

> Revert to previous state when Invalid Config is applied and Refresh Support 
> in SchedulerConfig Format
> -
>
> Key: YARN-9900
> URL: https://issues.apache.org/jira/browse/YARN-9900
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0, 3.2.2, 3.1.4
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9900-001.patch, YARN-9900-002.patch, 
> YARN-9900-003.patch
>
>
> Format Scheduler Config Option has to revert to the previous scheduler 
> configuration in case of invalid capacity-scheduler.xml contents. And refresh 
> has to be done after format so  that RM need not be restarted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9900) Revert to previous state when Invalid Config is applied and Refresh Support in SchedulerConfig Format

2019-10-16 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952787#comment-16952787
 ] 

Sunil G commented on YARN-9900:
---

[~Prabhu Joseph] committed to trunk and branch-3.2

path to branch-3.1 cannot be cleanly applied. pls share a patch.

> Revert to previous state when Invalid Config is applied and Refresh Support 
> in SchedulerConfig Format
> -
>
> Key: YARN-9900
> URL: https://issues.apache.org/jira/browse/YARN-9900
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0, 3.2.2, 3.1.4
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9900-001.patch, YARN-9900-002.patch, 
> YARN-9900-003.patch
>
>
> Format Scheduler Config Option has to revert to the previous scheduler 
> configuration in case of invalid capacity-scheduler.xml contents. And refresh 
> has to be done after format so  that RM need not be restarted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9909) Offline format of YarnConfigurationStore

2019-10-16 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953293#comment-16953293
 ] 

Sunil G commented on YARN-9909:
---

+1 for this patch.

> Offline format of YarnConfigurationStore
> 
>
> Key: YARN-9909
> URL: https://issues.apache.org/jira/browse/YARN-9909
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9909-001.patch
>
>
> YARN-9864 provides format option which removes the persistent configuration 
> from the backing YarnConfigurationStore and initializes the new one from 
> capacity-scheduler.xml. This is used when RM is running and to refresh with 
> new CapacityScheduler configs.
> This Jira provides an option to format the configuration store when RM is not 
> running. This is useful to setup capacity scheduler configs before RM startup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9909) Offline format of YarnConfigurationStore

2019-10-16 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953298#comment-16953298
 ] 

Sunil G commented on YARN-9909:
---

Committed this to trunk, failing to apply for branch-3.2 & 3.1. 
[~Prabhu Joseph]  pls help to share the patch.

> Offline format of YarnConfigurationStore
> 
>
> Key: YARN-9909
> URL: https://issues.apache.org/jira/browse/YARN-9909
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9909-001.patch
>
>
> YARN-9864 provides format option which removes the persistent configuration 
> from the backing YarnConfigurationStore and initializes the new one from 
> capacity-scheduler.xml. This is used when RM is running and to refresh with 
> new CapacityScheduler configs.
> This Jira provides an option to format the configuration store when RM is not 
> running. This is useful to setup capacity scheduler configs before RM startup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9830) Improve ContainerAllocationExpirer it blocks scheduling

2019-10-23 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957882#comment-16957882
 ] 

Sunil G commented on YARN-9830:
---

Thanks [~bibinchundatt] 

I think this change seems fine to me. this get us into some fine grained lock, 
and i think its much better for performance, 

[~cheersyang] [~jhung] [~rohithsharmaks]  cud u pls help to take a help and 
share your thoughts.

> Improve ContainerAllocationExpirer it blocks scheduling
> ---
>
> Key: YARN-9830
> URL: https://issues.apache.org/jira/browse/YARN-9830
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Bibin Chundatt
>Priority: Critical
>  Labels: perfomance
> Attachments: YARN-9830.001.patch
>
>
> {quote}
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor.register(AbstractLivelinessMonitor.java:106)
> - waiting to lock <0x7fa348749550> (a 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.ContainerAllocationExpirer)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$AcquiredTransition.transition(RMContainerImpl.java:601)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$AcquiredTransition.transition(RMContainerImpl.java:592)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> - locked <0x7fc8852f8200> (a 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:474)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:65)
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9937) Add missing queue configs in RMWebService#CapacitySchedulerQueueInfo

2019-10-30 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963046#comment-16963046
 ] 

Sunil G commented on YARN-9937:
---

Changes looks fine to me. I can get this in if there are no issues.

> Add missing queue configs in RMWebService#CapacitySchedulerQueueInfo
> 
>
> Key: YARN-9937
> URL: https://issues.apache.org/jira/browse/YARN-9937
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: Screen Shot 2019-10-28 at 8.54.53 PM.png, 
> YARN-9937-001.patch, YARN-9937-002.patch, YARN-9937-003.patch
>
>
> Below are the missing queue configs which are not part of RMWebServices 
> scheduler endpoint. 
> 1. Maximum Allocation
> 2. Queue ACLs
> 3. Queue Priority
> 4. Application Lifetime



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9937) Add missing queue configs in RMWebService#CapacitySchedulerQueueInfo

2019-10-30 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963064#comment-16963064
 ] 

Sunil G commented on YARN-9937:
---

Couple of quick comments.
 # maxAMResource -> maxAMResourceLimit
 # change map or acls to list of ACL. so we can create ACLsInfo and ACLInfo 
class where List will be the element in ACLsInfo

> Add missing queue configs in RMWebService#CapacitySchedulerQueueInfo
> 
>
> Key: YARN-9937
> URL: https://issues.apache.org/jira/browse/YARN-9937
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: Screen Shot 2019-10-28 at 8.54.53 PM.png, 
> YARN-9937-001.patch, YARN-9937-002.patch, YARN-9937-003.patch
>
>
> Below are the missing queue configs which are not part of RMWebServices 
> scheduler endpoint. 
> 1. Maximum Allocation
> 2. Queue ACLs
> 3. Queue Priority
> 4. Application Lifetime



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9937) Add missing queue configs in RMWebService#CapacitySchedulerQueueInfo

2019-10-30 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963137#comment-16963137
 ] 

Sunil G commented on YARN-9937:
---

Thanks 

+1 for latest patch. Pending jenkins

> Add missing queue configs in RMWebService#CapacitySchedulerQueueInfo
> 
>
> Key: YARN-9937
> URL: https://issues.apache.org/jira/browse/YARN-9937
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: Screen Shot 2019-10-28 at 8.54.53 PM.png, 
> YARN-9937-001.patch, YARN-9937-002.patch, YARN-9937-003.patch, 
> YARN-9937-004.patch
>
>
> Below are the missing queue configs which are not part of RMWebServices 
> scheduler endpoint. 
> 1. Maximum Allocation
> 2. Queue ACLs
> 3. Queue Priority
> 4. Application Lifetime



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9937) Add missing queue configs in RMWebService#CapacitySchedulerQueueInfo

2019-10-30 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963330#comment-16963330
 ] 

Sunil G commented on YARN-9937:
---

Committing shortly. Thanks [~prabhujoseph]

> Add missing queue configs in RMWebService#CapacitySchedulerQueueInfo
> 
>
> Key: YARN-9937
> URL: https://issues.apache.org/jira/browse/YARN-9937
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: Screen Shot 2019-10-28 at 8.54.53 PM.png, 
> YARN-9937-001.patch, YARN-9937-002.patch, YARN-9937-003.patch, 
> YARN-9937-004.patch
>
>
> Below are the missing queue configs which are not part of RMWebServices 
> scheduler endpoint. 
> 1. Maximum Allocation
> 2. Queue ACLs
> 3. Queue Priority
> 4. Application Lifetime



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9949) Add missing queue configs for root queue in RMWebService#CapacitySchedulerInfo

2019-11-02 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965305#comment-16965305
 ] 

Sunil G commented on YARN-9949:
---

Could you please add test case to this?

> Add missing queue configs for root queue in 
> RMWebService#CapacitySchedulerInfo 
> ---
>
> Key: YARN-9949
> URL: https://issues.apache.org/jira/browse/YARN-9949
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: YARN-9949-001.patch
>
>
> YARN-9937 has added below missing queue configs but missed to add for root 
> queue.
> 1. Maximum Allocation
> 2. Queue ACLs
> 3. Queue Priority
> 4. Application Lifetime
> 5. Ordering Policy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9949) Add missing queue configs for root queue in RMWebService#CapacitySchedulerInfo

2019-11-02 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965545#comment-16965545
 ] 

Sunil G commented on YARN-9949:
---

+1 for this

Committing shortly

> Add missing queue configs for root queue in 
> RMWebService#CapacitySchedulerInfo 
> ---
>
> Key: YARN-9949
> URL: https://issues.apache.org/jira/browse/YARN-9949
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: YARN-9949-001.patch, YARN-9949-002.patch
>
>
> YARN-9937 has added below missing queue configs but missed to add for root 
> queue.
> 1. Maximum Allocation
> 2. Queue ACLs
> 3. Queue Priority
> 4. Application Lifetime
> 5. Ordering Policy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9950) Unset Ordering Policy of Leaf/Parent queue converted from Parent/Leaf queue respectively

2019-11-03 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966397#comment-16966397
 ] 

Sunil G commented on YARN-9950:
---

Hi [~prabhujoseph]

1. I think following fix may be incomplete.
{code:java}
// Unset Ordering Policy of Parent Queue converted from   
// Leaf Queue after addQueue 
String parentQueueOrderingPolicy = CapacitySchedulerConfiguration.PREFIX
  + parentQueue + CapacitySchedulerConfiguration.DOT + ORDERING_POLICY;
if (siblingQueues.size() == 1) {
   proposedConf.unset(parentQueueOrderingPolicy);
   confUpdate.put(parentQueueOrderingPolicy, null);
}{code}
When an existing parent queue has set ordering policy as PRIORITY based, and 
then a new child queue is added to that parent queue, the above mentioned code 
can unset the ordering policy to Resource based. This is incorrect.

2. For parent queue,
{code:java}
// Unset Ordering Policy of Leaf Queue converted from
// Parent Queue after removeQueue
String leafQueueOrderingPolicy = CapacitySchedulerConfiguration.PREFIX
  + parentQueuePath + CapacitySchedulerConfiguration.DOT
  + ORDERING_POLICY;
proposedConf.unset(leafQueueOrderingPolicy);
confUpdate.put(leafQueueOrderingPolicy, null); {code}
This doesnt seem accurate enough (naming). We are now removing a queue from a 
parent, hence we should call leafQueueOrderingPolicy =>  
parentQueueOrderingPolicy

> Unset Ordering Policy of Leaf/Parent queue converted from Parent/Leaf queue 
> respectively
> 
>
> Key: YARN-9950
> URL: https://issues.apache.org/jira/browse/YARN-9950
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9950-001.patch
>
>
> RM fails to start when adding a queue (say root.A.A1) under a leaf queue (say 
> root.A) with ordering policy fifo.
> YARN supports fifo or fair for leaf queue and utilization or 
> priority-utilization for parent queue. When the existing leaf queue (root.A) 
> becomes parent queue - the ordering policy (fifo or fair) has to be unset. 
> Else YARN RM will fail as fifo or fair is not a valid queue ordering policy 
> for parent queue.
> Similarly while removing a queue, unset ordering policy of leaf queue which 
> converted from parent queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9950) Unset Ordering Policy of Leaf/Parent queue converted from Parent/Leaf queue respectively

2019-11-03 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966418#comment-16966418
 ] 

Sunil G commented on YARN-9950:
---

Thanks [~prabhujoseph] 

This makes sense. +1 from me, pending jenkins.

> Unset Ordering Policy of Leaf/Parent queue converted from Parent/Leaf queue 
> respectively
> 
>
> Key: YARN-9950
> URL: https://issues.apache.org/jira/browse/YARN-9950
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9950-001.patch, YARN-9950-002.patch
>
>
> RM fails to start when adding a queue (say root.A.A1) under a leaf queue (say 
> root.A) with ordering policy fifo.
> YARN supports fifo or fair for leaf queue and utilization or 
> priority-utilization for parent queue. When the existing leaf queue (root.A) 
> becomes parent queue - the ordering policy (fifo or fair) has to be unset. 
> Else YARN RM will fail as fifo or fair is not a valid queue ordering policy 
> for parent queue.
> Similarly while removing a queue, unset ordering policy of leaf queue which 
> converted from parent queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9950) Unset Ordering Policy of Leaf/Parent queue converted from Parent/Leaf queue respectively

2019-11-04 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966809#comment-16966809
 ] 

Sunil G commented on YARN-9950:
---

Committing this now. Thanks [~prabhujoseph]

> Unset Ordering Policy of Leaf/Parent queue converted from Parent/Leaf queue 
> respectively
> 
>
> Key: YARN-9950
> URL: https://issues.apache.org/jira/browse/YARN-9950
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9950-001.patch, YARN-9950-002.patch
>
>
> RM fails to start when adding a queue (say root.A.A1) under a leaf queue (say 
> root.A) with ordering policy fifo.
> YARN supports fifo or fair for leaf queue and utilization or 
> priority-utilization for parent queue. When the existing leaf queue (root.A) 
> becomes parent queue - the ordering policy (fifo or fair) has to be unset. 
> Else YARN RM will fail as fifo or fair is not a valid queue ordering policy 
> for parent queue.
> Similarly while removing a queue, unset ordering policy of leaf queue which 
> converted from parent queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9920) YarnAuthorizationProvider AccessRequest gets Null RemoteAddress from FairScheduler

2019-11-06 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968219#comment-16968219
 ] 

Sunil G commented on YARN-9920:
---

cc [~wilfreds], cud u also pls take a look

> YarnAuthorizationProvider AccessRequest gets Null RemoteAddress from 
> FairScheduler
> --
>
> Key: YARN-9920
> URL: https://issues.apache.org/jira/browse/YARN-9920
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, security
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9920-001.patch, YARN-9920-002.patch, 
> YARN-9920-003.patch
>
>
> YarnAuthorizationProvider AccessRequest has null RemoteAddress in case of 
> FairScheduler. FSQueue#hasAccess uses Server.getRemoteAddress() which will be 
> null when the call is from RMWebServices and EventDispatcher. It works fine 
> when called by IPC Server Handler.
> FSQueue#hasAccess is called at three places where (2) and (3) returns null.
> *1. IPC Server -> RMAppManager#createAndPopulateNewRMApp -> FSQueue#hasAccess 
> -> Server.getRemoteAddress returns correct Remote IP.*
>  
> *2. IPC Server -> RMAppManager#createAndPopulateNewRMApp -> 
> AppAddedSchedulerEvent*
>     *EventDispatcher -> FairScheduler#addApplication -> FSQueue.hasAccess -> 
> Server.getRemoteAddress returns null*
>   
> {code:java}
> org.apache.hadoop.yarn.security.ConfiguredYarnAuthorizer.checkPermission(ConfiguredYarnAuthorizer.java:101)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.hasAccess(FSQueue.java:316)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:509)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1268)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:133)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> {code}
>  
> *3. RMWebServices -> QueueACLsManager#checkAccess -> FSQueue.hasAccess -> 
> Server.getRemoteAddress returns null.*
> {code:java}
> org.apache.hadoop.yarn.security.ConfiguredYarnAuthorizer.checkPermission(ConfiguredYarnAuthorizer.java:101)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.hasAccess(FSQueue.java:316)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.checkAccess(FairScheduler.java:1610)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:84)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:270)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:553)
> {code}
>  
> Have verified with CapacityScheduler and it works fine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9052) Replace all MockRM submit method definitions with a builder

2019-11-12 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972552#comment-16972552
 ] 

Sunil G commented on YARN-9052:
---

Really appreciate [~snemeth]'s efforts here in cleaning up MockRM with a 
builder. Kudos!

I will try and help in this one for reviews etc.

cc [~leftnoteasy] [~cheersyang] [~rohithsharmaks]

> Replace all MockRM submit method definitions with a builder
> ---
>
> Key: YARN-9052
> URL: https://issues.apache.org/jira/browse/YARN-9052
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-9052.001.patch, YARN-9052.002.patch, 
> YARN-9052.003.patch, YARN-9052.testlogs.patch
>
>
> MockRM has 31 definitions of submitApp, most of them having more than 
> acceptable number of parameters, ranging from 2 to even 22 parameters, which 
> makes the code completely unreadable.
> On top of unreadability, it's very hard to follow what RmApp will be produced 
> for tests as they often pass a lot of empty / null values as parameters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9984) FSPreemptionThread crash with NullPointerException

2019-11-18 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976438#comment-16976438
 ] 

Sunil G commented on YARN-9984:
---

This is straightforward one. I think UT may be tougher on this.

Since its a null check, lets get this in. +1

I will commit this later today if there are no objection.

> FSPreemptionThread crash with NullPointerException
> --
>
> Key: YARN-9984
> URL: https://issues.apache.org/jira/browse/YARN-9984
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.0.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9984.001.patch
>
>
> When an application is unregistered there is a chance that there are still 
> containers running on a node for that application. In all cases we handle the 
> application missing from the RM gracefully (log a message and continue) 
> except for the FS pre-emption thread.
> In case the application is removed but some containers are still linked to a 
> node the FSPreemptionThread will crash with a NPE when it tries to retrieve 
> the application id for the attempt:
> {code:java}
> FSAppAttempt app =
> scheduler.getSchedulerApp(container.getApplicationAttemptId());
> ApplicationId appId = app.getApplicationId();{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8373) RM Received RMFatalEvent of type CRITICAL_THREAD_CRASH

2019-11-18 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976615#comment-16976615
 ] 

Sunil G commented on YARN-8373:
---

[~wilfreds] cud we add a test to check whether reading from nodes doesnt cause 
any pblms OR any existing tests are already covering the same.

In an another note, does continuousSchedulingAttempt need writeLock by any 
chance ?

> RM  Received RMFatalEvent of type CRITICAL_THREAD_CRASH
> ---
>
> Key: YARN-8373
> URL: https://issues.apache.org/jira/browse/YARN-8373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 2.9.0
>Reporter: Girish Bhat
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>  Labels: newbie
> Attachments: YARN-8373.001.patch, YARN-8373.002.patch, 
> YARN-8373.003.patch, YARN-8373.004.patch, YARN-8373.005.patch
>
>
>  
>  
> {noformat}
> sudo -u yarn /usr/local/hadoop/latest/bin/yarn version Hadoop 2.9.0 
> Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 
> 756ebc8394e473ac25feac05fa493f6d612e6c50 Compiled by arsuresh on 
> 2017-11-13T23:15Z Compiled with protoc 2.5.0 From source with checksum 
> 0a76a9a32a5257331741f8d5932f183 This command was run using 
> /usr/local/hadoop/hadoop-2.9.0/share/hadoop/common/hadoop-common-2.9.0.jar{noformat}
> This is for version 2.9.0 
>  
> {noformat}
> 2018-05-25 05:53:12,742 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received 
> RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, Fai
> rSchedulerContinuousScheduling, that exited unexpectedly: 
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
> at java.util.TimSort.mergeHi(TimSort.java:899)
> at java.util.TimSort.mergeAt(TimSort.java:516)
> at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
> at java.util.TimSort.sort(TimSort.java:254)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1454)
> at java.util.Collections.sort(Collections.java:175)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:907)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)
> 2018-05-25 05:53:12,743 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Shutting down 
> the resource manager.
> 2018-05-25 05:53:12,749 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1: a critical thread, FairSchedulerContinuousScheduling, that exited 
> unexpectedly: java.lang.IllegalArgumentException: Comparison method violates 
> its general contract!
> at java.util.TimSort.mergeHi(TimSort.java:899)
> at java.util.TimSort.mergeAt(TimSort.java:516)
> at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
> at java.util.TimSort.sort(TimSort.java:254)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1454)
> at java.util.Collections.sort(Collections.java:175)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:907)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)
> 2018-05-25 05:53:12,772 ERROR 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
>  ExpiredTokenRemover received java.lang.InterruptedException: sleep 
> interrupted{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8373) RM Received RMFatalEvent of type CRITICAL_THREAD_CRASH

2019-11-19 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16977241#comment-16977241
 ] 

Sunil G commented on YARN-8373:
---

+1, thanks [~wilfreds]

> RM  Received RMFatalEvent of type CRITICAL_THREAD_CRASH
> ---
>
> Key: YARN-8373
> URL: https://issues.apache.org/jira/browse/YARN-8373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 2.9.0
>Reporter: Girish Bhat
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>  Labels: newbie
> Attachments: YARN-8373.001.patch, YARN-8373.002.patch, 
> YARN-8373.003.patch, YARN-8373.004.patch, YARN-8373.005.patch
>
>
>  
>  
> {noformat}
> sudo -u yarn /usr/local/hadoop/latest/bin/yarn version Hadoop 2.9.0 
> Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 
> 756ebc8394e473ac25feac05fa493f6d612e6c50 Compiled by arsuresh on 
> 2017-11-13T23:15Z Compiled with protoc 2.5.0 From source with checksum 
> 0a76a9a32a5257331741f8d5932f183 This command was run using 
> /usr/local/hadoop/hadoop-2.9.0/share/hadoop/common/hadoop-common-2.9.0.jar{noformat}
> This is for version 2.9.0 
>  
> {noformat}
> 2018-05-25 05:53:12,742 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received 
> RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, Fai
> rSchedulerContinuousScheduling, that exited unexpectedly: 
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
> at java.util.TimSort.mergeHi(TimSort.java:899)
> at java.util.TimSort.mergeAt(TimSort.java:516)
> at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
> at java.util.TimSort.sort(TimSort.java:254)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1454)
> at java.util.Collections.sort(Collections.java:175)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:907)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)
> 2018-05-25 05:53:12,743 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Shutting down 
> the resource manager.
> 2018-05-25 05:53:12,749 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1: a critical thread, FairSchedulerContinuousScheduling, that exited 
> unexpectedly: java.lang.IllegalArgumentException: Comparison method violates 
> its general contract!
> at java.util.TimSort.mergeHi(TimSort.java:899)
> at java.util.TimSort.mergeAt(TimSort.java:516)
> at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
> at java.util.TimSort.sort(TimSort.java:254)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1454)
> at java.util.Collections.sort(Collections.java:175)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:907)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)
> 2018-05-25 05:53:12,772 ERROR 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
>  ExpiredTokenRemover received java.lang.InterruptedException: sleep 
> interrupted{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9984) FSPreemptionThread can cause NullPointerException while app is unregistered with containers running on a node

2019-11-19 Thread Sunil G (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-9984:
--
Summary: FSPreemptionThread can cause NullPointerException while app is 
unregistered with containers running on a node  (was: FSPreemptionThread crash 
with NullPointerException)

> FSPreemptionThread can cause NullPointerException while app is unregistered 
> with containers running on a node
> -
>
> Key: YARN-9984
> URL: https://issues.apache.org/jira/browse/YARN-9984
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.0.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9984.001.patch
>
>
> When an application is unregistered there is a chance that there are still 
> containers running on a node for that application. In all cases we handle the 
> application missing from the RM gracefully (log a message and continue) 
> except for the FS pre-emption thread.
> In case the application is removed but some containers are still linked to a 
> node the FSPreemptionThread will crash with a NPE when it tries to retrieve 
> the application id for the attempt:
> {code:java}
> FSAppAttempt app =
> scheduler.getSchedulerApp(container.getApplicationAttemptId());
> ApplicationId appId = app.getApplicationId();{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8373) RM Received RMFatalEvent of type CRITICAL_THREAD_CRASH

2019-11-19 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16977493#comment-16977493
 ] 

Sunil G commented on YARN-8373:
---

I think its the name. it should have been branch-3.1. Let me reupload.

> RM  Received RMFatalEvent of type CRITICAL_THREAD_CRASH
> ---
>
> Key: YARN-8373
> URL: https://issues.apache.org/jira/browse/YARN-8373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 2.9.0
>Reporter: Girish Bhat
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.0, 3.2.2
>
> Attachments: YARN-8373-branch.3.1.001.patch, YARN-8373.001.patch, 
> YARN-8373.002.patch, YARN-8373.003.patch, YARN-8373.004.patch, 
> YARN-8373.005.patch
>
>
>  
>  
> {noformat}
> sudo -u yarn /usr/local/hadoop/latest/bin/yarn version Hadoop 2.9.0 
> Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 
> 756ebc8394e473ac25feac05fa493f6d612e6c50 Compiled by arsuresh on 
> 2017-11-13T23:15Z Compiled with protoc 2.5.0 From source with checksum 
> 0a76a9a32a5257331741f8d5932f183 This command was run using 
> /usr/local/hadoop/hadoop-2.9.0/share/hadoop/common/hadoop-common-2.9.0.jar{noformat}
> This is for version 2.9.0 
>  
> {noformat}
> 2018-05-25 05:53:12,742 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received 
> RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, Fai
> rSchedulerContinuousScheduling, that exited unexpectedly: 
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
> at java.util.TimSort.mergeHi(TimSort.java:899)
> at java.util.TimSort.mergeAt(TimSort.java:516)
> at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
> at java.util.TimSort.sort(TimSort.java:254)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1454)
> at java.util.Collections.sort(Collections.java:175)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:907)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)
> 2018-05-25 05:53:12,743 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Shutting down 
> the resource manager.
> 2018-05-25 05:53:12,749 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1: a critical thread, FairSchedulerContinuousScheduling, that exited 
> unexpectedly: java.lang.IllegalArgumentException: Comparison method violates 
> its general contract!
> at java.util.TimSort.mergeHi(TimSort.java:899)
> at java.util.TimSort.mergeAt(TimSort.java:516)
> at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
> at java.util.TimSort.sort(TimSort.java:254)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1454)
> at java.util.Collections.sort(Collections.java:175)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:907)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)
> 2018-05-25 05:53:12,772 ERROR 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
>  ExpiredTokenRemover received java.lang.InterruptedException: sleep 
> interrupted{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8373) RM Received RMFatalEvent of type CRITICAL_THREAD_CRASH

2019-11-19 Thread Sunil G (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-8373:
--
Attachment: YARN-8373-branch-3.1.001.patch

> RM  Received RMFatalEvent of type CRITICAL_THREAD_CRASH
> ---
>
> Key: YARN-8373
> URL: https://issues.apache.org/jira/browse/YARN-8373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 2.9.0
>Reporter: Girish Bhat
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.0, 3.2.2
>
> Attachments: YARN-8373-branch-3.1.001.patch, 
> YARN-8373-branch.3.1.001.patch, YARN-8373.001.patch, YARN-8373.002.patch, 
> YARN-8373.003.patch, YARN-8373.004.patch, YARN-8373.005.patch
>
>
>  
>  
> {noformat}
> sudo -u yarn /usr/local/hadoop/latest/bin/yarn version Hadoop 2.9.0 
> Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 
> 756ebc8394e473ac25feac05fa493f6d612e6c50 Compiled by arsuresh on 
> 2017-11-13T23:15Z Compiled with protoc 2.5.0 From source with checksum 
> 0a76a9a32a5257331741f8d5932f183 This command was run using 
> /usr/local/hadoop/hadoop-2.9.0/share/hadoop/common/hadoop-common-2.9.0.jar{noformat}
> This is for version 2.9.0 
>  
> {noformat}
> 2018-05-25 05:53:12,742 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received 
> RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, Fai
> rSchedulerContinuousScheduling, that exited unexpectedly: 
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
> at java.util.TimSort.mergeHi(TimSort.java:899)
> at java.util.TimSort.mergeAt(TimSort.java:516)
> at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
> at java.util.TimSort.sort(TimSort.java:254)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1454)
> at java.util.Collections.sort(Collections.java:175)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:907)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)
> 2018-05-25 05:53:12,743 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Shutting down 
> the resource manager.
> 2018-05-25 05:53:12,749 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1: a critical thread, FairSchedulerContinuousScheduling, that exited 
> unexpectedly: java.lang.IllegalArgumentException: Comparison method violates 
> its general contract!
> at java.util.TimSort.mergeHi(TimSort.java:899)
> at java.util.TimSort.mergeAt(TimSort.java:516)
> at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
> at java.util.TimSort.sort(TimSort.java:254)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1454)
> at java.util.Collections.sort(Collections.java:175)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:907)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)
> 2018-05-25 05:53:12,772 ERROR 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
>  ExpiredTokenRemover received java.lang.InterruptedException: sleep 
> interrupted{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9052) Replace all MockRM submit method definitions with a builder

2019-11-25 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981391#comment-16981391
 ] 

Sunil G commented on YARN-9052:
---

Thanks [~snemeth]

Thanks [~shuzirra] for spending good time in going through this patch.

I will spend some more time today on this. Thanks

> Replace all MockRM submit method definitions with a builder
> ---
>
> Key: YARN-9052
> URL: https://issues.apache.org/jira/browse/YARN-9052
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: 
> YARN-9052-004withlogs-patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt,
>  YARN-9052-testlogs003-justfailed.txt, 
> YARN-9052-testlogs003-patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt,
>  YARN-9052-testlogs004-justfailed.txt, YARN-9052.001.patch, 
> YARN-9052.002.patch, YARN-9052.003.patch, YARN-9052.004.patch, 
> YARN-9052.004.withlogs.patch, YARN-9052.005.patch, YARN-9052.006.patch, 
> YARN-9052.007.patch, YARN-9052.testlogs.002.patch, 
> YARN-9052.testlogs.002.patch, YARN-9052.testlogs.003.patch, 
> YARN-9052.testlogs.patch
>
>
> MockRM has 31 definitions of submitApp, most of them having more than 
> acceptable number of parameters, ranging from 2 to even 22 parameters, which 
> makes the code completely unreadable.
> On top of unreadability, it's very hard to follow what RmApp will be produced 
> for tests as they often pass a lot of empty / null values as parameters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 4469 matches

Mail list logo