[jira] [Updated] (YARN-9664) Improve response of scheduler/app activities for better understanding

2019-07-02 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9664:
---
Attachment: YARN-9664.001.patch

> Improve response of scheduler/app activities for better understanding
> -
>
> Key: YARN-9664
> URL: https://issues.apache.org/jira/browse/YARN-9664
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9664.001.patch, YARN-9664.001.patch
>
>
> Currently some diagnostics are not easy enough to understand for common 
> users, and I found some places still need to be improved such as no partition 
> information and lacking of necessary activities. This issue is to improve 
> these shortcomings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9644) First RMContext object is always leaked during switch over

2019-07-02 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877536#comment-16877536
 ] 

Sunil Govindan commented on YARN-9644:
--

[~bibinchundatt] cud u pls share 3.2 patch

> First RMContext object is always leaked during switch over
> --
>
> Key: YARN-9644
> URL: https://issues.apache.org/jira/browse/YARN-9644
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-9644.001.patch, YARN-9644.002.patch, 
> YARN-9644.003.patch
>
>
> As per my understanding following 2 issues causes the issue.
> * WebApp holds the reference to First applicationMasterServer instance, which 
> has rmcontext with ActiveServiceContext (holds RMApps + nodes map). WebApp 
> remains to life time of RM process.
> * On transistion to active RMNMInfo object is registered in  MBean and never 
> unregistered on transitionToStandBy
> On transistion to Standby and again based to active new RMContext gets 
> created, but above 2 issues causes first RMcontext persist still RMShutdown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7621) Support submitting apps with queue path for CapacityScheduler

2019-07-02 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877534#comment-16877534
 ] 

Sunil Govindan commented on YARN-7621:
--

I think this makes sense to me. 

We are trimming the last section of the queue path and querying based on same. 
And for user, they will get a seamless shift. +1 for this,. Thanks [~Tao Yang] 
and [~cheersyang]

> Support submitting apps with queue path for CapacityScheduler
> -
>
> Key: YARN-7621
> URL: https://issues.apache.org/jira/browse/YARN-7621
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-7621.001.patch, YARN-7621.002.patch
>
>
> Currently there is a difference of queue definition in 
> ApplicationSubmissionContext between CapacityScheduler and FairScheduler. 
> FairScheduler needs queue path but CapacityScheduler needs queue name. There 
> is no doubt of the correction of queue definition for CapacityScheduler 
> because it does not allow duplicate leaf queue names, but it's hard to switch 
> between FairScheduler and CapacityScheduler. I propose to support submitting 
> apps with queue path for CapacityScheduler to make the interface clearer and 
> scheduler switch smoothly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9664) Improve response of scheduler/app activities for better understanding

2019-07-02 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9664:
---
Attachment: YARN-9664.001.patch

> Improve response of scheduler/app activities for better understanding
> -
>
> Key: YARN-9664
> URL: https://issues.apache.org/jira/browse/YARN-9664
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9664.001.patch
>
>
> Currently some diagnostics are not easy enough to understand for common 
> users, and I found some places still need to be improved such as no partition 
> information and lacking of necessary activities. This issue is to improve 
> these shortcomings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9664) Improve response of scheduler/app activities for better understanding

2019-07-02 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9664:
---
Attachment: (was: YARN-9664.001.patch)

> Improve response of scheduler/app activities for better understanding
> -
>
> Key: YARN-9664
> URL: https://issues.apache.org/jira/browse/YARN-9664
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>
> Currently some diagnostics are not easy enough to understand for common 
> users, and I found some places still need to be improved such as no partition 
> information and lacking of necessary activities. This issue is to improve 
> these shortcomings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9658) Fix UT failures in TestLeafQueue

2019-07-02 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877495#comment-16877495
 ] 

Tao Yang commented on YARN-9658:


Thanks [~cheersyang] for the review and commit.

> Fix UT failures in TestLeafQueue
> 
>
> Key: YARN-9658
> URL: https://issues.apache.org/jira/browse/YARN-9658
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-9658.001.patch
>
>
> In ActivitiesManager, if there's no yarn configuration in mock RMContext, 
> cleanup interval can't be initialized to 5 seconds by default, causing the 
> cleanup thread keeps running repeatedly without interval which may bring 
> problems to mockito framework, it caused OOM in this case, internally many 
> throwable objects were generated by incomplete mock.
> Add configuration for mock RMContext to fix failures in TestLeafQueue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9658) Fix UT failures in TestLeafQueue

2019-07-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877473#comment-16877473
 ] 

Hudson commented on YARN-9658:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16856 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16856/])
YARN-9658. Fix UT failures in TestLeafQueue. Contributed by Tao Yang. (wwei: 
rev 15d82fcb750450777634e3b7599e527bd8239221)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java


> Fix UT failures in TestLeafQueue
> 
>
> Key: YARN-9658
> URL: https://issues.apache.org/jira/browse/YARN-9658
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-9658.001.patch
>
>
> In ActivitiesManager, if there's no yarn configuration in mock RMContext, 
> cleanup interval can't be initialized to 5 seconds by default, causing the 
> cleanup thread keeps running repeatedly without interval which may bring 
> problems to mockito framework, it caused OOM in this case, internally many 
> throwable objects were generated by incomplete mock.
> Add configuration for mock RMContext to fix failures in TestLeafQueue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9664) Improve response of scheduler/app activities for better understanding

2019-07-02 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877466#comment-16877466
 ] 

Tao Yang commented on YARN-9664:


Attached v1 patch for review.
Updates:
* Adjust structure and diagnostics for the response of activities
** Refactor activity diagnostics to make them more sensible: begin with 
activity level and adjust some contents
** Refactor field names: allocationState -> activityState, requestAllocation -> 
requestAllocations, allocationAttempt -> allocationAttempts
** Adjust the sequence of some fields
** Correct unreasonable activity states
** Add activity diagnostics for initial check at the beginning of scheduling 
process
* Support recording and showing partition name in scheduler activities
* Add activity level including QUEUE/APP/REQUEST/NODE and improve the recording 
process to get better classifications
* UT
** Add new test cases (testQueueSkippedBecauseOfHeadroom and 
testNodeSkippedBecauseOfRelaxLocality) in TestRMWebServicesSchedulerActivities 
to test diagnostics at request/node level
** Add testPartitionInSchedulerActivities in 
TestRMWebServicesForCSWithPartitions to test partition information
** Update frequently-used strings to be constants

> Improve response of scheduler/app activities for better understanding
> -
>
> Key: YARN-9664
> URL: https://issues.apache.org/jira/browse/YARN-9664
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9664.001.patch
>
>
> Currently some diagnostics are not easy enough to understand for common 
> users, and I found some places still need to be improved such as no partition 
> information and lacking of necessary activities. This issue is to improve 
> these shortcomings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7621) Support submitting apps with queue path for CapacityScheduler

2019-07-02 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877465#comment-16877465
 ] 

Weiwei Yang commented on YARN-7621:
---

cc [~leftnoteasy], [~sunilg], [~wilfreds].

This issue is important for users who want to migrate FS to CS.

Adding a label to tag it.

> Support submitting apps with queue path for CapacityScheduler
> -
>
> Key: YARN-7621
> URL: https://issues.apache.org/jira/browse/YARN-7621
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Minor
> Attachments: YARN-7621.001.patch, YARN-7621.002.patch
>
>
> Currently there is a difference of queue definition in 
> ApplicationSubmissionContext between CapacityScheduler and FairScheduler. 
> FairScheduler needs queue path but CapacityScheduler needs queue name. There 
> is no doubt of the correction of queue definition for CapacityScheduler 
> because it does not allow duplicate leaf queue names, but it's hard to switch 
> between FairScheduler and CapacityScheduler. I propose to support submitting 
> apps with queue path for CapacityScheduler to make the interface clearer and 
> scheduler switch smoothly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7621) Support submitting apps with queue path for CapacityScheduler

2019-07-02 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-7621:
--
Priority: Major  (was: Minor)

> Support submitting apps with queue path for CapacityScheduler
> -
>
> Key: YARN-7621
> URL: https://issues.apache.org/jira/browse/YARN-7621
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-7621.001.patch, YARN-7621.002.patch
>
>
> Currently there is a difference of queue definition in 
> ApplicationSubmissionContext between CapacityScheduler and FairScheduler. 
> FairScheduler needs queue path but CapacityScheduler needs queue name. There 
> is no doubt of the correction of queue definition for CapacityScheduler 
> because it does not allow duplicate leaf queue names, but it's hard to switch 
> between FairScheduler and CapacityScheduler. I propose to support submitting 
> apps with queue path for CapacityScheduler to make the interface clearer and 
> scheduler switch smoothly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7621) Support submitting apps with queue path for CapacityScheduler

2019-07-02 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-7621:
--
Labels: fs2cs  (was: )

> Support submitting apps with queue path for CapacityScheduler
> -
>
> Key: YARN-7621
> URL: https://issues.apache.org/jira/browse/YARN-7621
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-7621.001.patch, YARN-7621.002.patch
>
>
> Currently there is a difference of queue definition in 
> ApplicationSubmissionContext between CapacityScheduler and FairScheduler. 
> FairScheduler needs queue path but CapacityScheduler needs queue name. There 
> is no doubt of the correction of queue definition for CapacityScheduler 
> because it does not allow duplicate leaf queue names, but it's hard to switch 
> between FairScheduler and CapacityScheduler. I propose to support submitting 
> apps with queue path for CapacityScheduler to make the interface clearer and 
> scheduler switch smoothly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9658) Fix UT failures in TestLeafQueue

2019-07-02 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-9658:
--
Summary: Fix UT failures in TestLeafQueue  (was: UT failures in 
TestLeafQueue)

> Fix UT failures in TestLeafQueue
> 
>
> Key: YARN-9658
> URL: https://issues.apache.org/jira/browse/YARN-9658
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Minor
> Attachments: YARN-9658.001.patch
>
>
> In ActivitiesManager, if there's no yarn configuration in mock RMContext, 
> cleanup interval can't be initialized to 5 seconds by default, causing the 
> cleanup thread keeps running repeatedly without interval which may bring 
> problems to mockito framework, it caused OOM in this case, internally many 
> throwable objects were generated by incomplete mock.
> Add configuration for mock RMContext to fix failures in TestLeafQueue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9658) UT failures in TestLeafQueue

2019-07-02 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877464#comment-16877464
 ] 

Weiwei Yang commented on YARN-9658:
---

+1

> UT failures in TestLeafQueue
> 
>
> Key: YARN-9658
> URL: https://issues.apache.org/jira/browse/YARN-9658
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Minor
> Attachments: YARN-9658.001.patch
>
>
> In ActivitiesManager, if there's no yarn configuration in mock RMContext, 
> cleanup interval can't be initialized to 5 seconds by default, causing the 
> cleanup thread keeps running repeatedly without interval which may bring 
> problems to mockito framework, it caused OOM in this case, internally many 
> throwable objects were generated by incomplete mock.
> Add configuration for mock RMContext to fix failures in TestLeafQueue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9664) Improve response of scheduler/app activities for better understanding

2019-07-02 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9664:
---
Description: Currently some diagnostics are not easy enough to understand 
for common users, and I found some places still need to be improved such as no 
partition information and lacking of necessary activities. This issue is to 
improve these shortcomings.  (was: Currently some diagnostics are not easy 
enough to understand for common users, and I found some places still need to be 
improved such as no partition information and lacking of necessary activities 
in some places. This issue is to improve these.)

> Improve response of scheduler/app activities for better understanding
> -
>
> Key: YARN-9664
> URL: https://issues.apache.org/jira/browse/YARN-9664
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9664.001.patch
>
>
> Currently some diagnostics are not easy enough to understand for common 
> users, and I found some places still need to be improved such as no partition 
> information and lacking of necessary activities. This issue is to improve 
> these shortcomings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9664) Improve response of scheduler/app activities for better understanding

2019-07-02 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9664:
---
Attachment: YARN-9664.001.patch

> Improve response of scheduler/app activities for better understanding
> -
>
> Key: YARN-9664
> URL: https://issues.apache.org/jira/browse/YARN-9664
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9664.001.patch
>
>
> Currently some diagnostics are not easy enough to understand for common 
> users, and I found some places still need to be improved such as no partition 
> information and lacking of necessary activities. This issue is to improve 
> these shortcomings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9664) Improve response of scheduler/app activities for better understanding

2019-07-02 Thread Tao Yang (JIRA)
Tao Yang created YARN-9664:
--

 Summary: Improve response of scheduler/app activities for better 
understanding
 Key: YARN-9664
 URL: https://issues.apache.org/jira/browse/YARN-9664
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Tao Yang
Assignee: Tao Yang


Currently some diagnostics are not easy enough to understand for common users, 
and I found some places still need to be improved such as no partition 
information and lacking of necessary activities in some places. This issue is 
to improve these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9468) Fix inaccurate documentations in Placement Constraints

2019-07-02 Thread hunshenshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hunshenshi reassigned YARN-9468:


Assignee: hunshenshi  (was: Charan Hebri)

> Fix inaccurate documentations in Placement Constraints
> --
>
> Key: YARN-9468
> URL: https://issues.apache.org/jira/browse/YARN-9468
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
>
> Document Placement Constraints
> *First* 
> {code:java}
> zk=3,NOTIN,NODE,zk:hbase=5,IN,RACK,zk:spark=7,CARDINALITY,NODE,hbase,1,3{code}
>  * place 5 containers with tag “hbase” with affinity to a rack on which 
> containers with tag “zk” are running (i.e., an “hbase” container 
> should{color:#ff} not{color} be placed at a rack where an “zk” container 
> is running, given that “zk” is the TargetTag of the second constraint);
> The _*not*_ word in brackets should be delete.
>  
> *Second*
> {code:java}
> PlacementSpec => "" | KeyVal;PlacementSpec
> {code}
> The semicolon should be replaced by colon
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-07-02 Thread hunshenshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877458#comment-16877458
 ] 

hunshenshi commented on YARN-9655:
--

Thanks for review [~cheersyang]

> AllocateResponse in FederationInterceptor lost  applicationPriority
> ---
>
> Key: YARN-9655
> URL: https://issues.apache.org/jira/browse/YARN-9655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
> Fix For: 3.0.4, 3.3.0, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: YARN-9655.branch-2.9.patch, YARN-9655.branch-3.0.patch
>
>
> In YARN Federation mode using FederationInterceptor, when submitting 
> application, am will report an error.
> {code:java}
> 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9455) SchedulerInvalidResoureRequestException has a typo in its class (and file) name

2019-07-02 Thread WEI-HSIAO-LEE (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WEI-HSIAO-LEE reassigned YARN-9455:
---

Assignee: (was: WEI-HSIAO-LEE)

> SchedulerInvalidResoureRequestException has a typo in its class (and file) 
> name
> ---
>
> Key: YARN-9455
> URL: https://issues.apache.org/jira/browse/YARN-9455
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Priority: Major
>  Labels: newbie
>
> The class name should be: SchedulerInvalidResourceRequestException



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-07-02 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-9655:
--
Fix Version/s: 2.9.3

> AllocateResponse in FederationInterceptor lost  applicationPriority
> ---
>
> Key: YARN-9655
> URL: https://issues.apache.org/jira/browse/YARN-9655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
> Fix For: 3.0.4, 3.3.0, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: YARN-9655.branch-2.9.patch, YARN-9655.branch-3.0.patch
>
>
> In YARN Federation mode using FederationInterceptor, when submitting 
> application, am will report an error.
> {code:java}
> 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9660) Enhance documentation of Docker on YARN support

2019-07-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877048#comment-16877048
 ] 

Hadoop QA commented on YARN-9660:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
29m 19s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 19s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 42m 16s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9660 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12973432/YARN-9660-001.patch |
| Optional Tests |  dupname  asflicense  mvnsite  |
| uname | Linux 7569cfa2be73 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e966edd |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 448 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24341/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Enhance documentation of Docker on YARN support
> ---
>
> Key: YARN-9660
> URL: https://issues.apache.org/jira/browse/YARN-9660
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9660-001.patch
>
>
> Right now, using Docker on YARN has some hard requirements. If these 
> requirements are not met, then launching the containers will fail and and 
> error message will be printed. Depending on how familiar the user is with 
> Docker, it might or might not be easy for them to understand what went wrong 
> and how to fix the underlying problem.
> It would be important to explicitly document these requirements along with 
> the error messages.
> *#1: CGroups handler cannot be systemd*
> If docker deamon runs with systemd cgroups handler, we receive the following 
> error upon launching a container:
> {noformat}
> Container id: container_1561638268473_0006_01_02
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: /usr/bin/docker-current: Error response from daemon: 
> cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice".
> See '/usr/bin/docker-current run --help'.
> Shell output: main : command provided 4
> main : run as user is johndoe
> main : requested yarn user is johndoe
> {noformat}
> Solution: switch to cgroupfs. Doing so can be OS-specific, but we can 
> document a {{systemcl}} example.
>  
> *#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container*
> Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. 
> It's because all commands under {{/bin}} are linked to {{/bin/busyb

[jira] [Updated] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-07-02 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-9655:
--
Fix Version/s: 3.0.4

> AllocateResponse in FederationInterceptor lost  applicationPriority
> ---
>
> Key: YARN-9655
> URL: https://issues.apache.org/jira/browse/YARN-9655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
> Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9655.branch-2.9.patch, YARN-9655.branch-3.0.patch
>
>
> In YARN Federation mode using FederationInterceptor, when submitting 
> application, am will report an error.
> {code:java}
> 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9629) Support configurable MIN_LOG_ROLLING_INTERVAL

2019-07-02 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877018#comment-16877018
 ] 

Sunil Govindan commented on YARN-9629:
--

Thanks [~adam.antal]

First point makes sense if the default value is -1 and this code will always 
kick in if this is never configured

For #2, given a configuration is done, its always better to cleanly say like in 
that last else condition, Hence taking it out and show same log in both kind of 
scenario still makes sense to me. in first log, you can minimize content if its 
duplicated, however skipping else conditions looks more cleaner and generic.

> Support configurable MIN_LOG_ROLLING_INTERVAL
> -
>
> Key: YARN-9629
> URL: https://issues.apache.org/jira/browse/YARN-9629
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Attachments: YARN-9629.001.patch, YARN-9629.002.patch, 
> YARN-9629.003.patch, YARN-9629.004.patch, YARN-9629.005.patch
>
>
> One of the log-aggregation parameter, the minimum valid value for 
> {{yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds}} is 
> MIN_LOG_ROLLING_INTERVAL - it has been hardcoded since its addition in 
> YARN-2583. 
> It has been empirically set as 1 hour, as lower values would too frequently 
> put the NodeManagers under pressure. For bigger clusters that is indeed a 
> valid limitation, but for smaller clusters it makes sense and a valid 
> customer usecase to use lower values, even like not so lower 30 mins. At this 
> point this can only be achieved by setting 
> {{yarn.nodemanager.log-aggregation.debug-enabled}}, which I believe should be 
> kept as debug purposes.
> I'm suggesting to make this min configurable, although a warning should be 
> logged in the NodeManager startup when this value is lower than 1 hour.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9629) Support configurable MIN_LOG_ROLLING_INTERVAL

2019-07-02 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877015#comment-16877015
 ] 

Adam Antal edited comment on YARN-9629 at 7/2/19 2:24 PM:
--

Thanks for the review [~sunilg].
1) 
Firstly if the user does not override default configs this message is displayed 
and can be concerning if WARN level logs are displayed during standard startup. 
Secondly I think this does not fall into WARN category. We want to inform the 
users about the rolling feature being disabled and not warning them about that.
Thirdly it was in INFO level log before this patch, so I'd keep it that way if 
there's no big reason to raise level.

Taking into account all of this I suggest to keep this message in INFO level.

About adding extra information: this is a static method and we have only a 
Configuration object during {{serviceInit}} anyways. That part is also not 
bound to a specific application, so we can not really add more information 
beside the NM's id (but it's the log of that NM where this message is 
displayed, so it does not really make sense to do that).

2) 
This information is included in other branches of that if condition. If the 
configuration value does not pass the minimum constraint in case of debug mode 
we just say that this has been set to that value, otherwise we should warn the 
customer that the configured value violates the minimum constraint. In either 
way, we provide them the information, so there's no need to display it again 
(so there's no need to move that last log out of that else cause).

Does these rationale make sense to you [~sunilg]?


was (Author: adam.antal):
Thanks for the review [~sunilg].
1) 
Firstly if the user does not override default configs this message is displayed 
and can be concerning if WARN level logs are displayed during standard startup. 
Secondly I think this does not falls into WARN level category. We want to 
inform the users about that the rolling feature is not enabled and not warning 
them about this.
Thirdly it was in INFO level log before this patch, so I'd keep it that way if 
there's no big reason to raise level.
Taking into account all of this I suggest to keep this in INFO level.

About adding an extra information: this is a static method and also we have 
only a Configuration object during {{serviceInit}}. That part is also not bound 
to a specific application, so we can not really add more information beside the 
NM's id (but it's the log of that NM where this message is displayed, so it 
does not really make sense).

2) 
This information is included in other branches of that if condition. If the 
configuration value does not pass the minimum constraint in case of debug mode 
we just say that this has been set to that value, otherwise we should warn the 
customer that the the configured value violates the minimum constraint. In 
either way, we provide the information, so there's no need to display is again 
(so there's no need to move that last log out of that else cause).

Does these rationale make sense to you [~sunilg]?

> Support configurable MIN_LOG_ROLLING_INTERVAL
> -
>
> Key: YARN-9629
> URL: https://issues.apache.org/jira/browse/YARN-9629
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Attachments: YARN-9629.001.patch, YARN-9629.002.patch, 
> YARN-9629.003.patch, YARN-9629.004.patch, YARN-9629.005.patch
>
>
> One of the log-aggregation parameter, the minimum valid value for 
> {{yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds}} is 
> MIN_LOG_ROLLING_INTERVAL - it has been hardcoded since its addition in 
> YARN-2583. 
> It has been empirically set as 1 hour, as lower values would too frequently 
> put the NodeManagers under pressure. For bigger clusters that is indeed a 
> valid limitation, but for smaller clusters it makes sense and a valid 
> customer usecase to use lower values, even like not so lower 30 mins. At this 
> point this can only be achieved by setting 
> {{yarn.nodemanager.log-aggregation.debug-enabled}}, which I believe should be 
> kept as debug purposes.
> I'm suggesting to make this min configurable, although a warning should be 
> logged in the NodeManager startup when this value is lower than 1 hour.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9629) Support configurable MIN_LOG_ROLLING_INTERVAL

2019-07-02 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877015#comment-16877015
 ] 

Adam Antal commented on YARN-9629:
--

Thanks for the review [~sunilg].
1) 
Firstly if the user does not override default configs this message is displayed 
and can be concerning if WARN level logs are displayed during standard startup. 
Secondly I think this does not falls into WARN level category. We want to 
inform the users about that the rolling feature is not enabled and not warning 
them about this.
Thirdly it was in INFO level log before this patch, so I'd keep it that way if 
there's no big reason to raise level.
Taking into account all of this I suggest to keep this in INFO level.

About adding an extra information: this is a static method and also we have 
only a Configuration object during {{serviceInit}}. That part is also not bound 
to a specific application, so we can not really add more information beside the 
NM's id (but it's the log of that NM where this message is displayed, so it 
does not really make sense).

2) 
This information is included in other branches of that if condition. If the 
configuration value does not pass the minimum constraint in case of debug mode 
we just say that this has been set to that value, otherwise we should warn the 
customer that the the configured value violates the minimum constraint. In 
either way, we provide the information, so there's no need to display is again 
(so there's no need to move that last log out of that else cause).

Does these rationale make sense to you [~sunilg]?

> Support configurable MIN_LOG_ROLLING_INTERVAL
> -
>
> Key: YARN-9629
> URL: https://issues.apache.org/jira/browse/YARN-9629
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Attachments: YARN-9629.001.patch, YARN-9629.002.patch, 
> YARN-9629.003.patch, YARN-9629.004.patch, YARN-9629.005.patch
>
>
> One of the log-aggregation parameter, the minimum valid value for 
> {{yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds}} is 
> MIN_LOG_ROLLING_INTERVAL - it has been hardcoded since its addition in 
> YARN-2583. 
> It has been empirically set as 1 hour, as lower values would too frequently 
> put the NodeManagers under pressure. For bigger clusters that is indeed a 
> valid limitation, but for smaller clusters it makes sense and a valid 
> customer usecase to use lower values, even like not so lower 30 mins. At this 
> point this can only be achieved by setting 
> {{yarn.nodemanager.log-aggregation.debug-enabled}}, which I believe should be 
> kept as debug purposes.
> I'm suggesting to make this min configurable, although a warning should be 
> logged in the NodeManager startup when this value is lower than 1 hour.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9660) Enhance documentation of Docker on YARN support

2019-07-02 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876992#comment-16876992
 ] 

Peter Bacsko commented on YARN-9660:


Uploaded patch v1 to enhance the markdown file.

> Enhance documentation of Docker on YARN support
> ---
>
> Key: YARN-9660
> URL: https://issues.apache.org/jira/browse/YARN-9660
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9660-001.patch
>
>
> Right now, using Docker on YARN has some hard requirements. If these 
> requirements are not met, then launching the containers will fail and and 
> error message will be printed. Depending on how familiar the user is with 
> Docker, it might or might not be easy for them to understand what went wrong 
> and how to fix the underlying problem.
> It would be important to explicitly document these requirements along with 
> the error messages.
> *#1: CGroups handler cannot be systemd*
> If docker deamon runs with systemd cgroups handler, we receive the following 
> error upon launching a container:
> {noformat}
> Container id: container_1561638268473_0006_01_02
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: /usr/bin/docker-current: Error response from daemon: 
> cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice".
> See '/usr/bin/docker-current run --help'.
> Shell output: main : command provided 4
> main : run as user is johndoe
> main : requested yarn user is johndoe
> {noformat}
> Solution: switch to cgroupfs. Doing so can be OS-specific, but we can 
> document a {{systemcl}} example.
>  
> *#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container*
> Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. 
> It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and 
> there's only {{/bin/sh}}.
> If we try to use these kind of images, we'll see the following error message:
> {noformat}
> Container id: container_1561638268473_0015_01_02
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: /usr/bin/docker-current: Error response from daemon: oci 
> runtime error: container_linux.go:235: starting container process caused 
> "exec: \"bash\": executable file not found in $PATH".
> Shell output: main : command provided 4
> main : run as user is johndoe
> main : requested yarn user is johndoe
> {noformat}
>  
> *#3: {{find}} command must be available on the {{$PATH}}*
> It seems obvious that we have the {{find}} command, but even very popular 
> images like {{fedora}} requires that we install it separately.
> If we don't have {{find}} available, then {{launcher_container.sh}} fails 
> with:
> {noformat}
> [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. 
> Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh:
>  line 44: find: command not found
> Last 4096 bytes of stderr.txt :
> [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. 
> Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh:
>  line 44: find: command not found
> Last 4096 bytes of stderr.txt :
> {noformat}
> *#4 Add cmd-line example of how to tag local images*
> This is actually documented under "Privileged Container Security 
> Consideration", but an one-liner would be helpful. I had trouble running a 
> local docker image and tagging it appropriately. Just an example like 
> {{docker tag local_ubuntu local/ubuntu:latest}} is already very informative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9660) Enhance documentation of Docker on YARN support

2019-07-02 Thread Peter Bacsko (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9660:
---
Attachment: YARN-9660-001.patch

> Enhance documentation of Docker on YARN support
> ---
>
> Key: YARN-9660
> URL: https://issues.apache.org/jira/browse/YARN-9660
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9660-001.patch
>
>
> Right now, using Docker on YARN has some hard requirements. If these 
> requirements are not met, then launching the containers will fail and and 
> error message will be printed. Depending on how familiar the user is with 
> Docker, it might or might not be easy for them to understand what went wrong 
> and how to fix the underlying problem.
> It would be important to explicitly document these requirements along with 
> the error messages.
> *#1: CGroups handler cannot be systemd*
> If docker deamon runs with systemd cgroups handler, we receive the following 
> error upon launching a container:
> {noformat}
> Container id: container_1561638268473_0006_01_02
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: /usr/bin/docker-current: Error response from daemon: 
> cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice".
> See '/usr/bin/docker-current run --help'.
> Shell output: main : command provided 4
> main : run as user is johndoe
> main : requested yarn user is johndoe
> {noformat}
> Solution: switch to cgroupfs. Doing so can be OS-specific, but we can 
> document a {{systemcl}} example.
>  
> *#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container*
> Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. 
> It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and 
> there's only {{/bin/sh}}.
> If we try to use these kind of images, we'll see the following error message:
> {noformat}
> Container id: container_1561638268473_0015_01_02
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: /usr/bin/docker-current: Error response from daemon: oci 
> runtime error: container_linux.go:235: starting container process caused 
> "exec: \"bash\": executable file not found in $PATH".
> Shell output: main : command provided 4
> main : run as user is johndoe
> main : requested yarn user is johndoe
> {noformat}
>  
> *#3: {{find}} command must be available on the {{$PATH}}*
> It seems obvious that we have the {{find}} command, but even very popular 
> images like {{fedora}} requires that we install it separately.
> If we don't have {{find}} available, then {{launcher_container.sh}} fails 
> with:
> {noformat}
> [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. 
> Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh:
>  line 44: find: command not found
> Last 4096 bytes of stderr.txt :
> [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. 
> Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh:
>  line 44: find: command not found
> Last 4096 bytes of stderr.txt :
> {noformat}
> *#4 Add cmd-line example of how to tag local images*
> This is actually documented under "Privileged Container Security 
> Consideration", but an one-liner would be helpful. I had trouble running a 
> local docker image and tagging it appropriately. Just an example like 
> {{docker tag local_ubuntu local/ubuntu:latest}} is already very informative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9647) Docker launch fails when local-dirs or log-dirs is unhealthy.

2019-07-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876929#comment-16876929
 ] 

Hadoop QA commented on YARN-9647:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
30m 39s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 21m  8s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 68m  1s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.amrmproxy.TestFederationInterceptor |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9647 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12973415/YARN-9647.002.patch |
| Optional Tests |  dupname  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 582f77bc5327 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e966edd |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/24340/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24340/testReport/ |
| Max. process+thread count | 417 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24340/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Docker launch fails when local-dirs or log-dirs is unhealthy.
> -
>
> Key: YARN-9647
> URL: https://issues.apache.org/jira/browse/YARN-9647
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.2
>  

[jira] [Commented] (YARN-9651) Resource Manager throws NPE

2019-07-02 Thread hunshenshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876838#comment-16876838
 ] 

hunshenshi commented on YARN-9651:
--

[~zhangqw] all application will case rm shutdown ?

How can I reproduce this scene?

> Resource Manager throws NPE
> ---
>
> Key: YARN-9651
> URL: https://issues.apache.org/jira/browse/YARN-9651
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
> Environment: os: centos 7.1
> hadoop 3.1.1 release
>  
>Reporter: zhangqw
>Priority: Major
>
> We use hadoop 3.1.1 release,running some regular job when RM Stopped with NPE.
> {code:java}
> 2019-06-13 17:06:06,664 FATAL event.EventDispatcher 
> (EventDispatcher.java:run(75)) - Error in handling event type 
> APP_ATTEMPT_ADDED to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.transferStateFromPreviousAttempt(SchedulerApplicationAttempt.java:1158)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.transferStateFromPreviousAttempt(FiCaSchedulerApp.java:852)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:982)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1730)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:167)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> I checked [related issue: 
> YARN-2340|https://issues.apache.org/jira/browse/YARN-2340]  , but it's 
> already fixed in my running version. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9647) Docker launch fails when local-dirs or log-dirs is unhealthy.

2019-07-02 Thread KWON BYUNGCHANG (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KWON BYUNGCHANG updated YARN-9647:
--
Attachment: YARN-9647.002.patch

> Docker launch fails when local-dirs or log-dirs is unhealthy.
> -
>
> Key: YARN-9647
> URL: https://issues.apache.org/jira/browse/YARN-9647
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.2
>Reporter: KWON BYUNGCHANG
>Priority: Major
> Attachments: YARN-9647.001.patch, YARN-9647.002.patch
>
>
> my /etc/hadoop/conf/container-executor.cfg
> {code}
> [docker]
>docker.allowed.ro-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local
>docker.allowed.rw-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local
> {code}
> if /data2 is unhealthy, docker launch fails  although container can use 
> /data1 as local-dir, log-dir 
> error message is below
> {code}
> [2019-06-25 14:55:26.168]Exception from container-launch. Container id: 
> container_e50_1561100493387_5185_01_000597 Exit code: 29 Exception message: 
> Launch container failed Shell error output: Could not determine real path of 
> mount '/data2/hadoop/yarn/local' Could not determine real path of mount 
> '/data2/hadoop/yarn/local' Unable to find permitted docker mounts on disk 
> Error constructing docker command, docker error code=16, error message='Mount 
> access error' Shell output: main : command provided 4 main : run as user is 
> magnum main : requested yarn user is magnum Creating script paths... Creating 
> local dirs... [2019-06-25 14:55:26.189]Container exited with a non-zero exit 
> code 29. [2019-06-25 14:55:26.192]Container exited with a non-zero exit code 
> 29. 
> {code}
> root cause is that normalize_mounts() in docker-util.c return -1  because it 
> cannot resolve real path of /data2/hadoop/yarn/local.(note that /data2 is 
> disk fault  at this point)
> however disk of nm local dirs and nm log dirs can fail at any time.
> docker launch should succeed if there are available local dirs and log dirs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9480) createAppDir() in LogAggregationService shouldn't block dispatcher thread of ContainerManagerImpl

2019-07-02 Thread liyakun (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876750#comment-16876750
 ] 

liyakun commented on YARN-9480:
---

Thanks [~tangzhankun] and [~Weiwei Yang]. [~Yunyao Zhang] Please help to solve 
this issue ASAP.

> createAppDir() in LogAggregationService shouldn't block dispatcher thread of 
> ContainerManagerImpl
> -
>
> Key: YARN-9480
> URL: https://issues.apache.org/jira/browse/YARN-9480
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: liyakun
>Assignee: Yunyao Zhang
>Priority: Major
>
> At present, when startContainers(), if NM does not contain the application, 
> it will enter the step of INIT_APPLICATION. In the application init step, 
> createAppDir() will be executed, and it is a blocking operation.
> createAppDir() is an operation that needs to interact with an external file 
> system. This operation is affected by the SLA of the external file system. 
> Once the external file system has a high latency, the NM dispatcher thread of 
> ContainerManagerImpl will be stuck. (In fact, I have seen a scene that NM 
> stuck here for more than an hour.)
> I think it would be more reasonable to move createAppDir() to the actual time 
> of uploading log (in other threads). And according to the logRetentionPolicy, 
> many of the containers may not get to this step, which will save a lot of 
> interactions with external file system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9480) createAppDir() in LogAggregationService shouldn't block dispatcher thread of ContainerManagerImpl

2019-07-02 Thread liyakun (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyakun reassigned YARN-9480:
-

Assignee: Yunyao Zhang  (was: liyakun)

> createAppDir() in LogAggregationService shouldn't block dispatcher thread of 
> ContainerManagerImpl
> -
>
> Key: YARN-9480
> URL: https://issues.apache.org/jira/browse/YARN-9480
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: liyakun
>Assignee: Yunyao Zhang
>Priority: Major
>
> At present, when startContainers(), if NM does not contain the application, 
> it will enter the step of INIT_APPLICATION. In the application init step, 
> createAppDir() will be executed, and it is a blocking operation.
> createAppDir() is an operation that needs to interact with an external file 
> system. This operation is affected by the SLA of the external file system. 
> Once the external file system has a high latency, the NM dispatcher thread of 
> ContainerManagerImpl will be stuck. (In fact, I have seen a scene that NM 
> stuck here for more than an hour.)
> I think it would be more reasonable to move createAppDir() to the actual time 
> of uploading log (in other threads). And according to the logRetentionPolicy, 
> many of the containers may not get to this step, which will save a lot of 
> interactions with external file system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9327) Improve synchronisation in ProtoUtils#convertToProtoFormat block

2019-07-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876744#comment-16876744
 ] 

Hudson commented on YARN-9327:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16850 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16850/])
YARN-9327. Improve synchronisation in ProtoUtils#convertToProtoFormat (sunilg: 
rev 0c8813f135f8c17f88660bb92529c15bb3a157ca)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourcePBImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ProtoUtils.java


> Improve synchronisation in ProtoUtils#convertToProtoFormat block
> 
>
> Key: YARN-9327
> URL: https://issues.apache.org/jira/browse/YARN-9327
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9327.001.patch, YARN-9327.002.patch, 
> YARN-9327.003.patch
>
>
> {code}
>   public static synchronized ResourceProto convertToProtoFormat(Resource r) {
> return ResourcePBImpl.getProto(r);
>   }
> {code}
> {noformat}
> "IPC Server handler 41 on 23764" #324 daemon prio=5 os_prio=0 
> tid=0x7f181de72800 nid=0x222 waiting for monitor entry 
> [0x7ef153dad000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ProtoUtils.convertToProtoFormat(ProtoUtils.java:404)
>   - waiting to lock <0x7ef2d8bcf6d8> (a java.lang.Class for 
> org.apache.hadoop.yarn.api.records.impl.pb.ProtoUtils)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.convertToProtoFormat(NodeReportPBImpl.java:315)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:262)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:289)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:228)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.convertToProtoFormat(AllocateResponsePBImpl.java:844)
>   - locked <0x7f0fed968a30> (a 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.access$500(AllocateResponsePBImpl.java:72)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$7$1.next(AllocateResponsePBImpl.java:810)
>   - locked <0x7f0fed96f500> (a 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$7$1)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$7$1.next(AllocateResponsePBImpl.java:799)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
>   at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$AllocateResponseProto$Builder.addAllUpdatedNodes(YarnServiceProtos.java:13810)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToBuilder(AllocateResponsePBImpl.java:158)
>   - locked <0x7f0fed968a30> (a 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToProto(AllocateResponsePBImpl.java:198)
>   - eliminated <0x7f0fed968a30> (a 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.getProto(AllocateResponsePBImpl.java:103)
>   - locked <0x7f0fed968a30> (a 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:61)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:878)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:824)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at jav

[jira] [Commented] (YARN-9644) First RMContext object is always leaked during switch over

2019-07-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876745#comment-16876745
 ] 

Hudson commented on YARN-9644:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16850 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16850/])
YARN-9644. First RMContext object is always leaked during switch over. (sunilg: 
rev e966edd025332394701fe0d2cfa0d76731183aaf)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMNMInfo.java


> First RMContext object is always leaked during switch over
> --
>
> Key: YARN-9644
> URL: https://issues.apache.org/jira/browse/YARN-9644
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-9644.001.patch, YARN-9644.002.patch, 
> YARN-9644.003.patch
>
>
> As per my understanding following 2 issues causes the issue.
> * WebApp holds the reference to First applicationMasterServer instance, which 
> has rmcontext with ActiveServiceContext (holds RMApps + nodes map). WebApp 
> remains to life time of RM process.
> * On transistion to active RMNMInfo object is registered in  MBean and never 
> unregistered on transitionToStandBy
> On transistion to Standby and again based to active new RMContext gets 
> created, but above 2 issues causes first RMcontext persist still RMShutdown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org