[jira] [Updated] (YARN-9505) Add container allocation latency for Opportunistic Scheduler

2019-04-23 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9505:

Attachment: YARN-9505.002.patch

> Add container allocation latency for Opportunistic Scheduler
> 
>
> Key: YARN-9505
> URL: https://issues.apache.org/jira/browse/YARN-9505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9505.001.patch, YARN-9505.002.patch
>
>
> This will help in tuning the opportunistic scheduler and it's configuration 
> parameters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9505) Add container allocation latency for Opportunistic Scheduler

2019-04-23 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9505:

Attachment: YARN-9505.001.patch

> Add container allocation latency for Opportunistic Scheduler
> 
>
> Key: YARN-9505
> URL: https://issues.apache.org/jira/browse/YARN-9505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9505.001.patch
>
>
> This will help in tuning the opportunistic scheduler and it's configuration 
> parameters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9505) Add container allocation latency for Opportunistic Scheduler

2019-04-23 Thread Abhishek Modi (JIRA)
Abhishek Modi created YARN-9505:
---

 Summary: Add container allocation latency for Opportunistic 
Scheduler
 Key: YARN-9505
 URL: https://issues.apache.org/jira/browse/YARN-9505
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Abhishek Modi
Assignee: Abhishek Modi


This will help in tuning the opportunistic scheduler and it's configuration 
parameters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2889) Limit the number of opportunistic container allocated per AM heartbeat

2019-04-22 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823302#comment-16823302
 ] 

Abhishek Modi commented on YARN-2889:
-

Thanks [~elgoiri] for review and committing it.

> Limit the number of opportunistic container allocated per AM heartbeat
> --
>
> Key: YARN-2889
> URL: https://issues.apache.org/jira/browse/YARN-2889
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Abhishek Modi
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-2889.001.patch, YARN-2889.002.patch, 
> YARN-2889.003.patch, YARN-2889.004.patch
>
>
> We introduce a way to limit the number of opportunistic containers that will 
> be allocated on each AM heartbeat.
>  This way we can restrict the number of opportunistic containers handed out 
> by the system, as well as throttle down misbehaving AMs (asking for too many 
> opportunistic containers).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2889) Limit the number of opportunistic container allocated per AM heartbeat

2019-04-20 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16822434#comment-16822434
 ] 

Abhishek Modi commented on YARN-2889:
-

Check-style issue is due to more number of parameters.

> Limit the number of opportunistic container allocated per AM heartbeat
> --
>
> Key: YARN-2889
> URL: https://issues.apache.org/jira/browse/YARN-2889
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-2889.001.patch, YARN-2889.002.patch, 
> YARN-2889.003.patch, YARN-2889.004.patch
>
>
> We introduce a way to limit the number of opportunistic containers that will 
> be allocated on each AM heartbeat.
>  This way we can restrict the number of opportunistic containers handed out 
> by the system, as well as throttle down misbehaving AMs (asking for too many 
> opportunistic containers).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2889) Limit the number of opportunistic container allocated per AM heartbeat

2019-04-19 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-2889:

Attachment: YARN-2889.004.patch

> Limit the number of opportunistic container allocated per AM heartbeat
> --
>
> Key: YARN-2889
> URL: https://issues.apache.org/jira/browse/YARN-2889
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-2889.001.patch, YARN-2889.002.patch, 
> YARN-2889.003.patch, YARN-2889.004.patch
>
>
> We introduce a way to limit the number of opportunistic containers that will 
> be allocated on each AM heartbeat.
>  This way we can restrict the number of opportunistic containers handed out 
> by the system, as well as throttle down misbehaving AMs (asking for too many 
> opportunistic containers).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9339) Apps pending metric incorrect after moving app to a new queue

2019-04-19 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16822072#comment-16822072
 ] 

Abhishek Modi commented on YARN-9339:
-

Thanks [~elgoiri] for review. I ran both tests locally with my changes and they 
are passing.

testDecreaseAfterIncreaseWithAllocationExpiration keeps randomly failing in 
other builds also - will file a jira for fixing this.

TestFairSchdulerPreemption also fails intermittently - there is already an open 
jira for this: YARN-9333.

 

> Apps pending metric incorrect after moving app to a new queue
> -
>
> Key: YARN-9339
> URL: https://issues.apache.org/jira/browse/YARN-9339
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Abhishek Modi
>Priority: Minor
> Attachments: YARN-9339.001.patch, YARN-9339.002.patch, 
> YARN-9339.003.patch, YARN-9339.004.patch
>
>
> I observed a cluster that had a high Apps Pending count that appeared to be 
> incorrect. This seemed to be related to apps being moved to different queues. 
> I tested by adding some logging to TestCapacityScheduler#testMoveAppBasic 
> before and after a moveApplication call. Before the call appsPending was 1 
> and afterwards appsPending was 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9339) Apps pending metric incorrect after moving app to a new queue

2019-04-19 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9339:

Attachment: YARN-9339.004.patch

> Apps pending metric incorrect after moving app to a new queue
> -
>
> Key: YARN-9339
> URL: https://issues.apache.org/jira/browse/YARN-9339
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Abhishek Modi
>Priority: Minor
> Attachments: YARN-9339.001.patch, YARN-9339.002.patch, 
> YARN-9339.003.patch, YARN-9339.004.patch
>
>
> I observed a cluster that had a high Apps Pending count that appeared to be 
> incorrect. This seemed to be related to apps being moved to different queues. 
> I tested by adding some logging to TestCapacityScheduler#testMoveAppBasic 
> before and after a moveApplication call. Before the call appsPending was 1 
> and afterwards appsPending was 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9339) Apps pending metric incorrect after moving app to a new queue

2019-04-19 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821690#comment-16821690
 ] 

Abhishek Modi commented on YARN-9339:
-

Thanks [~elgoiri] for review. Attached v4 patch with fixes.

> Apps pending metric incorrect after moving app to a new queue
> -
>
> Key: YARN-9339
> URL: https://issues.apache.org/jira/browse/YARN-9339
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Abhishek Modi
>Priority: Minor
> Attachments: YARN-9339.001.patch, YARN-9339.002.patch, 
> YARN-9339.003.patch, YARN-9339.004.patch
>
>
> I observed a cluster that had a high Apps Pending count that appeared to be 
> incorrect. This seemed to be related to apps being moved to different queues. 
> I tested by adding some logging to TestCapacityScheduler#testMoveAppBasic 
> before and after a moveApplication call. Before the call appsPending was 1 
> and afterwards appsPending was 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2889) Limit the number of opportunistic container allocated per AM heartbeat

2019-04-18 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820823#comment-16820823
 ] 

Abhishek Modi commented on YARN-2889:
-

Thanks [~elgoiri] for review:
 * Let's avoid using "luser" - this is being used across all tests. Should I 
change them all to something else?
 * There are already tests that are testing for large number of allocations: 
testLotsOfContainersRackLocalAllocationSameSchedKey and 
testLotsOfContainersRackLocalAllocation are already covering those cases.

> Limit the number of opportunistic container allocated per AM heartbeat
> --
>
> Key: YARN-2889
> URL: https://issues.apache.org/jira/browse/YARN-2889
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-2889.001.patch, YARN-2889.002.patch, 
> YARN-2889.003.patch
>
>
> We introduce a way to limit the number of opportunistic containers that will 
> be allocated on each AM heartbeat.
>  This way we can restrict the number of opportunistic containers handed out 
> by the system, as well as throttle down misbehaving AMs (asking for too many 
> opportunistic containers).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9448) Fix Opportunistic Scheduling for node local allocations.

2019-04-18 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820808#comment-16820808
 ] 

Abhishek Modi commented on YARN-9448:
-

Thanks [~elgoiri] for review. Attached v4 patch with the fix.

> Fix Opportunistic Scheduling for node local allocations.
> 
>
> Key: YARN-9448
> URL: https://issues.apache.org/jira/browse/YARN-9448
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9448.001.patch, YARN-9448.002.patch, 
> YARN-9448.003.patch, YARN-9448.004.patch
>
>
> Right now, opportunistic container might not get allocated on rack local node 
> even if it's available.
> Nodes are right now blacklisted if any container except node local container 
> is allocated on that node. In case, if previously container was allocated on 
> that node, that node wouldn't be even considered even if there is an ask for 
> node local request. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9448) Fix Opportunistic Scheduling for node local allocations.

2019-04-18 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9448:

Attachment: YARN-9448.004.patch

> Fix Opportunistic Scheduling for node local allocations.
> 
>
> Key: YARN-9448
> URL: https://issues.apache.org/jira/browse/YARN-9448
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9448.001.patch, YARN-9448.002.patch, 
> YARN-9448.003.patch, YARN-9448.004.patch
>
>
> Right now, opportunistic container might not get allocated on rack local node 
> even if it's available.
> Nodes are right now blacklisted if any container except node local container 
> is allocated on that node. In case, if previously container was allocated on 
> that node, that node wouldn't be even considered even if there is an ask for 
> node local request. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9339) Apps pending metric incorrect after moving app to a new queue

2019-04-17 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820331#comment-16820331
 ] 

Abhishek Modi commented on YARN-9339:
-

None of the failures are related to this patch.

> Apps pending metric incorrect after moving app to a new queue
> -
>
> Key: YARN-9339
> URL: https://issues.apache.org/jira/browse/YARN-9339
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Abhishek Modi
>Priority: Minor
> Attachments: YARN-9339.001.patch, YARN-9339.002.patch, 
> YARN-9339.003.patch
>
>
> I observed a cluster that had a high Apps Pending count that appeared to be 
> incorrect. This seemed to be related to apps being moved to different queues. 
> I tested by adding some logging to TestCapacityScheduler#testMoveAppBasic 
> before and after a moveApplication call. Before the call appsPending was 1 
> and afterwards appsPending was 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9339) Apps pending metric incorrect after moving app to a new queue

2019-04-16 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9339:

Attachment: YARN-9339.003.patch

> Apps pending metric incorrect after moving app to a new queue
> -
>
> Key: YARN-9339
> URL: https://issues.apache.org/jira/browse/YARN-9339
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Abhishek Modi
>Priority: Minor
> Attachments: YARN-9339.001.patch, YARN-9339.002.patch, 
> YARN-9339.003.patch
>
>
> I observed a cluster that had a high Apps Pending count that appeared to be 
> incorrect. This seemed to be related to apps being moved to different queues. 
> I tested by adding some logging to TestCapacityScheduler#testMoveAppBasic 
> before and after a moveApplication call. Before the call appsPending was 1 
> and afterwards appsPending was 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9448) Fix Opportunistic Scheduling for node local allocations.

2019-04-15 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818264#comment-16818264
 ] 

Abhishek Modi commented on YARN-9448:
-

Thanks [~elgoiri] for review. Attached v3 patch with more detailed comments in 
test.

> Fix Opportunistic Scheduling for node local allocations.
> 
>
> Key: YARN-9448
> URL: https://issues.apache.org/jira/browse/YARN-9448
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9448.001.patch, YARN-9448.002.patch, 
> YARN-9448.003.patch
>
>
> Right now, opportunistic container might not get allocated on rack local node 
> even if it's available.
> Nodes are right now blacklisted if any container except node local container 
> is allocated on that node. In case, if previously container was allocated on 
> that node, that node wouldn't be even considered even if there is an ask for 
> node local request. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9448) Fix Opportunistic Scheduling for node local allocations.

2019-04-15 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9448:

Attachment: YARN-9448.003.patch

> Fix Opportunistic Scheduling for node local allocations.
> 
>
> Key: YARN-9448
> URL: https://issues.apache.org/jira/browse/YARN-9448
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9448.001.patch, YARN-9448.002.patch, 
> YARN-9448.003.patch
>
>
> Right now, opportunistic container might not get allocated on rack local node 
> even if it's available.
> Nodes are right now blacklisted if any container except node local container 
> is allocated on that node. In case, if previously container was allocated on 
> that node, that node wouldn't be even considered even if there is an ask for 
> node local request. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2889) Limit the number of opportunistic container allocated per AM heartbeat

2019-04-15 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818212#comment-16818212
 ] 

Abhishek Modi commented on YARN-2889:
-

Thanks [~elgoiri] for review. Attached v3 patch addressing the comments.

> Limit the number of opportunistic container allocated per AM heartbeat
> --
>
> Key: YARN-2889
> URL: https://issues.apache.org/jira/browse/YARN-2889
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-2889.001.patch, YARN-2889.002.patch, 
> YARN-2889.003.patch
>
>
> We introduce a way to limit the number of opportunistic containers that will 
> be allocated on each AM heartbeat.
>  This way we can restrict the number of opportunistic containers handed out 
> by the system, as well as throttle down misbehaving AMs (asking for too many 
> opportunistic containers).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2889) Limit the number of opportunistic container allocated per AM heartbeat

2019-04-15 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-2889:

Attachment: YARN-2889.003.patch

> Limit the number of opportunistic container allocated per AM heartbeat
> --
>
> Key: YARN-2889
> URL: https://issues.apache.org/jira/browse/YARN-2889
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-2889.001.patch, YARN-2889.002.patch, 
> YARN-2889.003.patch
>
>
> We introduce a way to limit the number of opportunistic containers that will 
> be allocated on each AM heartbeat.
>  This way we can restrict the number of opportunistic containers handed out 
> by the system, as well as throttle down misbehaving AMs (asking for too many 
> opportunistic containers).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9474) Remove hard coded sleep from Opportunistic Scheduler tests.

2019-04-14 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16817517#comment-16817517
 ] 

Abhishek Modi commented on YARN-9474:
-

Thanks [~elgoiri] for review and committing it.

> Remove hard coded sleep from Opportunistic Scheduler tests.
> ---
>
> Key: YARN-9474
> URL: https://issues.apache.org/jira/browse/YARN-9474
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9474.001.patch, YARN-9474.002.patch
>
>
> Remove hard coded sleep from Opportunistic Scheduler tests and improve logs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9474) Remove hard coded sleep from Opportunistic Scheduler tests.

2019-04-13 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16817176#comment-16817176
 ] 

Abhishek Modi commented on YARN-9474:
-

Assert for null for rmContaner is not required as we are doing the null check 
above and assigning it to new variable. That was a redundant Assert so removed 
it.

> Remove hard coded sleep from Opportunistic Scheduler tests.
> ---
>
> Key: YARN-9474
> URL: https://issues.apache.org/jira/browse/YARN-9474
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9474.001.patch, YARN-9474.002.patch
>
>
> Remove hard coded sleep from Opportunistic Scheduler tests and improve logs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9474) Remove hard coded sleep from Opportunistic Scheduler tests.

2019-04-12 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816829#comment-16816829
 ] 

Abhishek Modi commented on YARN-9474:
-

Test failure and findbugs warnings are not related to this patch.

> Remove hard coded sleep from Opportunistic Scheduler tests.
> ---
>
> Key: YARN-9474
> URL: https://issues.apache.org/jira/browse/YARN-9474
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9474.001.patch, YARN-9474.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9474) Remove hard coded sleep from Opportunistic Scheduler tests.

2019-04-12 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816575#comment-16816575
 ] 

Abhishek Modi commented on YARN-9474:
-

Thanks [~elgoiri] for reviewing it. I have fixed the review comments in v2 
patch.

> Remove hard coded sleep from Opportunistic Scheduler tests.
> ---
>
> Key: YARN-9474
> URL: https://issues.apache.org/jira/browse/YARN-9474
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9474.001.patch, YARN-9474.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9474) Remove hard coded sleep from Opportunistic Scheduler tests.

2019-04-12 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9474:

Attachment: YARN-9474.002.patch

> Remove hard coded sleep from Opportunistic Scheduler tests.
> ---
>
> Key: YARN-9474
> URL: https://issues.apache.org/jira/browse/YARN-9474
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9474.001.patch, YARN-9474.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9474) Remove hard coded sleep from Opportunistic Scheduler tests.

2019-04-12 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816288#comment-16816288
 ] 

Abhishek Modi commented on YARN-9474:
-

Findbugs warning is unrelated to this change.

[~elgoiri] [~giovanni.fumarola] could you please review this. Thanks.

> Remove hard coded sleep from Opportunistic Scheduler tests.
> ---
>
> Key: YARN-9474
> URL: https://issues.apache.org/jira/browse/YARN-9474
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9474.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9474) Remove hard coded sleep from Opportunistic Scheduler tests.

2019-04-12 Thread Abhishek Modi (JIRA)
Abhishek Modi created YARN-9474:
---

 Summary: Remove hard coded sleep from Opportunistic Scheduler 
tests.
 Key: YARN-9474
 URL: https://issues.apache.org/jira/browse/YARN-9474
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Abhishek Modi
Assignee: Abhishek Modi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9435) Add Opportunistic Scheduler metrics in ResourceManager.

2019-04-12 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816109#comment-16816109
 ] 

Abhishek Modi commented on YARN-9435:
-

Thanks [~giovanni.fumarola] for review and committing it to trunk.

> Add Opportunistic Scheduler metrics in ResourceManager.
> ---
>
> Key: YARN-9435
> URL: https://issues.apache.org/jira/browse/YARN-9435
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9435.001.patch, YARN-9435.002.patch, 
> YARN-9435.003.patch, YARN-9435.004.patch
>
>
> Right now there are no metrics available for Opportunistic Scheduler at 
> ResourceManager. As part of this jira, we will add metrics like number of 
> allocated opportunistic containers, released opportunistic containers, node 
> level allocations, rack level allocations etc. for Opportunistic Scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9339) Apps pending metric incorrect after moving app to a new queue

2019-04-11 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815646#comment-16815646
 ] 

Abhishek Modi commented on YARN-9339:
-

Thanks [~elgoiri] for review. I will address the comments in next patch.

> Apps pending metric incorrect after moving app to a new queue
> -
>
> Key: YARN-9339
> URL: https://issues.apache.org/jira/browse/YARN-9339
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Abhishek Modi
>Priority: Minor
> Attachments: YARN-9339.001.patch, YARN-9339.002.patch
>
>
> I observed a cluster that had a high Apps Pending count that appeared to be 
> incorrect. This seemed to be related to apps being moved to different queues. 
> I tested by adding some logging to TestCapacityScheduler#testMoveAppBasic 
> before and after a moveApplication call. Before the call appsPending was 1 
> and afterwards appsPending was 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9435) Add Opportunistic Scheduler metrics in ResourceManager.

2019-04-11 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815335#comment-16815335
 ] 

Abhishek Modi commented on YARN-9435:
-

None of the findbugs warnings is related to the patch.

> Add Opportunistic Scheduler metrics in ResourceManager.
> ---
>
> Key: YARN-9435
> URL: https://issues.apache.org/jira/browse/YARN-9435
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9435.001.patch, YARN-9435.002.patch, 
> YARN-9435.003.patch, YARN-9435.004.patch
>
>
> Right now there are no metrics available for Opportunistic Scheduler at 
> ResourceManager. As part of this jira, we will add metrics like number of 
> allocated opportunistic containers, released opportunistic containers, node 
> level allocations, rack level allocations etc. for Opportunistic Scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9435) Add Opportunistic Scheduler metrics in ResourceManager.

2019-04-11 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815133#comment-16815133
 ] 

Abhishek Modi commented on YARN-9435:
-

Thanks [~giovanni.fumarola] for reviewing this. I have attached v4 patch after 
removing sleep. 

> Add Opportunistic Scheduler metrics in ResourceManager.
> ---
>
> Key: YARN-9435
> URL: https://issues.apache.org/jira/browse/YARN-9435
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9435.001.patch, YARN-9435.002.patch, 
> YARN-9435.003.patch, YARN-9435.004.patch
>
>
> Right now there are no metrics available for Opportunistic Scheduler at 
> ResourceManager. As part of this jira, we will add metrics like number of 
> allocated opportunistic containers, released opportunistic containers, node 
> level allocations, rack level allocations etc. for Opportunistic Scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9435) Add Opportunistic Scheduler metrics in ResourceManager.

2019-04-11 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9435:

Attachment: YARN-9435.004.patch

> Add Opportunistic Scheduler metrics in ResourceManager.
> ---
>
> Key: YARN-9435
> URL: https://issues.apache.org/jira/browse/YARN-9435
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9435.001.patch, YARN-9435.002.patch, 
> YARN-9435.003.patch, YARN-9435.004.patch
>
>
> Right now there are no metrics available for Opportunistic Scheduler at 
> ResourceManager. As part of this jira, we will add metrics like number of 
> allocated opportunistic containers, released opportunistic containers, node 
> level allocations, rack level allocations etc. for Opportunistic Scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9448) Fix Opportunistic Scheduling for node local allocations.

2019-04-06 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9448:

Attachment: YARN-9448.002.patch

> Fix Opportunistic Scheduling for node local allocations.
> 
>
> Key: YARN-9448
> URL: https://issues.apache.org/jira/browse/YARN-9448
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9448.001.patch, YARN-9448.002.patch
>
>
> Right now, opportunistic container might not get allocated on rack local node 
> even if it's available.
> Nodes are right now blacklisted if any container except node local container 
> is allocated on that node. In case, if previously container was allocated on 
> that node, that node wouldn't be even considered even if there is an ask for 
> node local request. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3488) AM get timeline service info from RM rather than Application specific configuration.

2019-04-05 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811452#comment-16811452
 ] 

Abhishek Modi commented on YARN-3488:
-

Findbugs warnings are not related to this patch.

> AM get timeline service info from RM rather than Application specific 
> configuration.
> 
>
> Key: YARN-3488
> URL: https://issues.apache.org/jira/browse/YARN-3488
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications
>Reporter: Junping Du
>Assignee: Abhishek Modi
>Priority: Major
>  Labels: YARN-5355
> Attachments: YARN-3488.001.patch, YARN-3488.002.patch, 
> YARN-3488.003.patch, YARN-3488.004.patch
>
>
> Since v1 timeline service, we have MR configuration to enable/disable putting 
> history event to timeline service. For today's v2 timeline service ongoing 
> effort, currently we have different methods/structures between v1 and v2 for 
> consuming TimelineClient, so application have to be aware of which version 
> timeline service get used there.
> There are basically two options here:
> First option is as current way in DistributedShell or MR to let application 
> has specific configuration to point out that if enabling ATS and which 
> version could be, like: MRJobConfig.MAPREDUCE_JOB_EMIT_TIMELINE_DATA, etc.
> The other option is to let application to figure out timeline related info 
> from YARN/RM, it can be done through registerApplicationMaster() in 
> ApplicationMasterProtocol with return value for service "off", "v1_on", or 
> "v2_on".
> We prefer the latter option because application owner doesn't have to aware 
> RM/YARN infrastructure details. Please note that we should keep compatible 
> (consistent behavior with the same setting) with released configurations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9382) Publish container killed, paused and resumed events to ATSv2.

2019-04-05 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811445#comment-16811445
 ] 

Abhishek Modi commented on YARN-9382:
-

Thanks [~vrushalic] for reviewing and committing it.

> Publish container killed, paused and resumed events to ATSv2.
> -
>
> Key: YARN-9382
> URL: https://issues.apache.org/jira/browse/YARN-9382
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
>  Labels: atsv2
> Fix For: 3.3.0
>
> Attachments: YARN-9382.001.patch, YARN-9382.002.patch, 
> YARN-9382.003.patch
>
>
> There are some events missing in container lifecycle. We need to add support 
> for adding events for when container gets killed, paused and resumed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9335) [atsv2] Restrict the number of elements held in timeline collector when backend is unreachable for async calls

2019-04-05 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811444#comment-16811444
 ] 

Abhishek Modi commented on YARN-9335:
-

Thanks [~vrushalic] for review and committing it. 

> [atsv2] Restrict the number of elements held in timeline collector when 
> backend is unreachable for async calls
> --
>
> Key: YARN-9335
> URL: https://issues.apache.org/jira/browse/YARN-9335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Abhishek Modi
>Priority: Major
>  Labels: atvs
> Fix For: 3.3.0
>
> Attachments: YARN-9335.001.patch, YARN-9335.002.patch, 
> YARN-9335.003.patch, YARN-9335.004.patch
>
>
> For ATSv2 , if the backend is unreachable, the number/size of data held in 
> timeline collector's memory increases significantly. This is not good for the 
> NM memory. 
> Filing jira to set a limit on how many/much should be retained by the 
> timeline collector in memory in case the backend is not reachable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3488) AM get timeline service info from RM rather than Application specific configuration.

2019-04-05 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811166#comment-16811166
 ] 

Abhishek Modi commented on YARN-3488:
-

Attached a new patch after merging with trunk.

> AM get timeline service info from RM rather than Application specific 
> configuration.
> 
>
> Key: YARN-3488
> URL: https://issues.apache.org/jira/browse/YARN-3488
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications
>Reporter: Junping Du
>Assignee: Abhishek Modi
>Priority: Major
>  Labels: YARN-5355
> Attachments: YARN-3488.001.patch, YARN-3488.002.patch, 
> YARN-3488.003.patch, YARN-3488.004.patch
>
>
> Since v1 timeline service, we have MR configuration to enable/disable putting 
> history event to timeline service. For today's v2 timeline service ongoing 
> effort, currently we have different methods/structures between v1 and v2 for 
> consuming TimelineClient, so application have to be aware of which version 
> timeline service get used there.
> There are basically two options here:
> First option is as current way in DistributedShell or MR to let application 
> has specific configuration to point out that if enabling ATS and which 
> version could be, like: MRJobConfig.MAPREDUCE_JOB_EMIT_TIMELINE_DATA, etc.
> The other option is to let application to figure out timeline related info 
> from YARN/RM, it can be done through registerApplicationMaster() in 
> ApplicationMasterProtocol with return value for service "off", "v1_on", or 
> "v2_on".
> We prefer the latter option because application owner doesn't have to aware 
> RM/YARN infrastructure details. Please note that we should keep compatible 
> (consistent behavior with the same setting) with released configurations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9448) Fix Opportunistic Scheduling for node local allocations.

2019-04-05 Thread Abhishek Modi (JIRA)
Abhishek Modi created YARN-9448:
---

 Summary: Fix Opportunistic Scheduling for node local allocations.
 Key: YARN-9448
 URL: https://issues.apache.org/jira/browse/YARN-9448
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Abhishek Modi
Assignee: Abhishek Modi


Right now, opportunistic container might not get allocated on rack local node 
even if it's available.

Nodes are right now blacklisted if any container except node local container is 
allocated on that node. In case, if previously container was allocated on that 
node, that node wouldn't be even considered even if there is an ask for node 
local request. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3488) AM get timeline service info from RM rather than Application specific configuration.

2019-04-05 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-3488:

Attachment: YARN-3488.004.patch

> AM get timeline service info from RM rather than Application specific 
> configuration.
> 
>
> Key: YARN-3488
> URL: https://issues.apache.org/jira/browse/YARN-3488
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications
>Reporter: Junping Du
>Assignee: Abhishek Modi
>Priority: Major
>  Labels: YARN-5355
> Attachments: YARN-3488.001.patch, YARN-3488.002.patch, 
> YARN-3488.003.patch, YARN-3488.004.patch
>
>
> Since v1 timeline service, we have MR configuration to enable/disable putting 
> history event to timeline service. For today's v2 timeline service ongoing 
> effort, currently we have different methods/structures between v1 and v2 for 
> consuming TimelineClient, so application have to be aware of which version 
> timeline service get used there.
> There are basically two options here:
> First option is as current way in DistributedShell or MR to let application 
> has specific configuration to point out that if enabling ATS and which 
> version could be, like: MRJobConfig.MAPREDUCE_JOB_EMIT_TIMELINE_DATA, etc.
> The other option is to let application to figure out timeline related info 
> from YARN/RM, it can be done through registerApplicationMaster() in 
> ApplicationMasterProtocol with return value for service "off", "v1_on", or 
> "v2_on".
> We prefer the latter option because application owner doesn't have to aware 
> RM/YARN infrastructure details. Please note that we should keep compatible 
> (consistent behavior with the same setting) with released configurations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8647) Add a flag to disable move app between queues

2019-04-05 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi resolved YARN-8647.
-
Resolution: Won't Fix

> Add a flag to disable move app between queues
> -
>
> Key: YARN-8647
> URL: https://issues.apache.org/jira/browse/YARN-8647
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: sarun singla
>Assignee: Abhishek Modi
>Priority: Critical
>
> For large clusters where we have a number of users submitting application, we 
> can result into scenarios where app developers try to move the queues for 
> their applications using something like 
> {code:java}
> yarn application -movetoqueue  -queue {code}
> Today there is no way of disabling the feature if one does not want 
> application developers to use  the feature.
> *Solution:*
> We should probably add an option to disable move queue feature from RM side 
> on the cluster level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9382) Publish container killed, paused and resumed events to ATSv2.

2019-04-05 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810524#comment-16810524
 ] 

Abhishek Modi commented on YARN-9382:
-

None of the findbugs warning are related to this patch.

> Publish container killed, paused and resumed events to ATSv2.
> -
>
> Key: YARN-9382
> URL: https://issues.apache.org/jira/browse/YARN-9382
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9382.001.patch, YARN-9382.002.patch, 
> YARN-9382.003.patch
>
>
> There are some events missing in container lifecycle. We need to add support 
> for adding events for when container gets killed, paused and resumed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9335) [atsv2] Restrict the number of elements held in timeline collector when backend is unreachable for async calls

2019-04-04 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810036#comment-16810036
 ] 

Abhishek Modi commented on YARN-9335:
-

Thanks [~vrushalic]. There were some conflicts due to some changes in trunk. 
Attached a new patch after resolving conflicts. Thanks.

> [atsv2] Restrict the number of elements held in timeline collector when 
> backend is unreachable for async calls
> --
>
> Key: YARN-9335
> URL: https://issues.apache.org/jira/browse/YARN-9335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9335.001.patch, YARN-9335.002.patch, 
> YARN-9335.003.patch, YARN-9335.004.patch
>
>
> For ATSv2 , if the backend is unreachable, the number/size of data held in 
> timeline collector's memory increases significantly. This is not good for the 
> NM memory. 
> Filing jira to set a limit on how many/much should be retained by the 
> timeline collector in memory in case the backend is not reachable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9335) [atsv2] Restrict the number of elements held in timeline collector when backend is unreachable for async calls

2019-04-04 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9335:

Attachment: YARN-9335.004.patch

> [atsv2] Restrict the number of elements held in timeline collector when 
> backend is unreachable for async calls
> --
>
> Key: YARN-9335
> URL: https://issues.apache.org/jira/browse/YARN-9335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9335.001.patch, YARN-9335.002.patch, 
> YARN-9335.003.patch, YARN-9335.004.patch
>
>
> For ATSv2 , if the backend is unreachable, the number/size of data held in 
> timeline collector's memory increases significantly. This is not good for the 
> NM memory. 
> Filing jira to set a limit on how many/much should be retained by the 
> timeline collector in memory in case the backend is not reachable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9382) Publish container killed, paused and resumed events to ATSv2.

2019-04-04 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809934#comment-16809934
 ] 

Abhishek Modi commented on YARN-9382:
-

[~vrushalic] I looked into it and due to some recent changes is trunk there was 
a conflict. I attached a new patch after resolving the conflicts. Thanks.

> Publish container killed, paused and resumed events to ATSv2.
> -
>
> Key: YARN-9382
> URL: https://issues.apache.org/jira/browse/YARN-9382
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9382.001.patch, YARN-9382.002.patch, 
> YARN-9382.003.patch
>
>
> There are some events missing in container lifecycle. We need to add support 
> for adding events for when container gets killed, paused and resumed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9382) Publish container killed, paused and resumed events to ATSv2.

2019-04-04 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9382:

Attachment: YARN-9382.003.patch

> Publish container killed, paused and resumed events to ATSv2.
> -
>
> Key: YARN-9382
> URL: https://issues.apache.org/jira/browse/YARN-9382
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9382.001.patch, YARN-9382.002.patch, 
> YARN-9382.003.patch
>
>
> There are some events missing in container lifecycle. We need to add support 
> for adding events for when container gets killed, paused and resumed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9335) [atsv2] Restrict the number of elements held in timeline collector when backend is unreachable for async calls

2019-04-03 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809539#comment-16809539
 ] 

Abhishek Modi commented on YARN-9335:
-

Thanks [~vrushalic]. I will check at my end. 

Let me also run complete UTs with patch as I am afraid it can cause some other 
UT failures as we have made writes async.

> [atsv2] Restrict the number of elements held in timeline collector when 
> backend is unreachable for async calls
> --
>
> Key: YARN-9335
> URL: https://issues.apache.org/jira/browse/YARN-9335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9335.001.patch, YARN-9335.002.patch, 
> YARN-9335.003.patch
>
>
> For ATSv2 , if the backend is unreachable, the number/size of data held in 
> timeline collector's memory increases significantly. This is not good for the 
> NM memory. 
> Filing jira to set a limit on how many/much should be retained by the 
> timeline collector in memory in case the backend is not reachable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9382) Publish container killed, paused and resumed events to ATSv2.

2019-04-03 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809536#comment-16809536
 ] 

Abhishek Modi commented on YARN-9382:
-

Thanks Vrushali - let me check at my end.

> Publish container killed, paused and resumed events to ATSv2.
> -
>
> Key: YARN-9382
> URL: https://issues.apache.org/jira/browse/YARN-9382
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9382.001.patch, YARN-9382.002.patch
>
>
> There are some events missing in container lifecycle. We need to add support 
> for adding events for when container gets killed, paused and resumed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9435) Add Opportunistic Scheduler metrics in ResourceManager.

2019-04-03 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9435:

Attachment: YARN-9435.003.patch

> Add Opportunistic Scheduler metrics in ResourceManager.
> ---
>
> Key: YARN-9435
> URL: https://issues.apache.org/jira/browse/YARN-9435
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9435.001.patch, YARN-9435.002.patch, 
> YARN-9435.003.patch
>
>
> Right now there are no metrics available for Opportunistic Scheduler at 
> ResourceManager. As part of this jira, we will add metrics like number of 
> allocated opportunistic containers, released opportunistic containers, node 
> level allocations, rack level allocations etc. for Opportunistic Scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9435) Add Opportunistic Scheduler metrics in ResourceManager.

2019-04-02 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9435:

Attachment: YARN-9435.002.patch

> Add Opportunistic Scheduler metrics in ResourceManager.
> ---
>
> Key: YARN-9435
> URL: https://issues.apache.org/jira/browse/YARN-9435
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9435.001.patch, YARN-9435.002.patch
>
>
> Right now there are no metrics available for Opportunistic Scheduler at 
> ResourceManager. As part of this jira, we will add metrics like number of 
> allocated opportunistic containers, released opportunistic containers, node 
> level allocations, rack level allocations etc. for Opportunistic Scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9435) Add Opportunistic Scheduler metrics in ResourceManager.

2019-04-02 Thread Abhishek Modi (JIRA)
Abhishek Modi created YARN-9435:
---

 Summary: Add Opportunistic Scheduler metrics in ResourceManager.
 Key: YARN-9435
 URL: https://issues.apache.org/jira/browse/YARN-9435
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Abhishek Modi
Assignee: Abhishek Modi


Right now there are no metrics available for Opportunistic Scheduler at 
ResourceManager. As part of this jira, we will add metrics like number of 
allocated opportunistic containers, released opportunistic containers, node 
level allocations, rack level allocations etc. for Opportunistic Scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9428) Add metrics for paused containers in NodeManager

2019-04-01 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807362#comment-16807362
 ] 

Abhishek Modi commented on YARN-9428:
-

Thanks [~giovanni.fumarola] for review and committing it. Thanks.

> Add metrics for paused containers in NodeManager
> 
>
> Key: YARN-9428
> URL: https://issues.apache.org/jira/browse/YARN-9428
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9428.001.patch, YARN-9428.002.patch
>
>
> Add metrics for paused containers in NodeManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2889) Limit the number of opportunistic container allocated per AM heartbeat

2019-04-01 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807363#comment-16807363
 ] 

Abhishek Modi commented on YARN-2889:
-

[~giovanni.fumarola] could you please review it. Thanks.

> Limit the number of opportunistic container allocated per AM heartbeat
> --
>
> Key: YARN-2889
> URL: https://issues.apache.org/jira/browse/YARN-2889
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-2889.001.patch, YARN-2889.002.patch
>
>
> We introduce a way to limit the number of opportunistic containers that will 
> be allocated on each AM heartbeat.
>  This way we can restrict the number of opportunistic containers handed out 
> by the system, as well as throttle down misbehaving AMs (asking for too many 
> opportunistic containers).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2889) Limit the number of opportunistic container allocated per AM heartbeat

2019-04-01 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-2889:

Description: 
We introduce a way to limit the number of opportunistic containers that will be 
allocated on each AM heartbeat.
 This way we can restrict the number of opportunistic containers handed out by 
the system, as well as throttle down misbehaving AMs (asking for too many 
opportunistic containers).

  was:
We introduce a way to limit the number of queueable requests that each AM can 
submit to the LocalRM.
This way we can restrict the number of queueable containers handed out by the 
system, as well as throttle down misbehaving AMs (asking for too many queueable 
containers).


> Limit the number of opportunistic container allocated per AM heartbeat
> --
>
> Key: YARN-2889
> URL: https://issues.apache.org/jira/browse/YARN-2889
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-2889.001.patch, YARN-2889.002.patch
>
>
> We introduce a way to limit the number of opportunistic containers that will 
> be allocated on each AM heartbeat.
>  This way we can restrict the number of opportunistic containers handed out 
> by the system, as well as throttle down misbehaving AMs (asking for too many 
> opportunistic containers).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2889) Limit the number of opportunistic container allocated per AM heartbeat

2019-04-01 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-2889:

Attachment: YARN-2889.002.patch

> Limit the number of opportunistic container allocated per AM heartbeat
> --
>
> Key: YARN-2889
> URL: https://issues.apache.org/jira/browse/YARN-2889
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-2889.001.patch, YARN-2889.002.patch
>
>
> We introduce a way to limit the number of queueable requests that each AM can 
> submit to the LocalRM.
> This way we can restrict the number of queueable containers handed out by the 
> system, as well as throttle down misbehaving AMs (asking for too many 
> queueable containers).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9428) Add metrics for paused containers in NodeManager

2019-04-01 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807082#comment-16807082
 ] 

Abhishek Modi commented on YARN-9428:
-

Thanks [~giovanni.fumarola] for review. Attached 002 patch with the fixes. 
Thanks.

> Add metrics for paused containers in NodeManager
> 
>
> Key: YARN-9428
> URL: https://issues.apache.org/jira/browse/YARN-9428
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9428.001.patch, YARN-9428.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9428) Add metrics for paused containers in NodeManager

2019-04-01 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9428:

Attachment: YARN-9428.002.patch

> Add metrics for paused containers in NodeManager
> 
>
> Key: YARN-9428
> URL: https://issues.apache.org/jira/browse/YARN-9428
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9428.001.patch, YARN-9428.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-2889) Limit the number of opportunistic container allocated per AM heartbeat

2019-03-31 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi reassigned YARN-2889:
---

Assignee: Abhishek Modi  (was: Arun Suresh)

> Limit the number of opportunistic container allocated per AM heartbeat
> --
>
> Key: YARN-2889
> URL: https://issues.apache.org/jira/browse/YARN-2889
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Abhishek Modi
>Priority: Major
>
> We introduce a way to limit the number of queueable requests that each AM can 
> submit to the LocalRM.
> This way we can restrict the number of queueable containers handed out by the 
> system, as well as throttle down misbehaving AMs (asking for too many 
> queueable containers).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2889) Limit in the number of opportunistic container allocated per AM heartbeat

2019-03-31 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-2889:

Summary: Limit in the number of opportunistic container allocated per AM 
heartbeat  (was: Limit in the number of opportunistic container requests per AM)

> Limit in the number of opportunistic container allocated per AM heartbeat
> -
>
> Key: YARN-2889
> URL: https://issues.apache.org/jira/browse/YARN-2889
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
>Priority: Major
>
> We introduce a way to limit the number of queueable requests that each AM can 
> submit to the LocalRM.
> This way we can restrict the number of queueable containers handed out by the 
> system, as well as throttle down misbehaving AMs (asking for too many 
> queueable containers).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2889) Limit the number of opportunistic container allocated per AM heartbeat

2019-03-31 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-2889:

Summary: Limit the number of opportunistic container allocated per AM 
heartbeat  (was: Limit in the number of opportunistic container allocated per 
AM heartbeat)

> Limit the number of opportunistic container allocated per AM heartbeat
> --
>
> Key: YARN-2889
> URL: https://issues.apache.org/jira/browse/YARN-2889
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
>Priority: Major
>
> We introduce a way to limit the number of queueable requests that each AM can 
> submit to the LocalRM.
> This way we can restrict the number of queueable containers handed out by the 
> system, as well as throttle down misbehaving AMs (asking for too many 
> queueable containers).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9428) Add metrics for paused containers in NodeManager

2019-03-31 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806188#comment-16806188
 ] 

Abhishek Modi commented on YARN-9428:
-

[~giovanni.fumarola] could you please review it. Thanks.

> Add metrics for paused containers in NodeManager
> 
>
> Key: YARN-9428
> URL: https://issues.apache.org/jira/browse/YARN-9428
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9428.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9428) Add metrics for paused containers in NodeManager

2019-03-31 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9428:

Attachment: YARN-9428.001.patch

> Add metrics for paused containers in NodeManager
> 
>
> Key: YARN-9428
> URL: https://issues.apache.org/jira/browse/YARN-9428
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9428.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9428) Add metrics for paused containers in NodeManager

2019-03-31 Thread Abhishek Modi (JIRA)
Abhishek Modi created YARN-9428:
---

 Summary: Add metrics for paused containers in NodeManager
 Key: YARN-9428
 URL: https://issues.apache.org/jira/browse/YARN-9428
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Abhishek Modi
Assignee: Abhishek Modi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9366) Make logs in TimelineClient implementation specific to application

2019-03-27 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802875#comment-16802875
 ] 

Abhishek Modi commented on YARN-9366:
-

Thanks for the patch [~prabham]. 

In YarnException, you are passing all the timeline entities that were not 
published. ToString function of TimelineEntities uses toString function of 
TimelineEntity, but that's not overridden. You need to use 
dumptimelineRecordsToJson to convert timelineEntity in readable format. 

I also think, you should log these entities in debug logs only.

> Make logs in TimelineClient implementation specific to application 
> ---
>
> Key: YARN-9366
> URL: https://issues.apache.org/jira/browse/YARN-9366
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: ATSv2
>Reporter: Prabha Manepalli
>Assignee: Prabha Manepalli
>Priority: Minor
> Attachments: YARN-9366.v1.patch
>
>
> For every container launched on a NM node, a timeline client is created to 
> publish entities to the corresponding application's timeline collector. And 
> there would be multiple timeline clients running at the same time. Current 
> implementation of timeline client logs are insufficient to isolate publishing 
> problems related to one application. Hence, creating this Jira to improvise 
> the logs in TimelineV2ClientImpl.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9382) Publish container killed, paused and resumed events to ATSv2.

2019-03-24 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800150#comment-16800150
 ] 

Abhishek Modi commented on YARN-9382:
-

Thanks [~vrushalic] for review. I have attached an updated patch with fixes. 
Thanks.

> Publish container killed, paused and resumed events to ATSv2.
> -
>
> Key: YARN-9382
> URL: https://issues.apache.org/jira/browse/YARN-9382
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9382.001.patch, YARN-9382.002.patch
>
>
> There are some events missing in container lifecycle. We need to add support 
> for adding events for when container gets killed, paused and resumed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9382) Publish container killed, paused and resumed events to ATSv2.

2019-03-24 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9382:

Attachment: YARN-9382.002.patch

> Publish container killed, paused and resumed events to ATSv2.
> -
>
> Key: YARN-9382
> URL: https://issues.apache.org/jira/browse/YARN-9382
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9382.001.patch, YARN-9382.002.patch
>
>
> There are some events missing in container lifecycle. We need to add support 
> for adding events for when container gets killed, paused and resumed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8636) TimelineSchemaCreator command not working

2019-03-24 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi resolved YARN-8636.
-
Resolution: Duplicate

> TimelineSchemaCreator command not working
> -
>
> Key: YARN-8636
> URL: https://issues.apache.org/jira/browse/YARN-8636
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Abhishek Modi
>Priority: Minor
>
> {code:java}
> nodemanager/bin> ./hadoop 
> org.apache.yarn.timelineservice.storage.TimelineSchemaCreator -create
> Error: Could not find or load main class 
> org.apache.yarn.timelineservice.storage.TimelineSchemaCreator
> {code}
> share/hadoop/yarn/timelineservice/ is not part of class path



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9342) Moving log4j1 to log4j2 in hadoop-yarn

2019-03-24 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800066#comment-16800066
 ] 

Abhishek Modi commented on YARN-9342:
-

At Microsoft we have switched to Log4j2 and it has provided significant 
improvements in performance.

> Moving log4j1 to log4j2 in hadoop-yarn
> --
>
> Key: YARN-9342
> URL: https://issues.apache.org/jira/browse/YARN-9342
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.1.2
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> 1. Log4j2 Asynchronous Logging will give significant improvement in the 
> performance.
> 2. Log4j2 does not have below locking issue which Log4j1 has.
> {code}
> "Thread-16" #40 daemon prio=5 os_prio=0 tid=0x7f181f9bb800 nid=0x125 
> waiting for monitor entry [0x7ef163bab000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at org.apache.log4j.Category.callAppenders(Category.java:204)
>   - locked <0x7ef2d803e2b8> (a org.apache.log4j.spi.RootLogger)
>   at org.apache.log4j.Category.forcedLog(Category.java:391)
>   at org.apache.log4j.Category.log(Category.java:856)
>   at 
> org.apache.commons.logging.impl.Log4JLogger.info(Log4JLogger.java:176)
> {code}
> https://bz.apache.org/bugzilla/show_bug.cgi?id=57714



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9402) Opportunistic containers should not be scheduled on Decommissioning nodes.

2019-03-21 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798355#comment-16798355
 ] 

Abhishek Modi commented on YARN-9402:
-

Thanks [~giovanni.fumarola] for review and committing it.

> Opportunistic containers should not be scheduled on Decommissioning nodes.
> --
>
> Key: YARN-9402
> URL: https://issues.apache.org/jira/browse/YARN-9402
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9402.001.patch
>
>
> Right now, opportunistic containers can get scheduled on Decommissioning 
> nodes which we are draining and thus can lead to killing of those containers 
> when node is decommissioned. As part of this jira, we will skip allocation of 
> opportunistic containers on Decommissioning nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9402) Opportunistic containers should not be scheduled on Decommissioning nodes.

2019-03-20 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797782#comment-16797782
 ] 

Abhishek Modi commented on YARN-9402:
-

Test failure is unrelated.

[~giovanni.fumarola] could you please review it. Thanks.

> Opportunistic containers should not be scheduled on Decommissioning nodes.
> --
>
> Key: YARN-9402
> URL: https://issues.apache.org/jira/browse/YARN-9402
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9402.001.patch
>
>
> Right now, opportunistic containers can get scheduled on Decommissioning 
> nodes which we are draining and thus can lead to killing of those containers 
> when node is decommissioned. As part of this jira, we will skip allocation of 
> opportunistic containers on Decommissioning nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9339) Apps pending metric incorrect after moving app to a new queue

2019-03-20 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797401#comment-16797401
 ] 

Abhishek Modi commented on YARN-9339:
-

[~giovanni.fumarola] could you please review it. Thanks.

> Apps pending metric incorrect after moving app to a new queue
> -
>
> Key: YARN-9339
> URL: https://issues.apache.org/jira/browse/YARN-9339
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Abhishek Modi
>Priority: Minor
> Attachments: YARN-9339.001.patch, YARN-9339.002.patch
>
>
> I observed a cluster that had a high Apps Pending count that appeared to be 
> incorrect. This seemed to be related to apps being moved to different queues. 
> I tested by adding some logging to TestCapacityScheduler#testMoveAppBasic 
> before and after a moveApplication call. Before the call appsPending was 1 
> and afterwards appsPending was 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9335) [atsv2] Restrict the number of elements held in timeline collector when backend is unreachable for async calls

2019-03-20 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9335:

Summary: [atsv2] Restrict the number of elements held in timeline collector 
when backend is unreachable for async calls  (was: [atsv2] Restrict the number 
of elements held in NM timeline collector when backend is unreachable for async 
calls)

> [atsv2] Restrict the number of elements held in timeline collector when 
> backend is unreachable for async calls
> --
>
> Key: YARN-9335
> URL: https://issues.apache.org/jira/browse/YARN-9335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9335.001.patch, YARN-9335.002.patch, 
> YARN-9335.003.patch
>
>
> For ATSv2 , if the backend is unreachable, the number/size of data held in 
> timeline collector's memory increases significantly. This is not good for the 
> NM memory. 
> Filing jira to set a limit on how many/much should be retained by the 
> timeline collector in memory in case the backend is not reachable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9339) Apps pending metric incorrect after moving app to a new queue

2019-03-20 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797107#comment-16797107
 ] 

Abhishek Modi commented on YARN-9339:
-

Checkstyle warning is due to redundant public modifier, but that is across the 
file and nothing much could be done.

> Apps pending metric incorrect after moving app to a new queue
> -
>
> Key: YARN-9339
> URL: https://issues.apache.org/jira/browse/YARN-9339
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Abhishek Modi
>Priority: Minor
> Attachments: YARN-9339.001.patch, YARN-9339.002.patch
>
>
> I observed a cluster that had a high Apps Pending count that appeared to be 
> incorrect. This seemed to be related to apps being moved to different queues. 
> I tested by adding some logging to TestCapacityScheduler#testMoveAppBasic 
> before and after a moveApplication call. Before the call appsPending was 1 
> and afterwards appsPending was 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9402) Opportunistic containers should not be scheduled on Decommissioning nodes.

2019-03-20 Thread Abhishek Modi (JIRA)
Abhishek Modi created YARN-9402:
---

 Summary: Opportunistic containers should not be scheduled on 
Decommissioning nodes.
 Key: YARN-9402
 URL: https://issues.apache.org/jira/browse/YARN-9402
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Abhishek Modi
Assignee: Abhishek Modi


Right now, opportunistic containers can get scheduled on Decommissioning nodes 
which we are draining and thus can lead to killing of those containers when 
node is decommissioned. As part of this jira, we will skip allocation of 
opportunistic containers on Decommissioning nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9339) Apps pending metric incorrect after moving app to a new queue

2019-03-20 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9339:

Attachment: YARN-9339.002.patch

> Apps pending metric incorrect after moving app to a new queue
> -
>
> Key: YARN-9339
> URL: https://issues.apache.org/jira/browse/YARN-9339
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Abhishek Modi
>Priority: Minor
> Attachments: YARN-9339.001.patch, YARN-9339.002.patch
>
>
> I observed a cluster that had a high Apps Pending count that appeared to be 
> incorrect. This seemed to be related to apps being moved to different queues. 
> I tested by adding some logging to TestCapacityScheduler#testMoveAppBasic 
> before and after a moveApplication call. Before the call appsPending was 1 
> and afterwards appsPending was 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9392) Handle missing scheduler events in Opportunistic Scheduler.

2019-03-19 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796705#comment-16796705
 ] 

Abhishek Modi commented on YARN-9392:
-

Thanks [~giovanni.fumarola] for review and committing it.

> Handle missing scheduler events in Opportunistic Scheduler.
> ---
>
> Key: YARN-9392
> URL: https://issues.apache.org/jira/browse/YARN-9392
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9392.001.patch
>
>
> At present newly added scheduler events are not being ignored by 
> Opportunistic scheduler causing error messages in logs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9390) Add support for configurable Resource Calculator in Opportunistic Scheduler.

2019-03-19 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796704#comment-16796704
 ] 

Abhishek Modi commented on YARN-9390:
-

Thanks [~giovanni.fumarola] for review.

Right now, in capacity scheduler there is a configuration to change resource 
calculator, but there is no way to change it in Opportunistic scheduling. 
DefaultResourceCalculator is performant as compared to 
DominantResourceCalculator, as it doesn't compare other resource types. The 
idea is to provide flexibility and an option to the users using capacity 
scheduler with DefaultResourceCalculator to use opportunistic scheduling with 
same resource calculator.

> Add support for configurable Resource Calculator in Opportunistic Scheduler.
> 
>
> Key: YARN-9390
> URL: https://issues.apache.org/jira/browse/YARN-9390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9390.001.patch
>
>
> Right now, Opportunistic scheduler uses hard coded DominantResourceCalculator 
> and there is no option to change it to other resource calculators. This Jira 
> is to make resource calculator configurable for Opportunistic scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9335) [atsv2] Restrict the number of elements held in NM timeline collector when backend is unreachable for async calls

2019-03-19 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9335:

Attachment: YARN-9335.003.patch

> [atsv2] Restrict the number of elements held in NM timeline collector when 
> backend is unreachable for async calls
> -
>
> Key: YARN-9335
> URL: https://issues.apache.org/jira/browse/YARN-9335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9335.001.patch, YARN-9335.002.patch, 
> YARN-9335.003.patch
>
>
> For ATSv2 , if the backend is unreachable, the number/size of data held in 
> timeline collector's memory increases significantly. This is not good for the 
> NM memory. 
> Filing jira to set a limit on how many/much should be retained by the 
> timeline collector in memory in case the backend is not reachable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9335) [atsv2] Restrict the number of elements held in NM timeline collector when backend is unreachable for async calls

2019-03-19 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796142#comment-16796142
 ] 

Abhishek Modi commented on YARN-9335:
-

Thanks [~sadineni] for pointing it out. Attached 003 patch with the fix.

> [atsv2] Restrict the number of elements held in NM timeline collector when 
> backend is unreachable for async calls
> -
>
> Key: YARN-9335
> URL: https://issues.apache.org/jira/browse/YARN-9335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9335.001.patch, YARN-9335.002.patch, 
> YARN-9335.003.patch
>
>
> For ATSv2 , if the backend is unreachable, the number/size of data held in 
> timeline collector's memory increases significantly. This is not good for the 
> NM memory. 
> Filing jira to set a limit on how many/much should be retained by the 
> timeline collector in memory in case the backend is not reachable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9392) Handle missing scheduler events in Opportunistic Scheduler.

2019-03-16 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9392:

Attachment: YARN-9392.001.patch

> Handle missing scheduler events in Opportunistic Scheduler.
> ---
>
> Key: YARN-9392
> URL: https://issues.apache.org/jira/browse/YARN-9392
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9392.001.patch
>
>
> At present newly added scheduler events are not being ignored by 
> Opportunistic scheduler causing error messages in logs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5414) Integrate NodeQueueLoadMonitor with ClusterNodeTracker

2019-03-16 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi reassigned YARN-5414:
---

Assignee: Abhishek Modi  (was: Arun Suresh)

> Integrate NodeQueueLoadMonitor with ClusterNodeTracker
> --
>
> Key: YARN-5414
> URL: https://issues.apache.org/jira/browse/YARN-5414
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: container-queuing, distributed-scheduling, scheduler
>Reporter: Arun Suresh
>Assignee: Abhishek Modi
>Priority: Major
>
> The {{ClusterNodeTracker}} tracks the states of clusterNodes and provides 
> convenience methods like sort and filter.
> The {{NodeQueueLoadMonitor}} should use the {{ClusterNodeTracker}} instead of 
> maintaining its own data-structure of node information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9392) Handle missing scheduler events in Opportunistic Scheduler.

2019-03-16 Thread Abhishek Modi (JIRA)
Abhishek Modi created YARN-9392:
---

 Summary: Handle missing scheduler events in Opportunistic 
Scheduler.
 Key: YARN-9392
 URL: https://issues.apache.org/jira/browse/YARN-9392
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Abhishek Modi
Assignee: Abhishek Modi


At present newly added scheduler events are not being ignored by Opportunistic 
scheduler causing error messages in logs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9390) Add support for configurable Resource Calculator in Opportunistic Scheduler.

2019-03-15 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793706#comment-16793706
 ] 

Abhishek Modi commented on YARN-9390:
-

[~giovanni.fumarola] could you please review it. Thanks.

> Add support for configurable Resource Calculator in Opportunistic Scheduler.
> 
>
> Key: YARN-9390
> URL: https://issues.apache.org/jira/browse/YARN-9390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9390.001.patch
>
>
> Right now, Opportunistic scheduler uses hard coded DominantResourceCalculator 
> and there is no option to change it to other resource calculators. This Jira 
> is to make resource calculator configurable for Opportunistic scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9390) Add support for configurable Resource Calculator in Opportunistic Scheduler.

2019-03-15 Thread Abhishek Modi (JIRA)
Abhishek Modi created YARN-9390:
---

 Summary: Add support for configurable Resource Calculator in 
Opportunistic Scheduler.
 Key: YARN-9390
 URL: https://issues.apache.org/jira/browse/YARN-9390
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Abhishek Modi
Assignee: Abhishek Modi


Right now, Opportunistic scheduler uses hard coded DominantResourceCalculator 
and there is no option to change it to other resource calculators. This Jira is 
to make resource calculator configurable for Opportunistic scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9335) [atsv2] Restrict the number of elements held in NM timeline collector when backend is unreachable for async calls

2019-03-14 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792921#comment-16792921
 ] 

Abhishek Modi commented on YARN-9335:
-

[~vrushalic] [~rohithsharma] could you please review this patch. Thanks.

> [atsv2] Restrict the number of elements held in NM timeline collector when 
> backend is unreachable for async calls
> -
>
> Key: YARN-9335
> URL: https://issues.apache.org/jira/browse/YARN-9335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9335.001.patch, YARN-9335.002.patch
>
>
> For ATSv2 , if the backend is unreachable, the number/size of data held in 
> timeline collector's memory increases significantly. This is not good for the 
> NM memory. 
> Filing jira to set a limit on how many/much should be retained by the 
> timeline collector in memory in case the backend is not reachable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3488) AM get timeline service info from RM rather than Application specific configuration.

2019-03-13 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791383#comment-16791383
 ] 

Abhishek Modi commented on YARN-3488:
-

[~rohithsharma] [~vrushalic] could you please review and commit it whenever you 
get some time. Thanks.

> AM get timeline service info from RM rather than Application specific 
> configuration.
> 
>
> Key: YARN-3488
> URL: https://issues.apache.org/jira/browse/YARN-3488
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications
>Reporter: Junping Du
>Assignee: Abhishek Modi
>Priority: Major
>  Labels: YARN-5355
> Attachments: YARN-3488.001.patch, YARN-3488.002.patch, 
> YARN-3488.003.patch
>
>
> Since v1 timeline service, we have MR configuration to enable/disable putting 
> history event to timeline service. For today's v2 timeline service ongoing 
> effort, currently we have different methods/structures between v1 and v2 for 
> consuming TimelineClient, so application have to be aware of which version 
> timeline service get used there.
> There are basically two options here:
> First option is as current way in DistributedShell or MR to let application 
> has specific configuration to point out that if enabling ATS and which 
> version could be, like: MRJobConfig.MAPREDUCE_JOB_EMIT_TIMELINE_DATA, etc.
> The other option is to let application to figure out timeline related info 
> from YARN/RM, it can be done through registerApplicationMaster() in 
> ApplicationMasterProtocol with return value for service "off", "v1_on", or 
> "v2_on".
> We prefer the latter option because application owner doesn't have to aware 
> RM/YARN infrastructure details. Please note that we should keep compatible 
> (consistent behavior with the same setting) with released configurations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9338) Timeline related testcases are failing

2019-03-12 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790845#comment-16790845
 ] 

Abhishek Modi commented on YARN-9338:
-

Thanks [~vrushalic]. I have ran all tests locally and they are running fine. 

> Timeline related testcases are failing
> --
>
> Key: YARN-9338
> URL: https://issues.apache.org/jira/browse/YARN-9338
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Prabhu Joseph
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9338.001.patch, YARN-9338.002.patch, 
> YARN-9338.003.patch, YARN-9338.004.patch
>
>
> Timeline related testcases are failing
> {code}
> [ERROR] Failures: 
> [ERROR]   
> TestCombinedSystemMetricsPublisher.testTimelineServiceEventPublishingV2Enabled:262->runTest:245->validateV2:382->verifyEntity:417
>  Expected 2 events to be published expected:<2> but was:<1>
> [ERROR]   
> TestSystemMetricsPublisherForV2.testPublishAppAttemptMetrics:259->verifyEntity:332
>  Expected 2 events to be published expected:<2> but was:<1>
> [ERROR]   
> TestSystemMetricsPublisherForV2.testPublishApplicationMetrics:224->verifyEntity:332
>  Expected 4 events to be published expected:<4> but was:<1>
> [ERROR]   
> TestSystemMetricsPublisherForV2.testPublishContainerMetrics:291->verifyEntity:332
>  Expected 2 events to be published expected:<2> but was:<1>
> [ERROR] Errors: 
> [ERROR]   
> TestCombinedSystemMetricsPublisher.testTimelineServiceEventPublishingV1V2Enabled:252->runTest:242->testSetup:123
>  » YarnRuntime
> [ERROR] Failures: 
> [ERROR]   
> TestTimelineAuthFilterForV2.testPutTimelineEntities:343->access$000:87->publishAndVerifyEntity:307
>  expected:<1> but was:<2>
> [ERROR]   
> TestTimelineAuthFilterForV2.testPutTimelineEntities:352->publishWithRetries:320->publishAndVerifyEntity:307
>  expected:<1> but was:<2>
> [ERROR]   
> TestTimelineAuthFilterForV2.testPutTimelineEntities:352->publishWithRetries:320->publishAndVerifyEntity:307
>  expected:<1> but was:<2>
> [ERROR]   
> TestTimelineAuthFilterForV2.testPutTimelineEntities:343->access$000:87->publishAndVerifyEntity:307
>  expected:<1> but was:<2>
> [INFO] 
> [ERROR] Failures: 
> [ERROR]   
> TestDistributedShell.testDSShellWithoutDomainV2:313->testDSShell:317->testDSShell:458->checkTimelineV2:557->verifyEntityForTimelineV2:710
>  Unexpected number of DS_APP_ATTEMPT_START event published. expected:<1> but 
> was:<0>
> [ERROR]   
> TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow:329->testDSShell:458->checkTimelineV2:557->verifyEntityForTimelineV2:710
>  Unexpected number of DS_APP_ATTEMPT_START event published. expected:<1> but 
> was:<0>
> [ERROR]   
> TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow:323->testDSShell:458->checkTimelineV2:557->verifyEntityForTimelineV2:710
>  Unexpected number of DS_APP_ATTEMPT_START event published. expected:<1> but 
> was:<0>
> [ERROR] Failures: 
> [ERROR]   
> TestMRTimelineEventHandling.testMRNewTimelineServiceEventHandling:240->checkNewTimelineEvent:304->verifyEntity:462
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9381) The yarn-default.xml has two identical property named yarn.timeline-service.http-cross-origin.enabled

2019-03-12 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790840#comment-16790840
 ] 

Abhishek Modi commented on YARN-9381:
-

[~cheersyang] [~giovanni.fumarola] could you please review this.

> The yarn-default.xml has two identical property named 
> yarn.timeline-service.http-cross-origin.enabled
> -
>
> Key: YARN-9381
> URL: https://issues.apache.org/jira/browse/YARN-9381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.1.0, 3.2.0, 3.1.1, 3.1.2
>Reporter: jenny
>Assignee: Abhishek Modi
>Priority: Trivial
> Attachments: YARN-9381.001.patch, image-2019-03-12-16-51-19-748.png
>
>
> The yarn-default.xml file has two identical property named 
> yarn.timeline-service.http-cross-origin.enabled.
> !image-2019-03-12-16-51-19-748.png|width=298,height=98!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9381) The yarn-default.xml has two identical property named yarn.timeline-service.http-cross-origin.enabled

2019-03-12 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi reassigned YARN-9381:
---

Assignee: Abhishek Modi

> The yarn-default.xml has two identical property named 
> yarn.timeline-service.http-cross-origin.enabled
> -
>
> Key: YARN-9381
> URL: https://issues.apache.org/jira/browse/YARN-9381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.1.0, 3.2.0, 3.1.1, 3.1.2
>Reporter: jenny
>Assignee: Abhishek Modi
>Priority: Trivial
> Attachments: image-2019-03-12-16-51-19-748.png
>
>
> The yarn-default.xml file has two identical property named 
> yarn.timeline-service.http-cross-origin.enabled.
> !image-2019-03-12-16-51-19-748.png|width=298,height=98!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9335) [atsv2] Restrict the number of elements held in NM timeline collector when backend is unreachable for async calls

2019-03-12 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9335:

Attachment: YARN-9335.002.patch

> [atsv2] Restrict the number of elements held in NM timeline collector when 
> backend is unreachable for async calls
> -
>
> Key: YARN-9335
> URL: https://issues.apache.org/jira/browse/YARN-9335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9335.001.patch, YARN-9335.002.patch
>
>
> For ATSv2 , if the backend is unreachable, the number/size of data held in 
> timeline collector's memory increases significantly. This is not good for the 
> NM memory. 
> Filing jira to set a limit on how many/much should be retained by the 
> timeline collector in memory in case the backend is not reachable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9383) Publish federation events to ATSv2.

2019-03-12 Thread Abhishek Modi (JIRA)
Abhishek Modi created YARN-9383:
---

 Summary: Publish federation events to ATSv2.
 Key: YARN-9383
 URL: https://issues.apache.org/jira/browse/YARN-9383
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Abhishek Modi
Assignee: Abhishek Modi


With federation enabled, containers for a single application might get spawned 
across multiple sub-clusters. This information right now is not getting 
published to ATSv2. As part of this jira, we are going to publish federation 
related info in container events to ATSv2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9382) Publish container killed, paused and resumed events to ATSv2.

2019-03-12 Thread Abhishek Modi (JIRA)
Abhishek Modi created YARN-9382:
---

 Summary: Publish container killed, paused and resumed events to 
ATSv2.
 Key: YARN-9382
 URL: https://issues.apache.org/jira/browse/YARN-9382
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Abhishek Modi
Assignee: Abhishek Modi


There are some events missing in container lifecycle. We need to add support 
for adding events for when container gets killed, paused and resumed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9338) Timeline related testcases are failing

2019-03-10 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9338:

Attachment: YARN-9338.004.patch

> Timeline related testcases are failing
> --
>
> Key: YARN-9338
> URL: https://issues.apache.org/jira/browse/YARN-9338
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Prabhu Joseph
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9338.001.patch, YARN-9338.002.patch, 
> YARN-9338.003.patch, YARN-9338.004.patch
>
>
> Timeline related testcases are failing
> {code}
> [ERROR] Failures: 
> [ERROR]   
> TestCombinedSystemMetricsPublisher.testTimelineServiceEventPublishingV2Enabled:262->runTest:245->validateV2:382->verifyEntity:417
>  Expected 2 events to be published expected:<2> but was:<1>
> [ERROR]   
> TestSystemMetricsPublisherForV2.testPublishAppAttemptMetrics:259->verifyEntity:332
>  Expected 2 events to be published expected:<2> but was:<1>
> [ERROR]   
> TestSystemMetricsPublisherForV2.testPublishApplicationMetrics:224->verifyEntity:332
>  Expected 4 events to be published expected:<4> but was:<1>
> [ERROR]   
> TestSystemMetricsPublisherForV2.testPublishContainerMetrics:291->verifyEntity:332
>  Expected 2 events to be published expected:<2> but was:<1>
> [ERROR] Errors: 
> [ERROR]   
> TestCombinedSystemMetricsPublisher.testTimelineServiceEventPublishingV1V2Enabled:252->runTest:242->testSetup:123
>  » YarnRuntime
> [ERROR] Failures: 
> [ERROR]   
> TestTimelineAuthFilterForV2.testPutTimelineEntities:343->access$000:87->publishAndVerifyEntity:307
>  expected:<1> but was:<2>
> [ERROR]   
> TestTimelineAuthFilterForV2.testPutTimelineEntities:352->publishWithRetries:320->publishAndVerifyEntity:307
>  expected:<1> but was:<2>
> [ERROR]   
> TestTimelineAuthFilterForV2.testPutTimelineEntities:352->publishWithRetries:320->publishAndVerifyEntity:307
>  expected:<1> but was:<2>
> [ERROR]   
> TestTimelineAuthFilterForV2.testPutTimelineEntities:343->access$000:87->publishAndVerifyEntity:307
>  expected:<1> but was:<2>
> [INFO] 
> [ERROR] Failures: 
> [ERROR]   
> TestDistributedShell.testDSShellWithoutDomainV2:313->testDSShell:317->testDSShell:458->checkTimelineV2:557->verifyEntityForTimelineV2:710
>  Unexpected number of DS_APP_ATTEMPT_START event published. expected:<1> but 
> was:<0>
> [ERROR]   
> TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow:329->testDSShell:458->checkTimelineV2:557->verifyEntityForTimelineV2:710
>  Unexpected number of DS_APP_ATTEMPT_START event published. expected:<1> but 
> was:<0>
> [ERROR]   
> TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow:323->testDSShell:458->checkTimelineV2:557->verifyEntityForTimelineV2:710
>  Unexpected number of DS_APP_ATTEMPT_START event published. expected:<1> but 
> was:<0>
> [ERROR] Failures: 
> [ERROR]   
> TestMRTimelineEventHandling.testMRNewTimelineServiceEventHandling:240->checkNewTimelineEvent:304->verifyEntity:462
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9335) [atsv2] Restrict the number of elements held in NM timeline collector when backend is unreachable for async calls

2019-03-09 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9335:

Summary: [atsv2] Restrict the number of elements held in NM timeline 
collector when backend is unreachable for async calls  (was: [atsv2] Restrict 
the number of elements held in NM timeline collector when backend is 
unreachable for asycn calls)

> [atsv2] Restrict the number of elements held in NM timeline collector when 
> backend is unreachable for async calls
> -
>
> Key: YARN-9335
> URL: https://issues.apache.org/jira/browse/YARN-9335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Abhishek Modi
>Priority: Major
>
> For ATSv2 , if the backend is unreachable, the number/size of data held in 
> timeline collector's memory increases significantly. This is not good for the 
> NM memory. 
> Filing jira to set a limit on how many/much should be retained by the 
> timeline collector in memory in case the backend is not reachable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9335) [atsv2] Restrict the number of elements held in NM timeline collector when backend is unreachable for asycn calls

2019-03-09 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16788667#comment-16788667
 ] 

Abhishek Modi commented on YARN-9335:
-

Sure [~Prabhu Joseph]. You can take over sync writes part. I will attach a 
patch for async one. Thanks.

> [atsv2] Restrict the number of elements held in NM timeline collector when 
> backend is unreachable for asycn calls
> -
>
> Key: YARN-9335
> URL: https://issues.apache.org/jira/browse/YARN-9335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Abhishek Modi
>Priority: Major
>
> For ATSv2 , if the backend is unreachable, the number/size of data held in 
> timeline collector's memory increases significantly. This is not good for the 
> NM memory. 
> Filing jira to set a limit on how many/much should be retained by the 
> timeline collector in memory in case the backend is not reachable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9335) [atsv2] Restrict the number of elements held in NM timeline collector when backend is unreachable for asycn calls

2019-03-07 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9335:

Summary: [atsv2] Restrict the number of elements held in NM timeline 
collector when backend is unreachable for asycn calls  (was: [atsv2] Restrict 
the number of elements held in NM timeline collector when backend is 
unreachable)

> [atsv2] Restrict the number of elements held in NM timeline collector when 
> backend is unreachable for asycn calls
> -
>
> Key: YARN-9335
> URL: https://issues.apache.org/jira/browse/YARN-9335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Abhishek Modi
>Priority: Major
>
> For ATSv2 , if the backend is unreachable, the number/size of data held in 
> timeline collector's memory increases significantly. This is not good for the 
> NM memory. 
> Filing jira to set a limit on how many/much should be retained by the 
> timeline collector in memory in case the backend is not reachable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9335) [atsv2] Restrict the number of elements held in NM timeline collector when backend is unreachable

2019-03-07 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787541#comment-16787541
 ] 

Abhishek Modi commented on YARN-9335:
-

There are two major issues right now. Hbase client has a huge retry time out 
which causes threads to get blocked at write entities for async writes. For 
sync writes, threads get blocked at synchronized blocks and that will bloat up 
the event queue causing huge memory pressure on NM as well as delay in 
processing of other events.

> [atsv2] Restrict the number of elements held in NM timeline collector when 
> backend is unreachable
> -
>
> Key: YARN-9335
> URL: https://issues.apache.org/jira/browse/YARN-9335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Abhishek Modi
>Priority: Major
>
> For ATSv2 , if the backend is unreachable, the number/size of data held in 
> timeline collector's memory increases significantly. This is not good for the 
> NM memory. 
> Filing jira to set a limit on how many/much should be retained by the 
> timeline collector in memory in case the backend is not reachable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8218) Add application launch time to ATSV1

2019-03-07 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787006#comment-16787006
 ] 

Abhishek Modi commented on YARN-8218:
-

Thanks [~vrushalic] for review and committing it.

> Add application launch time to ATSV1
> 
>
> Key: YARN-8218
> URL: https://issues.apache.org/jira/browse/YARN-8218
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Kanwaljeet Sachdev
>Assignee: Abhishek Modi
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-8218.001.patch
>
>
> YARN-7088 publishes application launch time to RMStore and also adds it to 
> the YARN UI. It would be a nice enhancement to have the launchTime event 
> published into the Application history server as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8218) Add application launch time to ATSV1

2019-03-06 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785751#comment-16785751
 ] 

Abhishek Modi commented on YARN-8218:
-

Gentle reminder [~vrushalic]. Thanks.

> Add application launch time to ATSV1
> 
>
> Key: YARN-8218
> URL: https://issues.apache.org/jira/browse/YARN-8218
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Kanwaljeet Sachdev
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-8218.001.patch
>
>
> YARN-7088 publishes application launch time to RMStore and also adds it to 
> the YARN UI. It would be a nice enhancement to have the launchTime event 
> published into the Application history server as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3488) AM get timeline service info from RM rather than Application specific configuration.

2019-03-06 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785746#comment-16785746
 ] 

Abhishek Modi commented on YARN-3488:
-

Gentle reminder [~rohithsharma] [~vrushalic]

> AM get timeline service info from RM rather than Application specific 
> configuration.
> 
>
> Key: YARN-3488
> URL: https://issues.apache.org/jira/browse/YARN-3488
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications
>Reporter: Junping Du
>Assignee: Abhishek Modi
>Priority: Major
>  Labels: YARN-5355
> Attachments: YARN-3488.001.patch, YARN-3488.002.patch, 
> YARN-3488.003.patch
>
>
> Since v1 timeline service, we have MR configuration to enable/disable putting 
> history event to timeline service. For today's v2 timeline service ongoing 
> effort, currently we have different methods/structures between v1 and v2 for 
> consuming TimelineClient, so application have to be aware of which version 
> timeline service get used there.
> There are basically two options here:
> First option is as current way in DistributedShell or MR to let application 
> has specific configuration to point out that if enabling ATS and which 
> version could be, like: MRJobConfig.MAPREDUCE_JOB_EMIT_TIMELINE_DATA, etc.
> The other option is to let application to figure out timeline related info 
> from YARN/RM, it can be done through registerApplicationMaster() in 
> ApplicationMasterProtocol with return value for service "off", "v1_on", or 
> "v2_on".
> We prefer the latter option because application owner doesn't have to aware 
> RM/YARN infrastructure details. Please note that we should keep compatible 
> (consistent behavior with the same setting) with released configurations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9338) Timeline related testcases are failing

2019-03-06 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9338:

Attachment: YARN-9338.003.patch

> Timeline related testcases are failing
> --
>
> Key: YARN-9338
> URL: https://issues.apache.org/jira/browse/YARN-9338
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Prabhu Joseph
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9338.001.patch, YARN-9338.002.patch, 
> YARN-9338.003.patch
>
>
> Timeline related testcases are failing
> {code}
> [ERROR] Failures: 
> [ERROR]   
> TestCombinedSystemMetricsPublisher.testTimelineServiceEventPublishingV2Enabled:262->runTest:245->validateV2:382->verifyEntity:417
>  Expected 2 events to be published expected:<2> but was:<1>
> [ERROR]   
> TestSystemMetricsPublisherForV2.testPublishAppAttemptMetrics:259->verifyEntity:332
>  Expected 2 events to be published expected:<2> but was:<1>
> [ERROR]   
> TestSystemMetricsPublisherForV2.testPublishApplicationMetrics:224->verifyEntity:332
>  Expected 4 events to be published expected:<4> but was:<1>
> [ERROR]   
> TestSystemMetricsPublisherForV2.testPublishContainerMetrics:291->verifyEntity:332
>  Expected 2 events to be published expected:<2> but was:<1>
> [ERROR] Errors: 
> [ERROR]   
> TestCombinedSystemMetricsPublisher.testTimelineServiceEventPublishingV1V2Enabled:252->runTest:242->testSetup:123
>  » YarnRuntime
> [ERROR] Failures: 
> [ERROR]   
> TestTimelineAuthFilterForV2.testPutTimelineEntities:343->access$000:87->publishAndVerifyEntity:307
>  expected:<1> but was:<2>
> [ERROR]   
> TestTimelineAuthFilterForV2.testPutTimelineEntities:352->publishWithRetries:320->publishAndVerifyEntity:307
>  expected:<1> but was:<2>
> [ERROR]   
> TestTimelineAuthFilterForV2.testPutTimelineEntities:352->publishWithRetries:320->publishAndVerifyEntity:307
>  expected:<1> but was:<2>
> [ERROR]   
> TestTimelineAuthFilterForV2.testPutTimelineEntities:343->access$000:87->publishAndVerifyEntity:307
>  expected:<1> but was:<2>
> [INFO] 
> [ERROR] Failures: 
> [ERROR]   
> TestDistributedShell.testDSShellWithoutDomainV2:313->testDSShell:317->testDSShell:458->checkTimelineV2:557->verifyEntityForTimelineV2:710
>  Unexpected number of DS_APP_ATTEMPT_START event published. expected:<1> but 
> was:<0>
> [ERROR]   
> TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow:329->testDSShell:458->checkTimelineV2:557->verifyEntityForTimelineV2:710
>  Unexpected number of DS_APP_ATTEMPT_START event published. expected:<1> but 
> was:<0>
> [ERROR]   
> TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow:323->testDSShell:458->checkTimelineV2:557->verifyEntityForTimelineV2:710
>  Unexpected number of DS_APP_ATTEMPT_START event published. expected:<1> but 
> was:<0>
> [ERROR] Failures: 
> [ERROR]   
> TestMRTimelineEventHandling.testMRNewTimelineServiceEventHandling:240->checkNewTimelineEvent:304->verifyEntity:462
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9338) Timeline related testcases are failing

2019-03-03 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782734#comment-16782734
 ] 

Abhishek Modi commented on YARN-9338:
-

Thanks [~Prabhu Joseph] for pointint it out. Attached YARN-9338.002.patch to 
fix this.

> Timeline related testcases are failing
> --
>
> Key: YARN-9338
> URL: https://issues.apache.org/jira/browse/YARN-9338
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Prabhu Joseph
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9338.001.patch, YARN-9338.002.patch
>
>
> Timeline related testcases are failing
> {code}
> [ERROR] Failures: 
> [ERROR]   
> TestCombinedSystemMetricsPublisher.testTimelineServiceEventPublishingV2Enabled:262->runTest:245->validateV2:382->verifyEntity:417
>  Expected 2 events to be published expected:<2> but was:<1>
> [ERROR]   
> TestSystemMetricsPublisherForV2.testPublishAppAttemptMetrics:259->verifyEntity:332
>  Expected 2 events to be published expected:<2> but was:<1>
> [ERROR]   
> TestSystemMetricsPublisherForV2.testPublishApplicationMetrics:224->verifyEntity:332
>  Expected 4 events to be published expected:<4> but was:<1>
> [ERROR]   
> TestSystemMetricsPublisherForV2.testPublishContainerMetrics:291->verifyEntity:332
>  Expected 2 events to be published expected:<2> but was:<1>
> [ERROR] Errors: 
> [ERROR]   
> TestCombinedSystemMetricsPublisher.testTimelineServiceEventPublishingV1V2Enabled:252->runTest:242->testSetup:123
>  » YarnRuntime
> [ERROR] Failures: 
> [ERROR]   
> TestTimelineAuthFilterForV2.testPutTimelineEntities:343->access$000:87->publishAndVerifyEntity:307
>  expected:<1> but was:<2>
> [ERROR]   
> TestTimelineAuthFilterForV2.testPutTimelineEntities:352->publishWithRetries:320->publishAndVerifyEntity:307
>  expected:<1> but was:<2>
> [ERROR]   
> TestTimelineAuthFilterForV2.testPutTimelineEntities:352->publishWithRetries:320->publishAndVerifyEntity:307
>  expected:<1> but was:<2>
> [ERROR]   
> TestTimelineAuthFilterForV2.testPutTimelineEntities:343->access$000:87->publishAndVerifyEntity:307
>  expected:<1> but was:<2>
> [INFO] 
> [ERROR] Failures: 
> [ERROR]   
> TestDistributedShell.testDSShellWithoutDomainV2:313->testDSShell:317->testDSShell:458->checkTimelineV2:557->verifyEntityForTimelineV2:710
>  Unexpected number of DS_APP_ATTEMPT_START event published. expected:<1> but 
> was:<0>
> [ERROR]   
> TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow:329->testDSShell:458->checkTimelineV2:557->verifyEntityForTimelineV2:710
>  Unexpected number of DS_APP_ATTEMPT_START event published. expected:<1> but 
> was:<0>
> [ERROR]   
> TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow:323->testDSShell:458->checkTimelineV2:557->verifyEntityForTimelineV2:710
>  Unexpected number of DS_APP_ATTEMPT_START event published. expected:<1> but 
> was:<0>
> [ERROR] Failures: 
> [ERROR]   
> TestMRTimelineEventHandling.testMRNewTimelineServiceEventHandling:240->checkNewTimelineEvent:304->verifyEntity:462
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9338) Timeline related testcases are failing

2019-03-03 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9338:

Attachment: YARN-9338.002.patch

> Timeline related testcases are failing
> --
>
> Key: YARN-9338
> URL: https://issues.apache.org/jira/browse/YARN-9338
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Prabhu Joseph
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9338.001.patch, YARN-9338.002.patch
>
>
> Timeline related testcases are failing
> {code}
> [ERROR] Failures: 
> [ERROR]   
> TestCombinedSystemMetricsPublisher.testTimelineServiceEventPublishingV2Enabled:262->runTest:245->validateV2:382->verifyEntity:417
>  Expected 2 events to be published expected:<2> but was:<1>
> [ERROR]   
> TestSystemMetricsPublisherForV2.testPublishAppAttemptMetrics:259->verifyEntity:332
>  Expected 2 events to be published expected:<2> but was:<1>
> [ERROR]   
> TestSystemMetricsPublisherForV2.testPublishApplicationMetrics:224->verifyEntity:332
>  Expected 4 events to be published expected:<4> but was:<1>
> [ERROR]   
> TestSystemMetricsPublisherForV2.testPublishContainerMetrics:291->verifyEntity:332
>  Expected 2 events to be published expected:<2> but was:<1>
> [ERROR] Errors: 
> [ERROR]   
> TestCombinedSystemMetricsPublisher.testTimelineServiceEventPublishingV1V2Enabled:252->runTest:242->testSetup:123
>  » YarnRuntime
> [ERROR] Failures: 
> [ERROR]   
> TestTimelineAuthFilterForV2.testPutTimelineEntities:343->access$000:87->publishAndVerifyEntity:307
>  expected:<1> but was:<2>
> [ERROR]   
> TestTimelineAuthFilterForV2.testPutTimelineEntities:352->publishWithRetries:320->publishAndVerifyEntity:307
>  expected:<1> but was:<2>
> [ERROR]   
> TestTimelineAuthFilterForV2.testPutTimelineEntities:352->publishWithRetries:320->publishAndVerifyEntity:307
>  expected:<1> but was:<2>
> [ERROR]   
> TestTimelineAuthFilterForV2.testPutTimelineEntities:343->access$000:87->publishAndVerifyEntity:307
>  expected:<1> but was:<2>
> [INFO] 
> [ERROR] Failures: 
> [ERROR]   
> TestDistributedShell.testDSShellWithoutDomainV2:313->testDSShell:317->testDSShell:458->checkTimelineV2:557->verifyEntityForTimelineV2:710
>  Unexpected number of DS_APP_ATTEMPT_START event published. expected:<1> but 
> was:<0>
> [ERROR]   
> TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow:329->testDSShell:458->checkTimelineV2:557->verifyEntityForTimelineV2:710
>  Unexpected number of DS_APP_ATTEMPT_START event published. expected:<1> but 
> was:<0>
> [ERROR]   
> TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow:323->testDSShell:458->checkTimelineV2:557->verifyEntityForTimelineV2:710
>  Unexpected number of DS_APP_ATTEMPT_START event published. expected:<1> but 
> was:<0>
> [ERROR] Failures: 
> [ERROR]   
> TestMRTimelineEventHandling.testMRNewTimelineServiceEventHandling:240->checkNewTimelineEvent:304->verifyEntity:462
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



<    1   2   3   4   5   6   7   >