[jira] [Comment Edited] (YARN-6593) [API] Introduce Placement Constraint object

2017-05-23 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16021954#comment-16021954
 ] 

Arun Suresh edited comment on YARN-6593 at 5/23/17 10:09 PM:
-

Given that what ideally should be perceived as the API is the proto 
definitions, and since we agreed to merge both TargetConstraint and 
CardinalityConstraint as a single proto struct, I don't see why we should not 
expose SimplePlacementConstraint externally as well.

In any case, we can have Builders/Validators that can take care of this 
internally right ? 

Since this feature is still nascent and in the process of development, we 
should probably err on the side of flexibility rather than stricter syntax. All 
dependent code (YARN native services) work can go on in parallel. One we have a 
working implementation, and we are close to a release, we can "late-bind" to a 
more restrictive API.


was (Author: asuresh):
Given that what ideally should be perceived as the API is the proto 
definitions, and since we agreed to merge both TargetConstraint and 
CardinalityConstraint as a single proto struct, I don't see why we should not 
expose SimplePlacementConstraint externally as well.

In any case, we can have Builders/Validators that can take care of this 
internally right ? 

> [API] Introduce Placement Constraint object
> ---
>
> Key: YARN-6593
> URL: https://issues.apache.org/jira/browse/YARN-6593
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-6593.001.patch, YARN-6593.002.patch
>
>
> This JIRA introduces an object for defining placement constraints.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6614) Deprecate DistributedSchedulingProtocol and add required fields directly to ApplicationMasterProtocol

2017-05-18 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016761#comment-16016761
 ] 

Arun Suresh commented on YARN-6614:
---

[~leftnoteasy], thanks for chiming in.

So, when we designed the DistributedSchedulingProtocol, I agree it was to 
ensure that it does not affect the existing protocol. But the drawback of the 
existing design is that the DS protocol, adds extra methods. This unfortunately 
complicates both the AMRMProxy RequestInterceptors on the NM as well as work on 
YARN-6355.
The root cause of the problem is that the version of protobuf we have does not 
support inheritance / extensions. That would have allowed us to just extends 
the Request and Response objects rather than the protocol itself.

> Deprecate DistributedSchedulingProtocol and add required fields directly to 
> ApplicationMasterProtocol
> -
>
> Key: YARN-6614
> URL: https://issues.apache.org/jira/browse/YARN-6614
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-6614.001.patch, YARN-6614.002.patch
>
>
> The {{DistributedSchedulingProtocol}} was initially designed as a wrapper 
> protocol over the {{ApplicaitonMasterProtocol}}.
> This JIRA proposes to deprecate the protocol itself and move the extra fields 
> of the {{RegisterDistributedSchedulingAMResponse}} and 
> {{DistributedSchedulingAllocateResponse}} to the 
> {{RegisterApplicationMasterResponse}} and {{AllocateResponse}} respectively.
> This will simplify the code quite a bit and make it easier to expose it as a 
> preprocessor.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6614) Deprecate DistributedSchedulingProtocol and add required fields directly to ApplicationMasterProtocol

2017-05-16 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-6614:
--
Description: 
The {{DistributedSchedulingProtocol}} was initially designed as a wrapper 
protocol over the {{ApplicaitonMasterProtocol}}.

This JIRA proposes to deprecate the protocol itself and move the extra fields 
of the {{RegisterDistributedSchedulingAMResponse}} and 
{{DistributedSchedulingAllocateResponse}} to the 
{{RegisterApplicationMasterResponse}} and {{AllocateResponse}} respectively.

This will simplify the code quite a bit and make it easier to expose it as a 
preprocessor.

  was:
The {{DistributedSchedulingProtocol}} was initially designed as a wrapper 
protocol over the {{ApplicaitonMasterProtocol}}.

This JIRA proposes to deprecate the protocol itself and move the extra fields 
of the {{RegisterDistributedSchedulingAMResponse}} and 
{{DistributedSchedulingAllocateResponse}} to the 
{{RegisterApplicationMasterResponse}} and {{AllocateResponse}} respectively.

This will simplify the code quite a bit and make it reimplement the feature as 
a preprocessor.


> Deprecate DistributedSchedulingProtocol and add required fields directly to 
> ApplicationMasterProtocol
> -
>
> Key: YARN-6614
> URL: https://issues.apache.org/jira/browse/YARN-6614
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>
> The {{DistributedSchedulingProtocol}} was initially designed as a wrapper 
> protocol over the {{ApplicaitonMasterProtocol}}.
> This JIRA proposes to deprecate the protocol itself and move the extra fields 
> of the {{RegisterDistributedSchedulingAMResponse}} and 
> {{DistributedSchedulingAllocateResponse}} to the 
> {{RegisterApplicationMasterResponse}} and {{AllocateResponse}} respectively.
> This will simplify the code quite a bit and make it easier to expose it as a 
> preprocessor.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6691) Update YARN daemon startup/shutdown scripts to include Router service

2017-06-07 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041645#comment-16041645
 ] 

Arun Suresh commented on YARN-6691:
---

[~giovanni.fumarola], the patch looks generally ok, but since we are 
introducing the router in this patch, you should probably remove all the 
backward compatibility sections.
+1 pending.

> Update YARN daemon startup/shutdown scripts to include Router service
> -
>
> Key: YARN-6691
> URL: https://issues.apache.org/jira/browse/YARN-6691
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-6691-YARN-2915.v1.patch, 
> YARN-6691-YARN-2915.v2.patch
>
>
> YARN-5410 introduce a new YARN service, i.e. Router. This jira proposes to 
> update YARN daemon startup/shutdown scripts to include Router service.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6355) Preprocessor framework for AM and Client interactions with the RM

2017-06-07 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-6355:
--
Attachment: YARN-6355.002.patch

Updating patch with some code cleanup, javadocs and testcases

> Preprocessor framework for AM and Client interactions with the RM
> -
>
> Key: YARN-6355
> URL: https://issues.apache.org/jira/browse/YARN-6355
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>  Labels: amrmproxy, resourcemanager
> Attachments: YARN-6355.001.patch, YARN-6355.002.patch
>
>
> Currently on the NM, we have the {{AMRMProxy}} framework to intercept the AM 
> <-> RM communication and enforce policies. This is used both by YARN 
> federation (YARN-2915) as well as Distributed Scheduling (YARN-2877).
> This JIRA proposes to introduce a similar framework on the the RM side, so 
> that pluggable policies can be enforced on ApplicationMasterService centrally 
> as well.
> This would be similar in spirit to a Java Servlet Filter Chain. Where the 
> order of the interceptors can declared externally.
> Once possible usecase would be:
> the {{OpportunisticContainerAllocatorAMService}} is implemented as a wrapper 
> over the {{ApplicationMasterService}}. It would probably be better to 
> implement it as an Interceptor.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5978) ContainerScheduler and Container state machine changes to support ExecType update

2017-05-31 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh reassigned YARN-5978:
-

Assignee: Arun Suresh

> ContainerScheduler and Container state machine changes to support ExecType 
> update
> -
>
> Key: YARN-5978
> URL: https://issues.apache.org/jira/browse/YARN-5978
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6306) NMClient API change for container upgrade

2017-05-08 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16001549#comment-16001549
 ] 

Arun Suresh commented on YARN-6306:
---

Thanks for the rev [~jianhe]. I agree having non-abstract methods might be 
better.
Will address your comments shortly

> NMClient API change for container upgrade
> -
>
> Key: YARN-6306
> URL: https://issues.apache.org/jira/browse/YARN-6306
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Arun Suresh
> Attachments: YARN-6306.001.patch
>
>
> This JIRA is track the addition of Upgrade API (Re-Initialize, Restart, 
> Rollback and Commit) to the NMClient and NMClientAsync



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2017-05-08 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16001708#comment-16001708
 ] 

Arun Suresh commented on YARN-1197:
---

[~mingma], Even though YARN-6216 renders the feature Scheduler Agnostic, most 
of the unit tests and the testing were done using the CapacityScheduler. It 
would be nice if we had some basic FairScheduler test cases for it as well. 
Maybe we can add them as part of YARN-1655 before closing it.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197_Design.2015.06.24.pdf, 
> YARN-1197_Design.2015.07.07.pdf, YARN-1197_Design.2015.08.21.pdf, 
> YARN-1197_Design.pdf, YARN-1197 old-design-docs-patches-for-reference.zip
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6306) NMClient API change for container upgrade

2017-05-05 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999085#comment-15999085
 ] 

Arun Suresh commented on YARN-6306:
---

[~jianhe], do you want to take a quick scan ? before I post another patch with 
the checkstyle and findbugs fixes ?

> NMClient API change for container upgrade
> -
>
> Key: YARN-6306
> URL: https://issues.apache.org/jira/browse/YARN-6306
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Arun Suresh
> Attachments: YARN-6306.001.patch
>
>
> This JIRA is track the addition of Upgrade API (Re-Initialize, Restart, 
> Rollback and Commit) to the NMClient and NMClientAsync



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6355) Preprocessor framework for AM and Client interactions with the RM

2017-05-24 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-6355:
--
Attachment: YARN-6355.001.patch

Attaching initial version of the patch.
* I think we should stick to the "Interceptor" name, since we need something 
that does both preprocessing of the request and maybe handle the response as 
well - A preprocessor implies that we just pre process the request.
* I have modified the OpportunisticContainerAllocatorAMService to use the 
framework.

> Preprocessor framework for AM and Client interactions with the RM
> -
>
> Key: YARN-6355
> URL: https://issues.apache.org/jira/browse/YARN-6355
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>  Labels: amrmproxy, resourcemanager
> Attachments: YARN-6355.001.patch
>
>
> Currently on the NM, we have the {{AMRMProxy}} framework to intercept the AM 
> <-> RM communication and enforce policies. This is used both by YARN 
> federation (YARN-2915) as well as Distributed Scheduling (YARN-2877).
> This JIRA proposes to introduce a similar framework on the the RM side, so 
> that pluggable policies can be enforced on ApplicationMasterService centrally 
> as well.
> This would be similar in spirit to a Java Servlet Filter Chain. Where the 
> order of the interceptors can declared externally.
> Once possible usecase would be:
> the {{OpportunisticContainerAllocatorAMService}} is implemented as a wrapper 
> over the {{ApplicationMasterService}}. It would probably be better to 
> implement it as an Interceptor.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6692) Delay pause when container is localizing

2017-06-05 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037865#comment-16037865
 ] 

Arun Suresh edited comment on YARN-6692 at 6/5/17 11:53 PM:


This situation cannot really happen, since, after YARN-4597, container queuing 
happens only AFTER the localization completes. Since queuing happens before 
running, and only running containers can be paused, it will not be possible to 
pause a container during localization.

There might be some deployments where the localization of opportunistic 
containers must also be delayed till right before running (essentially, 
localize after queuing - which would be how it was done earlier with the 
QueueingContainerManager). To address these use-cases, we should probably first 
raise a JIRA to introduce a configuration flag, based on which the NM can 
decide if queuing should happen before or after localization, then have this 
JIRA depend on that.



was (Author: asuresh):
This situation cannot really happen, since, after YARN-4597, container queuing 
happens only AFTER the localization completes. Since queuing happens before 
running, and only running containers can be paused, it will not be possible to 
pause a container during localization.

There might be some deployments where the localization of opportunistic 
containers must also be delayed till right before running (essentially, 
localize after queuing - which would be how it was done earlier with the 
QueueingContainerManager). For these situations, we should probably first raise 
a JIRA to introduce a configuration flag, based on which the NM can decide if 
queuing should happen before or after localization, then have this JIRA depend 
on that.


> Delay pause when container is localizing
> 
>
> Key: YARN-6692
> URL: https://issues.apache.org/jira/browse/YARN-6692
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jose Miguel Arreola
>Assignee: Jose Miguel Arreola
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> If a container receives a Pause event while localizing, allow container 
> finish localizing and then pause it



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6692) Delay pause when container is localizing

2017-06-05 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037865#comment-16037865
 ] 

Arun Suresh commented on YARN-6692:
---

This situation cannot really happen, since, after YARN-4597, container queuing 
happens only AFTER the localization completes. Since queuing happens before 
running, and only running containers can be paused, it will not be possible to 
pause a container during localization.

There might be some deployments where the localization of opportunistic 
containers must also be delayed till right before running (essentially, 
localize after queuing - which would be how it was done earlier with the 
QueueingContainerManager). For these situations, we should probably first raise 
a JIRA to introduce a configuration flag, based on which the NM can decide if 
queuing should happen before or after localization, then have this JIRA depend 
on that.


> Delay pause when container is localizing
> 
>
> Key: YARN-6692
> URL: https://issues.apache.org/jira/browse/YARN-6692
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jose Miguel Arreola
>Assignee: Jose Miguel Arreola
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> If a container receives a Pause event while localizing, allow container 
> finish localizing and then pause it



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7185) ContainerScheduler should only look at availableResource for GUARANTEED containers when opportunistic scheduling is enabled

2017-09-11 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162456#comment-16162456
 ] 

Arun Suresh commented on YARN-7185:
---

Thanks for the patch [~wangda],
In general, it LGTM.
Minor nit though: Can you see if you can add the testcase to 
{{TestContainerSchedulerQueuing}} ?
If the setup/teardown is not too complicated.

> ContainerScheduler should only look at availableResource for GUARANTEED 
> containers when opportunistic scheduling is enabled 
> 
>
> Key: YARN-7185
> URL: https://issues.apache.org/jira/browse/YARN-7185
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Tan, Wangda
>Priority: Blocker
> Attachments: YARN-7185.001.patch, YARN-7185.002.patch, 
> YARN-7185.003.patch
>
>
> Found an issue: 
> When DefaultContainerCalculator is enabled and opportunistic container 
> allocation is disabled. It is possible that for a NM:
> {code} 
> Σ(allocated-container.vcores) > nm.configured-vores. 
> {code} 
> When this happens, ContainerScheduler will report errors like:
> bq. ContainerScheduler 
> (ContainerScheduler.java:pickOpportunisticContainersToKill(458)) - There are 
> no sufficient resources to start guaranteed.
> This will be an incompatible change after 2.8 because before YARN-6706, we 
> can start containers when DefaultContainerCalculator is configured and vcores 
> is overallocated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5972) Support Pausing/Freezing of opportunistic containers

2017-09-14 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166607#comment-16166607
 ] 

Arun Suresh commented on YARN-5972:
---

Thanks [~jlowe], cherry-picked the 3 JIRAs and committed

> Support Pausing/Freezing of opportunistic containers
> 
>
> Key: YARN-5972
> URL: https://issues.apache.org/jira/browse/YARN-5972
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: container-pause-resume.pdf
>
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> Instead of preempting a running container, the container can be moved to a 
> PAUSED state, where it remains until resources get freed up on the node then 
> the preempted container can resume to the running state.
> Note that process freezing this is already supported by 'cgroups freezer' 
> which is used internally by the docker pause functionality. Windows also has 
> OS level support of a similar nature.
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt call would 
> pause the VM and resume would restore it back to the running state.
> If the container executor / runtime doesn't support preemption, then preempt 
> would default to killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7199) TestAMRMClientContainerRequest.testOpportunisticAndGuaranteedRequests is failing in trunk

2017-09-15 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168375#comment-16168375
 ] 

Arun Suresh edited comment on YARN-7199 at 9/15/17 7:12 PM:


Thanks for the patch [~botong]. But I think the actual issue is that the 
{{ContainerRequestBuilder#build}} method is not populating the resource profile 
field with the default value {{ProfileCapability.DEFAULT_PROFILE}}. It looks 
the public {{ContainerRequest}} constructors do set the default value correctly 
- unfortunately, the builder uses the private constructor that does not.
The proper fix should be to modify the private constructor to set the value to 
default AND add the resourceProfile() method in the builder, which can be used 
by a client to set it to something else.


was (Author: asuresh):
Thanks for the patch [~botong]. But I think the actual issue is that the 
{{ContainerRequestBuilder#build}} method is not populating the resource profile 
field with the default value {{ProfileCapability.DEFAULT_PROFILE}}. It looks 
the public {{ContainerRequest}} constructors that sets the value correctly - 
unfortunately, the builder uses the private constructor that does not.
The proper fix should be to modify the private constructor to set the value to 
default AND add the resourceProfile() method in the builder, which can be used 
by a client to set it to something else.

> TestAMRMClientContainerRequest.testOpportunisticAndGuaranteedRequests is 
> failing in trunk
> -
>
> Key: YARN-7199
> URL: https://issues.apache.org/jira/browse/YARN-7199
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Botong Huang
>Assignee: Botong Huang
> Attachments: YARN-7199.v1.patch, YARN-7199.v2.patch
>
>
> java.lang.IllegalArgumentException: The profile name cannot be null
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>   at 
> org.apache.hadoop.yarn.api.records.ProfileCapability.newInstance(ProfileCapability.java:68)
>   at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.addContainerRequest(AMRMClientImpl.java:512)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientContainerRequest.testOpportunisticAndGuaranteedRequests(TestAMRMClientContainerRequest.java:59)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7199) TestAMRMClientContainerRequest.testOpportunisticAndGuaranteedRequests is failing in trunk

2017-09-15 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168375#comment-16168375
 ] 

Arun Suresh commented on YARN-7199:
---

Thanks for the patch [~botong]. But I think the actual issue is that the 
{{ContainerRequestBuilder#build}} method is not populating the resource profile 
field with the default value {{ProfileCapability.DEFAULT_PROFILE}}. It looks 
the public {{ContainerRequest}} constructors that sets the value correctly - 
unfortunately, the builder uses the private constructor that does not.
The proper fix should be to modify the private constructor to set the value to 
default AND add the resourceProfile() method in the builder, which can be used 
by a client to set it to something else.

> TestAMRMClientContainerRequest.testOpportunisticAndGuaranteedRequests is 
> failing in trunk
> -
>
> Key: YARN-7199
> URL: https://issues.apache.org/jira/browse/YARN-7199
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Botong Huang
>Assignee: Botong Huang
> Attachments: YARN-7199.v1.patch, YARN-7199.v2.patch
>
>
> java.lang.IllegalArgumentException: The profile name cannot be null
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>   at 
> org.apache.hadoop.yarn.api.records.ProfileCapability.newInstance(ProfileCapability.java:68)
>   at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.addContainerRequest(AMRMClientImpl.java:512)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientContainerRequest.testOpportunisticAndGuaranteedRequests(TestAMRMClientContainerRequest.java:59)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7203) Add container ExecutionType into ContainerReport

2017-09-15 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168397#comment-16168397
 ] 

Arun Suresh commented on YARN-7203:
---

Thanks for the patch [~botong]

Some comments:
* You don't need to create new methods to convert the Exectype in 
{{ContainerReportPBImpl}}. You should just directly use ProtoUtils.convert 
methods.
* In the following snippet, since you want the default value to be GUARANTEED, 
make sure you return GUARANTEED, since out PBImpls are wrappers.
{code}
 public ExecutionType getExecutionType() {
   ContainerReportProtoOrBuilder p = viaProto ? proto : builder;
if (!p.hasExecutionType()) {
  return null;
}
  return convertFromProtoFormat(p.getExecutionType());
}
{code}

> Add container ExecutionType into ContainerReport
> 
>
> Key: YARN-7203
> URL: https://issues.apache.org/jira/browse/YARN-7203
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
> Attachments: YARN-7203.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7192) Add a pluggable StateMachine Listener that is notified of NM Container State changes

2017-09-15 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7192:
--
Attachment: YARN-7192.004.patch

Updating patch

bq. A single null check in a function consumes less CPU cycles than a fetch + 
call and a ret to a virtual function. There are two cases it can cause a 
problem. Code ran very frequently like this may add up and make the overall 
product slower. It creates a precedence that others will follow, and it may 
cause performance issues in other code added later.
To allay your concerns, I have made the noop listener a static field and made 
the 'listener' field final. Both these will ensure that JIT inlines the method 
call to an actual no-op. This should be even better than the if null check. To 
be honest, I really dislike null checks, not just becuase the code looks ugly, 
but all future code that might use the listener will have to re-perform the 
null check. I would've used the jdk 8 optional, but Id like it to backport to 
branch-2 without modification.

Fixed the checckstyles.

> Add a pluggable StateMachine Listener that is notified of NM Container State 
> changes
> 
>
> Key: YARN-7192
> URL: https://issues.apache.org/jira/browse/YARN-7192
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-7192.001.patch, YARN-7192.002.patch, 
> YARN-7192.003.patch, YARN-7192.004.patch
>
>
> This JIRA is to add support for a plugggable class in the NodeManager that is 
> notified of changes to the Container StateMachine state and the events that 
> caused the change.
> The proposal is to modify the basic StateMachine class add support for a hook 
> that is called before and after a transition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7192) Add a pluggable StateMachine Listener that is notified of NM Container State changes

2017-09-14 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7192:
--
Attachment: YARN-7192.003.patch

Updated the patch. Sorry missed including the yarn-defaults-xml change earlier.

bq. I do not see the reason for having NoopStateTransitionListener. Why do not 
we just skip calling in case of null and use null as the noop listener?
Hmm.. it is not technically needed, I just prefer it to having if checks. Do 
you have a strong preference against it ?

bq. What happens if there is no state transition, just a call? If the current 
state is the same as the previous state we still call post transition.
Yup, I figure it should still call both pre and pst right ? The implementation 
should figure out from the states passed in the pre and post that no transition 
has happened.

Fixed everything else.



> Add a pluggable StateMachine Listener that is notified of NM Container State 
> changes
> 
>
> Key: YARN-7192
> URL: https://issues.apache.org/jira/browse/YARN-7192
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-7192.001.patch, YARN-7192.002.patch, 
> YARN-7192.003.patch
>
>
> This JIRA is to add support for a plugggable class in the NodeManager that is 
> notified of changes to the Container StateMachine state and the events that 
> caused the change.
> The proposal is to modify the basic StateMachine class add support for a hook 
> that is called before and after a transition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7192) Add a pluggable StateMachine Listener that is notified of NM Container State changes

2017-09-14 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167282#comment-16167282
 ] 

Arun Suresh commented on YARN-7192:
---

The testcase failure is tracked in YARN-7196. Think the checkstyle issues can 
be ignored.

> Add a pluggable StateMachine Listener that is notified of NM Container State 
> changes
> 
>
> Key: YARN-7192
> URL: https://issues.apache.org/jira/browse/YARN-7192
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-7192.001.patch, YARN-7192.002.patch, 
> YARN-7192.003.patch
>
>
> This JIRA is to add support for a plugggable class in the NodeManager that is 
> notified of changes to the Container StateMachine state and the events that 
> caused the change.
> The proposal is to modify the basic StateMachine class add support for a hook 
> that is called before and after a transition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7192) Add a pluggable StateMachine Listener that is notified of NM Container State changes

2017-09-15 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7192:
--
Attachment: YARN-7192.006.patch

fixing checkstyles

> Add a pluggable StateMachine Listener that is notified of NM Container State 
> changes
> 
>
> Key: YARN-7192
> URL: https://issues.apache.org/jira/browse/YARN-7192
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-7192.001.patch, YARN-7192.002.patch, 
> YARN-7192.003.patch, YARN-7192.004.patch, YARN-7192.005.patch, 
> YARN-7192.006.patch
>
>
> This JIRA is to add support for a plugggable class in the NodeManager that is 
> notified of changes to the Container StateMachine state and the events that 
> caused the change.
> The proposal is to modify the basic StateMachine class add support for a hook 
> that is called before and after a transition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7196) Fix finicky TestContainerManager tests

2017-09-18 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7196:
--
Attachment: YARN-7196.002.patch

Updating patch.

> Fix finicky TestContainerManager tests
> --
>
> Key: YARN-7196
> URL: https://issues.apache.org/jira/browse/YARN-7196
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-7196.002.patch, YARN-7196.patch
>
>
> The Testcase {{testContainerUpdateExecTypeGuaranteedToOpportunistic}} seem to 
> fail every once in a while. Maybe have to change the way the event is 
> triggered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7199) TestAMRMClientContainerRequest.testOpportunisticAndGuaranteedRequests is failing in trunk

2017-09-18 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170423#comment-16170423
 ] 

Arun Suresh commented on YARN-7199:
---

+1

> TestAMRMClientContainerRequest.testOpportunisticAndGuaranteedRequests is 
> failing in trunk
> -
>
> Key: YARN-7199
> URL: https://issues.apache.org/jira/browse/YARN-7199
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Blocker
> Attachments: YARN-7199.v1.patch, YARN-7199.v2.patch, 
> YARN-7199.v3.patch
>
>
> java.lang.IllegalArgumentException: The profile name cannot be null
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>   at 
> org.apache.hadoop.yarn.api.records.ProfileCapability.newInstance(ProfileCapability.java:68)
>   at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.addContainerRequest(AMRMClientImpl.java:512)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientContainerRequest.testOpportunisticAndGuaranteedRequests(TestAMRMClientContainerRequest.java:59)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7203) Add container ExecutionType into ContainerReport

2017-09-18 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170425#comment-16170425
 ] 

Arun Suresh commented on YARN-7203:
---

+1, will commit this shortly

> Add container ExecutionType into ContainerReport
> 
>
> Key: YARN-7203
> URL: https://issues.apache.org/jira/browse/YARN-7203
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
> Attachments: YARN-7203.v1.patch, YARN-7203.v2.patch, 
> YARN-7203.v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7199) Fix TestAMRMClientContainerRequest.testOpportunisticAndGuaranteedRequests

2017-09-18 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7199:
--
Summary: Fix 
TestAMRMClientContainerRequest.testOpportunisticAndGuaranteedRequests  (was: 
TestAMRMClientContainerRequest.testOpportunisticAndGuaranteedRequests is 
failing in trunk)

> Fix TestAMRMClientContainerRequest.testOpportunisticAndGuaranteedRequests
> -
>
> Key: YARN-7199
> URL: https://issues.apache.org/jira/browse/YARN-7199
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Blocker
> Attachments: YARN-7199.v1.patch, YARN-7199.v2.patch, 
> YARN-7199.v3.patch
>
>
> java.lang.IllegalArgumentException: The profile name cannot be null
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>   at 
> org.apache.hadoop.yarn.api.records.ProfileCapability.newInstance(ProfileCapability.java:68)
>   at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.addContainerRequest(AMRMClientImpl.java:512)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientContainerRequest.testOpportunisticAndGuaranteedRequests(TestAMRMClientContainerRequest.java:59)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7203) Add container ExecutionType into ContainerReport

2017-09-18 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170529#comment-16170529
 ] 

Arun Suresh commented on YARN-7203:
---

Thanks for the update [~botong].
The latest patch lgtm.
With commit pending jenkins

> Add container ExecutionType into ContainerReport
> 
>
> Key: YARN-7203
> URL: https://issues.apache.org/jira/browse/YARN-7203
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
> Attachments: YARN-7203.v1.patch, YARN-7203.v2.patch, 
> YARN-7203.v3.patch, YARN-7203.v4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7196) Fix finicky TestContainerManager tests

2017-09-18 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170313#comment-16170313
 ] 

Arun Suresh commented on YARN-7196:
---

[~djp], do you mind if I take this up ? are you fine with 
[this|https://issues.apache.org/jira/browse/YARN-7196?focusedCommentId=16168638=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16168638]
 ?

> Fix finicky TestContainerManager tests
> --
>
> Key: YARN-7196
> URL: https://issues.apache.org/jira/browse/YARN-7196
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
> Attachments: YARN-7196.patch
>
>
> The Testcase {{testContainerUpdateExecTypeGuaranteedToOpportunistic}} seem to 
> fail every once in a while. Maybe have to change the way the event is 
> triggered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7196) Fix finicky TestContainerManager tests

2017-09-18 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh reassigned YARN-7196:
-

Assignee: Arun Suresh

> Fix finicky TestContainerManager tests
> --
>
> Key: YARN-7196
> URL: https://issues.apache.org/jira/browse/YARN-7196
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-7196.patch
>
>
> The Testcase {{testContainerUpdateExecTypeGuaranteedToOpportunistic}} seem to 
> fail every once in a while. Maybe have to change the way the event is 
> triggered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7199) Fix TestAMRMClientContainerRequest.testOpportunisticAndGuaranteedRequests

2017-09-18 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7199:
--
Fix Version/s: 3.1.0

> Fix TestAMRMClientContainerRequest.testOpportunisticAndGuaranteedRequests
> -
>
> Key: YARN-7199
> URL: https://issues.apache.org/jira/browse/YARN-7199
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Blocker
> Fix For: 3.1.0
>
> Attachments: YARN-7199.v1.patch, YARN-7199.v2.patch, 
> YARN-7199.v3.patch
>
>
> java.lang.IllegalArgumentException: The profile name cannot be null
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>   at 
> org.apache.hadoop.yarn.api.records.ProfileCapability.newInstance(ProfileCapability.java:68)
>   at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.addContainerRequest(AMRMClientImpl.java:512)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientContainerRequest.testOpportunisticAndGuaranteedRequests(TestAMRMClientContainerRequest.java:59)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7203) Add container ExecutionType into ContainerReport

2017-09-18 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170473#comment-16170473
 ] 

Arun Suresh commented on YARN-7203:
---

Actually, [~botong], just realized that we havn't added any new tests. Can you 
update {{TestYarnClient}} ?

> Add container ExecutionType into ContainerReport
> 
>
> Key: YARN-7203
> URL: https://issues.apache.org/jira/browse/YARN-7203
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
> Attachments: YARN-7203.v1.patch, YARN-7203.v2.patch, 
> YARN-7203.v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7196) Fix finicky TestContainerManager tests

2017-09-18 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170410#comment-16170410
 ] 

Arun Suresh commented on YARN-7196:
---

I noticed that {{testPromotionOfOpportunisticContainers}} does almost exactly 
what {{testContainerUpdateExecTypeOpportunisticToGuaranteed}} does, barring an 
extra Assert. So I moved the assert over to 
{{testPromotionOfOpportunisticContainers}} and removed that test.

> Fix finicky TestContainerManager tests
> --
>
> Key: YARN-7196
> URL: https://issues.apache.org/jira/browse/YARN-7196
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-7196.002.patch, YARN-7196.patch
>
>
> The Testcase {{testContainerUpdateExecTypeGuaranteedToOpportunistic}} seem to 
> fail every once in a while. Maybe have to change the way the event is 
> triggered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7196) Fix finicky TestContainerManager tests

2017-09-18 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170788#comment-16170788
 ] 

Arun Suresh commented on YARN-7196:
---

[~djp], let me know if the latest patch is fine (in which case I will fix the 
unused imports while committing).

> Fix finicky TestContainerManager tests
> --
>
> Key: YARN-7196
> URL: https://issues.apache.org/jira/browse/YARN-7196
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-7196.002.patch, YARN-7196.patch
>
>
> The Testcase {{testContainerUpdateExecTypeGuaranteedToOpportunistic}} seem to 
> fail every once in a while. Maybe have to change the way the event is 
> triggered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7203) Add container ExecutionType into ContainerReport

2017-09-18 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170857#comment-16170857
 ] 

Arun Suresh commented on YARN-7203:
---

Committing this shortly. The test failure is unrelated and the checkstyle issue 
is unavoidable.

> Add container ExecutionType into ContainerReport
> 
>
> Key: YARN-7203
> URL: https://issues.apache.org/jira/browse/YARN-7203
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
> Attachments: YARN-7203.v1.patch, YARN-7203.v2.patch, 
> YARN-7203.v3.patch, YARN-7203.v4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7196) Fix finicky TestContainerManager tests

2017-09-19 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172636#comment-16172636
 ] 

Arun Suresh commented on YARN-7196:
---

Thanks for the review and commit [~djp]

> Fix finicky TestContainerManager tests
> --
>
> Key: YARN-7196
> URL: https://issues.apache.org/jira/browse/YARN-7196
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Fix For: 2.9.0, 3.0.0-beta1, 3.1.0
>
> Attachments: YARN-7196.002.patch, YARN-7196.patch
>
>
> The Testcase {{testContainerUpdateExecTypeGuaranteedToOpportunistic}} seem to 
> fail every once in a while. Maybe have to change the way the event is 
> triggered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7196) Fix finicky TestContainerManager tests

2017-09-19 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172518#comment-16172518
 ] 

Arun Suresh commented on YARN-7196:
---

[~djp] / [~wangda], what do u think of the latest patch ?

> Fix finicky TestContainerManager tests
> --
>
> Key: YARN-7196
> URL: https://issues.apache.org/jira/browse/YARN-7196
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-7196.002.patch, YARN-7196.patch
>
>
> The Testcase {{testContainerUpdateExecTypeGuaranteedToOpportunistic}} seem to 
> fail every once in a while. Maybe have to change the way the event is 
> triggered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7240) Add more states and transitions to stabilize the NM Container state machine

2017-09-21 Thread Arun Suresh (JIRA)
Arun Suresh created YARN-7240:
-

 Summary: Add more states and transitions to stabilize the NM 
Container state machine
 Key: YARN-7240
 URL: https://issues.apache.org/jira/browse/YARN-7240
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Arun Suresh
Assignee: kartheek muthyala


There seem to be a few intermediate states that can be added to improve the 
stability of the NM container state machine.

For. eg:
* The REINITIALIZING should probably be split into REINITIALIZING and 
REINITIALIZING_AWAITING_KILL. 
* Container updates are currently handled in the ContainerScheduler, but it 
would probably be better to have it plumbed through the container state machine 
as a new state, say UPDATING and a new container event.

The plan is to add some extra tests too to try and test every transition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4511) Common scheduler changes supporting scheduler-specific implementations

2017-09-22 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16177369#comment-16177369
 ] 

Arun Suresh commented on YARN-4511:
---

Would like to take a closer look at this at well - should be able to give it a 
rev over the weekend or early next week.

> Common scheduler changes supporting scheduler-specific implementations
> --
>
> Key: YARN-4511
> URL: https://issues.apache.org/jira/browse/YARN-4511
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Haibo Chen
> Attachments: YARN-4511-YARN-1011.00.patch, 
> YARN-4511-YARN-1011.01.patch, YARN-4511-YARN-1011.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7219) Fix AllocateRequestProto difference between branch-2/branch-2.8 and trunk

2017-09-20 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173450#comment-16173450
 ] 

Arun Suresh commented on YARN-7219:
---

Thanks for raising this [~rchiang]. Yeah lets update everything to 7.

> Fix AllocateRequestProto difference between branch-2/branch-2.8 and trunk
> -
>
> Key: YARN-7219
> URL: https://issues.apache.org/jira/browse/YARN-7219
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Priority: Critical
>
> For yarn_service_protos.proto, we have the following code in
> (branch-2.8.0, branch-2.8, branch-2)
> {noformat}
> message AllocateRequestProto {
>   repeated ResourceRequestProto ask = 1;
>   repeated ContainerIdProto release = 2;
>   optional ResourceBlacklistRequestProto blacklist_request = 3;
>   optional int32 response_id = 4;
>   optional float progress = 5;
>   repeated ContainerResourceIncreaseRequestProto increase_request = 6;
>   repeated UpdateContainerRequestProto update_requests = 7;
> }
> {noformat}
> For yarn_service_protos.proto, we have the following code in
> (trunk)
> {noformat}
> message AllocateRequestProto {
>   repeated ResourceRequestProto ask = 1;
>   repeated ContainerIdProto release = 2;
>   optional ResourceBlacklistRequestProto blacklist_request = 3;
>   optional int32 response_id = 4;
>   optional float progress = 5;
>   repeated UpdateContainerRequestProto update_requests = 6;
> }
> {noformat}
> Notes
> * YARN-3866 was the original JIRA for container resizing.
> * YARN-5221 is what introduced the incompatible change.
> * In branch-2/branch-2.8/branch-2.8.0, this protobuf change was undone by 
> "Addendum patch to YARN-3866: fix incompatible API change."
> * There was a similar API fix done in YARN-6071.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7229) Add the metric for size of event queue in AsyncDispatcher

2017-09-20 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173773#comment-16173773
 ] 

Arun Suresh commented on YARN-7229:
---

+1 to this. Additionally, in our production clusters, we also maintain a count 
of each event type that has yet to be dispatched to a handler. We spit this to 
a dedicated log when the overall count goes too high.

> Add the metric for size of event queue in AsyncDispatcher
> -
>
> Key: YARN-7229
> URL: https://issues.apache.org/jira/browse/YARN-7229
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, nodemanager, resourcemanager
>Affects Versions: 3.1.0
>Reporter: Yufei Gu
>
> The size of event queue in AsyncDispatcher is a good point to monitor daemon 
> performance. Let's make it a RM metric.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7192) Add a pluggable StateMachine Listener that is notified of NM Container State changes

2017-09-15 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168473#comment-16168473
 ] 

Arun Suresh commented on YARN-7192:
---

Thanks for the review [~miklos.szeg...@cloudera.com]..
[~jlowe], are you fine with the latest patch ?

> Add a pluggable StateMachine Listener that is notified of NM Container State 
> changes
> 
>
> Key: YARN-7192
> URL: https://issues.apache.org/jira/browse/YARN-7192
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-7192.001.patch, YARN-7192.002.patch, 
> YARN-7192.003.patch, YARN-7192.004.patch
>
>
> This JIRA is to add support for a plugggable class in the NodeManager that is 
> notified of changes to the Container StateMachine state and the events that 
> caused the change.
> The proposal is to modify the basic StateMachine class add support for a hook 
> that is called before and after a transition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7206) IllegalArgumentException in AMRMClient

2017-09-15 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168461#comment-16168461
 ] 

Arun Suresh commented on YARN-7206:
---

This is already being tracked here : YARN-7199

> IllegalArgumentException in  AMRMClient 
> 
>
> Key: YARN-7206
> URL: https://issues.apache.org/jira/browse/YARN-7206
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Jian He
>
> {code}
> java.lang.IllegalArgumentException: The profile name cannot be null
>   at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>   at 
> org.apache.hadoop.yarn.api.records.ProfileCapability.newInstance(ProfileCapability.java:68)
>   at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.addContainerRequest(AMRMClientImpl.java:512)
>   at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.addContainerRequest(AMRMClientAsyncImpl.java:194)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7196) Fix finicky TestContainerManager tests

2017-09-15 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168638#comment-16168638
 ] 

Arun Suresh commented on YARN-7196:
---

Thanks for looking in to it and the patch [~djp],
I think a better fix might be to just move both the 
{{testContainerUpdateExecTypeOpportunisticToGuaranteed()}} and 
{{testContainerUpdateExecTypeGuaranteedToOpportunistic}} testcases to the 
{{TestContainerSchedulerQueuing}} test class (it is very similar to the 
{{TestContainerManager}} but it has a setup with the queue length > 0 already). 
Then we can just remove the setup from the {{TestContainerManager}}.


> Fix finicky TestContainerManager tests
> --
>
> Key: YARN-7196
> URL: https://issues.apache.org/jira/browse/YARN-7196
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
> Attachments: YARN-7196.patch
>
>
> The Testcase {{testContainerUpdateExecTypeGuaranteedToOpportunistic}} seem to 
> fail every once in a while. Maybe have to change the way the event is 
> triggered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7192) Add a pluggable StateMachine Listener that is notified of NM Container State changes

2017-09-13 Thread Arun Suresh (JIRA)
Arun Suresh created YARN-7192:
-

 Summary: Add a pluggable StateMachine Listener that is notified of 
NM Container State changes
 Key: YARN-7192
 URL: https://issues.apache.org/jira/browse/YARN-7192
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Arun Suresh
Assignee: Arun Suresh


This JIRA is to add support for a plugggable class in the NodeManager that is 
notified of changes to the Container StateMachine state and the events that 
caused the change.

The proposal is to modify the basic StateMachine class add support for a hook 
that is called before and after a transition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7192) Add a pluggable StateMachine Listener that is notified of NM Container State changes

2017-09-13 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7192:
--
Attachment: YARN-7192.001.patch

Attaching initial patch

> Add a pluggable StateMachine Listener that is notified of NM Container State 
> changes
> 
>
> Key: YARN-7192
> URL: https://issues.apache.org/jira/browse/YARN-7192
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-7192.001.patch
>
>
> This JIRA is to add support for a plugggable class in the NodeManager that is 
> notified of changes to the Container StateMachine state and the events that 
> caused the change.
> The proposal is to modify the basic StateMachine class add support for a hook 
> that is called before and after a transition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6838) Add support to LinuxContainerExecutor to support container PAUSE

2017-09-14 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-6838:
--
Target Version/s: 3.1.0

> Add support to LinuxContainerExecutor to support container PAUSE
> 
>
> Key: YARN-6838
> URL: https://issues.apache.org/jira/browse/YARN-6838
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>
> This JIRA tracks the changes needed to the {{LinuxContainerExecutor}},  
> {{LinuxContainerRuntime}}, {{DockerLinuxContainerRuntime}} and the 
> {{container-executor}} linux binary to support container PAUSE using cgroups 
> freezer module



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7196) Fix finicky TestContainerManager tests

2017-09-14 Thread Arun Suresh (JIRA)
Arun Suresh created YARN-7196:
-

 Summary: Fix finicky TestContainerManager tests
 Key: YARN-7196
 URL: https://issues.apache.org/jira/browse/YARN-7196
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: The Testcase 
{{testContainerUpdateExecTypeGuaranteedToOpportunistic}} seem to fail every 
once in a while. Maybe have to change the way the event is triggered.
Reporter: Arun Suresh






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7009) TestNMClient.testNMClientNoCleanupOnStop is flaky by design

2017-09-14 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167076#comment-16167076
 ] 

Arun Suresh commented on YARN-7009:
---

Thanks for the patch [~miklos.szeg...@cloudera.com].
I was thinking, instead of explicitly making a check in the ContainerImpl 
constructor, if we were to integrate with YARN-7192, for the testcases, you can 
just probably override the {{createNMContext()}} or maybe even plugin an 
implementation of the NMcontext that returns a decorated Listener.

> TestNMClient.testNMClientNoCleanupOnStop is flaky by design
> ---
>
> Key: YARN-7009
> URL: https://issues.apache.org/jira/browse/YARN-7009
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-7009.000.patch, YARN-7009.001.patch, 
> YARN-7009.002.patch
>
>
> The sleeps to wait for a transition to reinit and than back to running is not 
> long enough, it can miss the reinit event.
> {code}
> java.lang.AssertionError: Exception is not expected: 
> org.apache.hadoop.yarn.exceptions.YarnException: Cannot perform RE_INIT on 
> [container_1502735389852_0001_01_01]. Current state is [REINITIALIZING, 
> isReInitializing=true].
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.preReInitializeOrLocalizeCheck(ContainerManagerImpl.java:1772)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.reInitializeContainer(ContainerManagerImpl.java:1697)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.reInitializeContainer(ContainerManagerImpl.java:1668)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.reInitializeContainer(ContainerManagementProtocolPBServiceImpl.java:214)
>   at 
> org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:237)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testReInitializeContainer(TestNMClient.java:567)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:405)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClientNoCleanupOnStop(TestNMClient.java:214)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Cannot perform 
> RE_INIT on [container_1502735389852_0001_01_01]. Current state is 
> [REINITIALIZING, isReInitializing=true].
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.preReInitializeOrLocalizeCheck(ContainerManagerImpl.java:1772)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.reInitializeContainer(ContainerManagerImpl.java:1697)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.reInitializeContainer(ContainerManagerImpl.java:1668)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.reInitializeContainer(ContainerManagementProtocolPBServiceImpl.java:214)
>   at 
> org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:237)
>   at 
> 

[jira] [Commented] (YARN-7192) Add a pluggable StateMachine Listener that is notified of NM Container State changes

2017-09-14 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167028#comment-16167028
 ] 

Arun Suresh commented on YARN-7192:
---

Thanks for the detailed review [~jlowe].
bq. Would it make sense for the listener API to directly support multiple 
listeners?
While I agree, this would be useful, and yes, we do have Multi handlers etc., I 
was thinking of keeping it simple (for many of the situations you pointed out 
:)) initially to solve our particular use-case. Is it OK if we do this in a 
follow up ?
I agree with the rest of your suggestions. Will upload a patch shortly 
addressing them.

[~miklos.szeg...@cloudera.com], thanks for taking a look, and for raising 
YARN-7009. I agree it is a similar approach. Incidentally, I had initially 
started with a decorator approach, but given that I wanted to expose this as an 
external user interface, I decided against it as I did not want the client to 
access the internal state machine. Furthermore, I had wanted an explicit 
*before* and *after* hook.

bq. My concern is that it might cause performance/concurrency issues to call 
listeners synchronously from the state machine.
I understand the state machine transitions are synchronous w.r.t the internal 
statemachine and the listener hooks are called within the same synchronized 
context. But I agree, a listener  implementation can possibly make other 
synchronous calls and block things (but the current Transitions can do the same 
too). I am thinking we document the fact so users do not shoot themselves in 
the foot. Thoughts ?

> Add a pluggable StateMachine Listener that is notified of NM Container State 
> changes
> 
>
> Key: YARN-7192
> URL: https://issues.apache.org/jira/browse/YARN-7192
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-7192.001.patch
>
>
> This JIRA is to add support for a plugggable class in the NodeManager that is 
> notified of changes to the Container StateMachine state and the events that 
> caused the change.
> The proposal is to modify the basic StateMachine class add support for a hook 
> that is called before and after a transition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7192) Add a pluggable StateMachine Listener that is notified of NM Container State changes

2017-09-14 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7192:
--
Attachment: YARN-7192.002.patch

Updated the patch based on suggestions - and fixed the testcases / checkstyle 
and added some javadocs.

[~roniburd], I like the Async / Sync idea, but given that the async listener 
might need some extra coding - and other considerations like creating a 
dispatcher and handler (we can mayb reuse the AsyncDispatcher for eg.) - lets 
move that to another improvement JIRA. 

> Add a pluggable StateMachine Listener that is notified of NM Container State 
> changes
> 
>
> Key: YARN-7192
> URL: https://issues.apache.org/jira/browse/YARN-7192
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-7192.001.patch, YARN-7192.002.patch
>
>
> This JIRA is to add support for a plugggable class in the NodeManager that is 
> notified of changes to the Container StateMachine state and the events that 
> caused the change.
> The proposal is to modify the basic StateMachine class add support for a hook 
> that is called before and after a transition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6623) Add support to turn off launching privileged containers in the container-executor

2017-09-22 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176435#comment-16176435
 ] 

Arun Suresh commented on YARN-6623:
---

Thanks for the heads up [~djp].. Kindly do push this into branch-2 ass well

> Add support to turn off launching privileged containers in the 
> container-executor
> -
>
> Key: YARN-6623
> URL: https://issues.apache.org/jira/browse/YARN-6623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
> Attachments: YARN-6623.001.patch, YARN-6623.002.patch, 
> YARN-6623.003.patch, YARN-6623.004.patch, YARN-6623.005.patch, 
> YARN-6623.006.patch, YARN-6623.007.patch, YARN-6623.008.patch, 
> YARN-6623.009.patch, YARN-6623.010.patch
>
>
> Currently, launching privileged containers is controlled by the NM. We should 
> add a flag to the container-executor.cfg allowing admins to disable launching 
> privileged containers at the container-executor level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6595) [API] Add Placement Constraints at the application level

2017-09-21 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175795#comment-16175795
 ] 

Arun Suresh commented on YARN-6595:
---

[~kkaranasos], do give the latest patch a quick look.

> [API] Add Placement Constraints at the application level
> 
>
> Key: YARN-6595
> URL: https://issues.apache.org/jira/browse/YARN-6595
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Attachments: YARN-6595-YARN-6592.001.patch
>
>
> This JIRA allows placement constraints to be specified at the application 
> level.
> This will be used for placement constraints between different components of 
> the application.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7009) TestNMClient.testNMClientNoCleanupOnStop is flaky by design

2017-09-21 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175131#comment-16175131
 ] 

Arun Suresh commented on YARN-7009:
---

Thanks for the update [~miklos.szeg...@cloudera.com],

Can we move the {{DebugSumContainerStateListener}} out of the {{NodeManager}} 
class and somewhere into the test package ? I like the use of the lambda 
expression - but we must ensure to replace it for the 2.x patch.

> TestNMClient.testNMClientNoCleanupOnStop is flaky by design
> ---
>
> Key: YARN-7009
> URL: https://issues.apache.org/jira/browse/YARN-7009
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-7009.000.patch, YARN-7009.001.patch, 
> YARN-7009.002.patch, YARN-7009.003.patch
>
>
> The sleeps to wait for a transition to reinit and than back to running is not 
> long enough, it can miss the reinit event.
> {code}
> java.lang.AssertionError: Exception is not expected: 
> org.apache.hadoop.yarn.exceptions.YarnException: Cannot perform RE_INIT on 
> [container_1502735389852_0001_01_01]. Current state is [REINITIALIZING, 
> isReInitializing=true].
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.preReInitializeOrLocalizeCheck(ContainerManagerImpl.java:1772)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.reInitializeContainer(ContainerManagerImpl.java:1697)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.reInitializeContainer(ContainerManagerImpl.java:1668)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.reInitializeContainer(ContainerManagementProtocolPBServiceImpl.java:214)
>   at 
> org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:237)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testReInitializeContainer(TestNMClient.java:567)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:405)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClientNoCleanupOnStop(TestNMClient.java:214)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Cannot perform 
> RE_INIT on [container_1502735389852_0001_01_01]. Current state is 
> [REINITIALIZING, isReInitializing=true].
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.preReInitializeOrLocalizeCheck(ContainerManagerImpl.java:1772)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.reInitializeContainer(ContainerManagerImpl.java:1697)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.reInitializeContainer(ContainerManagerImpl.java:1668)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.reInitializeContainer(ContainerManagementProtocolPBServiceImpl.java:214)
>   at 
> org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:237)
>   at 
> 

[jira] [Updated] (YARN-7196) Fix finicky TestContainerManager tests

2017-09-15 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7196:
--
Description: 
The Testcase {{testContainerUpdateExecTypeGuaranteedToOpportunistic}} seem to 
fail every once in a while. Maybe have to change the way the event is triggered.


> Fix finicky TestContainerManager tests
> --
>
> Key: YARN-7196
> URL: https://issues.apache.org/jira/browse/YARN-7196
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>
> The Testcase {{testContainerUpdateExecTypeGuaranteedToOpportunistic}} seem to 
> fail every once in a while. Maybe have to change the way the event is 
> triggered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7196) Fix finicky TestContainerManager tests

2017-09-15 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7196:
--
Environment: (was: The Testcase 
{{testContainerUpdateExecTypeGuaranteedToOpportunistic}} seem to fail every 
once in a while. Maybe have to change the way the event is triggered.)

> Fix finicky TestContainerManager tests
> --
>
> Key: YARN-7196
> URL: https://issues.apache.org/jira/browse/YARN-7196
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7192) Add a pluggable StateMachine Listener that is notified of NM Container State changes

2017-09-15 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7192:
--
Attachment: YARN-7192.005.patch

Thanks [~jlowe] for the rev.
Updating the patch (005) with a {{MultiStateTranstitionListener}} as well :)
I also updated the NodeManager config to take a list of listeners, rather than 
just one.
Also updated the testcases to verify the wiring works.
Do take a look.

Caveat - This is similar to the {{AsyncDispatcher::MultListenerHandler}} and 
calls the listeners in order. And like you noted, an earlier one can screw up a 
later one etc. I have put in a javadoc comment stating the same.

> Add a pluggable StateMachine Listener that is notified of NM Container State 
> changes
> 
>
> Key: YARN-7192
> URL: https://issues.apache.org/jira/browse/YARN-7192
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-7192.001.patch, YARN-7192.002.patch, 
> YARN-7192.003.patch, YARN-7192.004.patch, YARN-7192.005.patch
>
>
> This JIRA is to add support for a plugggable class in the NodeManager that is 
> notified of changes to the Container StateMachine state and the events that 
> caused the change.
> The proposal is to modify the basic StateMachine class add support for a hook 
> that is called before and after a transition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7275) NM Statestore cleanup for Container updates

2017-10-08 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196102#comment-16196102
 ] 

Arun Suresh commented on YARN-7275:
---

Thanks for the patch [~kartheek]

Couple of comments:
* We can also avoid explicitly having to write the container version key. Since 
the container token also has the version. Thus, during recovery, you should 
also obtain the version for the token set the 'rcs.version' when you see a 
container update token.
* Once the containers are recovered, we also need to update the 
ContainerScheduler data structures. Create a {{recoverActiveContainer}} 
container method in the ContainerScheduler which takes a recovered container 
and places it in the queue (if it is queued or paused) or puts it in the 
runningContainers it is running, then call that method in 
{{ContainerManagerImpl::recoverActiveContainer}} right after you add the 
container to the context.
* Unfortunately, we do need to worry a bit about rollback. Essentially, if 
2.8.x is upgraded to 2.9.x or 3.0.0 and then rolled back. To this extent, I 
guess all you need to do is: If it is a resource change, in addition to storing 
the container update token, use the old resource update key and store the 
changed resource also. During recovery, if you see a resource change key, we 
just ignore it.

> NM Statestore cleanup for Container updates
> ---
>
> Key: YARN-7275
> URL: https://issues.apache.org/jira/browse/YARN-7275
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: kartheek muthyala
>Priority: Blocker
> Attachments: YARN-7275.001.patch, YARN-7275.002.patch
>
>
> Currently, only resource updates are recorded in the NM state store, we need 
> to add ExecutionType updates as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5978) ContainerScheduler and ContainerManager changes to support ExecType update

2017-10-04 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190934#comment-16190934
 ] 

Arun Suresh commented on YARN-5978:
---

Thanks for following up [~ajisakaa].. I had ignored the test since that 
specific test never runs for me locally...

> ContainerScheduler and ContainerManager changes to support ExecType update
> --
>
> Key: YARN-5978
> URL: https://issues.apache.org/jira/browse/YARN-5978
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: kartheek muthyala
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: YARN-5978.001.patch, YARN-5978.002.patch, 
> YARN-5978.003.patch
>
>
> ContainerScheduler should support updateContainer API for
> - Container Resource update
> - ExecType update that can change an opportunistic to guaranteed and 
> vice-versa
> Adding a new ContainerState event, UpdateContainerStateEvent to support 
> UPDATE_CONTAINER call from RM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7044) TestContainerAllocation#testAMContainerAllocationWhenDNSUnavailable fails on trunk

2017-10-04 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190932#comment-16190932
 ] 

Arun Suresh commented on YARN-7044:
---

Nice catch [~ajisakaa] !!
The patch looks good. +1 pending jenkins

> TestContainerAllocation#testAMContainerAllocationWhenDNSUnavailable fails on 
> trunk
> --
>
> Key: YARN-7044
> URL: https://issues.apache.org/jira/browse/YARN-7044
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, test
>Affects Versions: 3.0.0-beta1
>Reporter: Wangda Tan
>Assignee: Akira Ajisaka
> Attachments: YARN-7044.001.patch
>
>
> {code}
> Failed
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable
> Failing for the past 2 builds (Since Failed#16961 )
> Took 30 sec.
> Error Message
> test timed out after 3 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 3 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable(TestContainerAllocation.java:330)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7258) Add Node and Rack Hints to Opportunistic Scheduler

2017-10-04 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16191701#comment-16191701
 ] 

Arun Suresh commented on YARN-7258:
---

Thanks for the patch [~kartheek]
Some comments:
* It looks generally good. We might need to add some more tests - to 
differentiate between when we have requests with numcontainers > 1 and those 
with numcontainers == 1.
* Also, looks like we might hit a Concurrent modification exception, when we 
remove the scheduler keys from the outstanding opportunistic requests.

> Add Node and Rack Hints to Opportunistic Scheduler
> --
>
> Key: YARN-7258
> URL: https://issues.apache.org/jira/browse/YARN-7258
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: kartheek muthyala
> Attachments: YARN-7258.001.patch
>
>
> Currently, the Opportunistic Scheduler ignores the node and rack information 
> and allocates strictly on the least loaded node (based on queue length) at 
> the time it received the request. This JIRA is to track changes needed to 
> allow the OpportunisticContainerAllocator to take the node/rack name as hints.
> The flow would be:
> # If requested node found in the top K leastLoaded nodes, allocate on that 
> node
> # Else, allocate on least loaded node on the same rack from the top K least 
> Loaded nodes.
> # Else, allocate on least loaded node.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7219) Fix AllocateRequestProto difference between branch-2/branch-2.8 and trunk

2017-10-03 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190357#comment-16190357
 ] 

Arun Suresh commented on YARN-7219:
---

+1

> Fix AllocateRequestProto difference between branch-2/branch-2.8 and trunk
> -
>
> Key: YARN-7219
> URL: https://issues.apache.org/jira/browse/YARN-7219
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Critical
> Attachments: YARN-7219.001.patch
>
>
> For yarn_service_protos.proto, we have the following code in
> (branch-2.8.0, branch-2.8, branch-2)
> {noformat}
> message AllocateRequestProto {
>   repeated ResourceRequestProto ask = 1;
>   repeated ContainerIdProto release = 2;
>   optional ResourceBlacklistRequestProto blacklist_request = 3;
>   optional int32 response_id = 4;
>   optional float progress = 5;
>   repeated ContainerResourceIncreaseRequestProto increase_request = 6;
>   repeated UpdateContainerRequestProto update_requests = 7;
> }
> {noformat}
> For yarn_service_protos.proto, we have the following code in
> (trunk)
> {noformat}
> message AllocateRequestProto {
>   repeated ResourceRequestProto ask = 1;
>   repeated ContainerIdProto release = 2;
>   optional ResourceBlacklistRequestProto blacklist_request = 3;
>   optional int32 response_id = 4;
>   optional float progress = 5;
>   repeated UpdateContainerRequestProto update_requests = 6;
> }
> {noformat}
> Notes
> * YARN-3866 was the original JIRA for container resizing.
> * YARN-5221 is what introduced the incompatible change.
> * In branch-2/branch-2.8/branch-2.8.0, this protobuf change was undone by 
> "Addendum patch to YARN-3866: fix incompatible API change."
> * There was a similar API fix done in YARN-6071.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7299) TestDistributedScheduler is failing

2017-10-09 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197260#comment-16197260
 ] 

Arun Suresh commented on YARN-7299:
---

Thanks for raising this [~jlowe]
Yeah... we changed the behavior a bit.. unfortunately, looks like this was not 
run since it belongs to a different package. We will provide a patch shortly.


> TestDistributedScheduler is failing
> ---
>
> Key: YARN-7299
> URL: https://issues.apache.org/jira/browse/YARN-7299
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jason Lowe
>
> TestDistributedScheduler has been failing consistently in trunk:
> {noformat}
> Running 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.TestDistributedScheduler
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.75 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.TestDistributedScheduler
> testDistributedScheduler(org.apache.hadoop.yarn.server.nodemanager.scheduler.TestDistributedScheduler)
>   Time elapsed: 0.67 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<4> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.TestDistributedScheduler.testDistributedScheduler(TestDistributedScheduler.java:118)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7258) Add Node and Rack Hints to Opportunistic Scheduler

2017-10-05 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7258:
--
Fix Version/s: 2.9.0

> Add Node and Rack Hints to Opportunistic Scheduler
> --
>
> Key: YARN-7258
> URL: https://issues.apache.org/jira/browse/YARN-7258
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: kartheek muthyala
> Fix For: 2.9.0, 3.0.0, 3.1.0
>
> Attachments: YARN-7258.001.patch, YARN-7258.002.patch, 
> YARN-7258.003.patch, YARN-7258.004.patch
>
>
> Currently, the Opportunistic Scheduler ignores the node and rack information 
> and allocates strictly on the least loaded node (based on queue length) at 
> the time it received the request. This JIRA is to track changes needed to 
> allow the OpportunisticContainerAllocator to take the node/rack name as hints.
> The flow would be:
> # If requested node found in the top K leastLoaded nodes, allocate on that 
> node
> # Else, allocate on least loaded node on the same rack from the top K least 
> Loaded nodes.
> # Else, allocate on least loaded node.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7258) Add Node and Rack Hints to Opportunistic Scheduler

2017-10-05 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193838#comment-16193838
 ] 

Arun Suresh commented on YARN-7258:
---

Committed this to branch-3.0 and branch-2 as well

> Add Node and Rack Hints to Opportunistic Scheduler
> --
>
> Key: YARN-7258
> URL: https://issues.apache.org/jira/browse/YARN-7258
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: kartheek muthyala
> Fix For: 2.9.0, 3.0.0, 3.1.0
>
> Attachments: YARN-7258.001.patch, YARN-7258.002.patch, 
> YARN-7258.003.patch, YARN-7258.004.patch
>
>
> Currently, the Opportunistic Scheduler ignores the node and rack information 
> and allocates strictly on the least loaded node (based on queue length) at 
> the time it received the request. This JIRA is to track changes needed to 
> allow the OpportunisticContainerAllocator to take the node/rack name as hints.
> The flow would be:
> # If requested node found in the top K leastLoaded nodes, allocate on that 
> node
> # Else, allocate on least loaded node on the same rack from the top K least 
> Loaded nodes.
> # Else, allocate on least loaded node.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7258) Add Node and Rack Hints to Opportunistic Scheduler

2017-10-05 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193200#comment-16193200
 ] 

Arun Suresh commented on YARN-7258:
---

+1 for the latest patch.
Committed this to trunk. Thanks for the patch [~kartheek]

> Add Node and Rack Hints to Opportunistic Scheduler
> --
>
> Key: YARN-7258
> URL: https://issues.apache.org/jira/browse/YARN-7258
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: kartheek muthyala
> Attachments: YARN-7258.001.patch, YARN-7258.002.patch, 
> YARN-7258.003.patch, YARN-7258.004.patch
>
>
> Currently, the Opportunistic Scheduler ignores the node and rack information 
> and allocates strictly on the least loaded node (based on queue length) at 
> the time it received the request. This JIRA is to track changes needed to 
> allow the OpportunisticContainerAllocator to take the node/rack name as hints.
> The flow would be:
> # If requested node found in the top K leastLoaded nodes, allocate on that 
> node
> # Else, allocate on least loaded node on the same rack from the top K least 
> Loaded nodes.
> # Else, allocate on least loaded node.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7275) NM Statestore cleanup for Container updates

2017-10-16 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206220#comment-16206220
 ] 

Arun Suresh commented on YARN-7275:
---

Thanks for the update [~kartheek].
+1, the latest patch lgtm.

The test failure is captured by YARN-7299

Will commit it shortly.

> NM Statestore cleanup for Container updates
> ---
>
> Key: YARN-7275
> URL: https://issues.apache.org/jira/browse/YARN-7275
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: kartheek muthyala
>Priority: Blocker
> Attachments: YARN-7275.001.patch, YARN-7275.002.patch, 
> YARN-7275.003.patch, YARN-7275.004.patch, YARN-7275.005.patch, 
> YARN-7275.006.patch
>
>
> Currently, only resource updates are recorded in the NM state store, we need 
> to add ExecutionType updates as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7343) Add a junit test for ContainerScheduler recovery

2017-10-17 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh reassigned YARN-7343:
-

Assignee: Sampada Dehankar  (was: kartheek muthyala)

> Add a junit test for ContainerScheduler recovery
> 
>
> Key: YARN-7343
> URL: https://issues.apache.org/jira/browse/YARN-7343
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: kartheek muthyala
>Assignee: Sampada Dehankar
>Priority: Minor
>
> With queuing at NM, Container recovery becomes interesting. Add a junit test 
> for recovering containers in different states. This should test the recovery 
> with the ContainerScheduler class that was introduced for enabling container 
> queuing on contention of resources. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7327) Allocate containers asynchronously by default

2017-10-13 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7327:
--
Summary: Allocate containers asynchronously by default  (was: Launch 
containers asynchronously by default)

> Allocate containers asynchronously by default
> -
>
> Key: YARN-7327
> URL: https://issues.apache.org/jira/browse/YARN-7327
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Craig Ingram
>Priority: Trivial
> Attachments: yarn-async-scheduling.png
>
>
> I was recently doing some research into Spark on YARN's startup time and 
> observed slow, synchronous allocation of containers/executors. I am testing 
> on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was 
> only allocating about 3 containers per second. Moreover when starting 3 Spark 
> applications at the same time with each requesting 44 containers, the first 
> application would get all 44 requested containers and then the next 
> application would start getting containers and so on.
>  
> From looking at the code, it appears this is by design. There is an 
> undocumented configuration variable that will enable asynchronous allocation 
> of containers. I'm sure I'm missing something, but why is this not the 
> default? Is there a bug or race condition in this code path? I've done some 
> testing with it and it's been working and is significantly faster.
>  
> Here's the config:
> `yarn.scheduler.capacity.schedule-asynchronously.enable`
>  
> Any help understanding this would be appreciated.
>  
> Thanks,
> Craig
>  
> If you're curious about the performance difference with this setting, here 
> are the results:
>  
> The following tool was used for the benchmarks:
> https://github.com/SparkTC/spark-bench
> h2. async scheduler research
> The goal of this test is to determine if running Spark on YARN with async 
> scheduling of containers reduces the amount of time required for an 
> application to receive all of its requested resources. This setting should 
> also reduce the overall runtime of short-lived applications/stages or 
> notebook paragraphs. This setting could prove crucial to achieving optimal 
> performance when sharing resources on a cluster with dynalloc enabled.
> h3. Test Setup
> Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) 
> between runs.  
> `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false`
> conf files request executors counts of:  
> * 2
> * 20
> * 50
> * 100
> The apps are being submitted to the default queue on each cluster which caps 
> at 48 cores on dynalloc and 72 cores on baremetal. The default queue was 
> expanded for the last two tests on baremetal so it could potentially take 
> advantage of all 144 cores.
> h3. Test Environments
> h4. dynalloc
> 4 VMs in Fyre (1 master, 3 workers)
> 8 CPUs/16 GB per node
> model name: QEMU Virtual CPU version 2.5+  
> h4. baremetal
> 4 baremetal instances in Fyre (1 master, 3 workers)
> 48 CPUs/128GB per node
> model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz  
> h3. Using spark-bench with timedsleep workload sync
> h4. dynalloc
> || requested containers | avg | stdev||
> |2 | 23.814900 | 1.110725|
> |20 | 29.770250 | 0.830528|
> |50 | 44.486600 | 0.593516|
> |100 | 44.337700 | 0.490139|
> h4. baremetal - 2 queues splitting cluster 72 cores each
> || requested containers | avg | stdev||
> |2 | 14.827000 | 0.292290|
> |20 | 19.613150 | 0.155421|
> |50 | 30.768400 | 0.083400|
> |100 | 40.931850 | 0.092160|
> h4. baremetal - 1 queue to rule them all - 144 cores
> || requested containers | avg | stdev||
> |2 | 14.833050 | 0.334061|
> |20 | 19.575000 | 0.212836|
> |50 | 30.765350 | 0.111035|
> |100 | 41.763300 | 0.182700|
> h3. Using spark-bench with timedsleep workload async
> h4. dynalloc
> || requested containers | avg | stdev||
> |2 | 22.575150 | 0.574296|
> |20 | 26.904150 | 1.244602|
> |50 | 44.721800 | 0.655388|
> |100 | 44.57 | 0.514540|
> h5. 2nd run  
> || requested containers | avg | stdev||
> |2 | 22.441200 | 0.715875|
> |20 | 26.683400 | 0.583762|
> |50 | 44.227250 | 0.512568|
> |100 | 44.238750 | 0.329712|
> h4. baremetal - 2 queues splitting cluster 72 cores each
> || requested containers | avg | stdev||
> |2 | 12.902350 | 0.125505|
> |20 | 13.830600 | 0.169598|
> |50 | 16.738050 | 0.265091|
> |100 | 40.654500 | 0.111417|
> h4. baremetal - 1 queue to rule them all - 144 cores
> || requested containers | avg | stdev||
> |2 | 12.987150 | 0.118169|
> |20 | 13.837150 | 0.145871|
> |50 | 16.816300 | 0.253437|
> |100 | 23.113450 | 0.320744|



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, 

[jira] [Commented] (YARN-7327) Allocate containers asynchronously by default

2017-10-13 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16204337#comment-16204337
 ] 

Arun Suresh commented on YARN-7327:
---

Changed the tile : It looks like this is more related to the RM allocating 
containers than launching them on the NM.

Thanks for trying this. Asynchronous scheduling (in the up-until-now released 
2.x branches of YARN) is fairly experimental, and it does lead to some 
unnecessary locking and race conditions. [~leftnoteasy] has re-factored most of 
the asynchronous scheduling code paths and it should be available in 2.9.0 and 
you can give it a shot in 3.0.0-beta1 as well.

The default scheduling mode (what you refer to as synchronous scheduling) is 
actually Node heartbeat triggered scheduling. There are certain cases where I 
guess the default scheduling might still be more apt. For eg, if most of your 
requests have stricter Data locality requirements. Also, in a slightly pegged 
cluster, I suspect you might see higher latencies - I have yet to test this 
though. 

But in general, it is direction we are actively looking at. BTW, for extremely 
short duration tasks, there is also an option to use OPPORTUNISTIC containers 
(YARN-2877 and YARN-5220) but you need to have support in the AM for that.


> Allocate containers asynchronously by default
> -
>
> Key: YARN-7327
> URL: https://issues.apache.org/jira/browse/YARN-7327
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Craig Ingram
>Priority: Trivial
> Attachments: yarn-async-scheduling.png
>
>
> I was recently doing some research into Spark on YARN's startup time and 
> observed slow, synchronous allocation of containers/executors. I am testing 
> on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was 
> only allocating about 3 containers per second. Moreover when starting 3 Spark 
> applications at the same time with each requesting 44 containers, the first 
> application would get all 44 requested containers and then the next 
> application would start getting containers and so on.
>  
> From looking at the code, it appears this is by design. There is an 
> undocumented configuration variable that will enable asynchronous allocation 
> of containers. I'm sure I'm missing something, but why is this not the 
> default? Is there a bug or race condition in this code path? I've done some 
> testing with it and it's been working and is significantly faster.
>  
> Here's the config:
> `yarn.scheduler.capacity.schedule-asynchronously.enable`
>  
> Any help understanding this would be appreciated.
>  
> Thanks,
> Craig
>  
> If you're curious about the performance difference with this setting, here 
> are the results:
>  
> The following tool was used for the benchmarks:
> https://github.com/SparkTC/spark-bench
> h2. async scheduler research
> The goal of this test is to determine if running Spark on YARN with async 
> scheduling of containers reduces the amount of time required for an 
> application to receive all of its requested resources. This setting should 
> also reduce the overall runtime of short-lived applications/stages or 
> notebook paragraphs. This setting could prove crucial to achieving optimal 
> performance when sharing resources on a cluster with dynalloc enabled.
> h3. Test Setup
> Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) 
> between runs.  
> `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false`
> conf files request executors counts of:  
> * 2
> * 20
> * 50
> * 100
> The apps are being submitted to the default queue on each cluster which caps 
> at 48 cores on dynalloc and 72 cores on baremetal. The default queue was 
> expanded for the last two tests on baremetal so it could potentially take 
> advantage of all 144 cores.
> h3. Test Environments
> h4. dynalloc
> 4 VMs in Fyre (1 master, 3 workers)
> 8 CPUs/16 GB per node
> model name: QEMU Virtual CPU version 2.5+  
> h4. baremetal
> 4 baremetal instances in Fyre (1 master, 3 workers)
> 48 CPUs/128GB per node
> model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz  
> h3. Using spark-bench with timedsleep workload sync
> h4. dynalloc
> || requested containers | avg | stdev||
> |2 | 23.814900 | 1.110725|
> |20 | 29.770250 | 0.830528|
> |50 | 44.486600 | 0.593516|
> |100 | 44.337700 | 0.490139|
> h4. baremetal - 2 queues splitting cluster 72 cores each
> || requested containers | avg | stdev||
> |2 | 14.827000 | 0.292290|
> |20 | 19.613150 | 0.155421|
> |50 | 30.768400 | 0.083400|
> |100 | 40.931850 | 0.092160|
> h4. baremetal - 1 queue to rule them all - 144 cores
> || requested containers | avg | stdev||
> |2 | 14.833050 | 0.334061|
> |20 | 19.575000 | 0.212836|
> |50 | 30.765350 | 0.111035|
> |100 | 41.763300 | 0.182700|
> h3. Using spark-bench with 

[jira] [Updated] (YARN-7327) CapacityScheduler: Allocate containers asynchronously by default

2017-10-13 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7327:
--
Summary: CapacityScheduler: Allocate containers asynchronously by default  
(was: Allocate containers asynchronously by default)

> CapacityScheduler: Allocate containers asynchronously by default
> 
>
> Key: YARN-7327
> URL: https://issues.apache.org/jira/browse/YARN-7327
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Craig Ingram
>Priority: Trivial
> Attachments: yarn-async-scheduling.png
>
>
> I was recently doing some research into Spark on YARN's startup time and 
> observed slow, synchronous allocation of containers/executors. I am testing 
> on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was 
> only allocating about 3 containers per second. Moreover when starting 3 Spark 
> applications at the same time with each requesting 44 containers, the first 
> application would get all 44 requested containers and then the next 
> application would start getting containers and so on.
>  
> From looking at the code, it appears this is by design. There is an 
> undocumented configuration variable that will enable asynchronous allocation 
> of containers. I'm sure I'm missing something, but why is this not the 
> default? Is there a bug or race condition in this code path? I've done some 
> testing with it and it's been working and is significantly faster.
>  
> Here's the config:
> `yarn.scheduler.capacity.schedule-asynchronously.enable`
>  
> Any help understanding this would be appreciated.
>  
> Thanks,
> Craig
>  
> If you're curious about the performance difference with this setting, here 
> are the results:
>  
> The following tool was used for the benchmarks:
> https://github.com/SparkTC/spark-bench
> h2. async scheduler research
> The goal of this test is to determine if running Spark on YARN with async 
> scheduling of containers reduces the amount of time required for an 
> application to receive all of its requested resources. This setting should 
> also reduce the overall runtime of short-lived applications/stages or 
> notebook paragraphs. This setting could prove crucial to achieving optimal 
> performance when sharing resources on a cluster with dynalloc enabled.
> h3. Test Setup
> Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) 
> between runs.  
> `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false`
> conf files request executors counts of:  
> * 2
> * 20
> * 50
> * 100
> The apps are being submitted to the default queue on each cluster which caps 
> at 48 cores on dynalloc and 72 cores on baremetal. The default queue was 
> expanded for the last two tests on baremetal so it could potentially take 
> advantage of all 144 cores.
> h3. Test Environments
> h4. dynalloc
> 4 VMs in Fyre (1 master, 3 workers)
> 8 CPUs/16 GB per node
> model name: QEMU Virtual CPU version 2.5+  
> h4. baremetal
> 4 baremetal instances in Fyre (1 master, 3 workers)
> 48 CPUs/128GB per node
> model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz  
> h3. Using spark-bench with timedsleep workload sync
> h4. dynalloc
> || requested containers | avg | stdev||
> |2 | 23.814900 | 1.110725|
> |20 | 29.770250 | 0.830528|
> |50 | 44.486600 | 0.593516|
> |100 | 44.337700 | 0.490139|
> h4. baremetal - 2 queues splitting cluster 72 cores each
> || requested containers | avg | stdev||
> |2 | 14.827000 | 0.292290|
> |20 | 19.613150 | 0.155421|
> |50 | 30.768400 | 0.083400|
> |100 | 40.931850 | 0.092160|
> h4. baremetal - 1 queue to rule them all - 144 cores
> || requested containers | avg | stdev||
> |2 | 14.833050 | 0.334061|
> |20 | 19.575000 | 0.212836|
> |50 | 30.765350 | 0.111035|
> |100 | 41.763300 | 0.182700|
> h3. Using spark-bench with timedsleep workload async
> h4. dynalloc
> || requested containers | avg | stdev||
> |2 | 22.575150 | 0.574296|
> |20 | 26.904150 | 1.244602|
> |50 | 44.721800 | 0.655388|
> |100 | 44.57 | 0.514540|
> h5. 2nd run  
> || requested containers | avg | stdev||
> |2 | 22.441200 | 0.715875|
> |20 | 26.683400 | 0.583762|
> |50 | 44.227250 | 0.512568|
> |100 | 44.238750 | 0.329712|
> h4. baremetal - 2 queues splitting cluster 72 cores each
> || requested containers | avg | stdev||
> |2 | 12.902350 | 0.125505|
> |20 | 13.830600 | 0.169598|
> |50 | 16.738050 | 0.265091|
> |100 | 40.654500 | 0.111417|
> h4. baremetal - 1 queue to rule them all - 144 cores
> || requested containers | avg | stdev||
> |2 | 12.987150 | 0.118169|
> |20 | 13.837150 | 0.145871|
> |50 | 16.816300 | 0.253437|
> |100 | 23.113450 | 0.320744|



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: 

[jira] [Commented] (YARN-4859) [Bug] Unable to submit a job to a reservation when using FairScheduler

2017-10-16 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206521#comment-16206521
 ] 

Arun Suresh commented on YARN-4859:
---

Sorry to chime in late [~yufeigu], but If you are testing this, please note: I 
don't know if it is documented anywhere, but reservations will work in FS 
queues which are configured with DRF policy (alteast this was the case when I 
testing this last year)

> [Bug] Unable to submit a job to a reservation when using FairScheduler
> --
>
> Key: YARN-4859
> URL: https://issues.apache.org/jira/browse/YARN-4859
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Subru Krishnan
>Assignee: Yufei Gu
>Priority: Blocker
>
> Jobs submitted to a reservation get stuck at scheduled stage when using 
> FairScheduler. I came across this when working on YARN-4827 (documentation 
> for configuring ReservationSystem for FairScheduler)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-4859) [Bug] Unable to submit a job to a reservation when using FairScheduler

2017-10-16 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206521#comment-16206521
 ] 

Arun Suresh edited comment on YARN-4859 at 10/16/17 8:05 PM:
-

Sorry to chime in late [~yufeigu], but If you are testing this, please note: I 
don't know if it is documented anywhere, but reservations will only work in FS 
queues which are configured with DRF policy (alteast this was the case when I 
testing this last year)


was (Author: asuresh):
Sorry to chime in late [~yufeigu], but If you are testing this, please note: I 
don't know if it is documented anywhere, but reservations will work in FS 
queues which are configured with DRF policy (alteast this was the case when I 
testing this last year)

> [Bug] Unable to submit a job to a reservation when using FairScheduler
> --
>
> Key: YARN-4859
> URL: https://issues.apache.org/jira/browse/YARN-4859
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Subru Krishnan
>Assignee: Yufei Gu
>Priority: Blocker
>
> Jobs submitted to a reservation get stuck at scheduled stage when using 
> FairScheduler. I came across this when working on YARN-4827 (documentation 
> for configuring ReservationSystem for FairScheduler)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7275) NM Statestore cleanup for Container updates

2017-10-16 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7275:
--
Fix Version/s: 3.0.0
   2.9.0

> NM Statestore cleanup for Container updates
> ---
>
> Key: YARN-7275
> URL: https://issues.apache.org/jira/browse/YARN-7275
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: kartheek muthyala
>Priority: Blocker
> Fix For: 2.9.0, 3.0.0
>
> Attachments: YARN-7275.001.patch, YARN-7275.002.patch, 
> YARN-7275.003.patch, YARN-7275.004.patch, YARN-7275.005.patch, 
> YARN-7275.006.patch
>
>
> Currently, only resource updates are recorded in the NM state store, we need 
> to add ExecutionType updates as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6849) NMContainerStatus should have the Container ExecutionType.

2017-09-08 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158858#comment-16158858
 ] 

Arun Suresh commented on YARN-6849:
---

+1, The test failures are unrelated.
Committing this shortly..


> NMContainerStatus should have the Container ExecutionType.
> --
>
> Key: YARN-6849
> URL: https://issues.apache.org/jira/browse/YARN-6849
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: kartheek muthyala
> Attachments: YARN-6849.001.patch, YARN-6849.002.patch
>
>
> Currently only the ContainerState is sent to the RM in the NMContainerStatus. 
> This lets the restarted RM know if the container is queued or not, but It 
> wont know the ExecutionType.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6849) NMContainerStatus should have the Container ExecutionType.

2017-09-08 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-6849:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-5085

> NMContainerStatus should have the Container ExecutionType.
> --
>
> Key: YARN-6849
> URL: https://issues.apache.org/jira/browse/YARN-6849
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: kartheek muthyala
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: YARN-6849.001.patch, YARN-6849.002.patch
>
>
> Currently only the ContainerState is sent to the RM in the NMContainerStatus. 
> This lets the restarted RM know if the container is queued or not, but It 
> wont know the ExecutionType. ExecutionType updates (Container promotions) 
> cannot also happen unless the RM knows about opportunistic / QUEUED 
> containers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6849) NMContainerStatus should have the Container ExecutionType.

2017-09-08 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-6849:
--
Description: Currently only the ContainerState is sent to the RM in the 
NMContainerStatus. This lets the restarted RM know if the container is queued 
or not, but It wont know the ExecutionType. ExecutionType updates (Container 
promotions) cannot also happen unless the RM knows about opportunistic / QUEUED 
containers.  (was: Currently only the ContainerState is sent to the RM in the 
NMContainerStatus. This lets the restarted RM know if the container is queued 
or not, but It wont know the ExecutionType.)

> NMContainerStatus should have the Container ExecutionType.
> --
>
> Key: YARN-6849
> URL: https://issues.apache.org/jira/browse/YARN-6849
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: kartheek muthyala
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: YARN-6849.001.patch, YARN-6849.002.patch
>
>
> Currently only the ContainerState is sent to the RM in the NMContainerStatus. 
> This lets the restarted RM know if the container is queued or not, but It 
> wont know the ExecutionType. ExecutionType updates (Container promotions) 
> cannot also happen unless the RM knows about opportunistic / QUEUED 
> containers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-4509) Promote containers from OPPORTUNISTIC to GUARANTEED

2017-09-08 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh resolved YARN-4509.
---
  Resolution: Duplicate
Target Version/s:   (was: )

> Promote containers from OPPORTUNISTIC to GUARANTEED
> ---
>
> Key: YARN-4509
> URL: https://issues.apache.org/jira/browse/YARN-4509
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha1
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>
> YARN-2882 adds the notion of an OPPORTUNISTIC containers. We should define 
> the protocol for promoting these containers to GUARATEED.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4509) Promote containers from OPPORTUNISTIC to GUARANTEED

2017-09-08 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158894#comment-16158894
 ] 

Arun Suresh commented on YARN-4509:
---

Marking as duplicate of YARN-5085

> Promote containers from OPPORTUNISTIC to GUARANTEED
> ---
>
> Key: YARN-4509
> URL: https://issues.apache.org/jira/browse/YARN-4509
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha1
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>
> YARN-2882 adds the notion of an OPPORTUNISTIC containers. We should define 
> the protocol for promoting these containers to GUARATEED.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5221) Expose UpdateResourceRequest API to allow AM to request for change in container properties

2017-09-08 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-5221:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-5085

> Expose UpdateResourceRequest API to allow AM to request for change in 
> container properties
> --
>
> Key: YARN-5221
> URL: https://issues.apache.org/jira/browse/YARN-5221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: YARN-5221.001.patch, YARN-5221.002.patch, 
> YARN-5221.003.patch, YARN-5221.004.patch, YARN-5221.005.patch, 
> YARN-5221.006.patch, YARN-5221.007.patch, YARN-5221.008.patch, 
> YARN-5221.009.patch, YARN-5221.010.patch, YARN-5221.011.patch, 
> YARN-5221.012.patch, YARN-5221.013.patch, YARN-5221-branch-2.8-v1.patch, 
> YARN-5221-branch-2-v1.patch
>
>
> YARN-1197 introduced APIs to allow an AM to request for Increase and Decrease 
> of Container Resources after initial allocation.
> YARN-5085 proposes to allow an AM to request for a change of Container 
> ExecutionType.
> This JIRA proposes to unify both of the above into an Update Container API.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7173) Container Update Backward compatibility fix for upgrades

2017-09-08 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7173:
--
Attachment: YARN-7173.001.patch

Looks like this is required for 3.0.x / trunk as well.
Attaching trunk patch.

> Container Update Backward compatibility fix for upgrades
> 
>
> Key: YARN-7173
> URL: https://issues.apache.org/jira/browse/YARN-7173
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-7173.001.patch, YARN-7173-branch-2.001.patch
>
>
> This is based on discussions with [~leftnoteasy] in YARN-6979.
> In YARN-6979, the {{getContainersToDecrease()}} and 
> {{addAllContainersToDecrease()}} methods were removed from the 
> NodeHeartbeatResponse (although the actual protobuf fields were still 
> retained). We need to ensure that for clusters that upgrade from 2.8.x to 
> 2.9.0, the decreased containers should also be sent to the NM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3417) AM to be able to exit with a request saying "restart me with these (possibly updated) resource requirements"

2017-09-08 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-3417:
--
Issue Type: Bug  (was: Sub-task)
Parent: (was: YARN-4726)

> AM to be able to exit with a request saying "restart me with these (possibly 
> updated) resource requirements"
> 
>
> Key: YARN-3417
> URL: https://issues.apache.org/jira/browse/YARN-3417
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Steve Loughran
>Assignee: Varun Saxena
>Priority: Minor
>
> If an AM wants to reconfigure itself or restart with new resources, there's 
> no way to do this without the active participation of a client.
> It can call System.exit and rely on YARN to restart it -but that counts as a 
> failure and may lose the entire app. furthermore, that doesn't allow the AM 
> to resize itself.
> A simple exit-code to be interpreted as restart-without-failure could handle 
> the first case; an explicit call to indicate restart, including potentially 
> new resource/label requirements, could be more reliabile, and certainly more 
> flexible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1040) De-link container life cycle from an Allocation

2017-09-08 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-1040:
--
Issue Type: New Feature  (was: Sub-task)
Parent: (was: YARN-4726)

> De-link container life cycle from an Allocation
> ---
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Steve Loughran
> Attachments: YARN-1040-rough-design.pdf
>
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4876) Decoupled Init / Destroy of Containers from Start / Stop

2017-09-08 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4876:
--
Issue Type: Improvement  (was: Sub-task)
Parent: (was: YARN-4726)

> Decoupled Init / Destroy of Containers from Start / Stop
> 
>
> Key: YARN-4876
> URL: https://issues.apache.org/jira/browse/YARN-4876
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api, nodemanager
>Reporter: Arun Suresh
>Assignee: Marco Rabozzi
>  Labels: oct16-hard
> Attachments: YARN-4876.002.patch, YARN-4876.003.patch, 
> YARN-4876.004.patch, YARN-4876.01.patch, YARN-4876-design-doc.pdf
>
>
> Introduce *initialize* and *destroy* container API into the 
> *ContainerManagementProtocol* and decouple the actual start of a container 
> from the initialization. This will allow AMs to re-start a container without 
> having to lose the allocation.
> Additionally, if the localization of the container is associated to the 
> initialize (and the cleanup with the destroy), This can also be used by 
> applications to upgrade a Container by *re-initializing* with a new 
> *ContainerLaunchContext*



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7173) Container Update Backward compatibility fix for upgrades

2017-09-08 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7173:
--
Summary: Container Update Backward compatibility fix for upgrades  (was: 
Container Update Backward compatibility fix for upgrades from 2.8.x)

> Container Update Backward compatibility fix for upgrades
> 
>
> Key: YARN-7173
> URL: https://issues.apache.org/jira/browse/YARN-7173
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-7173-branch-2.001.patch
>
>
> This is based on discussions with [~leftnoteasy] in YARN-6979.
> In YARN-6979, the {{getContainersToDecrease()}} and 
> {{addAllContainersToDecrease()}} methods were removed from the 
> NodeHeartbeatResponse (although the actual protobuf fields were still 
> retained). We need to ensure that for clusters that upgrade from 2.8.x to 
> 2.9.0, the decreased containers should also be sent to the NM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7173) Container Update Backward compatibility fix for upgrades

2017-09-08 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7173:
--
Target Version/s: 2.9.0, 3.0.0-beta1  (was: 2.9.0)

> Container Update Backward compatibility fix for upgrades
> 
>
> Key: YARN-7173
> URL: https://issues.apache.org/jira/browse/YARN-7173
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-7173.001.patch, YARN-7173-branch-2.001.patch
>
>
> This is based on discussions with [~leftnoteasy] in YARN-6979.
> In YARN-6979, the {{getContainersToDecrease()}} and 
> {{addAllContainersToDecrease()}} methods were removed from the 
> NodeHeartbeatResponse (although the actual protobuf fields were still 
> retained). We need to ensure that for clusters that upgrade from 2.8.x to 
> 2.9.0, the decreased containers should also be sent to the NM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7178) Add documentation for Container Update API

2017-09-08 Thread Arun Suresh (JIRA)
Arun Suresh created YARN-7178:
-

 Summary: Add documentation for Container Update API
 Key: YARN-7178
 URL: https://issues.apache.org/jira/browse/YARN-7178
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun Suresh
Assignee: Arun Suresh






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7173) Container Update Backward compatibility fix for upgrades

2017-09-08 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159345#comment-16159345
 ] 

Arun Suresh commented on YARN-7173:
---

The test failures are unrelated.
I also ran TestIncreaseAllocationExpirer, 
TestOpportunisticContainerAllocatorAMService and TestContainerResizing locally 
to verify that things are fine.

> Container Update Backward compatibility fix for upgrades
> 
>
> Key: YARN-7173
> URL: https://issues.apache.org/jira/browse/YARN-7173
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-7173.001.patch, YARN-7173-branch-2.001.patch
>
>
> This is based on discussions with [~leftnoteasy] in YARN-6979.
> In YARN-6979, the {{getContainersToDecrease()}} and 
> {{addAllContainersToDecrease()}} methods were removed from the 
> NodeHeartbeatResponse (although the actual protobuf fields were still 
> retained). We need to ensure that for clusters that upgrade from 2.8.x to 
> 2.9.0, the decreased containers should also be sent to the NM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping

2017-08-29 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145990#comment-16145990
 ] 

Arun Suresh commented on YARN-7117:
---

cc [~subru] and [~curino]

> Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue 
> Mapping
> --
>
> Key: YARN-7117
> URL: https://issues.apache.org/jira/browse/YARN-7117
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>
> Currently Capacity Scheduler doesn't support auto creation of queues when 
> doing queue mapping. We saw more and more use cases which has complex queue 
> mapping policies configured to handle application to queues mapping. 
> The most common use case of CapacityScheduler queue mapping is to create one 
> queue for each user/group. However update {{capacity-scheduler.xml}} and 
> {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One 
> of the option to solve the problem is automatically create queues when new 
> user/group arrives.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Closed] (YARN-6692) Delay pause when container is localizing

2017-09-10 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh closed YARN-6692.
-

> Delay pause when container is localizing
> 
>
> Key: YARN-6692
> URL: https://issues.apache.org/jira/browse/YARN-6692
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jose Miguel Arreola
>Assignee: Jose Miguel Arreola
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> If a container receives a Pause event while localizing, allow container 
> finish localizing and then pause it



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6059) Update paused container state in the state store

2017-09-11 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh reassigned YARN-6059:
-

Assignee: Arun Suresh  (was: Hitesh Sharma)

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Arun Suresh
>Priority: Blocker
> Fix For: 2.9.0, 3.0.0
>
> Attachments: YARN-5216-YARN-6059.001.patch, 
> YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, 
> YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch, 
> YARN-6059-YARN-5972.005.patch, YARN-6059-YARN-5972.006.patch, 
> YARN-6059-YARN-5972.007.patch, YARN-6059-YARN-5972.008.patch, 
> YARN-6059-YARN-5972.009.patch, YARN-6059-YARN-5972.010.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6059) Update paused container state in the state store

2017-09-11 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-6059:
--
Priority: Blocker  (was: Major)

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
>Priority: Blocker
> Fix For: 2.9.0, 3.0.0
>
> Attachments: YARN-5216-YARN-6059.001.patch, 
> YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, 
> YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch, 
> YARN-6059-YARN-5972.005.patch, YARN-6059-YARN-5972.006.patch, 
> YARN-6059-YARN-5972.007.patch, YARN-6059-YARN-5972.008.patch, 
> YARN-6059-YARN-5972.009.patch, YARN-6059-YARN-5972.010.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6059) Update paused container state in the state store

2017-09-11 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-6059:
--
Fix Version/s: 3.0.0
   2.9.0

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
>Priority: Blocker
> Fix For: 2.9.0, 3.0.0
>
> Attachments: YARN-5216-YARN-6059.001.patch, 
> YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, 
> YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch, 
> YARN-6059-YARN-5972.005.patch, YARN-6059-YARN-5972.006.patch, 
> YARN-6059-YARN-5972.007.patch, YARN-6059-YARN-5972.008.patch, 
> YARN-6059-YARN-5972.009.patch, YARN-6059-YARN-5972.010.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6059) Update paused container state in the state store

2017-09-11 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-6059:
--
Attachment: YARN-6059-YARN-5972.011.patch

Cleaning up and re-attaching last patch to rebased YARN-5972 branch

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Arun Suresh
>Priority: Blocker
> Fix For: 2.9.0, 3.0.0
>
> Attachments: YARN-5216-YARN-6059.001.patch, 
> YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, 
> YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch, 
> YARN-6059-YARN-5972.005.patch, YARN-6059-YARN-5972.006.patch, 
> YARN-6059-YARN-5972.007.patch, YARN-6059-YARN-5972.008.patch, 
> YARN-6059-YARN-5972.009.patch, YARN-6059-YARN-5972.010.patch, 
> YARN-6059-YARN-5972.011.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6692) Delay pause when container is localizing

2017-09-10 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh resolved YARN-6692.
---
Resolution: Invalid

Closing this, since it is not a valid scenario currently.

> Delay pause when container is localizing
> 
>
> Key: YARN-6692
> URL: https://issues.apache.org/jira/browse/YARN-6692
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jose Miguel Arreola
>Assignee: Jose Miguel Arreola
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> If a container receives a Pause event while localizing, allow container 
> finish localizing and then pause it



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6059) Update paused container state in the state store

2017-09-11 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh reassigned YARN-6059:
-

Assignee: Hitesh Sharma  (was: Arun Suresh)

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
>Priority: Blocker
> Fix For: 2.9.0, 3.0.0
>
> Attachments: YARN-5216-YARN-6059.001.patch, 
> YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, 
> YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch, 
> YARN-6059-YARN-5972.005.patch, YARN-6059-YARN-5972.006.patch, 
> YARN-6059-YARN-5972.007.patch, YARN-6059-YARN-5972.008.patch, 
> YARN-6059-YARN-5972.009.patch, YARN-6059-YARN-5972.010.patch, 
> YARN-6059-YARN-5972.011.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7126) Create introductory site documentation for YARN native services

2017-09-05 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16154291#comment-16154291
 ] 

Arun Suresh commented on YARN-7126:
---

It looks like some of TBDs must must be removed.
Also, I think we need some more formal documentation - especially with some 
sample applications/services and setting up the DNS/registry etc.

> Create introductory site documentation for YARN native services
> ---
>
> Key: YARN-7126
> URL: https://issues.apache.org/jira/browse/YARN-7126
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Gour Saha
> Fix For: yarn-native-services
>
> Attachments: YARN-7126-yarn-native-services.001.patch, 
> YARN-7126-yarn-native-services.002.patch, 
> YARN-7126-yarn-native-services.003.patch, 
> YARN-7126-yarn-native-services.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6849) NMContainerStatus should have the Container ExecutionType.

2017-09-06 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh reassigned YARN-6849:
-

Assignee: kartheek muthyala  (was: Atri Sharma)

> NMContainerStatus should have the Container ExecutionType.
> --
>
> Key: YARN-6849
> URL: https://issues.apache.org/jira/browse/YARN-6849
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: kartheek muthyala
>
> Currently only the ContainerState is sent to the RM in the NMContainerStatus. 
> This lets the restarted RM know if the container is queued or not, but It 
> wont know the ExecutionType.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6849) NMContainerStatus should have the Container ExecutionType.

2017-09-07 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16157071#comment-16157071
 ] 

Arun Suresh commented on YARN-6849:
---

Thanks for the patch [~kartheek]

Can we also add in the testcase a Guaranteed container.
Also we need to verify that resources are not incremented for queue and node 
for the opportunistic containers.

> NMContainerStatus should have the Container ExecutionType.
> --
>
> Key: YARN-6849
> URL: https://issues.apache.org/jira/browse/YARN-6849
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: kartheek muthyala
> Attachments: YARN-6849.001.patch
>
>
> Currently only the ContainerState is sent to the RM in the NMContainerStatus. 
> This lets the restarted RM know if the container is queued or not, but It 
> wont know the ExecutionType.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6978) Add updateContainer API to NMClient.

2017-09-07 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16157007#comment-16157007
 ] 

Arun Suresh commented on YARN-6978:
---

Thanks for the update patch [~kartheek],

It looks mostly good to me.
+1 pending Jenkins / checkstyle and javac fixes

> Add updateContainer API to NMClient.
> 
>
> Key: YARN-6978
> URL: https://issues.apache.org/jira/browse/YARN-6978
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: kartheek muthyala
> Attachments: YARN-6978.001.patch, YARN-6978.002.patch
>
>
> This is to track the addition of updateContainer API to the {{NMClient}} and 
> {{NMClientAsync}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7173) Container Update Backward compatibility fix for upgrades from 2.8.x

2017-09-07 Thread Arun Suresh (JIRA)
Arun Suresh created YARN-7173:
-

 Summary: Container Update Backward compatibility fix for upgrades 
from 2.8.x
 Key: YARN-7173
 URL: https://issues.apache.org/jira/browse/YARN-7173
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun Suresh
Assignee: Arun Suresh


This is based on discussions with [~leftnoteasy] in YARN-6979.

In YARN-6979, the {{getContainersToDecrease()}} and 
{{addAllContainersToDecrease()}} methods were removed from the 
NodeHeartbeatResponse (although the actual protobuf fields were still 
retained). We need to ensure that for clusters that upgrade from 2.8.x to 
2.9.0, the decreased containers should also be sent to the NM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6978) Add updateContainer API to NMClient.

2017-09-07 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16157365#comment-16157365
 ] 

Arun Suresh commented on YARN-6978:
---

+1
Committing this shortly (will fix the minor indentation checkstyle before I 
commit).

> Add updateContainer API to NMClient.
> 
>
> Key: YARN-6978
> URL: https://issues.apache.org/jira/browse/YARN-6978
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: kartheek muthyala
> Attachments: YARN-6978.001.patch, YARN-6978.002.patch, 
> YARN-6978.003.patch
>
>
> This is to track the addition of updateContainer API to the {{NMClient}} and 
> {{NMClientAsync}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



<    11   12   13   14   15   16   17   18   19   20   >