date:20161206


[ 
https://issues.apache.org/jira/browse/YARN-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725564#comment-15725564
 ] 

Sunil G commented on YARN-5970:
---

I think point 3 is handled in YARN-5965. Pls skip my 3rd comment.

> Validate update timeout request parameters
> --
>
> Key: YARN-5970
> URL: https://issues.apache.org/jira/browse/YARN-5970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5970.0.patch
>
>
> AppTimeoutInfo is exposed to REST clients. This object request parameters 
> should be validated before processing it to clientRMService.
> And also handles couple of minor issues in REST services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-5972) Add Support for Pausing/Freezing of containers

Arun Suresh created YARN-5972:
-

 Summary: Add Support for Pausing/Freezing of containers
 Key: YARN-5972
 URL: https://issues.apache.org/jira/browse/YARN-5972
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Arun Suresh






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5292) NM Container lifecycle and state transitions to support for PAUSED container state.


 [ 
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-5292:
--
Summary: NM Container lifecycle and state transitions to support for PAUSED 
container state.  (was: Support for PAUSED container state for OPPORTUNISTIC 
containers)

> NM Container lifecycle and state transitions to support for PAUSED container 
> state.
> ---
>
> Key: YARN-5292
> URL: https://issues.apache.org/jira/browse/YARN-5292
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5292.001.patch, YARN-5292.002.patch, 
> YARN-5292.003.patch, YARN-5292.004.patch, YARN-5292.005.patch, yarn-5292.pdf
>
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it 
> remains until resources get freed up on the node then the preempted container 
> can resume to the running state.
>  
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt would pause 
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to 
> killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5292) NM Container lifecycle and state transitions to support for PAUSED container state.


 [ 
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-5292:
--
Issue Type: Sub-task  (was: New Feature)
Parent: YARN-5972

> NM Container lifecycle and state transitions to support for PAUSED container 
> state.
> ---
>
> Key: YARN-5292
> URL: https://issues.apache.org/jira/browse/YARN-5292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5292.001.patch, YARN-5292.002.patch, 
> YARN-5292.003.patch, YARN-5292.004.patch, YARN-5292.005.patch, yarn-5292.pdf
>
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it 
> remains until resources get freed up on the node then the preempted container 
> can resume to the running state.
>  
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt would pause 
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to 
> killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5972) Add Support for Pausing/Freezing of containers


[ 
https://issues.apache.org/jira/browse/YARN-5972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725848#comment-15725848
 ] 

Arun Suresh commented on YARN-5972:
---

As per Subru's request in YARN-5292, I have moved this to an umbrella JIRA to 
foster further discussion as well as a place to post the design doc.

> Add Support for Pausing/Freezing of containers
> --
>
> Key: YARN-5972
> URL: https://issues.apache.org/jira/browse/YARN-5972
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> Instead of preempting a running container, the container can be moved to a 
> PAUSED state, where it remains until resources get freed up on the node then 
> the preempted container can resume to the running state.
> Note that process freezing this is already supported by 'cgroups freezer' 
> which is used internally by the docker pause functionality. Windows also has 
> OS level support of a similar nature.
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt call would 
> pause the VM and resume would restore it back to the running state.
> If the container executor / runtime doesn't support preemption, then preempt 
> would default to killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5970) Validate update timeout request parameters


[ 
https://issues.apache.org/jira/browse/YARN-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725588#comment-15725588
 ] 

Rohith Sharma K S commented on YARN-5970:
-

bq. Since we do not know the error offset, do we need to set it to -1?
offset is mandatory for parseexcpetion. However, since null value is being 
passed, offset remain as -1.

bq. Could we append "seconds" as well at the end.
This value can not be seconds. This value be in-valid ISO8601 format. so, we 
need to print invalid configured value that can be anything. 

bq.  think point 3 is handled in YARN-5965
yes, yarn-5965 handles comment-3.

> Validate update timeout request parameters
> --
>
> Key: YARN-5970
> URL: https://issues.apache.org/jira/browse/YARN-5970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5970.0.patch
>
>
> AppTimeoutInfo is exposed to REST clients. This object request parameters 
> should be validated before processing it to clientRMService.
> And also handles couple of minor issues in REST services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5965) Revisit ApplicationReport #getApplicationTimeouts


[ 
https://issues.apache.org/jira/browse/YARN-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725641#comment-15725641
 ] 

Hadoop QA commented on YARN-5965:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
59s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
43s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  5m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m  
6s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 53s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 3 new + 319 unchanged - 0 fixed = 322 total (was 319) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
35s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
44s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 39m  7s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 16m 
19s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}111m 59s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-5965 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12841945/YARN-5965.0.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  cc  |
| uname | Linux 6109507ea2ca 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 
21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git

[jira] [Commented] (YARN-5965) Revisit ApplicationReport #getApplicationTimeouts


[ 
https://issues.apache.org/jira/browse/YARN-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725677#comment-15725677
 ] 

Sunil G commented on YARN-5965:
---

Thanks [~rohithsharma]

Few minor  nits:

{noformat}
  optional string appNodeLabelExpression = 24;
  optional string amNodeLabelExpression = 25;
  repeated AppTimeoutMapProto application_timeouts = 26;
}

message AppTimeoutMapProto {
  optional ApplicationTimeoutTypeProto application_timeout_type = 1;
  optional ApplicationTimeoutProto application_timeouts = 2;
}   
}
{noformat}
1. {{AppTimeoutMapProto application_timeouts}}, could it be like 
{{appTimeouts}} and {{AppTimeoutsMapProto}}
2. AppTimeoutMapProto -> AppTimeoutsMapProto
3. application_timeouts -> application_timeout



> Revisit ApplicationReport #getApplicationTimeouts
> -
>
> Key: YARN-5965
> URL: https://issues.apache.org/jira/browse/YARN-5965
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Jian He
>Assignee: Rohith Sharma K S
> Attachments: YARN-5965.0.patch
>
>
> Currently it returns a list of ApplicationTimeout objects,  to get a 
> particular timeout, the caller code needs to iterate the list and compare the 
> timeoutType to get the corresponding value. Is a map data structure easier 
> for use code? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5972) Add Support for Pausing/Freezing of containers


 [ 
https://issues.apache.org/jira/browse/YARN-5972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-5972:
--
Description: 
YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
capability to customize how OPPORTUNISTIC containers get preempted.

In this JIRA we propose introducing a PAUSED container state.
Instead of preempting a running container, the container can be moved to a 
PAUSED state, where it remains until resources get freed up on the node then 
the preempted container can resume to the running state.

Note that process freezing this is already supported by 'cgroups freezer' which 
is used internally by the docker pause functionality. Windows also has OS level 
support of a similar nature.

One scenario where this capability is useful is work preservation. How 
preemption is done, and whether the container supports it, is implementation 
specific.

For instance, if the container is a virtual machine, then preempt would pause 
the VM and resume would restore it back to the running state.
If the container doesn't support preemption, then preempt would default to 
killing the container. 
 

> Add Support for Pausing/Freezing of containers
> --
>
> Key: YARN-5972
> URL: https://issues.apache.org/jira/browse/YARN-5972
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> Instead of preempting a running container, the container can be moved to a 
> PAUSED state, where it remains until resources get freed up on the node then 
> the preempted container can resume to the running state.
> Note that process freezing this is already supported by 'cgroups freezer' 
> which is used internally by the docker pause functionality. Windows also has 
> OS level support of a similar nature.
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt would pause 
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to 
> killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5970) Validate update timeout request parameters


[ 
https://issues.apache.org/jira/browse/YARN-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725550#comment-15725550
 ] 

Sunil G commented on YARN-5970:
---

Thanks [~rohithsharma]

1)
bq.{{throw new ParseException("Invalid input.", -1);}}
Since we do not know the error offset, do we need to set it to -1?

2)
bq.
{noformat}
"Expire time is not in ISO8601 format. ISO8601 supported "
 + "format is -MM-dd'T'HH:mm:ss.SSSZ. Configured "
 + "timeout value is " + timeout.getValue();
{noformat}

Could we append "seconds" as well at the end.

3) When app is finished before timeout, could we validate ? I think we can club 
that validation also here. Thoughts?


> Validate update timeout request parameters
> --
>
> Key: YARN-5970
> URL: https://issues.apache.org/jira/browse/YARN-5970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5970.0.patch
>
>
> AppTimeoutInfo is exposed to REST clients. This object request parameters 
> should be validated before processing it to clientRMService.
> And also handles couple of minor issues in REST services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-5972) Add Support for Pausing/Freezing of containers


[ 
https://issues.apache.org/jira/browse/YARN-5972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725844#comment-15725844
 ] 

Arun Suresh edited comment on YARN-5972 at 12/6/16 3:40 PM:


Created YARN-5972 branch for this work


was (Author: asuresh):
Create YARN-5972 branch for this work

> Add Support for Pausing/Freezing of containers
> --
>
> Key: YARN-5972
> URL: https://issues.apache.org/jira/browse/YARN-5972
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> Instead of preempting a running container, the container can be moved to a 
> PAUSED state, where it remains until resources get freed up on the node then 
> the preempted container can resume to the running state.
> Note that process freezing this is already supported by 'cgroups freezer' 
> which is used internally by the docker pause functionality. Windows also has 
> OS level support of a similar nature.
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt call would 
> pause the VM and resume would restore it back to the running state.
> If the container executor / runtime doesn't support preemption, then preempt 
> would default to killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5972) Add Support for Pausing/Freezing of containers


[ 
https://issues.apache.org/jira/browse/YARN-5972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725844#comment-15725844
 ] 

Arun Suresh commented on YARN-5972:
---

Create YARN-5972 branch for this work

> Add Support for Pausing/Freezing of containers
> --
>
> Key: YARN-5972
> URL: https://issues.apache.org/jira/browse/YARN-5972
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> Instead of preempting a running container, the container can be moved to a 
> PAUSED state, where it remains until resources get freed up on the node then 
> the preempted container can resume to the running state.
> Note that process freezing this is already supported by 'cgroups freezer' 
> which is used internally by the docker pause functionality. Windows also has 
> OS level support of a similar nature.
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt call would 
> pause the VM and resume would restore it back to the running state.
> If the container executor / runtime doesn't support preemption, then preempt 
> would default to killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4675) Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl


[ 
https://issues.apache.org/jira/browse/YARN-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726372#comment-15726372
 ] 

Sangjin Lee commented on YARN-4675:
---

I agree that we need to have a firm conclusion one way or another on this 
before the next release.

There are two aspects to this: (1) reorganize {{TimelineClientImpl}} into v.1 
specific code and v.2 specific code, and (2) create separate interfaces (APIs) 
for v.1 and v.2.

IMO, (1) is not very controversial and that's what's captured in the current 
patch. I think we should do it as it's more of an internal "implementation 
detail". For the most part, clients should *not* use {{TimelineClientImpl}} or 
other impl classes directly but rather stick with {{TimelineClient}}. I do see 
some references to {{TimelineClientImpl}} in MR and distributed shell and what 
not, and it might be good to address them to avoid using {{TimelineClientImpl}} 
as much as possible.

On the other hand, (2) is a little bigger issue as it affects how clients will 
implement using timeline service. The initial desire was to stick with a single 
interface ({{TimelineClient}}) for both v.1 and v.2 because we wanted to 
minimize the impact on client code. However, since we have new entity API, we 
still had to have v.1-specific methods and v.2-specific methods. Therefore, 
clients still need to do things like:
{code}
if (v.1) {
  client.putEntities(...); // call the v.1 method
} else {
  client.putEntities(...); // call the v.2 method
}
{code}
So retaining a single interface doesn't really help them much. There is also 
the aspect of preparing the entities in a different way, depending on v.1 or 
v.2.

So, in that sense, it doesn't seem that bad to separate the interface itself. 
This is still a much more significant change because it would impact all the 
client code (MR, DS, Tez, etc.). What are your thoughts on both aspects?

> Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl
> 
>
> Key: YARN-4675
> URL: https://issues.apache.org/jira/browse/YARN-4675
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: YARN-5355, oct16-medium
> Attachments: YARN-4675-YARN-2928.v1.001.patch
>
>
> We need to reorganize TimeClientImpl into TimeClientV1Impl ,  
> TimeClientV2Impl and if required a base class, so that its clear which part 
> of the code belongs to which version and thus better maintainable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application


[ 
https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726187#comment-15726187
 ] 

Sangjin Lee commented on YARN-5739:
---

I have committed the patch to YARN-5355 and YARN-5355-branch-2. I had to cherry 
pick YARN-4765 to YARN-5355-branch-2 before cherry picking this one because 
YARN-4765, on which this patch is based, was missing on YARN-5355-branch-2.

Thanks [~gtCarrera9] for your contribution, and [~varun_saxena] for your review!

> Provide timeline reader API to list available timeline entity types for one 
> application
> ---
>
> Key: YARN-5739
> URL: https://issues.apache.org/jira/browse/YARN-5739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-5739-YARN-5355.001.patch, 
> YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, 
> YARN-5739-YARN-5355.004.patch, YARN-5739-YARN-5355.005.patch, 
> YARN-5739-YARN-5355.006.patch, YARN-5739-YARN-5355.007.patch
>
>
> Right now we only show a part of available timeline entity data in the new 
> YARN UI. However, some data (especially library specific data) are not 
> possible to be queried out by the web UI. It will be appealing for the UI to 
> provide an "entity browser" for each YARN application. Actually, simply 
> dumping out available timeline entities (with proper pagination, of course) 
> would be pretty helpful for UI users. 
> On timeline side, we're not far away from this goal. Right now I believe the 
> only thing missing is to list all available entity types within one 
> application. The challenge here is that we're not storing this data for each 
> application, but given this kind of call is relatively rare (compare to 
> writes and updates) we can perform some scanning during the read time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5932) Retrospect moveApplicationToQueue in align with YARN-5611


[ 
https://issues.apache.org/jira/browse/YARN-5932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726220#comment-15726220
 ] 

Jian He commented on YARN-5932:
---

bq. Could such cases be handled in next allocation / preemption cycle itself.
The capacity could even go over max-capacity, which violates capacity 
guarantees. Anyway, this is a different question. 

The patch itself looks good to me.

> Retrospect moveApplicationToQueue in align with YARN-5611
> -
>
> Key: YARN-5932
> URL: https://issues.apache.org/jira/browse/YARN-5932
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5932.0001.patch, YARN-5932.0002.patch, 
> YARN-5932.0003.patch, YARN-5932.v0.patch, YARN-5932.v1.patch
>
>
> All dynamic api's of an application's state change could follow a general 
> design approach. Currently priority and app timeouts are following this 
> approach all corner cases.
> *Steps*
> - Do a pre-validate check to ensure that changes are fine.
> - Update this information to state-store
> - Perform real move operation and update in-memory data structures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5709) Cleanup Curator-based leader election code


[ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726301#comment-15726301
 ] 

Karthik Kambatla commented on YARN-5709:


[~djp] - I was really hoping to clean up the configs in 2.8.0. The current 
naming is very confusing, at least to me. Has been a while since I looked at it.

Will follow up on this with more clear explanations and likely a patch this 
week. [~templedf] - okay if I pick this up? 

> Cleanup Curator-based leader election code
> --
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Daniel Templeton
>Priority: Critical
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5910) Support for multi-cluster delegation tokens

2016-12-06 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726204#comment-15726204
 ] 

Allen Wittenauer commented on YARN-5910:


bq.  How to maintain such an unknown list is a non-trivial task in the first 
place.

Yup... and you haven't even gotten to the part where you try to use the service 
for your application.  This is why DNS support would be extremely useful here.  
Ask it where uri://haservice is located then query the host responding for that 
service the details.

In any case, this isn't a YARN problem.  This is a HADOOP problem.

> Support for multi-cluster delegation tokens
> ---
>
> Key: YARN-5910
> URL: https://issues.apache.org/jira/browse/YARN-5910
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Clay B.
>Priority: Minor
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> 
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 
> 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a 
> failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425)
> ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For

[jira] [Commented] (YARN-4675) Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl


[ 
https://issues.apache.org/jira/browse/YARN-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726389#comment-15726389
 ] 

Sangjin Lee commented on YARN-4675:
---

For your reference, things that are referencing {{TimelineClientImpl}} directly 
today:
- JobHistoryFileReplayMapperV1 (MR)
- SimpleEntityWriterV1 (MR)
- TestDistributedShell (DS)
- TestDSAppMaster (DS)
- TestNMTimelinePublisher (node manager)
- TestTimelineWebServicesWithSSL (AHS)

> Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl
> 
>
> Key: YARN-4675
> URL: https://issues.apache.org/jira/browse/YARN-4675
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: YARN-5355, oct16-medium
> Attachments: YARN-4675-YARN-2928.v1.001.patch
>
>
> We need to reorganize TimeClientImpl into TimeClientV1Impl ,  
> TimeClientV2Impl and if required a base class, so that its clear which part 
> of the code belongs to which version and thus better maintainable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5970) Validate update timeout request parameters


[ 
https://issues.apache.org/jira/browse/YARN-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727922#comment-15727922
 ] 

Sunil G commented on YARN-5970:
---

Patch looks good for me. I will commit the patch later today if there are no 
objections.

> Validate update timeout request parameters
> --
>
> Key: YARN-5970
> URL: https://issues.apache.org/jira/browse/YARN-5970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5970.0.patch
>
>
> AppTimeoutInfo is exposed to REST clients. This object request parameters 
> should be validated before processing it to clientRMService.
> And also handles couple of minor issues in REST services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5085) Add support for change of container ExecutionType


[ 
https://issues.apache.org/jira/browse/YARN-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728054#comment-15728054
 ] 

Hadoop QA commented on YARN-5085:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 16 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
55s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
37s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m  
2s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  3s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 35 new + 1582 unchanged - 16 fixed = 1617 total (was 1598) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
50s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
18s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
34s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
32s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
35s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 
44s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 42m 46s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 16m 32s{color} 
| {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}140m 12s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common

[jira] [Commented] (YARN-5932) Retrospect moveApplicationToQueue in align with YARN-5611


[ 
https://issues.apache.org/jira/browse/YARN-5932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727733#comment-15727733
 ] 

Rohith Sharma K S commented on YARN-5932:
-

committed to trunk.. the patch do not apply to branch-2. [~sunilg] could you 
give patch to branch-2?

> Retrospect moveApplicationToQueue in align with YARN-5611
> -
>
> Key: YARN-5932
> URL: https://issues.apache.org/jira/browse/YARN-5932
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5932.0001.patch, YARN-5932.0002.patch, 
> YARN-5932.0003.patch, YARN-5932.v0.patch, YARN-5932.v1.patch
>
>
> All dynamic api's of an application's state change could follow a general 
> design approach. Currently priority and app timeouts are following this 
> approach all corner cases.
> *Steps*
> - Do a pre-validate check to ensure that changes are fine.
> - Update this information to state-store
> - Perform real move operation and update in-memory data structures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5965) Revisit ApplicationReport #getApplicationTimeouts


 [ 
https://issues.apache.org/jira/browse/YARN-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-5965:

Attachment: YARN-5965.1.patch

updated patch addressing comments

> Revisit ApplicationReport #getApplicationTimeouts
> -
>
> Key: YARN-5965
> URL: https://issues.apache.org/jira/browse/YARN-5965
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Jian He
>Assignee: Rohith Sharma K S
> Attachments: YARN-5965.0.patch, YARN-5965.1.patch
>
>
> Currently it returns a list of ApplicationTimeout objects,  to get a 
> particular timeout, the caller code needs to iterate the list and compare the 
> timeoutType to get the corresponding value. Is a map data structure easier 
> for use code? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5932) Retrospect moveApplicationToQueue in align with YARN-5611

2016-12-06 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727784#comment-15727784
 ] 

Hudson commented on YARN-5932:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10958 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10958/])
YARN-5932. Retrospect moveApplicationToQueue in align with YARN-5611. 
(rohithsharmaks: rev 563480dccd0136d82730f4228f1df9ed5822)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManagerEventType.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManagerEvent.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestMoveApplication.java
* (delete) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppMoveEvent.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppEventType.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerNodeLabelUpdate.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


> Retrospect moveApplicationToQueue in align with YARN-5611
> -
>
> Key: YARN-5932
> URL: https://issues.apache.org/jira/browse/YARN-5932
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5932.0001.patch, YARN-5932.0002.patch, 
> YARN-5932.0003.patch, YARN-5932.v0.patch, YARN-5932.v1.patch
>
>
> All dynamic api's of an application's state change could follow a general 
> design approach. Currently priority and app timeouts are following this 
> approach all corner cases.
> *Steps*
> - Do a pre-validate check to ensure that changes are fine.
> - Update this information to state-store
> - Perform real move operation and update in-memory data structures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5611) Provide an API to update lifetime of an application.

2016-12-06 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727783#comment-15727783
 ] 

Hudson commented on YARN-5611:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10958 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10958/])
YARN-5932. Retrospect moveApplicationToQueue in align with YARN-5611. 
(rohithsharmaks: rev 563480dccd0136d82730f4228f1df9ed5822)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManagerEventType.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManagerEvent.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestMoveApplication.java
* (delete) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppMoveEvent.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppEventType.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerNodeLabelUpdate.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


> Provide an API to update lifetime of an application.
> 
>
> Key: YARN-5611
> URL: https://issues.apache.org/jira/browse/YARN-5611
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>  Labels: oct16-hard
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: 0001-YARN-5611.patch, 0002-YARN-5611.patch, 
> 0003-YARN-5611.patch, YARN-5611.0004.patch, YARN-5611.0005.patch, 
> YARN-5611.0006.patch, YARN-5611.0007.patch, YARN-5611.0008.patch, 
> YARN-5611.0009.patch, YARN-5611.0010.patch, YARN-5611.v0.patch
>
>
> YARN-4205 monitors an Lifetime of an applications is monitored if required. 
> Add an client api to update lifetime of an application. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5673) [Umbrella] Re-write container-executor to improve security, extensibility, and portability

2016-12-06 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727953#comment-15727953
 ] 

Varun Vasudev commented on YARN-5673:
-

Thanks for the feedback [~miklos.szeg...@cloudera.com]!

{quote}
What is the common functionality that all modules need and why do not we simply 
split them into separate executables instead of loading them as modules to a 
core container-executor process?
linux-container-executor
docker-container-executor
mount-cgroups
tc-executor

2. I also have a concern of maintenance. File system privileges setup is what 
system administrators are most familiar with. We could let them use that 
instead of modifying proprietary configuration files, how to locate load 
modules into a linux-container-executor binary. Any common functionality like 
auditing, logging, setup checking can be statically linked to each executable. 
It also has the advantage of setting suid bits separately. An example: 
/usr/bin/sudo and /usr/bin/passwd tell in their names what they do and what you 
get if you set suid permission on them. An administrator would only set it on 
the executables that are needed. Setting suid on the posix-container-executor 
on the other hand means that it is allowed to load modules in a controlled but 
advanced way. Separate executables made the life of administrators way easier I 
think. Each of these binaries can have its own configuration file just like the 
modules you proposed.
{quote}

All of these binaries will require the setuid bit to be a set which means 
administrators will have to set permissions and manage 4 binaries. We also have 
to worry about 4 binaries that can have privilege escalation as opposed to one 
- any hot fixes for example will require all 4 binaries to be updated as 
opposed to just one. Interestingly you feel that administrator overhead of 
managing 4 binaries is worth it whereas some folks would prefer it the other 
way round :). Do other folks feel that the multiple binaries approach is the 
way to go?

{quote}
1. I have a concern of native modules potentially loaded into the same process. 
Even if communication between modules is not allowed this is a native binary, 
where all native code will have access to the whole memory. Just like the 
current executor, when more features are running in the same process with admin 
privileges, a faulty or malicious module may cause security issues even loading 
and accessing others. Even if we can add protection, the protection code would 
add more complexity.
{quote}

Fair point. The idea here is that -
(1) Administrators will not add arbitrary modules to the module list.
(2) The posix-container-executor will give up all privileges before loading the 
modules which don't require administrator privileges
(3) Give administrators an option to turn off modules that require 
administrator privileges.
Would these help mitigate your concerns? The issue with the current setup is 
that there is no clean way to enable/disable functionality that administrators 
do not want enabled on their cluster.

{quote}
A. One more separate issue that I wanted to ask your opinion about is the time 
launching a container. First the container executor is executed, then a shell 
script that runs the actual container like Java. Would not it be faster to 
launch container executor just once and communicate launch commands through the 
standard pipe or a named pipe and keep it running as long as the node manager 
is running?
{quote}
It probably would be but container launch time hasn’t been something people 
have complained about. Do you have some scenarios where container launch time 
has been an issue? The security aspects of a long running process versus one 
which is invoked on demand are different as well.

> [Umbrella] Re-write container-executor to improve security, extensibility, 
> and portability
> --
>
> Key: YARN-5673
> URL: https://issues.apache.org/jira/browse/YARN-5673
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: container-executor Re-write Design Document.pdf
>
>
> As YARN adds support for new features that require administrator 
> privileges(such as support for network throttling and docker), we’ve had to 
> add new capabilities to the container-executor. This has led to a recognition 
> that the current container-executor security features as well as the code 
> could be improved. The current code is fragile and it’s hard to add new 
> features without causing regressions. Some of the improvements that need to 
> be made are -
> *Security*
> Currently the container-executor has limited security features. It relies 
> primarily on the permissions set on

[jira] [Commented] (YARN-5965) Revisit ApplicationReport #getApplicationTimeouts


[ 
https://issues.apache.org/jira/browse/YARN-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728033#comment-15728033
 ] 

Jian He commented on YARN-5965:
---

lgtm

> Revisit ApplicationReport #getApplicationTimeouts
> -
>
> Key: YARN-5965
> URL: https://issues.apache.org/jira/browse/YARN-5965
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Jian He
>Assignee: Rohith Sharma K S
> Attachments: YARN-5965.0.patch, YARN-5965.1.patch
>
>
> Currently it returns a list of ApplicationTimeout objects,  to get a 
> particular timeout, the caller code needs to iterate the list and compare the 
> timeoutType to get the corresponding value. Is a map data structure easier 
> for use code? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5877) Allow all nm-whitelist-env to get overridden during launch


[ 
https://issues.apache.org/jira/browse/YARN-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727893#comment-15727893
 ] 

Sunil G commented on YARN-5877:
---

We have checked various cases for this issue in detail and had some offline 
discussions with [~vvasudev] and [~bibinchundatt] also.

Now coming to the issue mentioned. MAPREDUCE-6704 has come in to a common 
consensus that HADOOP_MAPRED_HOME could be published with nm-white-list so that 
even if this variable is not present in env, we can take the configured value. 
This will solve the container launch problem mentioned there.

As of today, if we publish HADOOP_MAPRED_HOME through nm-whitelist-env, it will 
look like below in launch_container.sh.
{{export HADOOP_MAPRED_HOME="/home/hadoopbuild"}}

As mentioned above, this will be fine in case of normal container launch. But 
in docker scenario, the path value given may not be valid. In such case, we 
will see the failure as mentioned in this jira. Ideally if this environment 
variable were published as whitelist env, then it could try to take from env 
itself first.

{code}
export HADOOP_MAPRED_HOME=${HADOOP_MAPRED_HOME:-"/home/hadoopbuild"}
{code}

Here container will try to get the value from *HADOOP_MAPRED_HOME* if its 
available in env first. If not, it will take value which is given as 
"/home/hadoopbuild". In case of docker, this value was set in its system env 
itself (i think Bibin used sequence iq docker image here which had this value).

So in conclusion, all variables mentioned in {{yarn.nodemanager.env-whitelist}} 
could be whitelisted. This can solve the problem. If some docker images doesn't 
have this env variable set in its env, and if the default path to 
HADOOP_MAPRED_HOME is also invalid, then user has to set correct mapred home 
path in {{yarn.app.mapreduce.am.env}}.

I think YARN-5877.0002.patch is good one to go. [~bibinchundatt] could you 
rebase the patch and give a latest version number to compile against trunk. If 
possible, could you please share test results with this latest patch too in 
docker.


> Allow all nm-whitelist-env to get overridden during launch
> --
>
> Key: YARN-5877
> URL: https://issues.apache.org/jira/browse/YARN-5877
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: Dockerfile, YARN-5877.0001.patch, YARN-5877.0002.patch, 
> YARN-5877.0003.patch, bootstrap.sh, yarn-site.xml
>
>
> As per the {{yarn.nodemanager.env-whitelist}} for the configured values 
> should  containers may override rather than use NodeManager's default.
> {code}
>   
> Environment variables that containers may override rather 
> than use NodeManager's default.
> yarn.nodemanager.env-whitelist
> 
> JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME
>   
> {code}
> But only the following containers can override
> {code}
> whitelist.add(ApplicationConstants.Environment.HADOOP_YARN_HOME.name());
> whitelist.add(ApplicationConstants.Environment.HADOOP_COMMON_HOME.name());
> whitelist.add(ApplicationConstants.Environment.HADOOP_HDFS_HOME.name());
> whitelist.add(ApplicationConstants.Environment.HADOOP_CONF_DIR.name());
> whitelist.add(ApplicationConstants.Environment.JAVA_HOME.name());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5709) Cleanup leader election related configuration mess


 [ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-5709:
---
Attachment: yarn-5709.1.patch

> Cleanup leader election related configuration mess
> --
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Daniel Templeton
>Priority: Blocker
> Attachments: yarn-5709.1.patch
>
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5709) Cleanup leader election related configuration mess


 [ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-5709:
---
Attachment: (was: yarn-5709-1.patch)

> Cleanup leader election related configuration mess
> --
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Daniel Templeton
>Priority: Blocker
> Attachments: yarn-5709.1.patch
>
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue

2016-12-06 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727669#comment-15727669
 ] 

Wilfred Spiegelenburg commented on YARN-5554:
-

The main point is that the {{ClientRMService}} does not have direct access to 
the Scheduler. All access checks run through the {{QueueACLsManager}} or the 
{{ApplicationACLsManager}}. Any change must thus go through that. In this case 
the new method was introduced because the current method does not have the 
destination queue available. We need to check the destination queue the 
originating queue is already checked earlier by calling the existing method. 
The passed in application has not been moved yet and thus still has the 
original queue. Updating the application is not possible because that would 
pre-empt the fact that the application can and will be moved.

The target queue checks are performed because it comes out of the move request 
and has not been checked at the time the access check is performed. To be able 
to distinguish between an access denied and a queue that does not exist the log 
message was added if the queue returned is empty. Without that check, and the 
log entries, at that point we would not be able to trace back that difference.

I looked at folding the two methods into one to remove some code duplication 
but stopped with that. The small but important differences between the two 
methods required a number of {{if ... else ...}} constructs which made the code 
really difficult to read and understand.





> MoveApplicationAcrossQueues does not check user permission on the target queue
> --
>
> Key: YARN-5554
> URL: https://issues.apache.org/jira/browse/YARN-5554
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
>  Labels: oct16-medium
> Attachments: YARN-5554.10.patch, YARN-5554.11.patch, 
> YARN-5554.2.patch, YARN-5554.3.patch, YARN-5554.4.patch, YARN-5554.5.patch, 
> YARN-5554.6.patch, YARN-5554.7.patch, YARN-5554.8.patch, YARN-5554.9.patch
>
>
> moveApplicationAcrossQueues operation currently does not check user 
> permission on the target queue. This incorrectly allows one user to move 
> his/her own applications to a queue that the user has no access to



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5922) Remove direct references of HBaseTimelineWriter/Reader in core ATS classes


[ 
https://issues.apache.org/jira/browse/YARN-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727101#comment-15727101
 ] 

Hadoop QA commented on YARN-5922:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 20m 
45s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  4m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
32s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
51s{color} | {color:green} hadoop-yarn-server-timelineservice in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 62m 15s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-5922 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12842040/YARN-5922.03.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 5d8ee26fd270 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 
21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / a7288da |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/14199/testReport/ |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice
 U: hadoop-yarn-project/hadoop-yarn |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/14199/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This

[jira] [Assigned] (YARN-5006) ResourceManager quit due to ApplicationStateData exceed the limit size of znode in zk


 [ 
https://issues.apache.org/jira/browse/YARN-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

luhuichun reassigned YARN-5006:
---

Assignee: luhuichun

> ResourceManager quit due to ApplicationStateData exceed the limit  size of 
> znode in zk
> --
>
> Key: YARN-5006
> URL: https://issues.apache.org/jira/browse/YARN-5006
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0, 2.7.2
>Reporter: dongtingting
>Assignee: luhuichun
>Priority: Critical
>
> Client submit a job, this job add 1 file into DistributedCache. when the 
> job is submitted, ResourceManager sotre ApplicationStateData into zk. 
> ApplicationStateData  is exceed the limit size of znode. RM exit 1.   
> The related code in RMStateStore.java :
> {code}
>   private static class StoreAppTransition
>   implements SingleArcTransition {
> @Override
> public void transition(RMStateStore store, RMStateStoreEvent event) {
>   if (!(event instanceof RMStateStoreAppEvent)) {
> // should never happen
> LOG.error("Illegal event type: " + event.getClass());
> return;
>   }
>   ApplicationState appState = ((RMStateStoreAppEvent) 
> event).getAppState();
>   ApplicationId appId = appState.getAppId();
>   ApplicationStateData appStateData = ApplicationStateData
>   .newInstance(appState);
>   LOG.info("Storing info for app: " + appId);
>   try {  
> store.storeApplicationStateInternal(appId, appStateData);  //store 
> the appStateData
> store.notifyApplication(new RMAppEvent(appId,
>RMAppEventType.APP_NEW_SAVED));
>   } catch (Exception e) {
> LOG.error("Error storing app: " + appId, e);
> store.notifyStoreOperationFailed(e);   //handle fail event, system 
> exit 
>   }
> };
>   }
> {code}
> The Exception log:
> {code}
>  ...
> 2016-04-20 11:26:35,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore 
> AsyncDispatcher event handler: Maxed out ZK retries. Giving up!
> 2016-04-20 11:26:35,732 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore 
> AsyncDispatcher event handler: Error storing app: 
> application_1461061795989_17671
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931)
> at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:936)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:933)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1075)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1096)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:933)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:947)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:956)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:626)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:138)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:123)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:860)
> at 
>

[jira] [Assigned] (YARN-2338) service assemble so complex


 [ 
https://issues.apache.org/jira/browse/YARN-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

luhuichun reassigned YARN-2338:
---

Assignee: luhuichun

> service assemble so complex
> ---
>
> Key: YARN-2338
> URL: https://issues.apache.org/jira/browse/YARN-2338
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: JiJun Tang
>Assignee: luhuichun
>
>   See ResourceManager
> protected void serviceInit(Configuration configuration) throws Exception 
> So many service will assembe into resourcemanager.
> Use guice or other service assemble framework to refactor this complex code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler


[ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727160#comment-15727160
 ] 

Hadoop QA commented on YARN-4390:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 13 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
46s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
20s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
10s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 24s{color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_111
 with JDK v1.8.0_111 generated 1 new + 26 unchanged - 11 fixed = 27 total (was 
37) {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 29s{color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_121
 with JDK v1.7.0_121 generated 1 new + 26 unchanged - 11 fixed = 27 total (was 
37) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 23s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 30 new + 527 unchanged - 17 fixed = 557 total (was 544) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 40s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_121. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}174m 27s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_111 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_121 Failed junit tests |

[jira] [Commented] (YARN-5709) Cleanup leader election configs and pluggability


[ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727593#comment-15727593
 ] 

Hadoop QA commented on YARN-5709:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
51s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m  
9s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  5m  9s{color} 
| {color:red} hadoop-yarn-project_hadoop-yarn generated 2 new + 37 unchanged - 
0 fixed = 39 total (was 37) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 47s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 10 new + 339 unchanged - 7 fixed = 349 total (was 346) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
32s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
30s{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager
 generated 2 new + 913 unchanged - 0 fixed = 915 total (was 913) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
32s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 40m 16s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 82m 31s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
|   | hadoop.yarn.server.resourcemanager.TestLeaderElectorService |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
| Timed out junit tests | 
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA |
|   | org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-5709 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12842073/yarn-5709.1.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux a36c94086a4e 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 
10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality |

[jira] [Commented] (YARN-5966) AMRMClient changes to support ExecutionType update

2016-12-06 Thread Hitesh Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727307#comment-15727307
 ] 

Hitesh Sharma commented on YARN-5966:
-

Just a quick clarification, how is it different than [YARN-5087]. Is this an 
extension of that patch?

> AMRMClient changes to support ExecutionType update
> --
>
> Key: YARN-5966
> URL: https://issues.apache.org/jira/browse/YARN-5966
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5966.wip.001.patch
>
>
> {{AMRMClient}} changes to support change of container ExecutionType



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5709) Cleanup leader election related configuration mess


[ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727438#comment-15727438
 ] 

Karthik Kambatla commented on YARN-5709:


The patch posted here captures my intention. [~jianhe] - what do you think? 

> Cleanup leader election related configuration mess
> --
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-5709.1.patch
>
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-558) Add ability to completely remove nodemanager from resourcemanager.


 [ 
https://issues.apache.org/jira/browse/YARN-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

luhuichun reassigned YARN-558:
--

Assignee: luhuichun

> Add ability to completely remove nodemanager from resourcemanager.
> --
>
> Key: YARN-558
> URL: https://issues.apache.org/jira/browse/YARN-558
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Garth Goodson
>Assignee: luhuichun
>Priority: Minor
>  Labels: feature
>
> I would like to add the ability to completely remove a nodemanager from the 
> resourcemanager's state.
> I run a cloud service where I want to dynamically bring up nodes to act as 
> nodemanagers and then bring them down again when not needed.  These nodes 
> have dynamically assigned IPs, thus the alternative of decommissioning them 
> via an excludes file leads to a large (unbounded) list of decommissioned 
> nodes that may never be commissioned again. I would like the ability to move 
> a node from a decommissioned state to completely removing it from the 
> resource manager.
> I have thought of two ways of implementing this.
> 1) Add an optional timeout between the decommission state -> being removed 
> from the nodemanager.
> 2) Add an explicit RPC to remove a node that is decommissioned.
> Any additional thoughts/discussion are welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5972) Add Support for Pausing/Freezing of containers

2016-12-06 Thread Hitesh Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727256#comment-15727256
 ] 

Hitesh Sharma commented on YARN-5972:
-

Hi folks, thanks for opening this JIRA and the feedback. Much appreciated. 

{quote}
While notifying an AM of containers that are about to be preempted does allow 
the AM to check-point work, it does imply, as you pointed out, that AMs be 
modified to act on this input and make some decisions based on it.

Container pausing/freezing on the other hand, given OS/VM level support (also 
exposed via Docker and LXC) to actually freeze a process (agreed, their 
definition of freeze might vary), is actually AM/application independent. This 
can be useful, for applications and deployments that do not really want to 
check-point on its own but at the same time like the idea of container 
preemption with work preservations.
{quote}

Agree with [~asuresh] here. What container pausing/freezing offers is an 
ability to delegate to the underlying OS how the resources used by a container 
should be reclaimed and when resources free up again then restart the 
container. The gains of doing so will vary based on the container executor 
implementation. That said it doesn't make the PAUSE/RESUME functionality to be 
the perfect solution for work preservation or substitute AM specific 
checkpointing.

[YARN-5292] adds PAUSE/RESUME for opportunistic containers and doesn't target 
guaranteed containers. I can think of scenarios where it is good to have this 
functionality in guaranteed containers but I would wait and see some need 
coming in the community.  

Allowing the ContainerManager to initiate a pause/resume on an opportunistic 
container was considered but we decided not to have that functionality. There 
are some edge cases around what happens if the CM initiates a RESUME on a 
paused container and the NM tries to PAUSE it ([YARN-5216]). I think [~subru] 
is also touching towards these edge cases.

Overall I feel that the current design of allowing PAUSE/RESUME on 
opportunistic containers is a good starting point and allows to PAUSE an 
opportunistic container in favor of a guaranteed one and when resources free up 
it gets RESUMED ([YARN-5216]). We should probably implement pauseContainer and 
resumeContainer for Docker based container executors as opportunistic 
containers running inside Docker containers can benefit from it. 

If the community feels then we can extend the functionality towards guaranteed 
containers. I personally think that may become more relevant as YARN containers 
become virtualized via Docker or virtual machines, but I would love to hear 
some scenarios before we do that.

> Add Support for Pausing/Freezing of containers
> --
>
> Key: YARN-5972
> URL: https://issues.apache.org/jira/browse/YARN-5972
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> Instead of preempting a running container, the container can be moved to a 
> PAUSED state, where it remains until resources get freed up on the node then 
> the preempted container can resume to the running state.
> Note that process freezing this is already supported by 'cgroups freezer' 
> which is used internally by the docker pause functionality. Windows also has 
> OS level support of a similar nature.
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt call would 
> pause the VM and resume would restore it back to the running state.
> If the container executor / runtime doesn't support preemption, then preempt 
> would default to killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5966) AMRMClient changes to support ExecutionType update


[ 
https://issues.apache.org/jira/browse/YARN-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727336#comment-15727336
 ] 

Arun Suresh commented on YARN-5966:
---

[~hrsharma], YARN-5087 addresses changes in the {{ApplicationMasterProtocol}}, 
which was handled by YARN-5221. This JIRA proposes changes to the AMRMClient.

> AMRMClient changes to support ExecutionType update
> --
>
> Key: YARN-5966
> URL: https://issues.apache.org/jira/browse/YARN-5966
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5966.wip.001.patch
>
>
> {{AMRMClient}} changes to support change of container ExecutionType



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5709) Cleanup leader election related configuration mess


 [ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-5709:
---
Attachment: yarn-5709-1.patch

> Cleanup leader election related configuration mess
> --
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Daniel Templeton
>Priority: Blocker
> Attachments: yarn-5709-1.patch
>
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5709) Cleanup leader election related configuration mess


 [ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-5709:
---
Attachment: (was: yarn-5709.1.patch)

> Cleanup leader election related configuration mess
> --
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Daniel Templeton
>Priority: Blocker
> Attachments: yarn-5709.1.patch
>
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5709) Cleanup leader election related configuration mess


 [ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-5709:
---
Attachment: yarn-5709.1.patch

> Cleanup leader election related configuration mess
> --
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Daniel Templeton
>Priority: Blocker
> Attachments: yarn-5709.1.patch
>
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-12-06 Thread Jonathan Hung (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727137#comment-15727137
 ] 

Jonathan Hung commented on YARN-5734:
-

Hi [~jianhe], thanks for the feedback.
bq. Does add/remove also support a full qualified queue name, not just a 
hierachical structure ? I think supporting a single full qualified queue name 
would be handy, especially for CLI add/remove
Sure, I think it makes sense to support both.
bq. User may need to provide a new queue structure for initialization, then, 
the xml file will conflict with what's in config store.
I don't think I understand this part, can you explain why the user needs to 
provide a new queue structure?
Initialization will be done by xml even if API-based approach is enabled. Then 
on crash/restart the config store will be honored. Basically once store is 
initialized, it will be used as source of truth (and the xml is no longer 
useful).
bq. Is the implementation that the caller will block until the update is 
completed - both in store and memory ?
Yes, the plan is to block until the update is completed for both. This is to 
prevent the scenario where the client sends a configuration change, an event is 
queued, and the call returns, then RM crashes, at which point the configuration 
change is lost.
bq. IIUC, the EmbededDerbyDatabase is suitable for single RM only. Do you run 
RM HA in your cluster? Also, I guess Derby does not support fencing ? If so, we 
could potentially have two RMs writing together in a split-brain situation and 
cause data inconsistency. Therefore, I think ZKRMStateStore might be a better 
store option by default, especially because of RM HA.
Currently we are not running RM HA. The reason we have Derby as the default is 
because we currently have it running in production (and we don't have a working 
implementation which supports RM HA), so for single RM clusters we know it 
works well.
bq. Regarding PluggableConfigurationPolicy for authorization, has the 
implementation considered using YarnAuthorizationProvider ?
Took a look at this. I have a couple comments about it, let me know if it's not 
what you had in mind.
* Right now if I understand correctly it looks like YarnAuthorizationProvider 
only supports authorization based on queue ACL (submit/administer queue). We 
would need to extend the implementation to support things like fine-grained 
acls (e.g. acls by configuration key). In this case we would just extend 
YarnAuthorizationProvider with something like 
"SchedulerConfigurationAuthorizationProvider". If this is true, then each 
component using an authorization provider would need to configure its own 
implementation, since the SchedulerConfigurationAuthorizationProvider does not 
apply to all components (and it seems all components use the same provider 
determined by yarn.authorization-provider).
* We will probably still need the new pluggable configuration policy, at least 
for configuration change validation to make sure the proposed configuration 
changes make sense.

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, 
> OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will

[jira] [Commented] (YARN-5959) RM changes to support change of container ExecutionType

2016-12-06 Thread Hitesh Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727312#comment-15727312
 ] 

Hitesh Sharma commented on YARN-5959:
-

Hello [~asuresh], can you share some design around how NM handles the change in 
execution type? I will look at the patch more closely but having that context 
in mind will help.

Thanks a lot!

> RM changes to support change of container ExecutionType
> ---
>
> Key: YARN-5959
> URL: https://issues.apache.org/jira/browse/YARN-5959
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5959.combined.001.patch, YARN-5959.wip.002.patch, 
> YARN-5959.wip.patch
>
>
> RM side changes to allow an AM to ask for change of ExecutionType.
> Currently, there are two cases:
> # *Promotion* : OPPORTUNISTIC to GUARANTEED.
> # *Demotion* : GUARANTEED to OPPORTUNISTIC.
> This is similar in YARN-1197 which allows for change in Container resources. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler

2016-12-06 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725959#comment-15725959
 ] 

Eric Payne commented on YARN-4390:
--

The failing unit tests {{TestAMAuthorization, TestCapacityScheduler, 
TestCapacitySchedulerSurgicalPreemption, TestClientRMTokens}} usually pass for 
me. I am concerned about {{TestCapacitySchedulerSurgicalPreemption}} because 
this doesn't always pass and is affected by race conditions that prevent 
containers from being allocated fast enough or preempted before the assertion 
checks are made.

> Do surgical preemption based on reserved container in CapacityScheduler
> ---
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: QueueNotHittingMax.jpg, YARN-4390-design.1.pdf, 
> YARN-4390-test-results.pdf, YARN-4390.1.patch, YARN-4390.2.patch, 
> YARN-4390.3.branch-2.patch, YARN-4390.3.patch, YARN-4390.4.patch, 
> YARN-4390.5.patch, YARN-4390.6.patch, YARN-4390.7.patch, YARN-4390.8.patch, 
> YARN-4390.branch-2.8.001.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5647) [Security] Collector and reader side changes for loading auth filters and principals


[ 
https://issues.apache.org/jira/browse/YARN-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726063#comment-15726063
 ] 

Varun Saxena commented on YARN-5647:


Please note patch in YARN-5648 is on top of YARN-5647

> [Security] Collector and reader side changes for loading auth filters and 
> principals
> 
>
> Key: YARN-5647
> URL: https://issues.apache.org/jira/browse/YARN-5647
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: oct16-hard
> Attachments: YARN-5647-YARN-5355.wip.002.patch, 
> YARN-5647-YARN-5355.wip.003.patch, YARN-5647-YARN-5355.wip.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application


[ 
https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726060#comment-15726060
 ] 

Varun Saxena commented on YARN-5739:


[~sjlee0], should I go ahead and commit it or do you want to commit it ?

> Provide timeline reader API to list available timeline entity types for one 
> application
> ---
>
> Key: YARN-5739
> URL: https://issues.apache.org/jira/browse/YARN-5739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-5739-YARN-5355.001.patch, 
> YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, 
> YARN-5739-YARN-5355.004.patch, YARN-5739-YARN-5355.005.patch, 
> YARN-5739-YARN-5355.006.patch, YARN-5739-YARN-5355.007.patch
>
>
> Right now we only show a part of available timeline entity data in the new 
> YARN UI. However, some data (especially library specific data) are not 
> possible to be queried out by the web UI. It will be appealing for the UI to 
> provide an "entity browser" for each YARN application. Actually, simply 
> dumping out available timeline entities (with proper pagination, of course) 
> would be pretty helpful for UI users. 
> On timeline side, we're not far away from this goal. Right now I believe the 
> only thing missing is to list all available entity types within one 
> application. The challenge here is that we're not storing this data for each 
> application, but given this kind of call is relatively rare (compare to 
> writes and updates) we can perform some scanning during the read time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5975) Remove the agent - slider AM ssl related code


[ 
https://issues.apache.org/jira/browse/YARN-5975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727237#comment-15727237
 ] 

Hadoop QA commented on YARN-5975:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
36s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
58s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
17s{color} | {color:green} yarn-native-services passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  
6s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-slider/hadoop-yarn-slider-core
 in yarn-native-services has 268 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
32s{color} | {color:red} hadoop-yarn-slider-core in yarn-native-services 
failed. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 27s{color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-slider_hadoop-yarn-slider-core
 generated 8 new + 29 unchanged - 7 fixed = 37 total (was 36) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 25s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-slider/hadoop-yarn-slider-core:
 The patch generated 3 new + 732 unchanged - 158 fixed = 735 total (was 890) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
8s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-slider/hadoop-yarn-slider-core
 generated 0 new + 262 unchanged - 6 fixed = 262 total (was 268) {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
27s{color} | {color:red} hadoop-yarn-slider-core in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
45s{color} | {color:green} hadoop-yarn-slider-core in the patch passed. {color} 
|
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 19m  7s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-5975 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12842041/YARN-5975-yarn-native-services.01.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  cc  |
| uname | Linux d7297058f3bc 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 
20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | yarn-native-services / 69283ee |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| findbugs |

[jira] [Commented] (YARN-5959) RM changes to support change of container ExecutionType


[ 
https://issues.apache.org/jira/browse/YARN-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727348#comment-15727348
 ] 

Arun Suresh commented on YARN-5959:
---

[~hrsharma], this patch address only the RM side changes. The changes to the NM 
will be handled in YARN-5977 and YARN-5978

> RM changes to support change of container ExecutionType
> ---
>
> Key: YARN-5959
> URL: https://issues.apache.org/jira/browse/YARN-5959
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5959.combined.001.patch, YARN-5959.wip.002.patch, 
> YARN-5959.wip.patch
>
>
> RM side changes to allow an AM to ask for change of ExecutionType.
> Currently, there are two cases:
> # *Promotion* : OPPORTUNISTIC to GUARANTEED.
> # *Demotion* : GUARANTEED to OPPORTUNISTIC.
> This is similar in YARN-1197 which allows for change in Container resources. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5709) Cleanup leader election configs and pluggability


 [ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-5709:
---
Summary: Cleanup leader election configs and pluggability  (was: Cleanup 
leader election related configuration mess)

> Cleanup leader election configs and pluggability
> 
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-5709.1.patch
>
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue


[ 
https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727172#comment-15727172
 ] 

Jian He commented on YARN-5554:
---

oh, I think it was because the CapacityScheduler#checkAccess is actually not 
needed for CS now, it cannot be removed because it's a public interface. And CS 
requires additional parameters like remoteAddress, forwardedAddresses which is 
not that appropriate to add that in the public interface, because FS inherits 
the same interface and it does not require these parameters.

> MoveApplicationAcrossQueues does not check user permission on the target queue
> --
>
> Key: YARN-5554
> URL: https://issues.apache.org/jira/browse/YARN-5554
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
>  Labels: oct16-medium
> Attachments: YARN-5554.10.patch, YARN-5554.11.patch, 
> YARN-5554.2.patch, YARN-5554.3.patch, YARN-5554.4.patch, YARN-5554.5.patch, 
> YARN-5554.6.patch, YARN-5554.7.patch, YARN-5554.8.patch, YARN-5554.9.patch
>
>
> moveApplicationAcrossQueues operation currently does not check user 
> permission on the target queue. This incorrectly allows one user to move 
> his/her own applications to a queue that the user has no access to



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-5977) ContainerManagementProtocol changes to support change of container ExecutionType

Arun Suresh created YARN-5977:
-

 Summary: ContainerManagementProtocol changes to support change of 
container ExecutionType
 Key: YARN-5977
 URL: https://issues.apache.org/jira/browse/YARN-5977
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun Suresh






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-5978) ContainerScheduler and Container state machine changes to support ExecType update

Arun Suresh created YARN-5978:
-

 Summary: ContainerScheduler and Container state machine changes to 
support ExecType update
 Key: YARN-5978
 URL: https://issues.apache.org/jira/browse/YARN-5978
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun Suresh






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-5709) Cleanup leader election related configuration mess


 [ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-5709:
--

Assignee: Karthik Kambatla  (was: Daniel Templeton)

> Cleanup leader election related configuration mess
> --
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-5709.1.patch
>
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5922) Remove direct references of HBaseTimelineWriter/Reader in core ATS classes


[ 
https://issues.apache.org/jira/browse/YARN-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727426#comment-15727426
 ] 

Sangjin Lee commented on YARN-5922:
---

I cherry-picked that commit into YARN-5355 and YARN-5355-branch-2 respectively.

The latest trunk patch LGTM. Please add the YARN-5355 patch if different, and 
I'll commit them once the CI turns green.

> Remove direct references of HBaseTimelineWriter/Reader in core ATS classes
> --
>
> Key: YARN-5922
> URL: https://issues.apache.org/jira/browse/YARN-5922
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0-alpha1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-5922-YARN-5355.01.patch, 
> YARN-5922-YARN-5355.02.patch, YARN-5922.01.patch, YARN-5922.02.patch, 
> YARN-5922.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-5120) Metric for RM async dispatcher queue size


 [ 
https://issues.apache.org/jira/browse/YARN-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

luhuichun reassigned YARN-5120:
---

Assignee: luhuichun

> Metric for RM async dispatcher queue size
> -
>
> Key: YARN-5120
> URL: https://issues.apache.org/jira/browse/YARN-5120
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: luhuichun
>Priority: Minor
>
> It is difficult to identify the health of the RM AsyncDispatcher. 
> Solution: Add a metric for the AsyncDispatcher queue size. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5709) Cleanup leader election related configuration mess


 [ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-5709:
---
Summary: Cleanup leader election related configuration mess  (was: Cleanup 
Curator-based leader election code)

> Cleanup leader election related configuration mess
> --
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Daniel Templeton
>Priority: Blocker
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5709) Cleanup Curator-based leader election code


 [ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-5709:
---
Priority: Blocker  (was: Critical)

> Cleanup Curator-based leader election code
> --
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Daniel Templeton
>Priority: Blocker
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler


[ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726446#comment-15726446
 ] 

Wangda Tan commented on YARN-4390:
--

Thanks [~eepayne], if the {{TestCapacitySchedulerSurgicalPreemption}} passes 
locally for you, please open a JIRA to track the UT failure and unblock the 
backport.

And javac warnings / javadocs warnings looks related to the change, could you 
update the patch accordingly?

> Do surgical preemption based on reserved container in CapacityScheduler
> ---
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: QueueNotHittingMax.jpg, YARN-4390-design.1.pdf, 
> YARN-4390-test-results.pdf, YARN-4390.1.patch, YARN-4390.2.patch, 
> YARN-4390.3.branch-2.patch, YARN-4390.3.patch, YARN-4390.4.patch, 
> YARN-4390.5.patch, YARN-4390.6.patch, YARN-4390.7.patch, YARN-4390.8.patch, 
> YARN-4390.branch-2.8.001.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5685) Non-embedded HA failover is broken


[ 
https://issues.apache.org/jira/browse/YARN-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726603#comment-15726603
 ] 

Karthik Kambatla commented on YARN-5685:


I am uncomfortable with deprecating the config for embedded, unless we make an 
explicit decision we will likely not add an out-of-process leader elector in 
the future. 

Other changes seem okay, but I would think the code has to be updated based on 
what we decide on YARN-5709. 

> Non-embedded HA failover is broken
> --
>
> Key: YARN-5685
> URL: https://issues.apache.org/jira/browse/YARN-5685
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-hard
> Attachments: YARN-5685.001.patch, YARN-5685.002.patch
>
>
> If HA is enabled with automatic failover enabled and embedded failover 
> disabled, all RMs all come up in standby state.  To make one of them active, 
> the {{--forcemanual}} flag must be used when manually triggering the state 
> change.  Should the active go down, the standby will not become active and 
> must be manually transitioned with the {{--forcemanual}} flag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler

2016-12-06 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-4390:
-
Attachment: YARN-4390.branch-2.8.002.patch

Thanks [~leftnoteasy].
{quote}
Thanks Eric Payne, if the TestCapacitySchedulerSurgicalPreemption passes 
locally for you, please open a JIRA to track the UT failure and unblock the 
backport.
{quote}
YARN-5973
{quote}
And javac warnings / javadocs warnings looks related to the change, could you 
update the patch accordingly?
{quote}
Attaching patch {{YARN-4390.branch-2.8.002.patch}}

I have backported these to branch 2.8 in my local repo. If you like, I can push 
them once you approve of the patch.

> Do surgical preemption based on reserved container in CapacityScheduler
> ---
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: QueueNotHittingMax.jpg, YARN-4390-design.1.pdf, 
> YARN-4390-test-results.pdf, YARN-4390.1.patch, YARN-4390.2.patch, 
> YARN-4390.3.branch-2.patch, YARN-4390.3.patch, YARN-4390.4.patch, 
> YARN-4390.5.patch, YARN-4390.6.patch, YARN-4390.7.patch, YARN-4390.8.patch, 
> YARN-4390.branch-2.8.001.patch, YARN-4390.branch-2.8.002.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5972) Add Support for Pausing/Freezing of containers


[ 
https://issues.apache.org/jira/browse/YARN-5972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726913#comment-15726913
 ] 

Arun Suresh commented on YARN-5972:
---

[~subru], thanks for taking a look. w.r.t your 
([comment|https://issues.apache.org/jira/browse/YARN-5292?focusedCommentId=15685939])
 on YARN-5292, [~hrsharma], please feel free to chime in:
Based on my understanding of MAPREDUCE-4584, I feel that feature is actually 
quite orthogonal to this, and I don't think one approach is necessarily 
better/worse than the other. They can possibly be mixed and matched based on 
the use case.

While notifying an AM of containers that are about to be preempted does allow 
the AM to check-point work, it does imply, as you pointed out, that AMs be 
modified to act on this input and make some decisions based on it.

Container pausing/freezing on the other hand, given OS/VM level support (also 
exposed via Docker and LXC) to actually freeze a process (agreed, their 
definition of freeze might vary), is actually AM/application independent. This 
can be useful, for applications and deployments that do not really want to 
check-point on its own but at the same time like the idea of container 
preemption with work preservations.

Also, the NM container lifecycle and API changes in the Container Executor 
should not ideally take into account the execution type of the containers. The 
trigger can either be from the ContainerScheduler (in case of YARN-1011 and 
YARN-2877, when it decides resources are required for a guaranteed container) 
or from an AM (the AM wants to play nice and relinquish resources so that some 
opportunistic containers running on the node to run)

Even though this is currently targeted for opportunistic containers, I don't 
really see any problems exposing this to AMs via the 
ContainerManagementProtocol (though the devil is in the details)
bq. We cannot guarantee RESUME unless we block the allocation for the Container 
which IMHO defeats the purpose
Not sure I completely agree. If an AM pauses a guaranteed container, yes, the 
allocation is blocked, but this is no different from an AM starting a container 
running never-ending sleep job, except this has the advantage that the NM is 
aware of it and can use the relinquished resources to start any queued 
opportunistic containers. Since the container is guaranteed, resume is ensured, 
since any opportunistic container that was running due to graciousness of the 
AM would immediately be preempted.




> Add Support for Pausing/Freezing of containers
> --
>
> Key: YARN-5972
> URL: https://issues.apache.org/jira/browse/YARN-5972
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> Instead of preempting a running container, the container can be moved to a 
> PAUSED state, where it remains until resources get freed up on the node then 
> the preempted container can resume to the running state.
> Note that process freezing this is already supported by 'cgroups freezer' 
> which is used internally by the docker pause functionality. Windows also has 
> OS level support of a similar nature.
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt call would 
> pause the VM and resume would restore it back to the running state.
> If the container executor / runtime doesn't support preemption, then preempt 
> would default to killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3409) Add constraint node labels

[
https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726911#comment-15726911
]

Wangda Tan commented on YARN-3409:
--

Thanks [~naganarasimha...@apache.org], my responses:

bq. the above example x.y.z-build which you gave doesnt fit in long array right
? suppose we some how come with a pattern to match these stuff with a internal
structure, how many such patterns we need to support and how to verify patterns
against other patterns ?
This is what I want to avoid actually, we should only support limited version
format, for example a.b.c.d and a.b.c-d. In my mind, this is just a node
attribute instead of a new programming language, we should make this can be
easily used as much as we can. I would prefer not to add customer-defined types
here. (Extensibility in our internal code implementation is different, we
should make implementations to be easily changed, but not visible by user).
If user have different version format, (such as r0009 or ~a.b.c), they can
either reformat them or treat them as string.

bq. all these scenarios captured are of non consumeable resources types, hence
cannot be addressed as another resource type (so as to solve through resource
profiles).
No they're actually all consumable resources, for example, if an app want to
have more disk size or bandwidth, it should be a part of the resource request
(YARN-3926) instead of tell YARN "run me on a node with 12 disks". The 12 disks
means nothing to the app because node is a shared environment.
Similarly, network interface should be either constraint (like if the node
support multi-homing), or it is a resource (how many net interface you want)
MAX_MEM should not be part of client, this should be planned by scheduler: why
the app only need 2G mem but asked to run a node with 128G mem? This could lead
cluster fragmentation to worse.

bq. how to capture internal type as long/ string or any other custom type ?
We could have a preprocess stage to set long[] for a given constraint value
when adding the constraint.

bq. if its inherent logic and if it doesnt match what user intended then none
of the resource requests matches or lead to undesired allocations.
Like you mentioned in above comment, If comparison op applied on two object,
one is string and one is long[], it will never match. If we compare two long[]
but with different length, we can do padding on the shorter array.

bq. Agree that Boolean can be acheived with String itself and for DATE might
not be immediately required but consider for date different formats of parsing
to be supported.
That is what I want to avoid, user A want MM-DD-YY, user B want -MM-DD;
And same for version, so we could have a lot of customized type
implementations, and then we have to manage these implementations, it is very
nature for a user to add new type and remove old types. We may need to provide
API to do that, etc.
The simplest way is let user handling all these stuffs. String+regex itself
should be able to handle most of DATE related use cases.

bq. And also consider the scenario where in we want to schedule on the node
where load average for past (1/5/15) mins is less than a particular value?
IMHO, it is an abuse usage of node constraint, node constraint should be
targeted to static properties of a node, and all the dynamic stuffs shouldn't
visible by end user. Supporting stuffs like "on the node where load average for
past (1/5/15) mins" will be very hard to be optimized. It should be a pluggable
scheduling policy instead of a hard coded constraint.

bq. All these kinds non consumables resources are supported in one way or the
other in other schedulers
Could you add some references here? I checked a couple of schedulers but I
didn't find a popular scheduler support customized type.

bq. Well is it heavy in terms of implementation or usage? As we keep string as
the default type in usage front its pretty simple and implementation and
maintainence front i dont see too much of a problem other than changes which is
required on the existing labels.
Both, but the impact on the usage is more serious to me: It requires user to
specify type of label upfront, and all the constraint with the same name
enforced to have the same type, it creates new problems such as:
- Change type of constraint
- Remove/Add constraint (Remove constraint need to check if it is associate
with other node, etc.)
If it doesn't have type, we can simply remove constraint when ref# becomes 0.
And unavoidabily, if customized type is available to user, we have to maintain
the public constraint-type interface in the future.

bq. IMHO I would like it to be flexible enough for future scenarios too hence i
would still prefer for Constraint Type.
I didn't see a must-to-have future scenario requires this.

> Add constraint node labels
> --
>
> Key:

[jira] [Commented] (YARN-5709) Cleanup leader election related configuration mess


[ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726920#comment-15726920
 ] 

Jian He commented on YARN-5709:
---

bq. yarn.resourcemanager.ha.automatic-failover.embedded
For the original hadoop-common's elector, I agree we should keep using this 
config, otherwise, it's incompatible.
But for the curator-based implementation. I feel we can avoid this config. One 
less config is one step easier to config HA. This config looks like a 
placeholder for future implementation only, which can never be set to false as 
of now. IMHO, instead, a ZK-FC-based-elector config can be added later if it's 
really implemented in the future. 
bq. Any reason we are not replacing the implementation? Do we just want to be 
safe and have a workaround in case the curator-based elector turns out to 
broken? 
Yeah, just intends to be safe. 
bq. If that is the case, the config that determines the implementation should 
be removed in a subsequent release and accordingly be called out @Unstable. We 
should likely not list it in yarn-default.xml.
agree
bq.  If we do decide on moving the initialization, we should move it for both 
implementations and not just one. 
Agree we can move both. 
bq. However, if we were to implement a ZKFC-based elector, that would have to 
depend on the AdminService to affect any transitions at all. I believe Bikas 
has recommended we keep the same code path irrespective of whether leader 
election is embedded. I see merit to that argument.
Didn't actually get the argument about keeping the same code path... If  
EmbeddedElectorService  is moved outside of AdminService, it still uses the 
same code path. EmbeddedElectorService -> AdminService -> 
transitionToActive/Standby.   For ZKFC-based elector, it's the same.  IIUC, 
whether EmbeddedElectorService is initialized inside AdminService or outside 
does not affect reusing the same code path. And conceptually 
EmbeddedElectorService does not look like a sub-service of AdminService..

> Cleanup leader election related configuration mess
> --
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Daniel Templeton
>Priority: Blocker
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5922) Remove direct references of HBaseTimelineWriter/Reader in core ATS classes

2016-12-06 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5922:
-
Attachment: YARN-5922.03.patch

> Remove direct references of HBaseTimelineWriter/Reader in core ATS classes
> --
>
> Key: YARN-5922
> URL: https://issues.apache.org/jira/browse/YARN-5922
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0-alpha1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-5922-YARN-5355.01.patch, 
> YARN-5922-YARN-5355.02.patch, YARN-5922.01.patch, YARN-5922.02.patch, 
> YARN-5922.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5922) Remove direct references of HBaseTimelineWriter/Reader in core ATS classes

2016-12-06 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726932#comment-15726932
 ] 

Haibo Chen commented on YARN-5922:
--

Uploaded a new patch for trunk branch now that HADOOP-13859 has been committed. 
[~sjlee0] Can you please cherry-pick hadoop-13859 into YARN-5355? I can then 
upload a patch for branch YARN-5355.

> Remove direct references of HBaseTimelineWriter/Reader in core ATS classes
> --
>
> Key: YARN-5922
> URL: https://issues.apache.org/jira/browse/YARN-5922
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0-alpha1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-5922-YARN-5355.01.patch, 
> YARN-5922-YARN-5355.02.patch, YARN-5922.01.patch, YARN-5922.02.patch, 
> YARN-5922.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-5975) Remove the agent - sldier M ssl related code

Jian He created YARN-5975:
-

 Summary: Remove the agent - sldier M ssl related code
 Key: YARN-5975
 URL: https://issues.apache.org/jira/browse/YARN-5975
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He


Now that agent doesn't exists, this piece of code is not needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5975) Remove the agent - sldier M ssl related code


 [ 
https://issues.apache.org/jira/browse/YARN-5975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5975:
--
Attachment: YARN-5975-yarn-native-services.01.patch

> Remove the agent - sldier M ssl related code
> 
>
> Key: YARN-5975
> URL: https://issues.apache.org/jira/browse/YARN-5975
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
> Attachments: YARN-5975-yarn-native-services.01.patch
>
>
> Now that agent doesn't exists, this piece of code is not needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-5975) Remove the agent - sldier AM ssl related code


 [ 
https://issues.apache.org/jira/browse/YARN-5975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned YARN-5975:
-

Assignee: Jian He

> Remove the agent - sldier AM ssl related code
> -
>
> Key: YARN-5975
> URL: https://issues.apache.org/jira/browse/YARN-5975
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5975-yarn-native-services.01.patch
>
>
> Now that agent doesn't exists, this piece of code is not needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5709) Cleanup leader election related configuration mess


[ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726943#comment-15726943
 ] 

Karthik Kambatla commented on YARN-5709:


bq. For the original hadoop-common's elector, I agree we should keep using this 
config, otherwise, it's incompatible.

{{yarn.resourcemanager.ha.automatic-failover.embedded}} only says the leader 
election is embedded. It does not say anything about the actual implementation 
of it. It just happened to be ActiveStandbyElector. Now, we are replacing it 
with the curator-based one. No? 

> Cleanup leader election related configuration mess
> --
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Daniel Templeton
>Priority: Blocker
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5975) Remove the agent - sldier AM ssl related code


 [ 
https://issues.apache.org/jira/browse/YARN-5975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5975:
--
Summary: Remove the agent - sldier AM ssl related code  (was: Remove the 
agent - sldier M ssl related code)

> Remove the agent - sldier AM ssl related code
> -
>
> Key: YARN-5975
> URL: https://issues.apache.org/jira/browse/YARN-5975
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
> Attachments: YARN-5975-yarn-native-services.01.patch
>
>
> Now that agent doesn't exists, this piece of code is not needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5975) Remove the agent - slider AM ssl related code


 [ 
https://issues.apache.org/jira/browse/YARN-5975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5975:
--
Summary: Remove the agent - slider AM ssl related code  (was: Remove the 
agent - sldier AM ssl related code)

> Remove the agent - slider AM ssl related code
> -
>
> Key: YARN-5975
> URL: https://issues.apache.org/jira/browse/YARN-5975
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5975-yarn-native-services.01.patch
>
>
> Now that agent doesn't exists, this piece of code is not needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue

2016-12-06 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726946#comment-15726946
 ] 

Daniel Templeton commented on YARN-5554:


bq. The change to not use the checkAccess() of the scheduler for the capacity 
scheduler was made as part of YARN-4571.

I'm looking at YARN-4571, and I can't see any reason why {{checkAccess()}} was 
modified to special case the capacity scheduler.  The only difference I see 
from the code is that the new {{checkAccess()}} plows ahead even if the queue 
is null.  Maybe there were differences at the time that have since disappeared? 
 ([~jianhe], can you shed any light?)

Looking at your new {{checkAccess()}} method, I also don't see where it does 
anything different from the schedulers' {{checkAccess()}} methods.  What 
problem is it that you're solving with the new method?

> MoveApplicationAcrossQueues does not check user permission on the target queue
> --
>
> Key: YARN-5554
> URL: https://issues.apache.org/jira/browse/YARN-5554
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
>  Labels: oct16-medium
> Attachments: YARN-5554.10.patch, YARN-5554.11.patch, 
> YARN-5554.2.patch, YARN-5554.3.patch, YARN-5554.4.patch, YARN-5554.5.patch, 
> YARN-5554.6.patch, YARN-5554.7.patch, YARN-5554.8.patch, YARN-5554.9.patch
>
>
> moveApplicationAcrossQueues operation currently does not check user 
> permission on the target queue. This incorrectly allows one user to move 
> his/her own applications to a queue that the user has no access to



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-5970) Validate AppTimeoutInfo request parameters


 [ 
https://issues.apache.org/jira/browse/YARN-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S reassigned YARN-5970:
---

Assignee: Rohith Sharma K S

> Validate AppTimeoutInfo request parameters
> --
>
> Key: YARN-5970
> URL: https://issues.apache.org/jira/browse/YARN-5970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> AppTimeoutInfo is exposed to REST clients. This object request parameters 
> should be validated before processing it to clientRMService.
> And also handles couple of minor issues in REST services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-5970) Validate AppTimeoutInfo request parameters

Rohith Sharma K S created YARN-5970:
---

 Summary: Validate AppTimeoutInfo request parameters
 Key: YARN-5970
 URL: https://issues.apache.org/jira/browse/YARN-5970
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Rohith Sharma K S


AppTimeoutInfo is exposed to REST clients. This object request parameters 
should be validated before processing it to clientRMService.

And also handles couple of minor issues in REST services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-5971) All Events processed by one dispatcher in rm

2016-12-06 Thread Zhaofei Meng (JIRA)

Zhaofei Meng created YARN-5971:
--

 Summary: All Events processed by one dispatcher in rm
 Key: YARN-5971
 URL: https://issues.apache.org/jira/browse/YARN-5971
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhaofei Meng


All Events processed by one dispatcher in rm.Is there a way to divide various 
events into mutiple dispatcher?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5965) Revisit ApplicationReport #getApplicationTimeouts


 [ 
https://issues.apache.org/jira/browse/YARN-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-5965:

Attachment: YARN-5965.0.patch

updated patch changing return type to Map. 

> Revisit ApplicationReport #getApplicationTimeouts
> -
>
> Key: YARN-5965
> URL: https://issues.apache.org/jira/browse/YARN-5965
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Jian He
>Assignee: Rohith Sharma K S
> Attachments: YARN-5965.0.patch
>
>
> Currently it returns a list of ApplicationTimeout objects,  to get a 
> particular timeout, the caller code needs to iterate the list and compare the 
> timeoutType to get the corresponding value. Is a map data structure easier 
> for use code? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5970) Validate update timeout request parameters


 [ 
https://issues.apache.org/jira/browse/YARN-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-5970:

Attachment: YARN-5970.0.patch

Updating the patch with following changes. 
# Validate for expiryTime with null value. 
# As per comment from Gour, changes the REST URI to 
/apps/$appid/*timeouts*/$type for GET request.

> Validate update timeout request parameters
> --
>
> Key: YARN-5970
> URL: https://issues.apache.org/jira/browse/YARN-5970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5970.0.patch
>
>
> AppTimeoutInfo is exposed to REST clients. This object request parameters 
> should be validated before processing it to clientRMService.
> And also handles couple of minor issues in REST services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5647) [Security] Collector and reader side changes for loading auth filters and principals


[ 
https://issues.apache.org/jira/browse/YARN-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724926#comment-15724926
 ] 

Varun Saxena commented on YARN-5647:


I was actually attempting to remove setupOptions but soon realised that due to 
mismatch in hadoop-common versions with HBase, there will be runtime errors. 
Changed it to addFilters because we were only adding timeline filter in this 
method so thought this might be a more suitable name.

bq. I assume that you're still working on the unit tests for this? 
I have added a E2E UT in YARN-5648 which includes client side changes (most 
changes already exist).

bq. Also, can you please fix the checkstyle issues? Seems straightforward 
enough.
Sure. If the patch overall is fine, I think we need to move Timeline auth 
filter and related classes to hadoop-yarn-server-common. Referring it from 
hadoop-yarn-applicationhistoryservice does not seem correct.
Thought of fixing checkstyle alongwith that refactoring.

> [Security] Collector and reader side changes for loading auth filters and 
> principals
> 
>
> Key: YARN-5647
> URL: https://issues.apache.org/jira/browse/YARN-5647
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: oct16-hard
> Attachments: YARN-5647-YARN-5355.wip.002.patch, 
> YARN-5647-YARN-5355.wip.003.patch, YARN-5647-YARN-5355.wip.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-5647) [Security] Collector and reader side changes for loading auth filters and principals


[ 
https://issues.apache.org/jira/browse/YARN-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724926#comment-15724926
 ] 

Varun Saxena edited comment on YARN-5647 at 12/6/16 9:33 AM:
-

I was actually attempting to remove setupOptions but soon realised that due to 
mismatch in hadoop-common versions with HBase, there will be runtime errors. 
Changed it to addFilters because we were only adding timeline filter in this 
method so thought this might be a more suitable name.

bq. I assume that you're still working on the unit tests for this? 
I have added a E2E UT in YARN-5648 which includes some client side changes 
(most changes already existed because we were already reusing relevant code 
written during ATSv1).

bq. Also, can you please fix the checkstyle issues? Seems straightforward 
enough.
Sure. If the patch overall is fine, I think we need to move Timeline auth 
filter and related classes to hadoop-yarn-server-common. Referring it from 
hadoop-yarn-applicationhistoryservice does not seem correct.
Thought of fixing checkstyle alongwith that refactoring.


was (Author: varun_saxena):
I was actually attempting to remove setupOptions but soon realised that due to 
mismatch in hadoop-common versions with HBase, there will be runtime errors. 
Changed it to addFilters because we were only adding timeline filter in this 
method so thought this might be a more suitable name.

bq. I assume that you're still working on the unit tests for this? 
I have added a E2E UT in YARN-5648 which includes client side changes (most 
changes already exist).

bq. Also, can you please fix the checkstyle issues? Seems straightforward 
enough.
Sure. If the patch overall is fine, I think we need to move Timeline auth 
filter and related classes to hadoop-yarn-server-common. Referring it from 
hadoop-yarn-applicationhistoryservice does not seem correct.
Thought of fixing checkstyle alongwith that refactoring.

> [Security] Collector and reader side changes for loading auth filters and 
> principals
> 
>
> Key: YARN-5647
> URL: https://issues.apache.org/jira/browse/YARN-5647
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: oct16-hard
> Attachments: YARN-5647-YARN-5355.wip.002.patch, 
> YARN-5647-YARN-5355.wip.003.patch, YARN-5647-YARN-5355.wip.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5970) Validate update timeout request parameters


[ 
https://issues.apache.org/jira/browse/YARN-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725389#comment-15725389
 ] 

Hadoop QA commented on YARN-5970:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  4m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
26s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 43m 57s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 86m  1s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-5970 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12841938/YARN-5970.0.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 12a53cb9c630 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 7b988e8 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/14195/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/14195/testReport/ |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: hadoop-yarn-project/hadoop-yarn |
|

[jira] [Commented] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application


[ 
https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724916#comment-15724916
 ] 

Varun Saxena commented on YARN-5739:


Checkstyle is unrelated.
+1 LGTM.

> Provide timeline reader API to list available timeline entity types for one 
> application
> ---
>
> Key: YARN-5739
> URL: https://issues.apache.org/jira/browse/YARN-5739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-5739-YARN-5355.001.patch, 
> YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, 
> YARN-5739-YARN-5355.004.patch, YARN-5739-YARN-5355.005.patch, 
> YARN-5739-YARN-5355.006.patch, YARN-5739-YARN-5355.007.patch
>
>
> Right now we only show a part of available timeline entity data in the new 
> YARN UI. However, some data (especially library specific data) are not 
> possible to be queried out by the web UI. It will be appealing for the UI to 
> provide an "entity browser" for each YARN application. Actually, simply 
> dumping out available timeline entities (with proper pagination, of course) 
> would be pretty helpful for UI users. 
> On timeline side, we're not far away from this goal. Right now I believe the 
> only thing missing is to list all available entity types within one 
> application. The challenge here is that we're not storing this data for each 
> application, but given this kind of call is relatively rare (compare to 
> writes and updates) we can perform some scanning during the read time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5935) TestAppRunnability.testMoveRunnableApp fails intermittently

2016-12-06 Thread Fei Hui (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725050#comment-15725050
 ] 

Fei Hui commented on YARN-5935:
---

which version do you test?

> TestAppRunnability.testMoveRunnableApp fails intermittently
> ---
>
> Key: YARN-5935
> URL: https://issues.apache.org/jira/browse/YARN-5935
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Priority: Minor
>
> {noformat}
> java.lang.AssertionError: expected:<> but 
> was:<>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAppRunnability.testMoveRunnableApp(TestAppRunnability.java:242)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5970) Validate update timeout request parameters


 [ 
https://issues.apache.org/jira/browse/YARN-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-5970:

Summary: Validate update timeout request parameters  (was: Validate 
AppTimeoutInfo request parameters)

> Validate update timeout request parameters
> --
>
> Key: YARN-5970
> URL: https://issues.apache.org/jira/browse/YARN-5970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5970.0.patch
>
>
> AppTimeoutInfo is exposed to REST clients. This object request parameters 
> should be validated before processing it to clientRMService.
> And also handles couple of minor issues in REST services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5673) [Umbrella] Re-write container-executor to improve security, extensibility, and portability

2016-12-06 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726489#comment-15726489
 ] 

Miklos Szegedi commented on YARN-5673:
--

Thank you [~vvasudev] for the design proposal! I like the design approach.

I was missing some arguments about another approach, and why it was not 
considered.
What is the common functionality that all modules need and why do not we simply 
split them into separate executables instead of loading them as modules to a 
core container-executor process?
linux-container-executor
docker-container-executor
mount-cgroups
tc-executor
...

1. I have a concern of native modules potentially loaded into the same process. 
Even if communication between modules is not allowed this is a native binary, 
where all native code will have access to the whole memory. Just like the 
current executor, when more features are running in the same process with admin 
privileges, a faulty or malicious module may cause security issues even loading 
and accessing others. Even if we can add protection, the protection code would 
add more complexity.

2. I also have a concern of maintenance. File system privileges setup is what 
system administrators are most familiar with. We could let them use that 
instead of modifying proprietary configuration files, how to locate load 
modules into a linux-container-executor binary. Any common functionality like 
auditing, logging, setup checking can be statically linked to each executable. 
It also has the advantage of setting suid bits separately. An example: 
/usr/bin/sudo and /usr/bin/passwd tell in their names what they do and what you 
get if you set suid permission on them. An administrator would only set it on 
the executables that are needed. Setting suid on the posix-container-executor 
on the other hand means that it is allowed to load modules in a controlled but 
advanced way.  Separate executables made the life of administrators way easier 
I think. Each of these binaries can have its own configuration file just like 
the modules you proposed.

3. Separate executables may help a little with debugging, too.



A. One more separate issue that I wanted to ask your opinion about is the time 
launching a container. First the container executor is executed, then a shell 
script that runs the actual container like Java. Would not it be faster to 
launch container executor just once and communicate launch commands through the 
standard pipe or a named pipe and keep it running as long as the node manager 
is running?

> [Umbrella] Re-write container-executor to improve security, extensibility, 
> and portability
> --
>
> Key: YARN-5673
> URL: https://issues.apache.org/jira/browse/YARN-5673
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: container-executor Re-write Design Document.pdf
>
>
> As YARN adds support for new features that require administrator 
> privileges(such as support for network throttling and docker), we’ve had to 
> add new capabilities to the container-executor. This has led to a recognition 
> that the current container-executor security features as well as the code 
> could be improved. The current code is fragile and it’s hard to add new 
> features without causing regressions. Some of the improvements that need to 
> be made are -
> *Security*
> Currently the container-executor has limited security features. It relies 
> primarily on the permissions set on the binary but does little additional 
> security beyond that. There are few outstanding issues today -
> - No audit log
> - No way to disable features - network throttling and docker support are 
> built in and there’s no way to turn them off at a container-executor level
> - Code can be improved - a lot of the code switches users back and forth in 
> an arbitrary manner
> - No input validation - the paths, and files provided at invocation are not 
> validated or required to be in some specific location
> - No signing functionality - there is no way to enforce that the binary was 
> invoked by the NM and not by any other process
> *Code Issues*
> The code layout and implementation themselves can be improved. Some issues 
> there are -
> - No support for log levels - everything is logged and this can’t be turned 
> on or off
> - Extremely long set of invocation parameters(specifically during container 
> launch) which makes turning features on or off complicated
> - Poor test coverage - it’s easy to introduce regressions today due to the 
> lack of a proper test setup
> - Duplicate functionality - there is some amount of code duplication
> - Hard to make improvements or add new features due to the issues raised above
>

[jira] [Comment Edited] (YARN-5709) Cleanup leader election related configuration mess


[ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726506#comment-15726506
 ] 

Karthik Kambatla edited comment on YARN-5709 at 12/6/16 7:50 PM:
-

My primary concern with the current 2.8.0 code is confusing configs. The reason 
we called the existing leader election and the corresponding config embedded 
was because we wanted to highlight it is embedded in the RM. The plan, at the 
time, was to add ZKFC-based leader election as well. We should likely leave 
that config ({{yarn.resourcemanager.ha.automatic-failover.embedded}}) alone 
unless we make a decision that we will not add ZKFC-type of leader election 
that runs in a different process.

bq. the ultimate goal is to remove the old EmbeddedElectorService
We are in agreement here. I am comfortable with ripping out the current 
implementation of EmbeddedElectorService in 2.8.0 and replacing it with the 
curator-based implementation.

Any reason we are not replacing the implementation? Do we just want to be safe 
and have a workaround in case the curator-based elector turns out to broken? If 
that is the case, the config that determines the implementation should be 
removed in a subsequent release and accordingly be called out @Unstable. We 
should likely not list it in yarn-default.xml. 

bq.  don't think it needs to be the case even for the old 
EmbeddedElectorService too, if you look at the implementation, there's no 
dependency between the EmbeddedElectorService and AdminService at all.

I see your point. 

On the dependency front, EmbeddedElectorService does not depend on 
AdminService. However, if we were to implement a ZKFC-based elector, that would 
have to depend on the AdminService to affect any transitions at all. I believe 
Bikas has recommended we keep the same code path irrespective of whether leader 
election is embedded. I see merit to that argument. 

If we do decide on moving the initialization, we should move it for both 
implementations and not just one. Implementation-based initialization points is 
confusing for any new person looking at the code. Even for those looking at it 
after a while (like me). 



was (Author: kasha):
My primary concern with the current 2.8.0 code is confusing configs. The reason 
we called it existing leader election and the corresponding config embedded was 
because we wanted to highlight it is embedded in the RM. The plan, at the time, 
was to add ZKFC-based leader election as well. We should likely leave that 
config ({{yarn.resourcemanager.ha.automatic-failover.embedded}}) alone unless 
we make a decision that we will not add ZKFC-type of leader election that runs 
in a different process.

bq. the ultimate goal is to remove the old EmbeddedElectorService
We are in agreement here. I am comfortable with ripping out the current 
implementation of EmbeddedElectorService in 2.8.0 and replacing it with the 
curator-based implementation.

Any reason we are not replacing the implementation? Do we just want to be safe 
and have a workaround in case the curator-based elector turns out to broken? If 
that is the case, the config that determines the implementation should be 
removed in a subsequent release and accordingly be called out @Unstable. We 
should likely not list it in yarn-default.xml. 

bq.  don't think it needs to be the case even for the old 
EmbeddedElectorService too, if you look at the implementation, there's no 
dependency between the EmbeddedElectorService and AdminService at all.

I see your point. 

On the dependency front, EmbeddedElectorService does not depend on 
AdminService. However, if we were to implement a ZKFC-based elector, that would 
have to depend on the AdminService to affect any transitions at all. I believe 
Bikas has recommended we keep the same code path irrespective of whether leader 
election is embedded. I see merit to that argument. 

If we do decide on moving the initialization, we should move it for both 
implementations and not just one. Implementation-based initialization points is 
confusing for any new person looking at the code. Even for those looking at it 
after a while (like me). 


> Cleanup leader election related configuration mess
> --
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Daniel Templeton
>Priority: Blocker
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon.

[jira] [Commented] (YARN-5709) Cleanup leader election related configuration mess


[ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726506#comment-15726506
 ] 

Karthik Kambatla commented on YARN-5709:


My primary concern with the current 2.8.0 code is confusing configs. The reason 
we called it existing leader election and the corresponding config embedded was 
because we wanted to highlight it is embedded in the RM. The plan, at the time, 
was to add ZKFC-based leader election as well. We should likely leave that 
config ({{yarn.resourcemanager.ha.automatic-failover.embedded}}) alone unless 
we make a decision that we will not add ZKFC-type of leader election that runs 
in a different process.

bq. the ultimate goal is to remove the old EmbeddedElectorService
We are in agreement here. I am comfortable with ripping out the current 
implementation of EmbeddedElectorService in 2.8.0 and replacing it with the 
curator-based implementation.

Any reason we are not replacing the implementation? Do we just want to be safe 
and have a workaround in case the curator-based elector turns out to broken? If 
that is the case, the config that determines the implementation should be 
removed in a subsequent release and accordingly be called out @Unstable. We 
should likely not list it in yarn-default.xml. 

bq.  don't think it needs to be the case even for the old 
EmbeddedElectorService too, if you look at the implementation, there's no 
dependency between the EmbeddedElectorService and AdminService at all.

I see your point. 

On the dependency front, EmbeddedElectorService does not depend on 
AdminService. However, if we were to implement a ZKFC-based elector, that would 
have to depend on the AdminService to affect any transitions at all. I believe 
Bikas has recommended we keep the same code path irrespective of whether leader 
election is embedded. I see merit to that argument. 

If we do decide on moving the initialization, we should move it for both 
implementations and not just one. Implementation-based initialization points is 
confusing for any new person looking at the code. Even for those looking at it 
after a while (like me). 


> Cleanup leader election related configuration mess
> --
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Daniel Templeton
>Priority: Blocker
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-5973) TestCapacitySchedulerSurgicalPreemption sometimes fails

2016-12-06 Thread Eric Payne (JIRA)

Eric Payne created YARN-5973:


 Summary: TestCapacitySchedulerSurgicalPreemption sometimes fails
 Key: YARN-5973
 URL: https://issues.apache.org/jira/browse/YARN-5973
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler, scheduler preemption
Affects Versions: 2.8.0
Reporter: Eric Payne
Priority: Minor


The tests in {{TestCapacitySchedulerSurgicalPreemption}} appear to be racy. 
They often pass, but  the following errors sometimes occur:
{noformat}
testSimpleSurgicalPreemption(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption)
  Time elapsed: 14.671 sec  <<< FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.fail(Assert.java:95)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerPreemptionTestBase.waitNumberOfLiveContainersFromApp(CapacitySchedulerPreemptionTestBase.java:110)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption.testSimpleSurgicalPreemption(TestCapacitySchedulerSurgicalPreemption.java:143)
{noformat}
{noformat}
testSurgicalPreemptionWithAvailableResource(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption)
  Time elapsed: 9.503 sec  <<< FAILURE!
java.lang.AssertionError: expected:<3> but was:<2>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption.testSurgicalPreemptionWithAvailableResource(TestCapacitySchedulerSurgicalPreemption.java:220)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Reopened] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler


 [ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reopened YARN-4390:
--

> Do surgical preemption based on reserved container in CapacityScheduler
> ---
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: QueueNotHittingMax.jpg, YARN-4390-design.1.pdf, 
> YARN-4390-test-results.pdf, YARN-4390.1.patch, YARN-4390.2.patch, 
> YARN-4390.3.branch-2.patch, YARN-4390.3.patch, YARN-4390.4.patch, 
> YARN-4390.5.patch, YARN-4390.6.patch, YARN-4390.7.patch, YARN-4390.8.patch, 
> YARN-4390.branch-2.8.001.patch, YARN-4390.branch-2.8.002.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler


[ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726731#comment-15726731
 ] 

Wangda Tan commented on YARN-4390:
--

[~eepayne], thanks. Reopened the ticket to trigger Jenkins.

> Do surgical preemption based on reserved container in CapacityScheduler
> ---
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: QueueNotHittingMax.jpg, YARN-4390-design.1.pdf, 
> YARN-4390-test-results.pdf, YARN-4390.1.patch, YARN-4390.2.patch, 
> YARN-4390.3.branch-2.patch, YARN-4390.3.patch, YARN-4390.4.patch, 
> YARN-4390.5.patch, YARN-4390.6.patch, YARN-4390.7.patch, YARN-4390.8.patch, 
> YARN-4390.branch-2.8.001.patch, YARN-4390.branch-2.8.002.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5969) FairShareComparator getResourceUsage poor performance

2016-12-06 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-5969:
---
Assignee: zhangshilong

> FairShareComparator getResourceUsage poor performance
> -
>
> Key: YARN-5969
> URL: https://issues.apache.org/jira/browse/YARN-5969
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhangshilong
>Assignee: zhangshilong
> Attachments: 20161206.patch
>
>
> in FairShareComparator class, the performance of function getResourceUsage()  
> is very poor. It will be executed above 100,000,000 times per second.
> In our scene, It  takes 20 seconds per minute.  
> A simple solution is to reduce call counts  of the function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4675) Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl

2016-12-06 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726776#comment-15726776
 ] 

Li Lu commented on YARN-4675:
-

I'm fine with separating the v1 and v2 interfaces. Right now mixing v1 and v2 
interfaces in one interface looks pretty confusing to me. Since we've decided 
timeline v2 is not backward compatible at the very beginning, I think it's fine 
to let users choose between TimelineClient v1 and v2. 

bq. things that are referencing TimelineClientImpl directly today
Yes, we should not directly refer to TimelineClientImpl in downstream usages. 
Shall I open a JIRA and remove all of them? 

bq.  the facility for getting delegation token and renewing it would be common 
to both the clients. We would not want to repeat such large amounts of code in 
both V1 and V2 client implementations. 
That's certainly a very valid concern, and addressing this may bring in much 
discussions on security itself. My bottomline here is that let's *assume* any 
security facilities do not exist in timeline v2, and let's start the design 
from the scratch. We may then think about how to merge and reuse the code 
afterwards. For now, let's not think about maximize code reuse for timeline v1 
and v2, especially for security? 

> Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl
> 
>
> Key: YARN-4675
> URL: https://issues.apache.org/jira/browse/YARN-4675
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: YARN-5355, oct16-medium
> Attachments: YARN-4675-YARN-2928.v1.001.patch
>
>
> We need to reorganize TimeClientImpl into TimeClientV1Impl ,  
> TimeClientV2Impl and if required a base class, so that its clear which part 
> of the code belongs to which version and thus better maintainable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4675) Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl

2016-12-06 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726783#comment-15726783
 ] 

Li Lu commented on YARN-4675:
-

Created YARN-5974 for removing unnecessary references to TimelineClientImpl. 

> Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl
> 
>
> Key: YARN-4675
> URL: https://issues.apache.org/jira/browse/YARN-4675
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: YARN-5355, oct16-medium
> Attachments: YARN-4675-YARN-2928.v1.001.patch
>
>
> We need to reorganize TimeClientImpl into TimeClientV1Impl ,  
> TimeClientV2Impl and if required a base class, so that its clear which part 
> of the code belongs to which version and thus better maintainable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4675) Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl


[ 
https://issues.apache.org/jira/browse/YARN-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726699#comment-15726699
 ] 

Varun Saxena commented on YARN-4675:


At a cursory glance, the patch posted seems to be at a high level, pretty near 
to what we had in mind when this was raised. 

I agree that we do not necessarily need a single interface however if we do 
that we would probably pull out connection related (URL connection factory, 
retry, etc.) code into a separate class. Also, the facility for getting 
delegation token and renewing it would be common to both the clients. We would 
not want to repeat such large amounts of code in both V1 and V2 client 
implementations. Maybe we can have a separate class for all this common stuff 
and method for posting entities in an interface. Implementations then can 
probably extend from this class. 
But then TimelineClient extends AbstractService and we would not want to break 
that. Maybe instead of extending the class containing the common code we can 
just use it as a helper class with an object of it in each implementation.

Thoughts ?



> Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl
> 
>
> Key: YARN-4675
> URL: https://issues.apache.org/jira/browse/YARN-4675
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: YARN-5355, oct16-medium
> Attachments: YARN-4675-YARN-2928.v1.001.patch
>
>
> We need to reorganize TimeClientImpl into TimeClientV1Impl ,  
> TimeClientV2Impl and if required a base class, so that its clear which part 
> of the code belongs to which version and thus better maintainable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-4675) Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl


[ 
https://issues.apache.org/jira/browse/YARN-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726699#comment-15726699
 ] 

Varun Saxena edited comment on YARN-4675 at 12/6/16 9:02 PM:
-

At a cursory glance, the patch posted seems to be at a high level, pretty near 
to what we had in mind when this was raised. 

I agree though, that we do not necessarily need a single interface however if 
we do that we would probably pull out connection related (URL connection 
factory, retry, etc.) code into a separate class. Also, the facility for 
getting delegation token and renewing it would be common to both the clients. 
We would not want to repeat such large amounts of code in both V1 and V2 client 
implementations. Maybe we can have a separate class for all this common stuff 
and method for posting entities in an interface. Implementations then can 
probably extend from this class. 
But then TimelineClient extends AbstractService and we would not want to break 
that. Maybe instead of extending the class containing the common code we can 
just use it as a helper class with an object of it in each implementation.

Thoughts ?




was (Author: varun_saxena):
At a cursory glance, the patch posted seems to be at a high level, pretty near 
to what we had in mind when this was raised. 

I agree that we do not necessarily need a single interface however if we do 
that we would probably pull out connection related (URL connection factory, 
retry, etc.) code into a separate class. Also, the facility for getting 
delegation token and renewing it would be common to both the clients. We would 
not want to repeat such large amounts of code in both V1 and V2 client 
implementations. Maybe we can have a separate class for all this common stuff 
and method for posting entities in an interface. Implementations then can 
probably extend from this class. 
But then TimelineClient extends AbstractService and we would not want to break 
that. Maybe instead of extending the class containing the common code we can 
just use it as a helper class with an object of it in each implementation.

Thoughts ?



> Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl
> 
>
> Key: YARN-4675
> URL: https://issues.apache.org/jira/browse/YARN-4675
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: YARN-5355, oct16-medium
> Attachments: YARN-4675-YARN-2928.v1.001.patch
>
>
> We need to reorganize TimeClientImpl into TimeClientV1Impl ,  
> TimeClientV2Impl and if required a base class, so that its clear which part 
> of the code belongs to which version and thus better maintainable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5756) Add state-machine implementation for queues


[ 
https://issues.apache.org/jira/browse/YARN-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726717#comment-15726717
 ] 

Wangda Tan commented on YARN-5756:
--

Thanks [~xgong], some comments/questions.

General:
- Could you add some descriptions to YARN-5755 and YARN-5724 as mentioned by 
QueueStateManager

1) QueueStateManager: 
- queueContext -> queueManager
- Move AbstractCSQueue#initializeQueueState to QueueStateManager? (Which we can 
protect all state transitions under the same sync lock)
appFinished needs sync lock.

2) AbstractCSQueue
{code}
  private ReentrantReadWriteLock.ReadLock queueStateReadLock;
  private ReentrantReadWriteLock.WriteLock queueStateWriteLock;
{code}
Is not necessary since {{state}} is volatile

3) Implementation of initializeQueueState
- I'm not sure if initializeQueueState should be called when admin refreshes 
queue config. If we plan to do so, initializeQueueState should be invoked when 
reinitialize the queue, so you may need to move it to {{setupQueueConfigs}}.
- And in addition, if initializeQueueState will be called when admin refreshes 
queue config, we need to handle state-in-memory and state-from-config, for 
example, draining state.

> Add state-machine implementation for queues
> ---
>
> Key: YARN-5756
> URL: https://issues.apache.org/jira/browse/YARN-5756
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-5756.1.patch, YARN-5756.2.patch, YARN-5756.3.patch, 
> YARN-5756.4.patch, YARN-5756.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-5974) Remove direct reference to TimelineClientImpl

2016-12-06 Thread Li Lu (JIRA)

Li Lu created YARN-5974:
---

 Summary: Remove direct reference to TimelineClientImpl
 Key: YARN-5974
 URL: https://issues.apache.org/jira/browse/YARN-5974
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: YARN-5355
Reporter: Li Lu


[~sjlee0]'s quick audit shows that things that are referencing 
TimelineClientImpl directly today:

JobHistoryFileReplayMapperV1 (MR)
SimpleEntityWriterV1 (MR)
TestDistributedShell (DS)
TestDSAppMaster (DS)
TestNMTimelinePublisher (node manager)
TestTimelineWebServicesWithSSL (AHS)

This is not the right way to use TimelineClient and we should avoid direct 
reference to TimelineClientImpl as much as possible. 

Any newcomers to the community are more than welcome to take this. If this 
remains unassigned for ~24hrs I'll jump in and do a quick fix. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-5972) Add Support for Pausing/Freezing of containers


[ 
https://issues.apache.org/jira/browse/YARN-5972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726913#comment-15726913
 ] 

Arun Suresh edited comment on YARN-5972 at 12/6/16 10:49 PM:
-

[~subru], thanks for taking a look. w.r.t your 
([comment|https://issues.apache.org/jira/browse/YARN-5292?focusedCommentId=15685939])
 on YARN-5292, [~hrsharma], please feel free to chime in:
Based on my understanding of MAPREDUCE-4584, I feel that feature is actually 
quite orthogonal to this, and I don't think one approach is necessarily 
better/worse than the other. They can possibly be mixed and matched based on 
the use case.

While notifying an AM of containers that are about to be preempted does allow 
the AM to check-point work, it does imply, as you pointed out, that AMs be 
modified to act on this input and make some decisions based on it.

Container pausing/freezing on the other hand, given OS/VM level support (also 
exposed via Docker and LXC) to actually freeze a process (agreed, their 
definition of freeze might vary), is actually AM/application independent. This 
can be useful, for applications and deployments that do not really want to 
check-point on its own but at the same time like the idea of container 
preemption with work preservations.

Also, the NM container lifecycle and API changes in the Container Executor 
should not ideally take into account the execution type of the containers. The 
trigger can either be from the ContainerScheduler (in case of YARN-1011 and 
YARN-2877, when it decides resources are required for a guaranteed container) 
or from an AM (the AM wants to play nice and relinquish resources so that some 
opportunistic containers running on the node to run)

Even though this is currently targeted for opportunistic containers, I don't 
really see any problems exposing this to AMs via the 
ContainerManagementProtocol (though the devil is in the details)
bq. We cannot guarantee RESUME unless we block the allocation for the Container 
which IMHO defeats the purpose
Not sure I completely agree. If an AM pauses a guaranteed container, yes, the 
allocation is blocked, but this is no different from an AM starting a container 
running never-ending sleep job, except this has the advantage that the NM is 
aware of it and can use the relinquished resources to start any queued 
opportunistic containers. Since the container is guaranteed, resume is ensured, 
since any opportunistic container that was running due to graciousness of the 
AM would immediately be preempted. I do agree though, in the case of 
opportunistic containers, if an AM explicitly asks to pause it, it cannot 
expect the container to be resumed as soon as it asks.





was (Author: asuresh):
[~subru], thanks for taking a look. w.r.t your 
([comment|https://issues.apache.org/jira/browse/YARN-5292?focusedCommentId=15685939])
 on YARN-5292, [~hrsharma], please feel free to chime in:
Based on my understanding of MAPREDUCE-4584, I feel that feature is actually 
quite orthogonal to this, and I don't think one approach is necessarily 
better/worse than the other. They can possibly be mixed and matched based on 
the use case.

While notifying an AM of containers that are about to be preempted does allow 
the AM to check-point work, it does imply, as you pointed out, that AMs be 
modified to act on this input and make some decisions based on it.

Container pausing/freezing on the other hand, given OS/VM level support (also 
exposed via Docker and LXC) to actually freeze a process (agreed, their 
definition of freeze might vary), is actually AM/application independent. This 
can be useful, for applications and deployments that do not really want to 
check-point on its own but at the same time like the idea of container 
preemption with work preservations.

Also, the NM container lifecycle and API changes in the Container Executor 
should not ideally take into account the execution type of the containers. The 
trigger can either be from the ContainerScheduler (in case of YARN-1011 and 
YARN-2877, when it decides resources are required for a guaranteed container) 
or from an AM (the AM wants to play nice and relinquish resources so that some 
opportunistic containers running on the node to run)

Even though this is currently targeted for opportunistic containers, I don't 
really see any problems exposing this to AMs via the 
ContainerManagementProtocol (though the devil is in the details)
bq. We cannot guarantee RESUME unless we block the allocation for the Container 
which IMHO defeats the purpose
Not sure I completely agree. If an AM pauses a guaranteed container, yes, the 
allocation is blocked, but this is no different from an AM starting a container 
running never-ending sleep job, except this has the advantage that the NM is 
aware of it and can use the relinquished resources to start any queued 
opportunistic containers. Since the container

[jira] [Commented] (YARN-5709) Cleanup leader election related configuration mess