[jira] [Commented] (YARN-5602) Utils for Federation State and Policy Store

2016-09-12 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486347#comment-15486347
 ] 

Jian He commented on YARN-5602:
---

few minor comments: 
- why this ? {{// Capability and HeartBeat are not necessary to compare the 
information}} 
- FederationStateStoreException: are the getErrorCode and getErrorMsg methods 
needed ? as they can be retrieved from getCode..
- FederationStateStoreUtils: some param name in the method comments are wrong, 
e.g. LOG , errMsg
- should the Exception related class in an exception package, rather than util ?

> Utils for Federation State and Policy Store
> ---
>
> Key: YARN-5602
> URL: https://issues.apache.org/jira/browse/YARN-5602
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-5602-YARN-2915.v1.patch, 
> YARN-5602-YARN-2915.v2.patch
>
>
> This JIRA tracks the creation of utils for Federation State and Policy Store 
> such as Error Codes, Exceptions...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero

2016-09-12 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486205#comment-15486205
 ] 

Bibin A Chundatt commented on YARN-5545:


Thank you [~leftnoteasy] for looking  into issue.

{quote}
So queue will split maximum-application-number according to ratio of their 
total configured resource across partitions
{quote}

{noformat}
Approach
Consider average of absolute percentage of all partition,but not average of 
absolute percentage per partition b ,Label 1 can be of 10% of 20 GB and default 
partition can be of 50% of 100GB.

Get percentage capacity of queue as [ sum of resource of queue A all 
partition (X) / Total cluster resource n cluster (Y) ]= absolute percentage 
overall cluster (Z).
max application of queue = Z * maxclusterapplication
Have to update the max application always with NODE registration and 
removal.

{noformat}
This was the initial approach we thought about, during discussion we came 
across scenario when rm is restarted and NM is not registered
might get rejected any thoughts on that.

 Any thoughts on above scenarios how we should handle ??



> App submit failure on queue with label when default queue partition capacity 
> is zero
> 
>
> Key: YARN-5545
> URL: https://issues.apache.org/jira/browse/YARN-5545
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: YARN-5545.0001.patch, YARN-5545.0002.patch, 
> YARN-5545.0003.patch, capacity-scheduler.xml
>
>
> Configure capacity scheduler 
> yarn.scheduler.capacity.root.default.capacity=0
> yarn.scheduler.capacity.root.queue1.accessible-node-labels.labelx.capacity=50
> yarn.scheduler.capacity.root.default.accessible-node-labels.labelx.capacity=50
> Submit application as below
> ./yarn jar 
> ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-alpha2-SNAPSHOT-tests.jar
>  sleep -Dmapreduce.job.node-label-expression=labelx 
> -Dmapreduce.job.queuename=default -m 1 -r 1 -mt 1000 -rt 1
> {noformat}
> 2016-08-21 18:21:31,375 INFO mapreduce.JobSubmitter: Cleaning up the staging 
> area /tmp/hadoop-yarn/staging/root/.staging/job_1471670113386_0001
> java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed 
> to submit application_1471670113386_0001 to YARN : 
> org.apache.hadoop.security.AccessControlException: Queue root.default already 
> has 0 applications, cannot accept submission of application: 
> application_1471670113386_0001
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:316)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:255)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1790)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362)
>   at org.apache.hadoop.mapreduce.SleepJob.run(SleepJob.java:273)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.mapreduce.SleepJob.main(SleepJob.java:194)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>   at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>   at 
> org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:136)
>   at 
> org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:144)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit 
> application_1471670113386_0001 to YARN : 
> org.apache.hadoop.security.AccessControlException: Queue root.default already 
> has 0 applications, cannot accept submission of application: 

[jira] [Updated] (YARN-5539) TimelineClient failed to retry on "java.net.SocketTimeoutException: Read timed out"

2016-09-12 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-5539:
-
Summary: TimelineClient failed to retry on 
"java.net.SocketTimeoutException: Read timed out"  (was: AM fails due to 
"java.net.SocketTimeoutException: Read timed out")

> TimelineClient failed to retry on "java.net.SocketTimeoutException: Read 
> timed out"
> ---
>
> Key: YARN-5539
> URL: https://issues.apache.org/jira/browse/YARN-5539
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Junping Du
>Priority: Critical
>
> AM fails with the following exception
> {code}
> FATAL distributedshell.ApplicationMaster: Error running ApplicationMaster
> com.sun.jersey.api.client.ClientHandlerException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run(TimelineClientImpl.java:236)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:185)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:247)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPostingObject(TimelineWriter.java:154)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:115)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:112)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPosting(TimelineWriter.java:112)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:92)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:345)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1166)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:567)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:298)
> Caused by: java.net.SocketTimeoutException: Read timed out
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:170)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:253)
>   at 
> org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:77)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:132)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:216)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.openConnection(DelegationTokenAuthenticatedURL.java:322)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineURLConnectionFactory.getHttpURLConnection(TimelineClientImpl.java:472)
>   at 
> 

[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486002#comment-15486002
 ] 

Hadoop QA commented on YARN-5605:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 57s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
6s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 44s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 42s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 5 
new + 135 unchanged - 122 fixed = 140 total (was 257) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager
 generated 0 new + 937 unchanged - 4 fixed = 937 total (was 941) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 29s 
{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 34m 52s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 64m 15s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestResourceTrackerService |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Issue | YARN-5605 |
| GITHUB PR | https://github.com/apache/hadoop/pull/124 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 78d6b884cc47 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 72dfb04 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 

[jira] [Reopened] (YARN-5539) AM fails due to "java.net.SocketTimeoutException: Read timed out"

2016-09-12 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reopened YARN-5539:
--

> AM fails due to "java.net.SocketTimeoutException: Read timed out"
> -
>
> Key: YARN-5539
> URL: https://issues.apache.org/jira/browse/YARN-5539
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Junping Du
>Priority: Critical
>
> AM fails with the following exception
> {code}
> FATAL distributedshell.ApplicationMaster: Error running ApplicationMaster
> com.sun.jersey.api.client.ClientHandlerException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run(TimelineClientImpl.java:236)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:185)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:247)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPostingObject(TimelineWriter.java:154)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:115)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:112)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPosting(TimelineWriter.java:112)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:92)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:345)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1166)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:567)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:298)
> Caused by: java.net.SocketTimeoutException: Read timed out
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:170)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:253)
>   at 
> org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:77)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:132)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:216)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.openConnection(DelegationTokenAuthenticatedURL.java:322)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineURLConnectionFactory.getHttpURLConnection(TimelineClientImpl.java:472)
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:159)
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147)
>

[jira] [Commented] (YARN-5539) AM fails due to "java.net.SocketTimeoutException: Read timed out"

2016-09-12 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486000#comment-15486000
 ] 

Junping Du commented on YARN-5539:
--

I think this exception hints our TimelineClient retry logic leak the exception 
in SokectTimeout case other than ConnectException.
{noformat}
public boolean shouldRetryOn(Exception e) {
  // Only retry on connection exceptions
  return (e instanceof ClientHandlerException)
  && (e.getCause() instanceof ConnectException);
}
{noformat}
This is a valid issue but only can be found in very occasional cases.
Reopen this issue to address the corner case. Will put up a patch soon!

> AM fails due to "java.net.SocketTimeoutException: Read timed out"
> -
>
> Key: YARN-5539
> URL: https://issues.apache.org/jira/browse/YARN-5539
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Junping Du
>Priority: Critical
>
> AM fails with the following exception
> {code}
> FATAL distributedshell.ApplicationMaster: Error running ApplicationMaster
> com.sun.jersey.api.client.ClientHandlerException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run(TimelineClientImpl.java:236)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:185)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:247)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPostingObject(TimelineWriter.java:154)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:115)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:112)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPosting(TimelineWriter.java:112)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:92)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:345)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1166)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:567)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:298)
> Caused by: java.net.SocketTimeoutException: Read timed out
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:170)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:253)
>   at 
> org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:77)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:132)
>   at 
> 

[jira] [Updated] (YARN-5637) Changes in NodeManager to support Container upgrade and rollback/commit

2016-09-12 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-5637:
--
Attachment: YARN-5637.001.patch

Uploading initial patch based on the framework laid out in YARN-5620.
I havn't submitted the patch yet since it depends on YARN-5620.. will do once 
that goes in.

[~jianhe], [~vvasudev]... do let me know what you think.

Some assumptions:
* AutoCommit is set to _true_ by default
* If _retryFailurePolicy_ is NOT specified, then the Container will be 
terminated immediately if upgrade fails.
* If _retryFailurePolicy_ is specified, the upgraded Container will be 
retried/relaunch as per the policy.. and if that fails... then container will 
be rolledback to previous state.
* If _autoCommit_ is *false* and user has called the _commitUpgarde_ API, then 
no rollback will be allowed after that.
* If _autoCommit_ is *true* then no Explicit or implicit (upgraded process 
fails to start) rollback will be allowed.
 

> Changes in NodeManager to support Container upgrade and rollback/commit
> ---
>
> Key: YARN-5637
> URL: https://issues.apache.org/jira/browse/YARN-5637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5637.001.patch
>
>
> YARN-5620 added support for re-initialization of Containers using a new 
> launch Context.
> This JIRA proposes to use the above feature to support upgrade and subsequent 
> rollback or commit of the upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1558) After apps are moved across queues, store new queue info in the RM state store

2016-09-12 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485925#comment-15485925
 ] 

Xianyin Xin commented on YARN-1558:
---

is there any update about this?

> After apps are moved across queues, store new queue info in the RM state store
> --
>
> Key: YARN-1558
> URL: https://issues.apache.org/jira/browse/YARN-1558
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Sandy Ryza
>
> The result of moving an app to a new queue should persist across RM restarts. 
>  This will require updating the ApplicationSubmissionContext, the single 
> source of truth upon state recovery, with the new queue info.
> There will be a brief window after the move completes before the move is 
> stored.  If the RM dies during this window, the recovered RM will include the 
> old queue info.  Schedulers should be resilient to this situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-12 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-5605:
---
Attachment: yarn-5605-3.patch

> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch, yarn-5605-2.patch, yarn-5605-3.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3359) Recover collector list in RM failed over

2016-09-12 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485847#comment-15485847
 ] 

Li Lu commented on YARN-3359:
-

Thanks [~jdu]! Actually keeping the copies in the NMs will do the work. The 
only challenge is when two or more then two collectors for the same application 
got launched (because of some cluster partition, for example). Therefore the RM 
needs to keep a version number for collectors, so that when rebuilding app to 
collector mappings, it knows which collectors are stale and which one is 
active. 

bq. btw, app's attempt id shouldn't be used here as collector is designed to be 
independent of AM lifecycle - it also means AM failed doesn't hint collector 
need to be killed/restarted. Do I miss anything?

You're right. We should not do it... 

> Recover collector list in RM failed over
> 
>
> Key: YARN-3359
> URL: https://issues.apache.org/jira/browse/YARN-3359
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Li Lu
>  Labels: YARN-5355
>
> Per discussion in YARN-3039, split the recover work from RMStateStore in a 
> separated JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5540) scheduler spends too much time looking at empty priorities

2016-09-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485794#comment-15485794
 ] 

Wangda Tan commented on YARN-5540:
--

Thanks [~jlowe], patch generally looks good, only a few minor comments: 
1) It might be better to rename add/removeSchedulerKeyReference to 
increase/decreaseSchedulerKeyReference since schedulerKey ref will not be 
removed unless value == 0. 
2) Do you think we should remove:
bq. // TODO: Shouldn't we activate even if numContainers = 0?

> scheduler spends too much time looking at empty priorities
> --
>
> Key: YARN-5540
> URL: https://issues.apache.org/jira/browse/YARN-5540
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, fairscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Nathan Roberts
>Assignee: Jason Lowe
> Attachments: YARN-5540.001.patch, YARN-5540.002.patch
>
>
> We're starting to see the capacity scheduler run out of scheduling horsepower 
> when running 500-1000 applications on clusters with 4K nodes or so.
> This seems to be amplified by TEZ applications. TEZ applications have many 
> more priorities (sometimes in the hundreds) than typical MR applications and 
> therefore the loop in the scheduler which examines every priority within 
> every running application, starts to be a hotspot. The priorities appear to 
> stay around forever, even when there is no remaining resource request at that 
> priority causing us to spend a lot of time looking at nothing.
> jstack snippet:
> {noformat}
> "ResourceManager Event Processor" #28 prio=5 os_prio=0 tid=0x7fc2d453e800 
> nid=0x22f3 runnable [0x7fc2a8be2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceRequest(SchedulerApplicationAttempt.java:210)
> - eliminated <0x0005e73e5dc0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:852)
> - locked <0x0005e73e5dc0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
> - locked <0x0003006fcf60> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:527)
> - locked <0x0003001b22f8> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:415)
> - locked <0x0003001b22f8> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1224)
> - locked <0x000300041e40> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3359) Recover collector list in RM failed over

2016-09-12 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485786#comment-15485786
 ] 

Junping Du commented on YARN-3359:
--

btw, app's attempt id shouldn't be used here as collector is designed to be 
independent of AM lifecycle -  it also means AM failed doesn't hint collector 
need to be killed/restarted. Do I miss anything?

> Recover collector list in RM failed over
> 
>
> Key: YARN-3359
> URL: https://issues.apache.org/jira/browse/YARN-3359
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Li Lu
>  Labels: YARN-5355
>
> Per discussion in YARN-3039, split the recover work from RMStateStore in a 
> separated JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5638) Introduce a collector Id to uniquely identify collectors and their creation order

2016-09-12 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485778#comment-15485778
 ] 

Junping Du commented on YARN-5638:
--

Why collector's lifecycle need to bind with AM attempt? AM failed doesn't means 
collector have to be killed/restart.

> Introduce a collector Id to uniquely identify collectors and their creation 
> order
> -
>
> Key: YARN-5638
> URL: https://issues.apache.org/jira/browse/YARN-5638
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
>
> As discussed in YARN-3359, we need to further identify timeline collectors 
> and their creation order for better service discovery and resource isolation. 
> This JIRA proposes to use  to accurately identify 
> each timeline collector. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3359) Recover collector list in RM failed over

2016-09-12 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485770#comment-15485770
 ] 

Junping Du commented on YARN-3359:
--

Thanks for bringing up many options here. Rather than RM state store, I think 
there is another much simpler and more lightweight solution - cause NM know 
exactly which collectors are running on it (collector use a NM local RPC to 
notify NM before), NM can just notify RM in registering to new or restarted RM. 
What do you think?

> Recover collector list in RM failed over
> 
>
> Key: YARN-3359
> URL: https://issues.apache.org/jira/browse/YARN-3359
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Li Lu
>  Labels: YARN-5355
>
> Per discussion in YARN-3039, split the recover work from RMStateStore in a 
> separated JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5324) Stateless router policies implementation

2016-09-12 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485752#comment-15485752
 ] 

Subru Krishnan commented on YARN-5324:
--

Thanks [~curino] for the patch. I looked at it and have a few comments below.

Can we move *SubClusterIdInfo* --> federation.store.records.dao as it can  
potentially be used elsewhere like in future Federation Admin REST APIs.

In *WeightedPolicyInfo*:
  * Rename _routerWeights_ --> _routerPolicyWeights_.
  * Rename _amrmWeights_  --> _AMRMPolicyWeights_
  * Add Javadocs for the getter/setters of the weights as they are not very 
inituitive.
  * Why are we iterating through the weight maps for every get 
(_getRouterWeights/getAmrmWeights_)? Either we should avoid this or do this 
once at initialization.
  * We should move the JSONJAXBContext and marshaller/unmarshaller as class 
variables and initialize at declaration as they are expensive ops.
  * Can we give some example for alpha values and their effect to provide more 
context.
  * To improve readability, can we split the if into two - for routerWeights & 
amrmWeights respectively;
{code}
if(otherAMRMWeights != null && amrmWeights != null &&
otherRouterWeights != null && routerWeights != null) {
 return CollectionUtils.isEqualCollection(otherAMRMWeights.entrySet(),
amrmWeights.entrySet()) && CollectionUtils.isEqualCollection(
  (otherRouterWeights.entrySet()), routerWeights.entrySet());
{code}
  * For Hashcode - {code} 31 * amrmWeights + routerWeights {code} should be a 
better option.

In *BaseWeightedRouterPolicy*:

  * Something seems off in the sentence formation of class Javadocs.
  * _getRand()_ seems to be used only by *WeightedRandomRouterPolicy* so can be 
moved as a local variable as is done in *UniformRandomRouterPolicy*.
  * Shouldn't {code} if (policyInfo != null && 
policyInfo.equals(newPolicyInfo)) { {code} be {code} if (policyInfo == null || 
policyInfo.equals(newPolicyInfo)) { {code}
  * Why are we not checking _amrmWeights_ in:
{code}
if (newWeights == null || newWeights.size() < 1) {
{code}
  * I think it'll be good if we move all the validations to either a validator 
or a separate validate method.
  * We should have the check for active sub-clusters here as now every policy 
repeats it.
  * I feel we should define the {{protected SubClusterId 
selectSubCluster(Map activeSubclusters, 
Map routerPolicyWeights)}} in the base class and 
implementing policies can override it accordingly.

In *LoadBasedRouterPolicy*:

  * We should invoke _getAvailableMemory_ only if weight is 1.
  * Do we have to do this for every invocation of _getHomeSubcluster_. Seems 
like it but just want to make sure?

In *OrderedRouterPolicy*:

  * Typo in Javadoc; _Heights_ should be _Highest_.
  * I feel it should round robin between the active sub-clusters as current 
policy will pick the same sub-cluster every time (assuming that entire 
sub-cluster downtime is rare, more so with RM HA). This feels more like a 
*PriorityRouterPolicy*.

In *WeightedRandomRouterPolicy*:

  * {code} if (getPolicyInfo() == null) { {code} isn't the check redundant?
  * Can we add some code comments to clarify how exactly the selection is done.
  * 
{code} 
chosen = id;
if (lookupValue <= 0) {
break;
}
{code} 
should be 
{code} 
if (lookupValue <= 0) {
   return id;
}
{code}

In *MockPolicyManager*:
  * Might be better to return "default" rather than null in _getQueue_?

Common across tests:
  * Am I missing something; All the tests call it out but I don't see scenarios 
covering multiple invocations or clusters going inactive?
  * We should have a {{BaseRouterPolicyTest}} and move the tests there and 
override only the policy context in individual policy test. Refer to 
{{FederationStateStoreBaseTest}}.
  *  How is  
*FederationPoliciesTestUtil::createResourceRequests/createResourceRequest* 
used? Why can't we use *ResourceRequest::newInstance* for the latter?
  * Nit: variables can be declared outside the loop in _setUp_

In *TestLoadBasedRouterPolicy*: 

  * Typo in class Javadoc: _weighiting_ --> _weighting_


> Stateless router policies implementation
> 
>
> Key: YARN-5324
> URL: https://issues.apache.org/jira/browse/YARN-5324
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: YARN-2915
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-5324-YARN-2915.06.patch, 
> YARN-5324-YARN-2915.07.patch, YARN-5324-YARN-2915.08.patch, 
> YARN-5324-YARN-2915.09.patch, YARN-5324.01.patch, YARN-5324.02.patch, 
> YARN-5324.03.patch, YARN-5324.04.patch, YARN-5324.05.patch
>
>
> These are policies at the Router that do not require maintaing state across 
> choices (e.g., 

[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485753#comment-15485753
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r78478634
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -557,28 +599,33 @@ private boolean preemptContainerPreCheck() {
 getFairShare());
   }
 
-  /**
-   * Is a queue being starved for its min share.
-   */
-  @VisibleForTesting
-  boolean isStarvedForMinShare() {
-return isStarved(getMinShare());
+  private Resource minShareStarvation() {
+Resource desiredShare = Resources.min(policy.getResourceCalculator(),
+scheduler.getClusterResource(), getMinShare(), getDemand());
+
+Resource starvation = Resources.subtract(desiredShare, 
getResourceUsage());
+boolean starved = Resources.greaterThan(policy.getResourceCalculator(),
+scheduler.getClusterResource(), starvation, none());
+
+long now = scheduler.getClock().getTime();
+if (!starved) {
+  setLastTimeAtMinShare(now);
+}
+
+if (starved &&
+(now - lastTimeAtMinShare > getMinSharePreemptionTimeout())) {
+  return starvation;
+} else {
+  return Resources.clone(Resources.none());
--- End diff --

Actually, this can be simplified further. Improvement in updated patch. 


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch, yarn-5605-2.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1468) TestRMRestart.testRMRestartWaitForPreviousAMToFinish get failed.

2016-09-12 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485549#comment-15485549
 ] 

Daniel Templeton commented on YARN-1468:


For what it's worth, I'm seeing the same issue as [~mitdesai].  I'm going to 
take a look this week.

> TestRMRestart.testRMRestartWaitForPreviousAMToFinish get failed.
> 
>
> Key: YARN-1468
> URL: https://issues.apache.org/jira/browse/YARN-1468
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>
> Log is as following:
> {code}
> Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 149.968 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
> testRMRestartWaitForPreviousAMToFinish(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
>   Time elapsed: 44.197 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: AppAttempt state is not correct 
> (timedout) expected: but was:
> at junit.framework.Assert.fail(Assert.java:50)
> at junit.framework.Assert.failNotEquals(Assert.java:287)
> at junit.framework.Assert.assertEquals(Assert.java:67)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:292)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:826)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:464)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4205) Add a service for monitoring application life time out

2016-09-12 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485470#comment-15485470
 ] 

Vinod Kumar Vavilapalli commented on YARN-4205:
---

bq. Added ApplicationTimeouts class that contains lifetime values. This class 
can be used in future to support for any other timeouts such as queu_timeout or 
statestore_timeout.
Makes sense.

> Add a service for monitoring application life time out
> --
>
> Key: YARN-4205
> URL: https://issues.apache.org/jira/browse/YARN-4205
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: nijel
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4205.patch, 0002-YARN-4205.patch, 
> 0003-YARN-4205.patch, YARN-4205_01.patch, YARN-4205_02.patch, 
> YARN-4205_03.patch
>
>
> This JIRA intend to provide a lifetime monitor service. 
> The service will monitor the applications where the life time is configured. 
> If the application is running beyond the lifetime, it will be killed. 
> The lifetime will be considered from the submit time.
> The thread monitoring interval is configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5638) Introduce a collector Id to uniquely identify collectors and their creation order

2016-09-12 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485366#comment-15485366
 ] 

Li Lu commented on YARN-5638:
-

Having put some more thoughts, I think adding the attempt ID as one part of 
identifying collectors may not be a good idea. Collectors belong to 
applications but not attempts. So it is possible that we map multiple app 
attempts to one collector (if the collector is ran in a separate container). It 
is also possible to have several collectors within one application, though. For 
example, a node got separated from other nodes, and all applications running on 
it will get relaunched. Thus, I think it make sense to preserve the old  mapping when facing users, but internally we need to treat the 
corner case that multiple collectors (one active, others stale) are mapped to 
the same application. 

> Introduce a collector Id to uniquely identify collectors and their creation 
> order
> -
>
> Key: YARN-5638
> URL: https://issues.apache.org/jira/browse/YARN-5638
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
>
> As discussed in YARN-3359, we need to further identify timeline collectors 
> and their creation order for better service discovery and resource isolation. 
> This JIRA proposes to use  to accurately identify 
> each timeline collector. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5324) Stateless router policies implementation

2016-09-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485329#comment-15485329
 ] 

Hadoop QA commented on YARN-5324:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
39s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
45s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 11s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common: 
The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
51s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 53s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
16s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 14m 34s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12828108/YARN-5324-YARN-2915.09.patch
 |
| JIRA Issue | YARN-5324 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 411c7df3a9a1 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | YARN-2915 / 302d206 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/13091/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/13091/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/13091/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Stateless router policies implementation
> 
>
> Key: YARN-5324
> URL: https://issues.apache.org/jira/browse/YARN-5324

[jira] [Commented] (YARN-5324) Stateless router policies implementation

2016-09-12 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485287#comment-15485287
 ] 

Carlo Curino commented on YARN-5324:


More cleanups. The remaining checkstyle (>7 params) I think is ok as is (used 
only by tests to create a {{ResourceRequest}} for testing).

> Stateless router policies implementation
> 
>
> Key: YARN-5324
> URL: https://issues.apache.org/jira/browse/YARN-5324
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: YARN-2915
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-5324-YARN-2915.06.patch, 
> YARN-5324-YARN-2915.07.patch, YARN-5324-YARN-2915.08.patch, 
> YARN-5324-YARN-2915.09.patch, YARN-5324.01.patch, YARN-5324.02.patch, 
> YARN-5324.03.patch, YARN-5324.04.patch, YARN-5324.05.patch
>
>
> These are policies at the Router that do not require maintaing state across 
> choices (e.g., weighted random).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5324) Stateless router policies implementation

2016-09-12 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-5324:
---
Attachment: YARN-5324-YARN-2915.09.patch

> Stateless router policies implementation
> 
>
> Key: YARN-5324
> URL: https://issues.apache.org/jira/browse/YARN-5324
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: YARN-2915
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-5324-YARN-2915.06.patch, 
> YARN-5324-YARN-2915.07.patch, YARN-5324-YARN-2915.08.patch, 
> YARN-5324-YARN-2915.09.patch, YARN-5324.01.patch, YARN-5324.02.patch, 
> YARN-5324.03.patch, YARN-5324.04.patch, YARN-5324.05.patch
>
>
> These are policies at the Router that do not require maintaing state across 
> choices (e.g., weighted random).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3359) Recover collector list in RM failed over

2016-09-12 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485264#comment-15485264
 ] 

Li Lu commented on YARN-3359:
-

Thanks [~vinodkv]! Creating YARN-5638 to trace the collector ID related changes 
separately. 

> Recover collector list in RM failed over
> 
>
> Key: YARN-3359
> URL: https://issues.apache.org/jira/browse/YARN-3359
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Li Lu
>  Labels: YARN-5355
>
> Per discussion in YARN-3039, split the recover work from RMStateStore in a 
> separated JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5638) Introduce a collector Id to uniquely identify collectors and their creation order

2016-09-12 Thread Li Lu (JIRA)
Li Lu created YARN-5638:
---

 Summary: Introduce a collector Id to uniquely identify collectors 
and their creation order
 Key: YARN-5638
 URL: https://issues.apache.org/jira/browse/YARN-5638
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu


As discussed in YARN-3359, we need to further identify timeline collectors and 
their creation order for better service discovery and resource isolation. This 
JIRA proposes to use  to accurately identify each 
timeline collector. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5561) [Atsv2] : Support for ability to retrieve apps/app-attempt/containers and entities via REST

2016-09-12 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485249#comment-15485249
 ] 

Li Lu commented on YARN-5561:
-

Thanks [~rohithsharma], the two proposed APIs make the app - attempt - 
container APIs more comprehensive. LGTM. 

bq. are looking for a complete applications page where all applications which 
were running/completed had to be listed.
This is something I'm not sure if we'd like to support on the timeline level. 
This information is pretty much available on the RM side via the state store. 
Why do we want to encourage an expensive operation on the timeline store for 
this data? 

> [Atsv2] : Support for ability to retrieve apps/app-attempt/containers and 
> entities via REST
> ---
>
> Key: YARN-5561
> URL: https://issues.apache.org/jira/browse/YARN-5561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5561.patch, YARN-5561.v0.patch
>
>
> ATSv2 model lacks retrieval of {{list-of-all-apps}}, 
> {{list-of-all-app-attempts}} and {{list-of-all-containers-per-attempt}} via 
> REST API's. And also it is required to know about all the entities in an 
> applications.
> It is pretty much highly required these URLs for Web  UI.
> New REST URL would be 
> # GET {{/ws/v2/timeline/apps}}
> # GET {{/ws/v2/timeline/apps/\{app-id\}/appattempts}}.
> # GET 
> {{/ws/v2/timeline/apps/\{app-id\}/appattempts/\{attempt-id\}/containers}}
> # GET {{/ws/v2/timeline/apps/\{app id\}/entities}} should display list of 
> entities that can be queried.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3359) Recover collector list in RM failed over

2016-09-12 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485242#comment-15485242
 ] 

Vinod Kumar Vavilapalli commented on YARN-3359:
---

bq. Right now each collector is mapped with an app ID, but to handle the state 
recover case, we need to associate each collector with an attempt ID (and 
ideally a time stamp to further distinguish collectors).
+1 in general. We can simply have the RM generate a collector-id to be 
(app-attempt-ID + another ID identifying the collector). RM *has* to know about 
collectors for scheduling later when the collector runs in its own container, 
so letting it generate the IDs is reasonable.

> Recover collector list in RM failed over
> 
>
> Key: YARN-3359
> URL: https://issues.apache.org/jira/browse/YARN-3359
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Li Lu
>  Labels: YARN-5355
>
> Per discussion in YARN-3039, split the recover work from RMStateStore in a 
> separated JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5555) Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically nested.

2016-09-12 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-:
-
Fix Version/s: (was: 3.0.0-alpha2)
   (was: 2.9.0)
   2.8.0

Thanks [~varun_saxena]. I have backported this to 2.8.0

> Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically 
> nested.
> 
>
> Key: YARN-
> URL: https://issues.apache.org/jira/browse/YARN-
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: PctOfQueueIsInaccurate.jpg, YARN-.001.patch
>
>
> If a leaf queue is hierarchically nested (e.g., {{root.a.a1}}, 
> {{root.a.a2}}), the values in the "*% of Queue*" column in the apps section 
> of the Scheduler UI is calculated as if the leaf queue ({{a1}}) were a direct 
> child of {{root}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3359) Recover collector list in RM failed over

2016-09-12 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485225#comment-15485225
 ] 

Li Lu commented on YARN-3359:
-

I've got some offline discussion with [~vinodkv] about this issue. We cannot 
simply preserve collector states in the RM state store since this state is not 
final, and updating this status frequently will block the RM. A natural 
replacement place for the state store is the NM state store. That is to say, we 
can rebuild RM's collector table by getting updates from the NMs. In summary, 
we need to do the following things:

For NMs: 
1. on collector launching, preserve collector address in its state store. 
2. on removing collectors, remove the related item from state store. 
3. on start up, recover collector addresses from state store. 
4. on resync, send current collector address mapping to the RM. 

For RMs, the only change needed is to rebuild the collector/address mapping 
upon restart. This actually involves a pretty messy corner case: when one 
application has two different attempts running (due to some network problems, 
for example) and the RM is trying to rebuild collector status, the RM needs to 
know which collector is for the latest app attempt and which one is for the 
stale attempt. This requires some changes in collector IDs. Right now each 
collector is mapped with an app ID, but to handle the state recover case, we 
need to associate each collector with an attempt ID (and ideally a time stamp 
to further distinguish collectors). 

Not sure if we missed some critical points in this design. Thoughts? 

> Recover collector list in RM failed over
> 
>
> Key: YARN-3359
> URL: https://issues.apache.org/jira/browse/YARN-3359
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Li Lu
>  Labels: YARN-5355
>
> Per discussion in YARN-3039, split the recover work from RMStateStore in a 
> separated JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3359) Recover collector list in RM failed over

2016-09-12 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485188#comment-15485188
 ] 

Li Lu commented on YARN-3359:
-

Thanks [~djp], I'm taking it over. Thanks for letting me know YARN-4758 which 
may be a potential conflict to this JIRA. Let me first explore possible options 
to persist the state of collectors here. After the plan is fixed here we can 
decide how to proceed. 

> Recover collector list in RM failed over
> 
>
> Key: YARN-3359
> URL: https://issues.apache.org/jira/browse/YARN-3359
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Li Lu
>  Labels: YARN-5355
>
> Per discussion in YARN-3039, split the recover work from RMStateStore in a 
> separated JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5613) Fair Scheduler can assign containers from blacklisted nodes

2016-09-12 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton resolved YARN-5613.

Resolution: Invalid

Turns out the issue I was seeing is coincidentally resolved by YARN-3547.

> Fair Scheduler can assign containers from blacklisted nodes
> ---
>
> Key: YARN-5613
> URL: https://issues.apache.org/jira/browse/YARN-5613
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-5613.001.patch
>
>
> The {{FairScheduler.allocate()}} makes its resource request before it updates 
> the blacklist.  If the scheduler processes the resource request before the 
> allocating thread updates the blacklist, the scheduler can assign containers 
> that are on nodes in the blacklist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4404) Typo in comment in SchedulerUtils

2016-09-12 Thread Manjunath Ballur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485177#comment-15485177
 ] 

Manjunath Ballur commented on YARN-4404:


There are lot many places where this typo exists. Can I take this up? If yes, 
please assign this to me.

> Typo in comment in SchedulerUtils
> -
>
> Key: YARN-4404
> URL: https://issues.apache.org/jira/browse/YARN-4404
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Devon Michaels
>Priority: Trivial
>  Labels: newbie
>
> The comment starting on line 254 says:
> {code}
>   /**
>* Utility method to validate a resource request, by insuring that the
>* requested memory/vcore is non-negative and not greater than max
>* 
>* @throws InvalidResourceRequestException when there is invalid request
>*/
> {code}
> "Insuring" should be "ensuring."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-3359) Recover collector list in RM failed over

2016-09-12 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu reassigned YARN-3359:
---

Assignee: Li Lu  (was: Junping Du)

> Recover collector list in RM failed over
> 
>
> Key: YARN-3359
> URL: https://issues.apache.org/jira/browse/YARN-3359
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Li Lu
>  Labels: YARN-5355
>
> Per discussion in YARN-3039, split the recover work from RMStateStore in a 
> separated JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5324) Stateless router policies implementation

2016-09-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485097#comment-15485097
 ] 

Hadoop QA commented on YARN-5324:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
6s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
45s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 11s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common: 
The patch generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 52s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 51s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
15s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 15m 51s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
 |
|  |  org.apache.hadoop.yarn.server.federation.policies.dao.WeightedPolicyInfo 
defines equals and uses Object.hashCode()  At 
WeightedPolicyInfo.java:Object.hashCode()  At WeightedPolicyInfo.java:[lines 
140-157] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12828092/YARN-5324-YARN-2915.08.patch
 |
| JIRA Issue | YARN-5324 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 3d93d9d04bd3 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | YARN-2915 / 302d206 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/13090/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt
 |
| findbugs | 
https://builds.apache.org/job/PreCommit-YARN-Build/13090/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.html
 |
|  Test 

[jira] [Commented] (YARN-5324) Stateless router policies implementation

2016-09-12 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485039#comment-15485039
 ] 

Carlo Curino commented on YARN-5324:


Standard pass on fixing checkstyle/javadoc/findbugs/asflicense issues.

> Stateless router policies implementation
> 
>
> Key: YARN-5324
> URL: https://issues.apache.org/jira/browse/YARN-5324
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: YARN-2915
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-5324-YARN-2915.06.patch, 
> YARN-5324-YARN-2915.07.patch, YARN-5324-YARN-2915.08.patch, 
> YARN-5324.01.patch, YARN-5324.02.patch, YARN-5324.03.patch, 
> YARN-5324.04.patch, YARN-5324.05.patch
>
>
> These are policies at the Router that do not require maintaing state across 
> choices (e.g., weighted random).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5324) Stateless router policies implementation

2016-09-12 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-5324:
---
Attachment: YARN-5324-YARN-2915.08.patch

> Stateless router policies implementation
> 
>
> Key: YARN-5324
> URL: https://issues.apache.org/jira/browse/YARN-5324
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: YARN-2915
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-5324-YARN-2915.06.patch, 
> YARN-5324-YARN-2915.07.patch, YARN-5324-YARN-2915.08.patch, 
> YARN-5324.01.patch, YARN-5324.02.patch, YARN-5324.03.patch, 
> YARN-5324.04.patch, YARN-5324.05.patch
>
>
> These are policies at the Router that do not require maintaing state across 
> choices (e.g., weighted random).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2571) RM to support YARN registry

2016-09-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485010#comment-15485010
 ] 

Hadoop QA commented on YARN-2571:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 14 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
50s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 30s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
51s {color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
53s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
8s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 13s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 2m 13s {color} 
| {color:red} hadoop-yarn-project_hadoop-yarn generated 1 new + 34 unchanged - 
1 fixed = 35 total (was 35) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 47s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 15 
new + 787 unchanged - 168 fixed = 802 total (was 955) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
44s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch 4 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s 
{color} | {color:green} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-registry 
generated 0 new + 17 unchanged - 35 fixed = 17 total (was 52) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 7s 
{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 41s 
{color} | {color:green} hadoop-yarn-registry in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} |
| 

[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484875#comment-15484875
 ] 

Hadoop QA commented on YARN-5605:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
45s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
56s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 31s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 39s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 10 
new + 133 unchanged - 122 fixed = 143 total (was 255) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 18s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager
 generated 1 new + 937 unchanged - 4 fixed = 938 total (was 941) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 33s 
{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 44m 8s 
{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
20s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 71m 48s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  Inconsistent synchronization of 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.liveContainers;
 locked 94% of time  Unsynchronized access at FSAppAttempt.java:94% of time  
Unsynchronized access at FSAppAttempt.java:[line 500] |
|  |  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSStarvedApps$StarvationComparator
 implements Comparator but not Serializable  At FSStarvedApps.java:Serializable 
 At FSStarvedApps.java:[lines 51-55] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Issue | YARN-5605 |
| GITHUB PR | https://github.com/apache/hadoop/pull/124 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  

[jira] [Commented] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero

2016-09-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484852#comment-15484852
 ] 

Wangda Tan commented on YARN-5545:
--

[~bibinchundatt], [~sunilg], [~Naganarasimha].

Thanks for discussion,

I think for this issue, what we should do:
- Don't split maximum-application-number to per-partition, as we already have 
am-resource-percent-per-partition, adding more per-partition configuration will 
confuse user
- And also, you cannot say one app belongs to one partition, you can only say 
one AM belongs to one partition
- So queue will split maximum-application-number according to ratio of their 
total configured resource across partitions. For example, 
{code}
Cluster maximum-application = 100, 
queueA configured partitionX = 10G, partitionY = 20G; 
queueB configured partitionX = 20G, partitionY = 50G;
{code}
So queueA 's maximum-application is 100 * (10 + 20) / (10 + 20 + 20 + 50)  = 30
And queueB's maximum-application is 100 * (20 + 50) / (10 + 20 + 20 + 50)  = 70 
- Please note that, the maximum-applications of queues will be updated when CS 
configuration updated (refresh queue), and cluster resource updated, so we need 
to update it inside CSQueue#updateClusterResource .

Thoughts?

> App submit failure on queue with label when default queue partition capacity 
> is zero
> 
>
> Key: YARN-5545
> URL: https://issues.apache.org/jira/browse/YARN-5545
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: YARN-5545.0001.patch, YARN-5545.0002.patch, 
> YARN-5545.0003.patch, capacity-scheduler.xml
>
>
> Configure capacity scheduler 
> yarn.scheduler.capacity.root.default.capacity=0
> yarn.scheduler.capacity.root.queue1.accessible-node-labels.labelx.capacity=50
> yarn.scheduler.capacity.root.default.accessible-node-labels.labelx.capacity=50
> Submit application as below
> ./yarn jar 
> ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-alpha2-SNAPSHOT-tests.jar
>  sleep -Dmapreduce.job.node-label-expression=labelx 
> -Dmapreduce.job.queuename=default -m 1 -r 1 -mt 1000 -rt 1
> {noformat}
> 2016-08-21 18:21:31,375 INFO mapreduce.JobSubmitter: Cleaning up the staging 
> area /tmp/hadoop-yarn/staging/root/.staging/job_1471670113386_0001
> java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed 
> to submit application_1471670113386_0001 to YARN : 
> org.apache.hadoop.security.AccessControlException: Queue root.default already 
> has 0 applications, cannot accept submission of application: 
> application_1471670113386_0001
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:316)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:255)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1790)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362)
>   at org.apache.hadoop.mapreduce.SleepJob.run(SleepJob.java:273)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.mapreduce.SleepJob.main(SleepJob.java:194)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>   at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>   at 
> org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:136)
>   at 
> org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:144)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit 
> application_1471670113386_0001 to YARN : 
> 

[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-09-12 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484820#comment-15484820
 ] 

Allen Wittenauer commented on YARN-5567:


bq. would you prefer this be a config setting to choose the behavior?

The history of the health check script is interesting, but long.  But not 
trusting the exit code was one of the key learnings by the ops team from the 
HOD experience. It fails a lot more often than people realize, mainly due to 
users doing crazy things, especially on insecure systems.

This is one of those times where it's going to be extremely difficult to 
convince me otherwise.  I can't think of a reason to ever trust the exit code 
enough to bring down the NodeManager.   In this particular environment, the 
number of conditions that the script can fail for reasons which may be 
temporary/pointless are many.  

Now it could be argued that those temporary failures should cause the NM to 
come down, but then you get into a race condition between heartbeats and actual 
issues.  HDFS worked around it by basically saying "it has to fail for X long". 
Ignoring the exit code avoids that problem because one can be sure that "ERROR 
-" really did come from the script.

bq. Alternatively, would you be okay with standardizing on a specific error 
code for "detected bad Node" vs "bad script"?

If by error code you specifically mean the value the NM reports back to the RM, 
yes that makes sense.  It just can't fail the node.  

> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2571) RM to support YARN registry

2016-09-12 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2571:
-
Attachment: YARN-2571-016.patch

> RM to support YARN registry 
> 
>
> Key: YARN-2571
> URL: https://issues.apache.org/jira/browse/YARN-2571
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-2571-001.patch, YARN-2571-002.patch, 
> YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, 
> YARN-2571-008.patch, YARN-2571-009.patch, YARN-2571-010.patch, 
> YARN-2571-012.patch, YARN-2571-013.patch, YARN-2571-015.patch, 
> YARN-2571-016.patch
>
>
> The RM needs to (optionally) integrate with the YARN registry:
> # startup: create the /services and /users paths with system ACLs (yarn, hdfs 
> principals)
> # app-launch: create the user directory /users/$username with the relevant 
> permissions (CRD) for them to create subnodes.
> # attempt, container, app completion: remove service records with the 
> matching persistence and ID



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5296) NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl

2016-09-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484776#comment-15484776
 ] 

Wangda Tan commented on YARN-5296:
--

Already closed it, thanks for explanations!

> NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl
> ---
>
> Key: YARN-5296
> URL: https://issues.apache.org/jira/browse/YARN-5296
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karam Singh
>Assignee: Junping Du
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: YARN-5296-v2.1.patch, YARN-5296-v2.patch, 
> YARN-5296.patch, after v2 fix.png, before v2 fix.png
>
>
> Ran tests in following manner,
> 1. Run GridMix of 768 sequestionally around 17 times to execute about 12.9K 
> apps.
> 2. After 4-5hrs take Check NM Heap using Memory Analyser. It report around 
> 96% Heap is being used my ContainerMetrics
> 3. Run 7 more GridMix run for have around 18.2apps ran in total. Again check 
> NM heap using Memory Analyser again 96% heap is being used by 
> ContainerMetrics. 
> 4. Start one more grimdmix run, while run going on , NMs started going down 
> with OOM, around running 18.7K+, On analysing NM heap using Memory analyser, 
> OOM was caused by ContainerMetrics



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5190) Registering/unregistering container metrics triggered by ContainerEvent and ContainersMonitorEvent are conflict which cause uncaught exception in ContainerMonitorImpl

2016-09-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484772#comment-15484772
 ] 

Wangda Tan commented on YARN-5190:
--

[~djp], HADOOP-13362 should be able to fix the issue, thanks for pointing me 
this. Closing this ticket.

> Registering/unregistering container metrics triggered by ContainerEvent and 
> ContainersMonitorEvent are conflict which cause uncaught exception in 
> ContainerMonitorImpl
> --
>
> Key: YARN-5190
> URL: https://issues.apache.org/jira/browse/YARN-5190
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-5190-branch-2.7.001.patch, YARN-5190-v2.patch, 
> YARN-5190.patch
>
>
> The exception stack is as following:
> {noformat}
> 310735 2016-05-22 01:50:04,554 [Container Monitor] ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[Container 
> Monitor,5,main] threw an Exception.
> 310736 org.apache.hadoop.metrics2.MetricsException: Metrics source 
> ContainerResource_container_1463840817638_14484_01_10 already exists!
> 310737 at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135)
> 310738 at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112)
> 310739 at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
> 310740 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.forContainer(ContainerMetrics.java:212)
> 310741 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.forContainer(ContainerMetrics.java:198)
> 310742 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:385)
> {noformat}
> After YARN-4906, we have multiple places to get ContainerMetrics for a 
> particular container that could cause race condition in registering the same 
> container metrics to DefaultMetricsSystem by different threads. Lacking of 
> proper handling of MetricsException which could get thrown, the exception 
> will could bring down daemon of ContainerMonitorImpl or even whole NM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484683#comment-15484683
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r78414618
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -223,17 +225,76 @@ public void setPolicy(SchedulingPolicy policy)
 }
 super.policy = policy;
   }
-  
+
   @Override
-  public void recomputeShares() {
+  public void updateInternal(boolean checkStarvation) {
 readLock.lock();
 try {
   policy.computeShares(runnableApps, getFairShare());
+  if (checkStarvation) {
+identifyStarvedApplications();
+  }
 } finally {
   readLock.unlock();
 }
   }
 
+  /**
+   * Helper method to identify starved applications. This needs to be 
called
+   * ONLY from {@link #updateInternal}, after the application shares
+   * are updated.
+   *
+   * A queue can be starving due to fairshare or minshare.
+   *
+   * Minshare is defined only on the queue and not the applications.
+   * Fairshare is defined for both the queue and the applications.
+   *
+   * If this queue is starved due to minshare, we need to identify the most
+   * deserving apps if they themselves are not starved due to fairshare.
+   *
+   * If this queue is starving due to fairshare, there must be at least
+   * one application that is starved. And, even if the queue is not
+   * starved due to fairshare, there might still be starved applications.
+   */
+  private void identifyStarvedApplications() {
+// First identify starved applications and track total amount of
+// starvation (in resources)
+Resource fairShareStarvation = Resources.clone(none());
+TreeSet appsWithDemand = fetchAppsWithDemand();
+for (FSAppAttempt app : appsWithDemand) {
+  Resource appStarvation = app.fairShareStarvation();
+  if (Resources.equals(Resources.none(), appStarvation))  {
+break;
+  } else {
--- End diff --

It might be clearer if you swapped the _if_ and _else_.


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch, yarn-5605-2.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484679#comment-15484679
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r78414451
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
 ---
@@ -535,6 +535,23 @@ public synchronized Resource 
getResource(SchedulerRequestKey schedulerKey) {
   }
 
   /**
+   * Method to return the next resource request to be serviced.
+   *
+   * In the initial implementation, we just pick any {@link 
ResourceRequest}
+   * corresponding to the highest priority.
+   *
+   * @return next {@link ResourceRequest} to allocate resources for.
+   */
+  @Unstable
+  public synchronized ResourceRequest getNextResourceRequest() {
+for (ResourceRequest rr:
+resourceRequestMap.get(schedulerKeys.first()).values()) {
+  return rr;
--- End diff --

It's stylistic, and there's no guideline about multiple exit points that I 
know of, so I won't push it.  I don't think this form is very future-safe, 
though.


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch, yarn-5605-2.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484668#comment-15484668
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r78413869
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -223,17 +225,76 @@ public void setPolicy(SchedulingPolicy policy)
 }
 super.policy = policy;
   }
-  
+
   @Override
-  public void recomputeShares() {
+  public void updateInternal(boolean checkStarvation) {
 readLock.lock();
 try {
   policy.computeShares(runnableApps, getFairShare());
+  if (checkStarvation) {
+identifyStarvedApplications();
+  }
 } finally {
   readLock.unlock();
 }
   }
 
+  /**
+   * Helper method to identify starved applications. This needs to be 
called
+   * ONLY from {@link #updateInternal}, after the application shares
+   * are updated.
+   *
+   * A queue can be starving due to fairshare or minshare.
+   *
+   * Minshare is defined only on the queue and not the applications.
+   * Fairshare is defined for both the queue and the applications.
+   *
+   * If this queue is starved due to minshare, we need to identify the most
+   * deserving apps if they themselves are not starved due to fairshare.
+   *
+   * If this queue is starving due to fairshare, there must be at least
+   * one application that is starved. And, even if the queue is not
+   * starved due to fairshare, there might still be starved applications.
+   */
+  private void identifyStarvedApplications() {
+// First identify starved applications and track total amount of
+// starvation (in resources)
+Resource fairShareStarvation = Resources.clone(none());
+TreeSet appsWithDemand = fetchAppsWithDemand();
+for (FSAppAttempt app : appsWithDemand) {
+  Resource appStarvation = app.fairShareStarvation();
+  if (Resources.equals(Resources.none(), appStarvation))  {
+break;
+  } else {
+context.getStarvedApps().addStarvedApp(app);
--- End diff --

I think you should do whatever makes the code cleanest and easiest to 
maintain.  I don't think making the context a glorified hash map helps you in 
any notable way here.


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch, yarn-5605-2.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484660#comment-15484660
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r78413426
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSPreemptionThread.java
 ---
@@ -0,0 +1,173 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.yarn.api.records.ApplicationAttemptId;
+import org.apache.hadoop.yarn.api.records.ContainerStatus;
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.api.records.ResourceRequest;
+import 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer;
+import 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerEventType;
+import 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils;
+import org.apache.hadoop.yarn.util.resource.Resources;
+
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Timer;
+import java.util.TimerTask;
+
+/**
+ * Thread that handles FairScheduler preemption
+ */
+public class FSPreemptionThread extends Thread {
+  private static final Log LOG = 
LogFactory.getLog(FSPreemptionThread.class);
+  private final FSContext context;
+  private final FairScheduler scheduler;
+  private final long warnTimeBeforeKill;
+  private final Timer preemptionTimer;
+
+  public FSPreemptionThread(FairScheduler scheduler) {
+this.scheduler = scheduler;
+this.context = scheduler.getContext();
+FairSchedulerConfiguration fsConf = scheduler.getConf();
+context.setPreemptionEnabled();
+context.setPreemptionUtilizationThreshold(
+fsConf.getPreemptionUtilizationThreshold());
+warnTimeBeforeKill = fsConf.getWaitTimeBeforeKill();
+preemptionTimer = new Timer("Preemption Timer", true);
+
+setDaemon(true);
+setName("FSPreemptionThread");
+  }
+
+  public void run() {
+while (!Thread.interrupted()) {
+  FSAppAttempt starvedApp;
+  try{
+starvedApp = context.getStarvedApps().take();
+if (Resources.none().equals(starvedApp.getStarvation())) {
+  continue;
+}
+  } catch (InterruptedException e) {
+LOG.info("Preemption thread interrupted! Exiting.");
+return;
--- End diff --

I think I was a little confused.  How about this:


  try{
starvedApp = context.getStarvedApps().take();
if (!Resources.none().equals(starvedApp.getStarvation())) {
  List containers = 
identifyContainersToPreempt(starvedApp);
  if (containers != null) {
preemptContainers(containers);
  }
}
  } catch (InterruptedException e) {
LOG.info("Preemption thread interrupted! Exiting.");
interrupt();
  }


It does some extra work inside the _try_, but the logic is much simpler.


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch, yarn-5605-2.patch
>
>

[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484643#comment-15484643
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r78411782
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -557,28 +599,33 @@ private boolean preemptContainerPreCheck() {
 getFairShare());
   }
 
-  /**
-   * Is a queue being starved for its min share.
-   */
-  @VisibleForTesting
-  boolean isStarvedForMinShare() {
-return isStarved(getMinShare());
+  private Resource minShareStarvation() {
+Resource desiredShare = Resources.min(policy.getResourceCalculator(),
+scheduler.getClusterResource(), getMinShare(), getDemand());
+
+Resource starvation = Resources.subtract(desiredShare, 
getResourceUsage());
+boolean starved = Resources.greaterThan(policy.getResourceCalculator(),
+scheduler.getClusterResource(), starvation, none());
+
+long now = scheduler.getClock().getTime();
+if (!starved) {
+  setLastTimeAtMinShare(now);
+}
+
+if (starved &&
+(now - lastTimeAtMinShare > getMinSharePreemptionTimeout())) {
+  return starvation;
+} else {
+  return Resources.clone(Resources.none());
--- End diff --

You can make it:


if (!starved ||
(now - lastTimeAtMinShare < getMinSharePreemptionTimeout())) {
  starvation = Resources.clone(Resources.none());

  if (!starved) {
setLastTimeAtMinShare(now);
  }
}

return starvation;



> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch, yarn-5605-2.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-12 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-5605:
---
Attachment: yarn-5605-2.patch

> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch, yarn-5605-2.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-09-12 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484619#comment-15484619
 ] 

Ray Chiang edited comment on YARN-5567 at 9/12/16 4:58 PM:
---

[~aw], would you prefer this be a config setting to choose the behavior?  
Alternatively, would you be okay with standardizing on a specific error code 
for "detected bad Node" vs "bad script"?


was (Author: rchiang):
[~aw], would you prefer this be a config setting to choose the behavior?

> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2571) RM to support YARN registry

2016-09-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484627#comment-15484627
 ] 

Hadoop QA commented on YARN-2571:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 14 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
49s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 20s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
50s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 41s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
53s {color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 47s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 2m 47s {color} 
| {color:red} hadoop-yarn-project_hadoop-yarn generated 1 new + 34 unchanged - 
1 fixed = 35 total (was 35) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 49s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 15 
new + 788 unchanged - 168 fixed = 803 total (was 956) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
51s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch 4 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s 
{color} | {color:green} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-registry 
generated 0 new + 17 unchanged - 35 fixed = 17 total (was 52) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 8s 
{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 48s 
{color} | {color:green} hadoop-yarn-registry in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} |
| 

[jira] [Commented] (YARN-4767) Network issues can cause persistent RM UI outage

2016-09-12 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484623#comment-15484623
 ] 

Karthik Kambatla commented on YARN-4767:


Ping [~vinodkv]

> Network issues can cause persistent RM UI outage
> 
>
> Key: YARN-4767
> URL: https://issues.apache.org/jira/browse/YARN-4767
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.7.2
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4767.001.patch, YARN-4767.002.patch, 
> YARN-4767.003.patch, YARN-4767.004.patch, YARN-4767.005.patch, 
> YARN-4767.006.patch, YARN-4767.007.patch, YARN-4767.008.patch, 
> YARN-4767.009.patch, YARN-4767.010.patch
>
>
> If a network issue causes an AM web app to resolve the RM proxy's address to 
> something other than what's listed in the allowed proxies list, the 
> AmIpFilter will 302 redirect the RM proxy's request back to the RM proxy.  
> The RM proxy will then consume all available handler threads connecting to 
> itself over and over, resulting in an outage of the web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-09-12 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484619#comment-15484619
 ] 

Ray Chiang commented on YARN-5567:
--

[~aw], would you prefer this be a config setting to choose the behavior?

> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5620) Core changes in NodeManager to support re-initialization of Containers with new launchContext

2016-09-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484558#comment-15484558
 ] 

Hadoop QA commented on YARN-5620:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
9s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
49s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 22s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 10 new + 515 unchanged - 4 fixed = 525 total (was 519) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
54s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager
 generated 0 new + 240 unchanged - 2 fixed = 240 total (was 242) {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 34s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
15s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 29m 53s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.TestDefaultContainerExecutor |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12828059/YARN-5620.012.patch |
| JIRA Issue | YARN-5620 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux f4886f9afe87 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 9faccd1 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/13087/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/13087/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-YARN-Build/13087/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 

[jira] [Created] (YARN-5637) Changes in NodeManager to support Container upgrade and rollback/commit

2016-09-12 Thread Arun Suresh (JIRA)
Arun Suresh created YARN-5637:
-

 Summary: Changes in NodeManager to support Container upgrade and 
rollback/commit
 Key: YARN-5637
 URL: https://issues.apache.org/jira/browse/YARN-5637
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun Suresh
Assignee: Arun Suresh


YARN-5620 added support for re-initialization of Containers using a new launch 
Context.
This JIRA proposes to use the above feature to support upgrade and subsequent 
rollback or commit of the upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5620) Core changes in NodeManager to support re-initialization of Containers with new launchContext

2016-09-12 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-5620:
--
Summary: Core changes in NodeManager to support re-initialization of 
Containers with new launchContext  (was: Core changes in NodeManager to support 
for re-initialization of Containers)

> Core changes in NodeManager to support re-initialization of Containers with 
> new launchContext
> -
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, 
> YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch, 
> YARN-5620.009.patch, YARN-5620.010.patch, YARN-5620.011.patch, 
> YARN-5620.012.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with a new {{ContainerLaunchContext}} 
> as well as the ability to rollback the upgrade if the container is not able 
> to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5620) Core changes in NodeManager to support for re-initialization of Containers

2016-09-12 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-5620:
--
Summary: Core changes in NodeManager to support for re-initialization of 
Containers  (was: Core changes in NodeManager to support for upgrade and 
rollback of Containers)

> Core changes in NodeManager to support for re-initialization of Containers
> --
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, 
> YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch, 
> YARN-5620.009.patch, YARN-5620.010.patch, YARN-5620.011.patch, 
> YARN-5620.012.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with a new {{ContainerLaunchContext}} 
> as well as the ability to rollback the upgrade if the container is not able 
> to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-12 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-5620:
--
Attachment: YARN-5620.012.patch

Done.. Thanks [~jianhe]..

> Core changes in NodeManager to support for upgrade and rollback of Containers
> -
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, 
> YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch, 
> YARN-5620.009.patch, YARN-5620.010.patch, YARN-5620.011.patch, 
> YARN-5620.012.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with a new {{ContainerLaunchContext}} 
> as well as the ability to rollback the upgrade if the container is not able 
> to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5630) NM fails to start after downgrade from 2.8 to 2.7

2016-09-12 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484465#comment-15484465
 ] 

Jason Lowe commented on YARN-5630:
--

Thanks for the +1, Arun!  I'll commit this later today if there are no 
objections.

Regarding the key removal, yes if we read the keys out of the store itself then 
that totally makes sense.  Actually I was assuming that during the rollback 
when the old software saw a key that was ignored it would just remove it as it 
iterated the database.  I suppose we could have the entry flag whether or not 
the ignored key should be preserved as long as the container is active in case 
there is a subsequent roll-forward.  Then there's no need for a 
prepare-for-rollback once the support for the key table is in the old software 
version and all we need to do is delete keys.  Of course if there's more to do 
than just delete keys or fail containers then something needs to be run on the 
new software before downgrading to the old one.

> NM fails to start after downgrade from 2.8 to 2.7
> -
>
> Key: YARN-5630
> URL: https://issues.apache.org/jira/browse/YARN-5630
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: YARN-5630.001.patch, YARN-5630.002.patch
>
>
> A downgrade from 2.8 to 2.7 causes nodemanagers to fail to start due to an 
> unrecognized "version" container key on startup.  This breaks downgrades from 
> 2.8 to 2.7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2571) RM to support YARN registry

2016-09-12 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2571:
-
Attachment: YARN-2571-015.patch

YARN-2517 patch 015: remove a test which was obsolete, but which didn't show as 
it was broken. Fix up more javadocs

> RM to support YARN registry 
> 
>
> Key: YARN-2571
> URL: https://issues.apache.org/jira/browse/YARN-2571
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-2571-001.patch, YARN-2571-002.patch, 
> YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, 
> YARN-2571-008.patch, YARN-2571-009.patch, YARN-2571-010.patch, 
> YARN-2571-012.patch, YARN-2571-013.patch, YARN-2571-015.patch
>
>
> The RM needs to (optionally) integrate with the YARN registry:
> # startup: create the /services and /users paths with system ACLs (yarn, hdfs 
> principals)
> # app-launch: create the user directory /users/$username with the relevant 
> permissions (CRD) for them to create subnodes.
> # attempt, container, app completion: remove service records with the 
> matching persistence and ID



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5630) NM fails to start after downgrade from 2.8 to 2.7

2016-09-12 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484432#comment-15484432
 ] 

Arun Suresh commented on YARN-5630:
---

In that case... +1 for the latest patch..

bq. it adds another user-visible phase to the rollback procedure and places the 
burden on admins, requiring them to either know what keys are valid/appropriate 
to specify ..
Not necessarily. We can check in a rollback file that contains all new keys 
included in every compatible CURRENT_VERSION_INFO which can be scrubbed safely 
before rollback.
{code}

1.3
/version

1.2
/logDir
/workDir

1.1
/queued
...
{code}
rel 2.8 corresponds to ver 1.3 version info. and rel 2.7 corresponds to 1.0. 
The _--prepare-for-rollback_ flag just takes a target version_info argument (in 
this case is 1.0). And now we have a list of keys to scrub without any deep 
involvement from the admin (Additionally, we can choose to to Key the file with 
the release version rather than the CURRENT_VERSION_INFO to make it even easier)

> NM fails to start after downgrade from 2.8 to 2.7
> -
>
> Key: YARN-5630
> URL: https://issues.apache.org/jira/browse/YARN-5630
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: YARN-5630.001.patch, YARN-5630.002.patch
>
>
> A downgrade from 2.8 to 2.7 causes nodemanagers to fail to start due to an 
> unrecognized "version" container key on startup.  This breaks downgrades from 
> 2.8 to 2.7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-09-12 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484429#comment-15484429
 ] 

Allen Wittenauer commented on YARN-5567:


bq. Should we think of some other state which could warn the admin about 
this(which is captured in webui/Rest)?

Probably. The key problem is going to be putting it some place that admins will 
actually notice it. (Hint: most folks in ops that I know don't actually look at 
the web UIs...)

If folks want to pursue that, they'll need to do it in another JIRA since this 
one has been in a release. :(

> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4855) Should check if node exists when replace nodelabels

2016-09-12 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484371#comment-15484371
 ] 

Sunil G commented on YARN-4855:
---

HI [~Tao Jie], I think we need not have to modify some apis in 
{{RmAdminProtocol}}, We could add a new api itself. If there is a clear 
usecase, i think there is no issue in adding an api.
But its better we discuss  more in that. Looping [~naganarasimha...@apache.org] 
and [~leftnoteasy].

> Should check if node exists when replace nodelabels
> ---
>
> Key: YARN-4855
> URL: https://issues.apache.org/jira/browse/YARN-4855
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Minor
> Attachments: YARN-4855.001.patch, YARN-4855.002.patch, 
> YARN-4855.003.patch, YARN-4855.004.patch, YARN-4855.005.patch, 
> YARN-4855.006.patch, YARN-4855.007.patch, YARN-4855.008.patch
>
>
> Today when we add nodelabels to nodes, it would succeed even if nodes are not 
> existing NodeManger in cluster without any message.
> It could be like this:
> When we use *yarn rmadmin -replaceLabelsOnNode --fail-on-unkown-nodes 
> "node1=label1"* , it would be denied if node is unknown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5148) [YARN-3368] Add page to new YARN UI to view server side configurations/logs/JVM-metrics

2016-09-12 Thread Kai Sasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484297#comment-15484297
 ] 

Kai Sasaki commented on YARN-5148:
--

{quote}
1. Could we categorize with labels like YARN, MR etc?
{quote}
Sure. Since each property represents a model of ember, we can categorize 
property name.

{quote}
Could you please share a screen shot. Similarly for JMX.
{quote}
Currently JXM and logs page redirects to RM web UI (http://localhost:8088/jmx 
and http://localhost:8088/logs). So it looks same to current UI.


> [YARN-3368] Add page to new YARN UI to view server side 
> configurations/logs/JVM-metrics
> ---
>
> Key: YARN-5148
> URL: https://issues.apache.org/jira/browse/YARN-5148
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Kai Sasaki
> Attachments: Screen Shot 2016-09-11 at 23.28.31.png, 
> YARN-5148-YARN-3368.01.patch, YARN-5148-YARN-3368.02.patch, yarn-conf.png, 
> yarn-tools.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-12 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484308#comment-15484308
 ] 

Jian He commented on YARN-5620:
---

Arun, thank you very much for the prompt response..
I think you forgot to remove the unused checkAndUpdatePending method, given 
that we anyway need one more patch.. few nits:
- reInitContext no need to be volatile
- remove unused imports in ContainerLocalizationRequestEvent

latest patch looks good to me.
[~vvasudev], want to take a look  ? 

> Core changes in NodeManager to support for upgrade and rollback of Containers
> -
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, 
> YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch, 
> YARN-5620.009.patch, YARN-5620.010.patch, YARN-5620.011.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with a new {{ContainerLaunchContext}} 
> as well as the ability to rollback the upgrade if the container is not able 
> to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484303#comment-15484303
 ] 

Hadoop QA commented on YARN-5620:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
50s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 12 new + 514 unchanged - 4 fixed = 526 total (was 518) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
54s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager
 generated 0 new + 240 unchanged - 2 fixed = 240 total (was 242) {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 5s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
15s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 29m 23s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.TestDefaultContainerExecutor |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12828048/YARN-5620.011.patch |
| JIRA Issue | YARN-5620 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux dc2101f05586 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / cc01ed70 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/13085/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/13085/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-YARN-Build/13085/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 

[jira] [Commented] (YARN-5630) NM fails to start after downgrade from 2.8 to 2.7

2016-09-12 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484295#comment-15484295
 ] 

Jason Lowe commented on YARN-5630:
--

I'm not a fan of the "prepare for rollback" approach if we can avoid it.  It 
adds another user-visible phase to the rollback procedure and places the burden 
on admins, requiring them to either know what keys are valid/appropriate to 
specify for the command or that they need to run a special script which embeds 
this knowledge.  Also simply removing the keys from the database is not going 
to be a proper downgrade procedure.  Those keys represent state that is 
important to preserve on a restart, and if we ignore it then we are dropping a 
user request for a container.  That's not going to be OK in the general case, 
as that may prevent a container from launching properly or having the proper 
properties when it is launched.  Depending upon the nature of the feature that 
added the new store keys, we may not be able to support the downgrade at all 
short of failing the container because we can't execute it as requested.

In the short term I think we should commit something similar to this patch to 
unblock the 2.8 release.  IMHO we should be OK if we support downgrades from 
2.8 to 2.7 if the user does not leverage the new features in 2.8 (i.e.: 
container increase/decrease, queuing, etc.).  Once those features are used then 
a downgrade may not work.  This mirrors what was done for the epoch number in 
container IDs between 2.5 and 2.6.  Downgrades worked as long as the new 
work-preserving RM restart wasn't performed after upgrading to 2.6.  In general 
if we are careful only to use new store keys when they are absolutely necessary 
then we can support rollbacks as long as users don't use the new features added 
in the new release.  

After unblocking 2.8 we can then work on the data-driven key ignoring in 
YARN-5547.  That will help cover another set of features where a simple delete 
of the keys is sufficient to perform the downgrade.  That would then leave the 
features where we can't just ignore keys, and we'll have to come up with some 
other approach or state to users that downgrades do not necessarily work once 
that new feature is being used.


> NM fails to start after downgrade from 2.8 to 2.7
> -
>
> Key: YARN-5630
> URL: https://issues.apache.org/jira/browse/YARN-5630
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: YARN-5630.001.patch, YARN-5630.002.patch
>
>
> A downgrade from 2.8 to 2.7 causes nodemanagers to fail to start due to an 
> unrecognized "version" container key on startup.  This breaks downgrades from 
> 2.8 to 2.7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2571) RM to support YARN registry

2016-09-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484271#comment-15484271
 ] 

Hadoop QA commented on YARN-2571:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 13 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
46s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 26s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
47s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 34s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 1m 59s 
{color} | {color:red} hadoop-yarn in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 59s {color} 
| {color:red} hadoop-yarn in the patch failed. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 49s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 52 
new + 695 unchanged - 10 fixed = 747 total (was 705) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 35s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 27s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 12s 
{color} | {color:red} hadoop-yarn-registry in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 47s 
{color} | {color:green} hadoop-yarn-registry in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 29s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m 13s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Issue | YARN-2571 |
| GITHUB PR | https://github.com/apache/hadoop/pull/66 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  xml  |
| uname | Linux 4e62ed6fc545 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / cc01ed70 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| mvninstall | 

[jira] [Commented] (YARN-4205) Add a service for monitoring application life time out

2016-09-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484242#comment-15484242
 ] 

Hadoop QA commented on YARN-4205:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
44s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
44s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 34s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 17s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 42s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 25 
new + 503 unchanged - 3 fixed = 528 total (was 506) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
36s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 7 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 9s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api generated 
1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 18s 
{color} | {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 36s 
{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 45m 34s 
{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 77m 17s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api |
|  |  Result of integer multiplication cast to long in 
org.apache.hadoop.yarn.api.records.ApplicationTimeouts.hashCode()  At 
ApplicationTimeouts.java:to long in 
org.apache.hadoop.yarn.api.records.ApplicationTimeouts.hashCode()  At 
ApplicationTimeouts.java:[line 77] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12828034/0003-YARN-4205.patch |
| JIRA Issue | 

[jira] [Updated] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-12 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-5620:
--
Attachment: YARN-5620.011.patch

Thanks [~jianhe].. Uploading patch (v011) with the changes.

I left the CLEANUP_CONTAINER_FOR_REINIT there, even though it does the same 
thing as CLEANUP_CONTAINER. It is sent by a different source, it can be used 
for debugging etc.

> Core changes in NodeManager to support for upgrade and rollback of Containers
> -
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, 
> YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch, 
> YARN-5620.009.patch, YARN-5620.010.patch, YARN-5620.011.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with a new {{ContainerLaunchContext}} 
> as well as the ability to rollback the upgrade if the container is not able 
> to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2571) RM to support YARN registry

2016-09-12 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2571:
-
Attachment: YARN-2571-013.patch

Patch 013

* fix compile problem against new yarn test
* do as much as possible to shut up checkstyle and javadoc

> RM to support YARN registry 
> 
>
> Key: YARN-2571
> URL: https://issues.apache.org/jira/browse/YARN-2571
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-2571-001.patch, YARN-2571-002.patch, 
> YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, 
> YARN-2571-008.patch, YARN-2571-009.patch, YARN-2571-010.patch, 
> YARN-2571-012.patch, YARN-2571-013.patch
>
>
> The RM needs to (optionally) integrate with the YARN registry:
> # startup: create the /services and /users paths with system ACLs (yarn, hdfs 
> principals)
> # app-launch: create the user directory /users/$username with the relevant 
> permissions (CRD) for them to create subnodes.
> # attempt, container, app completion: remove service records with the 
> matching persistence and ID



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2571) RM to support YARN registry

2016-09-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484032#comment-15484032
 ] 

Hadoop QA commented on YARN-2571:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 13 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 48s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
46s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 18s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
43s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
56s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 28s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 1m 39s 
{color} | {color:red} hadoop-yarn in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 39s {color} 
| {color:red} hadoop-yarn in the patch failed. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 46s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 52 
new + 696 unchanged - 10 fixed = 748 total (was 706) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 29s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
35s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 23s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 11s 
{color} | {color:red} hadoop-yarn-registry in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 49s 
{color} | {color:green} hadoop-yarn-registry in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 28s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m 17s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Issue | YARN-2571 |
| GITHUB PR | https://github.com/apache/hadoop/pull/66 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  xml  |
| uname | Linux b4ae3cd62fff 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / cc01ed70 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| mvninstall | 

[jira] [Updated] (YARN-4205) Add a service for monitoring application life time out

2016-09-12 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4205:

Attachment: 0003-YARN-4205.patch

Update the patch fixing review comments. This patch has following changes from 
previous.
# Added ApplicationTimeouts class that contains lifetime values. This class can 
be used in future to support for any other timeouts such as queu_timeout or 
statestore_timeout.

> Add a service for monitoring application life time out
> --
>
> Key: YARN-4205
> URL: https://issues.apache.org/jira/browse/YARN-4205
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: nijel
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4205.patch, 0002-YARN-4205.patch, 
> 0003-YARN-4205.patch, YARN-4205_01.patch, YARN-4205_02.patch, 
> YARN-4205_03.patch
>
>
> This JIRA intend to provide a lifetime monitor service. 
> The service will monitor the applications where the life time is configured. 
> If the application is running beyond the lifetime, it will be killed. 
> The lifetime will be considered from the submit time.
> The thread monitoring interval is configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-679) add an entry point that can start any Yarn service

2016-09-12 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483977#comment-15483977
 ] 

Steve Loughran commented on YARN-679:
-

Dan, I've broken things down all that happens iis parts of the patch get 
ignored. Well, ignored more than this. What we have here is self-contained, 
and, being derived with what we've been using in Slider, not far off what's 
been used elsewhere. So afraid not.

but: thank you very, very much for the reviews, I'm going take a break from s3 
coding to address them

-steve

> add an entry point that can start any Yarn service
> --
>
> Key: YARN-679
> URL: https://issues.apache.org/jira/browse/YARN-679
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-679-001.patch, YARN-679-002.patch, 
> YARN-679-002.patch, YARN-679-003.patch, YARN-679-004.patch, 
> YARN-679-005.patch, YARN-679-006.patch, YARN-679-007.patch, 
> YARN-679-008.patch, YARN-679-009.patch, YARN-679-010.patch, 
> YARN-679-011.patch, org.apache.hadoop.servic...mon 3.0.0-SNAPSHOT API).pdf
>
>  Time Spent: 72h
>  Remaining Estimate: 0h
>
> There's no need to write separate .main classes for every Yarn service, given 
> that the startup mechanism should be identical: create, init, start, wait for 
> stopped -with an interrupt handler to trigger a clean shutdown on a control-c 
> interrupt.
> Provide one that takes any classname, and a list of config files/options



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2571) RM to support YARN registry

2016-09-12 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2571:
-
Attachment: YARN-2571-012.patch

Patch 012: rebase to trunk

This patch is coming up to second birthday. Can someone pleae look at at it. 
Thanks

> RM to support YARN registry 
> 
>
> Key: YARN-2571
> URL: https://issues.apache.org/jira/browse/YARN-2571
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-2571-001.patch, YARN-2571-002.patch, 
> YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, 
> YARN-2571-008.patch, YARN-2571-009.patch, YARN-2571-010.patch, 
> YARN-2571-012.patch
>
>
> The RM needs to (optionally) integrate with the YARN registry:
> # startup: create the /services and /users paths with system ACLs (yarn, hdfs 
> principals)
> # app-launch: create the user directory /users/$username with the relevant 
> permissions (CRD) for them to create subnodes.
> # attempt, container, app completion: remove service records with the 
> matching persistence and ID



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2571) RM to support YARN registry

2016-09-12 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2571:
-
Labels:   (was: BB2015-05-TBR)

> RM to support YARN registry 
> 
>
> Key: YARN-2571
> URL: https://issues.apache.org/jira/browse/YARN-2571
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-2571-001.patch, YARN-2571-002.patch, 
> YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, 
> YARN-2571-008.patch, YARN-2571-009.patch, YARN-2571-010.patch
>
>
> The RM needs to (optionally) integrate with the YARN registry:
> # startup: create the /services and /users paths with system ACLs (yarn, hdfs 
> principals)
> # app-launch: create the user directory /users/$username with the relevant 
> permissions (CRD) for them to create subnodes.
> # attempt, container, app completion: remove service records with the 
> matching persistence and ID



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5587) Add support for resource profiles

2016-09-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483623#comment-15483623
 ] 

Hadoop QA commented on YARN-5587:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 10 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 4m 5s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
18s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 10s 
{color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
42s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 1s 
{color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
28s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
37s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s 
{color} | {color:green} YARN-3926 passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 12s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 12s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 7m 12s {color} 
| {color:red} root generated 3 new + 713 unchanged - 0 fixed = 716 total (was 
713) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 39s 
{color} | {color:red} root: The patch generated 65 new + 1029 unchanged - 2 
fixed = 1094 total (was 1031) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 58s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 7s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 8s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 18s 
{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api generated 
4 new + 156 unchanged - 0 fixed = 160 total (was 156) {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 25s {color} 
| {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 20s {color} 
| {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 4s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 39m 55s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 5s {color} | 
{color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 44m 37s {color} 
| {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 26s 
{color} | {color:red} The patch generated 5 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | 

[jira] [Commented] (YARN-3359) Recover collector list in RM failed over

2016-09-12 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483620#comment-15483620
 ] 

Junping Du commented on YARN-3359:
--

Hi [~gtCarrera9], please feel free to take this over as I don't have short term 
plan on this. However, as you may know, I am currently working on AM address 
discovery (for AM restart case) that may consolidate with our previous 
collector address discovery work and will publish a design soon (on YARN-4758). 
I would suggest to wait on that effort clear to continue our effort if this is 
not our top item so far. What do you think?

> Recover collector list in RM failed over
> 
>
> Key: YARN-3359
> URL: https://issues.apache.org/jira/browse/YARN-3359
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
>  Labels: YARN-5355
>
> Per discussion in YARN-3039, split the recover work from RMStateStore in a 
> separated JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-12 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483565#comment-15483565
 ] 

Jian He edited comment on YARN-5620 at 9/12/16 9:18 AM:


bq. It is also possible that the an admin logs into the NM and does a 'kill -9' 
which will also cause the ContainerLaunch to send CONTAINER_KILLED_ON_REQUEST 
but it wont be in KILLING state.. right ?
I guess in this case,  it’s also fine to do the upgrade… because the upgrade 
API does accept it, it’s hard to distinguish which one should go first.. It's 
also likely the reverse can also happen because it's transient, if the 
setKillForReInitialization is called first, then the container process is 
killed, It will be considered as re-init, even though it is killed by external 
signal.  so keep it consistent ?
bq. Actually if you look at the prepareContainerUpgrade() function,
ah, yes,  mislooked . thank you !
bq. The problem with getAllResourcesByVisibility, is it gets all resources. I 
just need the pending resources
In this case, the pendingResources in the same as the 
getAllResourcesByVisibility, right? basically, I meant like below.. and the 
newly added methods could be not needed.
{code}
Map
    pendingResources = ((ContainerReInitEvent) event).getResourceSet()
    .getAllResourcesByVisibility();
if (!pendingResources.isEmpty()) {
  container.dispatcher.getEventHandler().handle(
      new ContainerLocalizationRequestEvent(container, pendingResources));
} else {
{code}
- Forgot to say, similarly, is the change in 
ResourceLocalizedWhileRunningTransition required. as the symlinks are also 
already distinct. 


was (Author: jianhe):
bq. It is also possible that the an admin logs into the NM and does a 'kill -9' 
which will also cause the ContainerLaunch to send CONTAINER_KILLED_ON_REQUEST 
but it wont be in KILLING state.. right ?
I guess in this case,  it’s also fine to do the upgrade… because the upgrade 
API does accept it, it’s hard to distinguish which one should go first.. It's 
also likely the reverse can also happen because it's transient, if the 
setKillForReInitialization is called first, then the container process is 
killed, It will be considered as re-init, even though it is killed by external 
signal.  so keep it consistent ?
bq. Actually if you look at the prepareContainerUpgrade() function,
ah, yes,  mislooked . thank you !
bq. The problem with getAllResourcesByVisibility, is it gets all resources. I 
just need the pending resources
In this case, the pendingResources in the same as the 
getAllResourcesByVisibility, right? basically, I meant like below.. and the 
newly added methods could be not needed.
{code}
Map
    pendingResources = ((ContainerReInitEvent) event).getResourceSet()
    .getAllResourcesByVisibility();
if (!pendingResources.isEmpty()) {
  container.dispatcher.getEventHandler().handle(
      new ContainerLocalizationRequestEvent(container, pendingResources));
} else {
{code}

> Core changes in NodeManager to support for upgrade and rollback of Containers
> -
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, 
> YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch, 
> YARN-5620.009.patch, YARN-5620.010.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with a new {{ContainerLaunchContext}} 
> as well as the ability to rollback the upgrade if the container is not able 
> to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3115) [Collector wireup] Work-preserving restarting of per-node timeline collector

2016-09-12 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483580#comment-15483580
 ] 

Junping Du commented on YARN-3115:
--

Hi [~varun_saxena], please feel free to take this up if you got plan to do it 
in the short term. I can help to review it.

> [Collector wireup] Work-preserving restarting of per-node timeline collector
> 
>
> Key: YARN-3115
> URL: https://issues.apache.org/jira/browse/YARN-3115
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Junping Du
>  Labels: YARN-5355
>
> YARN-3030 makes the per-node aggregator work as the aux service of a NM. It 
> contains the states of the per-app aggregators corresponding to the running 
> AM containers on this NM. While NM is restarted in work-preserving mode, this 
> information of per-node aggregator needs to be carried on over restarting too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5296) NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl

2016-09-12 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483569#comment-15483569
 ] 

Junping Du edited comment on YARN-5296 at 9/12/16 9:04 AM:
---

[~leftnoteasy], this is actually no need for branch-2.7 - as I discussed this 
with Jason on HADOOP-13362, this is just a misunderstand caused by different 
container remove places between branch-2.7 and branch-2. Just forget about my 
comment above. 
Also, I noticed you reopen YARN-5190 for branch-2.7 which seems duplicated with 
HADOOP-13362. Can you double check and close it? Thx!


was (Author: djp):
[~leftnoteasy], this is actually no need for branch-2.7 - as I discussed this 
with Jason on HADOOP-13362, this is just a misunderstand caused by different 
container remove places between branch-2.7 and branch-2. Just forget about the 
comment. 
Also, I noticed you reopen YARN-5190 for branch-2.7 which seems duplicated with 
HADOOP-13362. Can you double check and close it? Thx!

> NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl
> ---
>
> Key: YARN-5296
> URL: https://issues.apache.org/jira/browse/YARN-5296
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karam Singh
>Assignee: Junping Du
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: YARN-5296-v2.1.patch, YARN-5296-v2.patch, 
> YARN-5296.patch, after v2 fix.png, before v2 fix.png
>
>
> Ran tests in following manner,
> 1. Run GridMix of 768 sequestionally around 17 times to execute about 12.9K 
> apps.
> 2. After 4-5hrs take Check NM Heap using Memory Analyser. It report around 
> 96% Heap is being used my ContainerMetrics
> 3. Run 7 more GridMix run for have around 18.2apps ran in total. Again check 
> NM heap using Memory Analyser again 96% heap is being used by 
> ContainerMetrics. 
> 4. Start one more grimdmix run, while run going on , NMs started going down 
> with OOM, around running 18.7K+, On analysing NM heap using Memory analyser, 
> OOM was caused by ContainerMetrics



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5296) NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl

2016-09-12 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483569#comment-15483569
 ] 

Junping Du commented on YARN-5296:
--

[~leftnoteasy], this is actually no need for branch-2.7 - as I discussed this 
with Jason on HADOOP-13362, this is just a misunderstand caused by different 
container remove places between branch-2.7 and branch-2. Just forget about the 
comment. 
Also, I noticed you reopen YARN-5190 for branch-2.7 which seems duplicated with 
HADOOP-13362. Can you double check and close it? Thx!

> NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl
> ---
>
> Key: YARN-5296
> URL: https://issues.apache.org/jira/browse/YARN-5296
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karam Singh
>Assignee: Junping Du
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: YARN-5296-v2.1.patch, YARN-5296-v2.patch, 
> YARN-5296.patch, after v2 fix.png, before v2 fix.png
>
>
> Ran tests in following manner,
> 1. Run GridMix of 768 sequestionally around 17 times to execute about 12.9K 
> apps.
> 2. After 4-5hrs take Check NM Heap using Memory Analyser. It report around 
> 96% Heap is being used my ContainerMetrics
> 3. Run 7 more GridMix run for have around 18.2apps ran in total. Again check 
> NM heap using Memory Analyser again 96% heap is being used by 
> ContainerMetrics. 
> 4. Start one more grimdmix run, while run going on , NMs started going down 
> with OOM, around running 18.7K+, On analysing NM heap using Memory analyser, 
> OOM was caused by ContainerMetrics



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-12 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483565#comment-15483565
 ] 

Jian He commented on YARN-5620:
---

bq. It is also possible that the an admin logs into the NM and does a 'kill -9' 
which will also cause the ContainerLaunch to send CONTAINER_KILLED_ON_REQUEST 
but it wont be in KILLING state.. right ?
I guess in this case,  it’s also fine to do the upgrade… because the upgrade 
API does accept it, it’s hard to distinguish which one should go first.. It's 
also likely the reverse can also happen because it's transient, if the 
setKillForReInitialization is called first, then the container process is 
killed, It will be considered as re-init, even though it is killed by external 
signal.  so keep it consistent ?
bq. Actually if you look at the prepareContainerUpgrade() function,
ah, yes,  mislooked . thank you !
bq. The problem with getAllResourcesByVisibility, is it gets all resources. I 
just need the pending resources
In this case, the pendingResources in the same as the 
getAllResourcesByVisibility, right? basically, I meant like below.. and the 
newly added methods could be not needed.
{code}
Map
    pendingResources = ((ContainerReInitEvent) event).getResourceSet()
    .getAllResourcesByVisibility();
if (!pendingResources.isEmpty()) {
  container.dispatcher.getEventHandler().handle(
      new ContainerLocalizationRequestEvent(container, pendingResources));
} else {
{code}

> Core changes in NodeManager to support for upgrade and rollback of Containers
> -
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, 
> YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch, 
> YARN-5620.009.patch, YARN-5620.010.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with a new {{ContainerLaunchContext}} 
> as well as the ability to rollback the upgrade if the container is not able 
> to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5190) Registering/unregistering container metrics triggered by ContainerEvent and ContainersMonitorEvent are conflict which cause uncaught exception in ContainerMonitorImpl

2016-09-12 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483553#comment-15483553
 ] 

Junping Du commented on YARN-5190:
--

Hi [~leftnoteasy], HADOOP-13362 is proposed to fix this issue for branch-2.7 
and already get checked in. Anything more to fix here?

> Registering/unregistering container metrics triggered by ContainerEvent and 
> ContainersMonitorEvent are conflict which cause uncaught exception in 
> ContainerMonitorImpl
> --
>
> Key: YARN-5190
> URL: https://issues.apache.org/jira/browse/YARN-5190
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-5190-branch-2.7.001.patch, YARN-5190-v2.patch, 
> YARN-5190.patch
>
>
> The exception stack is as following:
> {noformat}
> 310735 2016-05-22 01:50:04,554 [Container Monitor] ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[Container 
> Monitor,5,main] threw an Exception.
> 310736 org.apache.hadoop.metrics2.MetricsException: Metrics source 
> ContainerResource_container_1463840817638_14484_01_10 already exists!
> 310737 at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135)
> 310738 at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112)
> 310739 at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
> 310740 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.forContainer(ContainerMetrics.java:212)
> 310741 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.forContainer(ContainerMetrics.java:198)
> 310742 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:385)
> {noformat}
> After YARN-4906, we have multiple places to get ContainerMetrics for a 
> particular container that could cause race condition in registering the same 
> container metrics to DefaultMetricsSystem by different threads. Lacking of 
> proper handling of MetricsException which could get thrown, the exception 
> will could bring down daemon of ContainerMonitorImpl or even whole NM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483521#comment-15483521
 ] 

Hadoop QA commented on YARN-5620:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 20s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 22 new + 575 unchanged - 4 fixed = 597 total (was 579) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager
 generated 0 new + 240 unchanged - 2 fixed = 240 total (was 242) {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 7s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
15s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m 30s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.TestDefaultContainerExecutor |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12827991/YARN-5620.010.patch |
| JIRA Issue | YARN-5620 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux d596c40c7af7 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 
21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / cc01ed70 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/13081/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/13081/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-YARN-Build/13081/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 

[jira] [Comment Edited] (YARN-5610) Initial code for native services REST API

2016-09-12 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483469#comment-15483469
 ] 

Jian He edited comment on YARN-5610 at 9/12/16 8:26 AM:


Thanks, Gour.
bq.  So according to an app-owner, it is STARTED but not RUNNING yet.
I would prefer rename as such pair STARTED -> READAY, or RUNNING -> READY 
bq.  The swagger definition defines this.
Do you mean swagger has such a date type in string format ?  which one is this? 
I couldn't find it in swagger documentation. 
bq. why the changes needed in hadoop-project/pom.xml
I meant what is this change used for ?
{code}
  
http://git-wip-us.apache.org/repos/asf/hadoop.git

  scm:git:http://git-wip-us.apache.org/repos/asf/hadoop.git


  scm:git:http://git-wip-us.apache.org/repos/asf/hadoop.git

  
{code}
bq. appOptions and uniqueGlobalPropertyCache are required as appOptions is the 
way application configuration properties are injected into Slider client.
I see.  Then the   {{if (uniqueGlobalPropertyCache == null)}} condition is not 
needed, because uniqueGlobalPropertyCache is initialized  as not null.
{code}
private void addOptionsIfNotPresent(List options,
    Set uniqueGlobalPropertyCache, String key, String value) {
  if (uniqueGlobalPropertyCache == null) {
    options.addAll(Arrays.asList(key, value));
  }
{code}


bq. In case, of complex and nested applications some components will be by 
themselves full blown and independent applications itself. The APPLICATION type 
artifact refers to such external application definitions
Then, what are the other parameters in Component used for in this case ? like 
number_of_containers, launch_command, resource etc.


was (Author: jianhe):
Thanks, Gour.
bq.  So according to an app-owner, it is STARTED but not RUNNING yet.
I would prefer rename as such pair STARTED -> READAY, or RUNNING -> READY 
bq.  The swagger definition defines this.
Do you mean swagger has such a date type in string format ?  which one is this? 
I couldn't find it in swagger documentation. 
bq. why the changes needed in hadoop-project/pom.xml
I meant what is this change used for ?
{code}
  
http://git-wip-us.apache.org/repos/asf/hadoop.git

  scm:git:http://git-wip-us.apache.org/repos/asf/hadoop.git


  scm:git:http://git-wip-us.apache.org/repos/asf/hadoop.git

  
{code}
bq. appOptions and uniqueGlobalPropertyCache are required as appOptions is the 
way application configuration properties are injected into Slider client.
I see.  Then the   {{if (uniqueGlobalPropertyCache == null) }} condition is not 
needed, because uniqueGlobalPropertyCache is initialized  as not null.
{code}
private void addOptionsIfNotPresent(List options,
    Set uniqueGlobalPropertyCache, String key, String value) {
  if (uniqueGlobalPropertyCache == null) {
    options.addAll(Arrays.asList(key, value));
  }
{code}


bq. In case, of complex and nested applications some components will be by 
themselves full blown and independent applications itself. The APPLICATION type 
artifact refers to such external application definitions
Then, what are the other parameters in Component used for in this case ? like 
number_of_containers, launch_command, resource etc.

> Initial code for native services REST API
> -
>
> Key: YARN-5610
> URL: https://issues.apache.org/jira/browse/YARN-5610
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Gour Saha
> Attachments: YARN-4793-yarn-native-services.001.patch, 
> YARN-5610-yarn-native-services.002.patch
>
>
> This task will be used to submit and review patches for the initial code drop 
> for the native services REST API 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5610) Initial code for native services REST API

2016-09-12 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483469#comment-15483469
 ] 

Jian He commented on YARN-5610:
---

Thanks, Gour.
bq.  So according to an app-owner, it is STARTED but not RUNNING yet.
I would prefer rename as such pair STARTED -> READAY, or RUNNING -> READY 
bq.  The swagger definition defines this.
Do you mean swagger has such a date type in string format ?  which one is this? 
I couldn't find it in swagger documentation. 
bq. why the changes needed in hadoop-project/pom.xml
I meant what is this change used for ?
{code}
  
http://git-wip-us.apache.org/repos/asf/hadoop.git

  scm:git:http://git-wip-us.apache.org/repos/asf/hadoop.git


  scm:git:http://git-wip-us.apache.org/repos/asf/hadoop.git

  
{code}
bq. appOptions and uniqueGlobalPropertyCache are required as appOptions is the 
way application configuration properties are injected into Slider client.
I see.  Then the   {{if (uniqueGlobalPropertyCache == null) }} condition is not 
needed, because uniqueGlobalPropertyCache is initialized  as not null.
{code}
private void addOptionsIfNotPresent(List options,
    Set uniqueGlobalPropertyCache, String key, String value) {
  if (uniqueGlobalPropertyCache == null) {
    options.addAll(Arrays.asList(key, value));
  }
{code}


bq. In case, of complex and nested applications some components will be by 
themselves full blown and independent applications itself. The APPLICATION type 
artifact refers to such external application definitions
Then, what are the other parameters in Component used for in this case ? like 
number_of_containers, launch_command, resource etc.

> Initial code for native services REST API
> -
>
> Key: YARN-5610
> URL: https://issues.apache.org/jira/browse/YARN-5610
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Gour Saha
> Attachments: YARN-4793-yarn-native-services.001.patch, 
> YARN-5610-yarn-native-services.002.patch
>
>
> This task will be used to submit and review patches for the initial code drop 
> for the native services REST API 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-12 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-5620:
--
Attachment: YARN-5620.010.patch

Fixing failed tests (The _TestDefaultContainerExecutor_ error seems to be 
unrelated) and some more checkstyles.

> Core changes in NodeManager to support for upgrade and rollback of Containers
> -
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, 
> YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch, 
> YARN-5620.009.patch, YARN-5620.010.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with a new {{ContainerLaunchContext}} 
> as well as the ability to rollback the upgrade if the container is not able 
> to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483437#comment-15483437
 ] 

Hadoop QA commented on YARN-5620:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
31s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
50s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 23s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 25 new + 604 unchanged - 4 fixed = 629 total (was 608) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager
 generated 0 new + 240 unchanged - 2 fixed = 240 total (was 242) {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 32s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 30m 35s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestContainerManager |
|   | 
hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRegression |
|   | hadoop.yarn.server.nodemanager.TestDefaultContainerExecutor |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12827978/YARN-5620.009.patch |
| JIRA Issue | YARN-5620 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 761d12e98010 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / cc01ed70 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/13080/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/13080/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| unit test logs |  

[jira] [Commented] (YARN-5631) Missing refreshClusterMaxPriority usage in rmadmin help message

2016-09-12 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483408#comment-15483408
 ] 

Rohith Sharma K S commented on YARN-5631:
-

+1 LGTM, will commit it shortly

> Missing refreshClusterMaxPriority usage in rmadmin help message
> ---
>
> Key: YARN-5631
> URL: https://issues.apache.org/jira/browse/YARN-5631
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha2
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>Priority: Minor
> Attachments: YARN-5631.01.patch, YARN-5631.02.patch
>
>
> {{rmadmin -help}} does not show {{-refreshClusterMaxPriority}} option in 
> usage line.
> {code}
> $ bin/yarn rmadmin -help
> rmadmin is the command to execute YARN administrative commands.
> The full syntax is:
> yarn rmadmin [-refreshQueues] [-refreshNodes [-g|graceful [timeout in 
> seconds] -client|server]] [-refreshNodesResources] 
> [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] 
> [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] 
> [-addToClusterNodeLabels 
> <"label1(exclusive=true),label2(exclusive=false),label3">] 
> [-removeFromClusterNodeLabels ] [-replaceLabelsOnNode 
> <"node1[:port]=label1,label2 node2[:port]=label1">] 
> [-directlyAccessNodeLabelStore] [-updateNodeResource [NodeID] [MemSize] 
> [vCores] ([OvercommitTimeout]) [-help [cmd]]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3692) Allow REST API to set a user generated message when killing an application

2016-09-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483382#comment-15483382
 ] 

Hadoop QA commented on YARN-3692:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
46s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 54s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 29s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 36s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 52s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 27s 
{color} | {color:red} root: The patch generated 3 new + 190 unchanged - 1 fixed 
= 193 total (was 191) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
24s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 14s 
{color} | {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 18s 
{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 37m 31s 
{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 16m 26s 
{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 118m 12s 
{color} | {color:green} hadoop-mapreduce-client-jobclient in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
30s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 221m 33s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12827958/0004-YARN-3692.patch |
| JIRA Issue | YARN-3692 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  cc  |
| uname | Linux 41200cc542b3 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Comment Edited] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-12 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483362#comment-15483362
 ] 

Arun Suresh edited comment on YARN-5620 at 9/12/16 7:35 AM:


Updating patch.
* Addressing [~jianhe]'s latest comments
* some javadoc, checkstyle and javac fixes

bq. IIUC, in this case, the ContainerImpl will receive the KILL event first and 
move to the KILLING state, and the CONTAINER_KILLED_ON_REQUEST will be sent to 
the container at KILLING state..
It goes to KILLING stage only if the AM explicitly sends a kill signal or the 
RM asks NM to kill. It is also possible that the an admin logs into the NM and 
does a 'kill -9' which will also cause the ContainerLaunch to send 
CONTAINER_KILLED_ON_REQUEST but it wont be in KILLING state.. right ?

bq. ..In testContainerUpgradeSuccess, could you make newStartFile a new upgrade 
resource, and verify the output is written into it, this verifies the part 
about the localization part as well.
Actually if you look at the _prepareContainerUpgrade()_ function, we create a 
new script file *scriptFile_new* which is passed into the 
_prepareContainerLaunchContext()_ function which associates the new file to a 
new *dest_file_new* location.. this should verify that the upgrade needed a new 
localized resource. The output of the script is also written to a new 
*start_file_n.txt* which we read and verify to check if the new process has 
actually started.

Also by the way:

bq. We can use the ResourceSet#getAllResourcesByVisibility method instead, and 
so the getLocalPendingRequests method and the new constructor in 
ContainerLocalizationRequestEvent is not needed
The problem with getAllResourcesByVisibility, is it gets all resources. I just 
need the pending resources... So if you are ok with it, Id like to keep it as 
is..




was (Author: asuresh):
Updating patch.
* Addressing [~jianhe]'s latest comments
* some javadoc, checkstyle and javac fixes

bq. IIUC, in this case, the ContainerImpl will receive the KILL event first and 
move to the KILLING state, and the CONTAINER_KILLED_ON_REQUEST will be sent to 
the container at KILLING state..
It goes to KILLING stage only if the AM explicitly sends a kill signal or the 
RM asks NM to kill. It is also possible that the an admin logs into the NM and 
does a 'kill -9' which will also cause the ContainerLaunch to send 
CONTAINER_KILLED_ON_REQUEST but it wont be in KILLING state.. right ?

bq. ..In testContainerUpgradeSuccess, could you make newStartFile a new upgrade 
resource, and verify the output is written into it, this verifies the part 
about the localization part as well.
Actually if you look at the _prepareContainerUpgrade()_ function, we create a 
new script file *scriptFile_new* while passed into the 
_prepareContainerLaunchContext()_ function which associates the new file to a 
new *dest_file_new* location.. this should verify that the upgrade needed a new 
localized resource. The output of the script is also written to a new 
*start_file_n.txt* which we read and verify to check if the new process has 
actually started.

Also by the way:

bq. We can use the ResourceSet#getAllResourcesByVisibility method instead, and 
so the getLocalPendingRequests method and the new constructor in 
ContainerLocalizationRequestEvent is not needed
The problem with getAllResourcesByVisibility, is it gets all resources. I just 
need the pending resources... So if you are ok with it, Id like to keep it as 
is..



> Core changes in NodeManager to support for upgrade and rollback of Containers
> -
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, 
> YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch, 
> YARN-5620.009.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with a new {{ContainerLaunchContext}} 
> as well as the ability to rollback the upgrade if the container is not able 
> to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-12 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-5620:
--
Attachment: YARN-5620.009.patch

Updating patch.
* Addressing [~jianhe]'s latest comments
* some javadoc, checkstyle and javac fixes

bq. IIUC, in this case, the ContainerImpl will receive the KILL event first and 
move to the KILLING state, and the CONTAINER_KILLED_ON_REQUEST will be sent to 
the container at KILLING state..
It goes to KILLING stage only if the AM explicitly sends a kill signal or the 
RM asks NM to kill. It is also possible that the an admin logs into the NM and 
does a 'kill -9' which will also cause the ContainerLaunch to send 
CONTAINER_KILLED_ON_REQUEST but it wont be in KILLING state.. right ?

bq. ..In testContainerUpgradeSuccess, could you make newStartFile a new upgrade 
resource, and verify the output is written into it, this verifies the part 
about the localization part as well.
Actually if you look at the _prepareContainerUpgrade()_ function, we create a 
new script file *scriptFile_new* while passed into the 
_prepareContainerLaunchContext()_ function which associates the new file to a 
new *dest_file_new* location.. this should verify that the upgrade needed a new 
localized resource. The output of the script is also written to a new 
*start_file_n.txt* which we read and verify to check if the new process has 
actually started.

Also by the way:

bq. We can use the ResourceSet#getAllResourcesByVisibility method instead, and 
so the getLocalPendingRequests method and the new constructor in 
ContainerLocalizationRequestEvent is not needed
The problem with getAllResourcesByVisibility, is it gets all resources. I just 
need the pending resources... So if you are ok with it, Id like to keep it as 
is..



> Core changes in NodeManager to support for upgrade and rollback of Containers
> -
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, 
> YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch, 
> YARN-5620.009.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with a new {{ContainerLaunchContext}} 
> as well as the ability to rollback the upgrade if the container is not able 
> to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4855) Should check if node exists when replace nodelabels

2016-09-12 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483364#comment-15483364
 ] 

Tao Jie commented on YARN-4855:
---

Thank you for comments, [~sunilg].
Doing nodes check on client side is to avoid modifying RmAdminProtocol, and we 
don't expect we do this operation frequently.
I looked {{NodeId#compareTo}} more closely, it seems that it will compare 
*both* host and port in {{NodeId#compareTo}} while in {{isNodeSame}} it will 
check host and port if port is set, otherwise only host will be checked.


> Should check if node exists when replace nodelabels
> ---
>
> Key: YARN-4855
> URL: https://issues.apache.org/jira/browse/YARN-4855
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Minor
> Attachments: YARN-4855.001.patch, YARN-4855.002.patch, 
> YARN-4855.003.patch, YARN-4855.004.patch, YARN-4855.005.patch, 
> YARN-4855.006.patch, YARN-4855.007.patch, YARN-4855.008.patch
>
>
> Today when we add nodelabels to nodes, it would succeed even if nodes are not 
> existing NodeManger in cluster without any message.
> It could be like this:
> When we use *yarn rmadmin -replaceLabelsOnNode --fail-on-unkown-nodes 
> "node1=label1"* , it would be denied if node is unknown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-12 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483313#comment-15483313
 ] 

Jian He commented on YARN-5620:
---

One more thing about the test.. In testContainerUpgradeSuccess, could you make 
newStartFile a new upgrade resource, and verify the output is written into it, 
this verifies the part about the localization part as well.

> Core changes in NodeManager to support for upgrade and rollback of Containers
> -
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, 
> YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with a new {{ContainerLaunchContext}} 
> as well as the ability to rollback the upgrade if the container is not able 
> to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-12 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483274#comment-15483274
 ] 

Jian He commented on YARN-5620:
---

bq. the container should be killable explicitly via an external signal.
IIUC,  in this case, the ContainerImpl will receive the KILL event first and 
move to the KILLING state, and the CONTAINER_KILLED_ON_REQUEST will be sent to 
the container at KILLING state.  

> Core changes in NodeManager to support for upgrade and rollback of Containers
> -
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, 
> YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with a new {{ContainerLaunchContext}} 
> as well as the ability to rollback the upgrade if the container is not able 
> to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5631) Missing refreshClusterMaxPriority usage in rmadmin help message

2016-09-12 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483276#comment-15483276
 ] 

Sunil G commented on YARN-5631:
---

Looks good for me.

> Missing refreshClusterMaxPriority usage in rmadmin help message
> ---
>
> Key: YARN-5631
> URL: https://issues.apache.org/jira/browse/YARN-5631
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha2
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>Priority: Minor
> Attachments: YARN-5631.01.patch, YARN-5631.02.patch
>
>
> {{rmadmin -help}} does not show {{-refreshClusterMaxPriority}} option in 
> usage line.
> {code}
> $ bin/yarn rmadmin -help
> rmadmin is the command to execute YARN administrative commands.
> The full syntax is:
> yarn rmadmin [-refreshQueues] [-refreshNodes [-g|graceful [timeout in 
> seconds] -client|server]] [-refreshNodesResources] 
> [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] 
> [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] 
> [-addToClusterNodeLabels 
> <"label1(exclusive=true),label2(exclusive=false),label3">] 
> [-removeFromClusterNodeLabels ] [-replaceLabelsOnNode 
> <"node1[:port]=label1,label2 node2[:port]=label1">] 
> [-directlyAccessNodeLabelStore] [-updateNodeResource [NodeID] [MemSize] 
> [vCores] ([OvercommitTimeout]) [-help [cmd]]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4855) Should check if node exists when replace nodelabels

2016-09-12 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483269#comment-15483269
 ] 

Sunil G commented on YARN-4855:
---

HI [~Tao Jie] and [~Naganarasimha Garla]

Thanks for the work on this item. I have doubts on the patch.

- Thinking out node, {{replaceLabelsOnNodes}} seems a little complex for me. 
Could we add a server side API to check whether set of nodes are registered or 
not? {{ConcurrentMap getRMNodes()}} is available at server 
side. So it will be faster and better approach.
- If we are sticking with client side impl, then we could try make use of 
{{NodeId#compareTo}} instead of {{isNodeSame}}


> Should check if node exists when replace nodelabels
> ---
>
> Key: YARN-4855
> URL: https://issues.apache.org/jira/browse/YARN-4855
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Minor
> Attachments: YARN-4855.001.patch, YARN-4855.002.patch, 
> YARN-4855.003.patch, YARN-4855.004.patch, YARN-4855.005.patch, 
> YARN-4855.006.patch, YARN-4855.007.patch, YARN-4855.008.patch
>
>
> Today when we add nodelabels to nodes, it would succeed even if nodes are not 
> existing NodeManger in cluster without any message.
> It could be like this:
> When we use *yarn rmadmin -replaceLabelsOnNode --fail-on-unkown-nodes 
> "node1=label1"* , it would be denied if node is unknown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5587) Add support for resource profiles

2016-09-12 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-5587:

Attachment: YARN-5587-YARN-3926.001.patch

> Add support for resource profiles
> -
>
> Key: YARN-5587
> URL: https://issues.apache.org/jira/browse/YARN-5587
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-5587-YARN-3926.001.patch
>
>
> Add support for resource profiles on the RM side to allow users to use 
> shorthands to specify resource requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5561) [Atsv2] : Support for ability to retrieve apps/app-attempt/containers and entities via REST

2016-09-12 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483258#comment-15483258
 ] 

Varun Saxena commented on YARN-5561:


bq.  are looking for a complete applications page where all applications which 
were running/completed had to be listed. For this purpose, I think we need the 
api as suggested by Rohith. Being said this, We will also be showing hierarchy 
from flows too. 
So you plan to have 2 app pages. One from a specific flow run and other a list 
of all the apps in a cluster. Right ?
[~rohithsharma], how do you plan to support fetching all apps within a cluster ?
Probably you can adopt the approach I had suggested. Because otherwise it would 
lead to full table scan.

bq. New API's required. Thoughts?
We should have them for the sake of completeness.

> [Atsv2] : Support for ability to retrieve apps/app-attempt/containers and 
> entities via REST
> ---
>
> Key: YARN-5561
> URL: https://issues.apache.org/jira/browse/YARN-5561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5561.patch, YARN-5561.v0.patch
>
>
> ATSv2 model lacks retrieval of {{list-of-all-apps}}, 
> {{list-of-all-app-attempts}} and {{list-of-all-containers-per-attempt}} via 
> REST API's. And also it is required to know about all the entities in an 
> applications.
> It is pretty much highly required these URLs for Web  UI.
> New REST URL would be 
> # GET {{/ws/v2/timeline/apps}}
> # GET {{/ws/v2/timeline/apps/\{app-id\}/appattempts}}.
> # GET 
> {{/ws/v2/timeline/apps/\{app-id\}/appattempts/\{attempt-id\}/containers}}
> # GET {{/ws/v2/timeline/apps/\{app id\}/entities}} should display list of 
> entities that can be queried.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero

2016-09-12 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483254#comment-15483254
 ] 

Bibin A Chundatt commented on YARN-5545:


[~Naganarasimha Garla] and [~sunilg]

{quote}
 Also consider the cases when the accessibility is * and new partitions are 
added without refreshing, this configuration will be wrong as its static.
{quote}
Thank you for pointing out will check the same. But [~Naganarasimha Garla]  
when ever we reconfigure capacity scheduler xml this limits also will get 
refreshed.

{quote}
Would it be better to set the default value of 
yarn.scheduler.capacity.maximum-applications.accessible-node-labels. to 
that of yarn.scheduler.capacity.maximum-applications
{quote}
Will use  {{yarn.scheduler.capacity.maximum-applications}} itself.

{quote}
IIUC you seem to adopt the approach little different than what you mention in 
your comment, though we are having per partition level max app limit, we just 
sum up max limits of all partitions under a queue and check against 
ApplicationLimit.getAllMaxApplication()
{quote}
This was added since application per partition we can't consider for app limit 
IIUC we have to check max apps to queue from all partitions.

Documentation will add for the same.





> App submit failure on queue with label when default queue partition capacity 
> is zero
> 
>
> Key: YARN-5545
> URL: https://issues.apache.org/jira/browse/YARN-5545
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: YARN-5545.0001.patch, YARN-5545.0002.patch, 
> YARN-5545.0003.patch, capacity-scheduler.xml
>
>
> Configure capacity scheduler 
> yarn.scheduler.capacity.root.default.capacity=0
> yarn.scheduler.capacity.root.queue1.accessible-node-labels.labelx.capacity=50
> yarn.scheduler.capacity.root.default.accessible-node-labels.labelx.capacity=50
> Submit application as below
> ./yarn jar 
> ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-alpha2-SNAPSHOT-tests.jar
>  sleep -Dmapreduce.job.node-label-expression=labelx 
> -Dmapreduce.job.queuename=default -m 1 -r 1 -mt 1000 -rt 1
> {noformat}
> 2016-08-21 18:21:31,375 INFO mapreduce.JobSubmitter: Cleaning up the staging 
> area /tmp/hadoop-yarn/staging/root/.staging/job_1471670113386_0001
> java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed 
> to submit application_1471670113386_0001 to YARN : 
> org.apache.hadoop.security.AccessControlException: Queue root.default already 
> has 0 applications, cannot accept submission of application: 
> application_1471670113386_0001
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:316)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:255)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1790)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362)
>   at org.apache.hadoop.mapreduce.SleepJob.run(SleepJob.java:273)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.mapreduce.SleepJob.main(SleepJob.java:194)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>   at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>   at 
> org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:136)
>   at 
> org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:144)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit 
> application_1471670113386_0001 to YARN : 
> 

  1   2   >