[jira] [Commented] (YARN-5602) Utils for Federation State and Policy Store
[ https://issues.apache.org/jira/browse/YARN-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486347#comment-15486347 ] Jian He commented on YARN-5602: --- few minor comments: - why this ? {{// Capability and HeartBeat are not necessary to compare the information}} - FederationStateStoreException: are the getErrorCode and getErrorMsg methods needed ? as they can be retrieved from getCode.. - FederationStateStoreUtils: some param name in the method comments are wrong, e.g. LOG , errMsg - should the Exception related class in an exception package, rather than util ? > Utils for Federation State and Policy Store > --- > > Key: YARN-5602 > URL: https://issues.apache.org/jira/browse/YARN-5602 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-5602-YARN-2915.v1.patch, > YARN-5602-YARN-2915.v2.patch > > > This JIRA tracks the creation of utils for Federation State and Policy Store > such as Error Codes, Exceptions... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero
[ https://issues.apache.org/jira/browse/YARN-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486205#comment-15486205 ] Bibin A Chundatt commented on YARN-5545: Thank you [~leftnoteasy] for looking into issue. {quote} So queue will split maximum-application-number according to ratio of their total configured resource across partitions {quote} {noformat} Approach Consider average of absolute percentage of all partition,but not average of absolute percentage per partition b ,Label 1 can be of 10% of 20 GB and default partition can be of 50% of 100GB. Get percentage capacity of queue as [ sum of resource of queue A all partition (X) / Total cluster resource n cluster (Y) ]= absolute percentage overall cluster (Z). max application of queue = Z * maxclusterapplication Have to update the max application always with NODE registration and removal. {noformat} This was the initial approach we thought about, during discussion we came across scenario when rm is restarted and NM is not registered might get rejected any thoughts on that. Any thoughts on above scenarios how we should handle ?? > App submit failure on queue with label when default queue partition capacity > is zero > > > Key: YARN-5545 > URL: https://issues.apache.org/jira/browse/YARN-5545 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-5545.0001.patch, YARN-5545.0002.patch, > YARN-5545.0003.patch, capacity-scheduler.xml > > > Configure capacity scheduler > yarn.scheduler.capacity.root.default.capacity=0 > yarn.scheduler.capacity.root.queue1.accessible-node-labels.labelx.capacity=50 > yarn.scheduler.capacity.root.default.accessible-node-labels.labelx.capacity=50 > Submit application as below > ./yarn jar > ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-alpha2-SNAPSHOT-tests.jar > sleep -Dmapreduce.job.node-label-expression=labelx > -Dmapreduce.job.queuename=default -m 1 -r 1 -mt 1000 -rt 1 > {noformat} > 2016-08-21 18:21:31,375 INFO mapreduce.JobSubmitter: Cleaning up the staging > area /tmp/hadoop-yarn/staging/root/.staging/job_1471670113386_0001 > java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed > to submit application_1471670113386_0001 to YARN : > org.apache.hadoop.security.AccessControlException: Queue root.default already > has 0 applications, cannot accept submission of application: > application_1471670113386_0001 > at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:316) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:255) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1790) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362) > at org.apache.hadoop.mapreduce.SleepJob.run(SleepJob.java:273) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.mapreduce.SleepJob.main(SleepJob.java:194) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) > at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) > at > org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:136) > at > org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:144) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:239) > at org.apache.hadoop.util.RunJar.main(RunJar.java:153) > Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit > application_1471670113386_0001 to YARN : > org.apache.hadoop.security.AccessControlException: Queue root.default already > has 0 applications, cannot accept submission of application:
[jira] [Updated] (YARN-5539) TimelineClient failed to retry on "java.net.SocketTimeoutException: Read timed out"
[ https://issues.apache.org/jira/browse/YARN-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-5539: - Summary: TimelineClient failed to retry on "java.net.SocketTimeoutException: Read timed out" (was: AM fails due to "java.net.SocketTimeoutException: Read timed out") > TimelineClient failed to retry on "java.net.SocketTimeoutException: Read > timed out" > --- > > Key: YARN-5539 > URL: https://issues.apache.org/jira/browse/YARN-5539 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Junping Du >Priority: Critical > > AM fails with the following exception > {code} > FATAL distributedshell.ApplicationMaster: Error running ApplicationMaster > com.sun.jersey.api.client.ClientHandlerException: > java.net.SocketTimeoutException: Read timed out > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run(TimelineClientImpl.java:236) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:185) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:247) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPostingObject(TimelineWriter.java:154) > at > org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:115) > at > org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:112) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) > at > org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPosting(TimelineWriter.java:112) > at > org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:92) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:345) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1166) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:567) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:298) > Caused by: java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:170) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441) > at > java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) > at > org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:253) > at > org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:77) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:132) > at > org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:216) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.openConnection(DelegationTokenAuthenticatedURL.java:322) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineURLConnectionFactory.getHttpURLConnection(TimelineClientImpl.java:472) > at >
[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications
[ https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486002#comment-15486002 ] Hadoop QA commented on YARN-5605: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 57s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 44s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 42s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 5 new + 135 unchanged - 122 fixed = 140 total (was 257) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 0 new + 937 unchanged - 4 fixed = 937 total (was 941) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 29s {color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 34m 52s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 64m 15s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestResourceTrackerService | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Issue | YARN-5605 | | GITHUB PR | https://github.com/apache/hadoop/pull/124 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 78d6b884cc47 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 72dfb04 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle |
[jira] [Reopened] (YARN-5539) AM fails due to "java.net.SocketTimeoutException: Read timed out"
[ https://issues.apache.org/jira/browse/YARN-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du reopened YARN-5539: -- > AM fails due to "java.net.SocketTimeoutException: Read timed out" > - > > Key: YARN-5539 > URL: https://issues.apache.org/jira/browse/YARN-5539 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Junping Du >Priority: Critical > > AM fails with the following exception > {code} > FATAL distributedshell.ApplicationMaster: Error running ApplicationMaster > com.sun.jersey.api.client.ClientHandlerException: > java.net.SocketTimeoutException: Read timed out > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run(TimelineClientImpl.java:236) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:185) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:247) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPostingObject(TimelineWriter.java:154) > at > org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:115) > at > org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:112) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) > at > org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPosting(TimelineWriter.java:112) > at > org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:92) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:345) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1166) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:567) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:298) > Caused by: java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:170) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441) > at > java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) > at > org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:253) > at > org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:77) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:132) > at > org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:216) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.openConnection(DelegationTokenAuthenticatedURL.java:322) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineURLConnectionFactory.getHttpURLConnection(TimelineClientImpl.java:472) > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:159) > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147) >
[jira] [Commented] (YARN-5539) AM fails due to "java.net.SocketTimeoutException: Read timed out"
[ https://issues.apache.org/jira/browse/YARN-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486000#comment-15486000 ] Junping Du commented on YARN-5539: -- I think this exception hints our TimelineClient retry logic leak the exception in SokectTimeout case other than ConnectException. {noformat} public boolean shouldRetryOn(Exception e) { // Only retry on connection exceptions return (e instanceof ClientHandlerException) && (e.getCause() instanceof ConnectException); } {noformat} This is a valid issue but only can be found in very occasional cases. Reopen this issue to address the corner case. Will put up a patch soon! > AM fails due to "java.net.SocketTimeoutException: Read timed out" > - > > Key: YARN-5539 > URL: https://issues.apache.org/jira/browse/YARN-5539 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Junping Du >Priority: Critical > > AM fails with the following exception > {code} > FATAL distributedshell.ApplicationMaster: Error running ApplicationMaster > com.sun.jersey.api.client.ClientHandlerException: > java.net.SocketTimeoutException: Read timed out > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run(TimelineClientImpl.java:236) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:185) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:247) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPostingObject(TimelineWriter.java:154) > at > org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:115) > at > org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:112) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) > at > org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPosting(TimelineWriter.java:112) > at > org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:92) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:345) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1166) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:567) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:298) > Caused by: java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:170) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441) > at > java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) > at > org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:253) > at > org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:77) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:132) > at >
[jira] [Updated] (YARN-5637) Changes in NodeManager to support Container upgrade and rollback/commit
[ https://issues.apache.org/jira/browse/YARN-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-5637: -- Attachment: YARN-5637.001.patch Uploading initial patch based on the framework laid out in YARN-5620. I havn't submitted the patch yet since it depends on YARN-5620.. will do once that goes in. [~jianhe], [~vvasudev]... do let me know what you think. Some assumptions: * AutoCommit is set to _true_ by default * If _retryFailurePolicy_ is NOT specified, then the Container will be terminated immediately if upgrade fails. * If _retryFailurePolicy_ is specified, the upgraded Container will be retried/relaunch as per the policy.. and if that fails... then container will be rolledback to previous state. * If _autoCommit_ is *false* and user has called the _commitUpgarde_ API, then no rollback will be allowed after that. * If _autoCommit_ is *true* then no Explicit or implicit (upgraded process fails to start) rollback will be allowed. > Changes in NodeManager to support Container upgrade and rollback/commit > --- > > Key: YARN-5637 > URL: https://issues.apache.org/jira/browse/YARN-5637 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-5637.001.patch > > > YARN-5620 added support for re-initialization of Containers using a new > launch Context. > This JIRA proposes to use the above feature to support upgrade and subsequent > rollback or commit of the upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1558) After apps are moved across queues, store new queue info in the RM state store
[ https://issues.apache.org/jira/browse/YARN-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485925#comment-15485925 ] Xianyin Xin commented on YARN-1558: --- is there any update about this? > After apps are moved across queues, store new queue info in the RM state store > -- > > Key: YARN-1558 > URL: https://issues.apache.org/jira/browse/YARN-1558 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Sandy Ryza > > The result of moving an app to a new queue should persist across RM restarts. > This will require updating the ApplicationSubmissionContext, the single > source of truth upon state recovery, with the new queue info. > There will be a brief window after the move completes before the move is > stored. If the RM dies during this window, the recovered RM will include the > old queue info. Schedulers should be resilient to this situation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications
[ https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-5605: --- Attachment: yarn-5605-3.patch > Preempt containers (all on one node) to meet the requirement of starved > applications > > > Key: YARN-5605 > URL: https://issues.apache.org/jira/browse/YARN-5605 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-5605-1.patch, yarn-5605-2.patch, yarn-5605-3.patch > > > Required items: > # Identify starved applications > # Identify a node that has enough containers from applications over their > fairshare. > # Preempt those containers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3359) Recover collector list in RM failed over
[ https://issues.apache.org/jira/browse/YARN-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485847#comment-15485847 ] Li Lu commented on YARN-3359: - Thanks [~jdu]! Actually keeping the copies in the NMs will do the work. The only challenge is when two or more then two collectors for the same application got launched (because of some cluster partition, for example). Therefore the RM needs to keep a version number for collectors, so that when rebuilding app to collector mappings, it knows which collectors are stale and which one is active. bq. btw, app's attempt id shouldn't be used here as collector is designed to be independent of AM lifecycle - it also means AM failed doesn't hint collector need to be killed/restarted. Do I miss anything? You're right. We should not do it... > Recover collector list in RM failed over > > > Key: YARN-3359 > URL: https://issues.apache.org/jira/browse/YARN-3359 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Junping Du >Assignee: Li Lu > Labels: YARN-5355 > > Per discussion in YARN-3039, split the recover work from RMStateStore in a > separated JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5540) scheduler spends too much time looking at empty priorities
[ https://issues.apache.org/jira/browse/YARN-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485794#comment-15485794 ] Wangda Tan commented on YARN-5540: -- Thanks [~jlowe], patch generally looks good, only a few minor comments: 1) It might be better to rename add/removeSchedulerKeyReference to increase/decreaseSchedulerKeyReference since schedulerKey ref will not be removed unless value == 0. 2) Do you think we should remove: bq. // TODO: Shouldn't we activate even if numContainers = 0? > scheduler spends too much time looking at empty priorities > -- > > Key: YARN-5540 > URL: https://issues.apache.org/jira/browse/YARN-5540 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, fairscheduler, resourcemanager >Affects Versions: 2.7.2 >Reporter: Nathan Roberts >Assignee: Jason Lowe > Attachments: YARN-5540.001.patch, YARN-5540.002.patch > > > We're starting to see the capacity scheduler run out of scheduling horsepower > when running 500-1000 applications on clusters with 4K nodes or so. > This seems to be amplified by TEZ applications. TEZ applications have many > more priorities (sometimes in the hundreds) than typical MR applications and > therefore the loop in the scheduler which examines every priority within > every running application, starts to be a hotspot. The priorities appear to > stay around forever, even when there is no remaining resource request at that > priority causing us to spend a lot of time looking at nothing. > jstack snippet: > {noformat} > "ResourceManager Event Processor" #28 prio=5 os_prio=0 tid=0x7fc2d453e800 > nid=0x22f3 runnable [0x7fc2a8be2000] >java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceRequest(SchedulerApplicationAttempt.java:210) > - eliminated <0x0005e73e5dc0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:852) > - locked <0x0005e73e5dc0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp) > - locked <0x0003006fcf60> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:527) > - locked <0x0003001b22f8> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:415) > - locked <0x0003001b22f8> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1224) > - locked <0x000300041e40> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3359) Recover collector list in RM failed over
[ https://issues.apache.org/jira/browse/YARN-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485786#comment-15485786 ] Junping Du commented on YARN-3359: -- btw, app's attempt id shouldn't be used here as collector is designed to be independent of AM lifecycle - it also means AM failed doesn't hint collector need to be killed/restarted. Do I miss anything? > Recover collector list in RM failed over > > > Key: YARN-3359 > URL: https://issues.apache.org/jira/browse/YARN-3359 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Junping Du >Assignee: Li Lu > Labels: YARN-5355 > > Per discussion in YARN-3039, split the recover work from RMStateStore in a > separated JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5638) Introduce a collector Id to uniquely identify collectors and their creation order
[ https://issues.apache.org/jira/browse/YARN-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485778#comment-15485778 ] Junping Du commented on YARN-5638: -- Why collector's lifecycle need to bind with AM attempt? AM failed doesn't means collector have to be killed/restart. > Introduce a collector Id to uniquely identify collectors and their creation > order > - > > Key: YARN-5638 > URL: https://issues.apache.org/jira/browse/YARN-5638 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > > As discussed in YARN-3359, we need to further identify timeline collectors > and their creation order for better service discovery and resource isolation. > This JIRA proposes to useto accurately identify > each timeline collector. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3359) Recover collector list in RM failed over
[ https://issues.apache.org/jira/browse/YARN-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485770#comment-15485770 ] Junping Du commented on YARN-3359: -- Thanks for bringing up many options here. Rather than RM state store, I think there is another much simpler and more lightweight solution - cause NM know exactly which collectors are running on it (collector use a NM local RPC to notify NM before), NM can just notify RM in registering to new or restarted RM. What do you think? > Recover collector list in RM failed over > > > Key: YARN-3359 > URL: https://issues.apache.org/jira/browse/YARN-3359 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Junping Du >Assignee: Li Lu > Labels: YARN-5355 > > Per discussion in YARN-3039, split the recover work from RMStateStore in a > separated JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5324) Stateless router policies implementation
[ https://issues.apache.org/jira/browse/YARN-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485752#comment-15485752 ] Subru Krishnan commented on YARN-5324: -- Thanks [~curino] for the patch. I looked at it and have a few comments below. Can we move *SubClusterIdInfo* --> federation.store.records.dao as it can potentially be used elsewhere like in future Federation Admin REST APIs. In *WeightedPolicyInfo*: * Rename _routerWeights_ --> _routerPolicyWeights_. * Rename _amrmWeights_ --> _AMRMPolicyWeights_ * Add Javadocs for the getter/setters of the weights as they are not very inituitive. * Why are we iterating through the weight maps for every get (_getRouterWeights/getAmrmWeights_)? Either we should avoid this or do this once at initialization. * We should move the JSONJAXBContext and marshaller/unmarshaller as class variables and initialize at declaration as they are expensive ops. * Can we give some example for alpha values and their effect to provide more context. * To improve readability, can we split the if into two - for routerWeights & amrmWeights respectively; {code} if(otherAMRMWeights != null && amrmWeights != null && otherRouterWeights != null && routerWeights != null) { return CollectionUtils.isEqualCollection(otherAMRMWeights.entrySet(), amrmWeights.entrySet()) && CollectionUtils.isEqualCollection( (otherRouterWeights.entrySet()), routerWeights.entrySet()); {code} * For Hashcode - {code} 31 * amrmWeights + routerWeights {code} should be a better option. In *BaseWeightedRouterPolicy*: * Something seems off in the sentence formation of class Javadocs. * _getRand()_ seems to be used only by *WeightedRandomRouterPolicy* so can be moved as a local variable as is done in *UniformRandomRouterPolicy*. * Shouldn't {code} if (policyInfo != null && policyInfo.equals(newPolicyInfo)) { {code} be {code} if (policyInfo == null || policyInfo.equals(newPolicyInfo)) { {code} * Why are we not checking _amrmWeights_ in: {code} if (newWeights == null || newWeights.size() < 1) { {code} * I think it'll be good if we move all the validations to either a validator or a separate validate method. * We should have the check for active sub-clusters here as now every policy repeats it. * I feel we should define the {{protected SubClusterId selectSubCluster(MapactiveSubclusters, Map routerPolicyWeights)}} in the base class and implementing policies can override it accordingly. In *LoadBasedRouterPolicy*: * We should invoke _getAvailableMemory_ only if weight is 1. * Do we have to do this for every invocation of _getHomeSubcluster_. Seems like it but just want to make sure? In *OrderedRouterPolicy*: * Typo in Javadoc; _Heights_ should be _Highest_. * I feel it should round robin between the active sub-clusters as current policy will pick the same sub-cluster every time (assuming that entire sub-cluster downtime is rare, more so with RM HA). This feels more like a *PriorityRouterPolicy*. In *WeightedRandomRouterPolicy*: * {code} if (getPolicyInfo() == null) { {code} isn't the check redundant? * Can we add some code comments to clarify how exactly the selection is done. * {code} chosen = id; if (lookupValue <= 0) { break; } {code} should be {code} if (lookupValue <= 0) { return id; } {code} In *MockPolicyManager*: * Might be better to return "default" rather than null in _getQueue_? Common across tests: * Am I missing something; All the tests call it out but I don't see scenarios covering multiple invocations or clusters going inactive? * We should have a {{BaseRouterPolicyTest}} and move the tests there and override only the policy context in individual policy test. Refer to {{FederationStateStoreBaseTest}}. * How is *FederationPoliciesTestUtil::createResourceRequests/createResourceRequest* used? Why can't we use *ResourceRequest::newInstance* for the latter? * Nit: variables can be declared outside the loop in _setUp_ In *TestLoadBasedRouterPolicy*: * Typo in class Javadoc: _weighiting_ --> _weighting_ > Stateless router policies implementation > > > Key: YARN-5324 > URL: https://issues.apache.org/jira/browse/YARN-5324 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Affects Versions: YARN-2915 >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-5324-YARN-2915.06.patch, > YARN-5324-YARN-2915.07.patch, YARN-5324-YARN-2915.08.patch, > YARN-5324-YARN-2915.09.patch, YARN-5324.01.patch, YARN-5324.02.patch, > YARN-5324.03.patch, YARN-5324.04.patch, YARN-5324.05.patch > > > These are policies at the Router that do not require maintaing state across > choices (e.g.,
[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications
[ https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485753#comment-15485753 ] ASF GitHub Bot commented on YARN-5605: -- Github user kambatla commented on a diff in the pull request: https://github.com/apache/hadoop/pull/124#discussion_r78478634 --- Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java --- @@ -557,28 +599,33 @@ private boolean preemptContainerPreCheck() { getFairShare()); } - /** - * Is a queue being starved for its min share. - */ - @VisibleForTesting - boolean isStarvedForMinShare() { -return isStarved(getMinShare()); + private Resource minShareStarvation() { +Resource desiredShare = Resources.min(policy.getResourceCalculator(), +scheduler.getClusterResource(), getMinShare(), getDemand()); + +Resource starvation = Resources.subtract(desiredShare, getResourceUsage()); +boolean starved = Resources.greaterThan(policy.getResourceCalculator(), +scheduler.getClusterResource(), starvation, none()); + +long now = scheduler.getClock().getTime(); +if (!starved) { + setLastTimeAtMinShare(now); +} + +if (starved && +(now - lastTimeAtMinShare > getMinSharePreemptionTimeout())) { + return starvation; +} else { + return Resources.clone(Resources.none()); --- End diff -- Actually, this can be simplified further. Improvement in updated patch. > Preempt containers (all on one node) to meet the requirement of starved > applications > > > Key: YARN-5605 > URL: https://issues.apache.org/jira/browse/YARN-5605 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-5605-1.patch, yarn-5605-2.patch > > > Required items: > # Identify starved applications > # Identify a node that has enough containers from applications over their > fairshare. > # Preempt those containers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1468) TestRMRestart.testRMRestartWaitForPreviousAMToFinish get failed.
[ https://issues.apache.org/jira/browse/YARN-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485549#comment-15485549 ] Daniel Templeton commented on YARN-1468: For what it's worth, I'm seeing the same issue as [~mitdesai]. I'm going to take a look this week. > TestRMRestart.testRMRestartWaitForPreviousAMToFinish get failed. > > > Key: YARN-1468 > URL: https://issues.apache.org/jira/browse/YARN-1468 > Project: Hadoop YARN > Issue Type: Test > Components: resourcemanager >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > > Log is as following: > {code} > Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 149.968 sec > <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart > testRMRestartWaitForPreviousAMToFinish(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) > Time elapsed: 44.197 sec <<< FAILURE! > junit.framework.AssertionFailedError: AppAttempt state is not correct > (timedout) expected: but was: > at junit.framework.Assert.fail(Assert.java:50) > at junit.framework.Assert.failNotEquals(Assert.java:287) > at junit.framework.Assert.assertEquals(Assert.java:67) > at > org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:292) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:826) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:464) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4205) Add a service for monitoring application life time out
[ https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485470#comment-15485470 ] Vinod Kumar Vavilapalli commented on YARN-4205: --- bq. Added ApplicationTimeouts class that contains lifetime values. This class can be used in future to support for any other timeouts such as queu_timeout or statestore_timeout. Makes sense. > Add a service for monitoring application life time out > -- > > Key: YARN-4205 > URL: https://issues.apache.org/jira/browse/YARN-4205 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: nijel >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4205.patch, 0002-YARN-4205.patch, > 0003-YARN-4205.patch, YARN-4205_01.patch, YARN-4205_02.patch, > YARN-4205_03.patch > > > This JIRA intend to provide a lifetime monitor service. > The service will monitor the applications where the life time is configured. > If the application is running beyond the lifetime, it will be killed. > The lifetime will be considered from the submit time. > The thread monitoring interval is configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5638) Introduce a collector Id to uniquely identify collectors and their creation order
[ https://issues.apache.org/jira/browse/YARN-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485366#comment-15485366 ] Li Lu commented on YARN-5638: - Having put some more thoughts, I think adding the attempt ID as one part of identifying collectors may not be a good idea. Collectors belong to applications but not attempts. So it is possible that we map multiple app attempts to one collector (if the collector is ran in a separate container). It is also possible to have several collectors within one application, though. For example, a node got separated from other nodes, and all applications running on it will get relaunched. Thus, I think it make sense to preserve the oldmapping when facing users, but internally we need to treat the corner case that multiple collectors (one active, others stale) are mapped to the same application. > Introduce a collector Id to uniquely identify collectors and their creation > order > - > > Key: YARN-5638 > URL: https://issues.apache.org/jira/browse/YARN-5638 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > > As discussed in YARN-3359, we need to further identify timeline collectors > and their creation order for better service discovery and resource isolation. > This JIRA proposes to use to accurately identify > each timeline collector. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5324) Stateless router policies implementation
[ https://issues.apache.org/jira/browse/YARN-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485329#comment-15485329 ] Hadoop QA commented on YARN-5324: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 39s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 45s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 11s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 53s {color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 14m 34s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12828108/YARN-5324-YARN-2915.09.patch | | JIRA Issue | YARN-5324 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 411c7df3a9a1 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | YARN-2915 / 302d206 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/13091/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/13091/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/13091/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Stateless router policies implementation > > > Key: YARN-5324 > URL: https://issues.apache.org/jira/browse/YARN-5324
[jira] [Commented] (YARN-5324) Stateless router policies implementation
[ https://issues.apache.org/jira/browse/YARN-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485287#comment-15485287 ] Carlo Curino commented on YARN-5324: More cleanups. The remaining checkstyle (>7 params) I think is ok as is (used only by tests to create a {{ResourceRequest}} for testing). > Stateless router policies implementation > > > Key: YARN-5324 > URL: https://issues.apache.org/jira/browse/YARN-5324 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Affects Versions: YARN-2915 >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-5324-YARN-2915.06.patch, > YARN-5324-YARN-2915.07.patch, YARN-5324-YARN-2915.08.patch, > YARN-5324-YARN-2915.09.patch, YARN-5324.01.patch, YARN-5324.02.patch, > YARN-5324.03.patch, YARN-5324.04.patch, YARN-5324.05.patch > > > These are policies at the Router that do not require maintaing state across > choices (e.g., weighted random). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5324) Stateless router policies implementation
[ https://issues.apache.org/jira/browse/YARN-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-5324: --- Attachment: YARN-5324-YARN-2915.09.patch > Stateless router policies implementation > > > Key: YARN-5324 > URL: https://issues.apache.org/jira/browse/YARN-5324 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Affects Versions: YARN-2915 >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-5324-YARN-2915.06.patch, > YARN-5324-YARN-2915.07.patch, YARN-5324-YARN-2915.08.patch, > YARN-5324-YARN-2915.09.patch, YARN-5324.01.patch, YARN-5324.02.patch, > YARN-5324.03.patch, YARN-5324.04.patch, YARN-5324.05.patch > > > These are policies at the Router that do not require maintaing state across > choices (e.g., weighted random). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3359) Recover collector list in RM failed over
[ https://issues.apache.org/jira/browse/YARN-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485264#comment-15485264 ] Li Lu commented on YARN-3359: - Thanks [~vinodkv]! Creating YARN-5638 to trace the collector ID related changes separately. > Recover collector list in RM failed over > > > Key: YARN-3359 > URL: https://issues.apache.org/jira/browse/YARN-3359 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Junping Du >Assignee: Li Lu > Labels: YARN-5355 > > Per discussion in YARN-3039, split the recover work from RMStateStore in a > separated JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5638) Introduce a collector Id to uniquely identify collectors and their creation order
Li Lu created YARN-5638: --- Summary: Introduce a collector Id to uniquely identify collectors and their creation order Key: YARN-5638 URL: https://issues.apache.org/jira/browse/YARN-5638 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu As discussed in YARN-3359, we need to further identify timeline collectors and their creation order for better service discovery and resource isolation. This JIRA proposes to useto accurately identify each timeline collector. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5561) [Atsv2] : Support for ability to retrieve apps/app-attempt/containers and entities via REST
[ https://issues.apache.org/jira/browse/YARN-5561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485249#comment-15485249 ] Li Lu commented on YARN-5561: - Thanks [~rohithsharma], the two proposed APIs make the app - attempt - container APIs more comprehensive. LGTM. bq. are looking for a complete applications page where all applications which were running/completed had to be listed. This is something I'm not sure if we'd like to support on the timeline level. This information is pretty much available on the RM side via the state store. Why do we want to encourage an expensive operation on the timeline store for this data? > [Atsv2] : Support for ability to retrieve apps/app-attempt/containers and > entities via REST > --- > > Key: YARN-5561 > URL: https://issues.apache.org/jira/browse/YARN-5561 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: YARN-5561.patch, YARN-5561.v0.patch > > > ATSv2 model lacks retrieval of {{list-of-all-apps}}, > {{list-of-all-app-attempts}} and {{list-of-all-containers-per-attempt}} via > REST API's. And also it is required to know about all the entities in an > applications. > It is pretty much highly required these URLs for Web UI. > New REST URL would be > # GET {{/ws/v2/timeline/apps}} > # GET {{/ws/v2/timeline/apps/\{app-id\}/appattempts}}. > # GET > {{/ws/v2/timeline/apps/\{app-id\}/appattempts/\{attempt-id\}/containers}} > # GET {{/ws/v2/timeline/apps/\{app id\}/entities}} should display list of > entities that can be queried. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3359) Recover collector list in RM failed over
[ https://issues.apache.org/jira/browse/YARN-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485242#comment-15485242 ] Vinod Kumar Vavilapalli commented on YARN-3359: --- bq. Right now each collector is mapped with an app ID, but to handle the state recover case, we need to associate each collector with an attempt ID (and ideally a time stamp to further distinguish collectors). +1 in general. We can simply have the RM generate a collector-id to be (app-attempt-ID + another ID identifying the collector). RM *has* to know about collectors for scheduling later when the collector runs in its own container, so letting it generate the IDs is reasonable. > Recover collector list in RM failed over > > > Key: YARN-3359 > URL: https://issues.apache.org/jira/browse/YARN-3359 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Junping Du >Assignee: Li Lu > Labels: YARN-5355 > > Per discussion in YARN-3039, split the recover work from RMStateStore in a > separated JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5555) Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically nested.
[ https://issues.apache.org/jira/browse/YARN-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-: - Fix Version/s: (was: 3.0.0-alpha2) (was: 2.9.0) 2.8.0 Thanks [~varun_saxena]. I have backported this to 2.8.0 > Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically > nested. > > > Key: YARN- > URL: https://issues.apache.org/jira/browse/YARN- > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Minor > Fix For: 2.8.0 > > Attachments: PctOfQueueIsInaccurate.jpg, YARN-.001.patch > > > If a leaf queue is hierarchically nested (e.g., {{root.a.a1}}, > {{root.a.a2}}), the values in the "*% of Queue*" column in the apps section > of the Scheduler UI is calculated as if the leaf queue ({{a1}}) were a direct > child of {{root}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3359) Recover collector list in RM failed over
[ https://issues.apache.org/jira/browse/YARN-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485225#comment-15485225 ] Li Lu commented on YARN-3359: - I've got some offline discussion with [~vinodkv] about this issue. We cannot simply preserve collector states in the RM state store since this state is not final, and updating this status frequently will block the RM. A natural replacement place for the state store is the NM state store. That is to say, we can rebuild RM's collector table by getting updates from the NMs. In summary, we need to do the following things: For NMs: 1. on collector launching, preserve collector address in its state store. 2. on removing collectors, remove the related item from state store. 3. on start up, recover collector addresses from state store. 4. on resync, send current collector address mapping to the RM. For RMs, the only change needed is to rebuild the collector/address mapping upon restart. This actually involves a pretty messy corner case: when one application has two different attempts running (due to some network problems, for example) and the RM is trying to rebuild collector status, the RM needs to know which collector is for the latest app attempt and which one is for the stale attempt. This requires some changes in collector IDs. Right now each collector is mapped with an app ID, but to handle the state recover case, we need to associate each collector with an attempt ID (and ideally a time stamp to further distinguish collectors). Not sure if we missed some critical points in this design. Thoughts? > Recover collector list in RM failed over > > > Key: YARN-3359 > URL: https://issues.apache.org/jira/browse/YARN-3359 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Junping Du >Assignee: Li Lu > Labels: YARN-5355 > > Per discussion in YARN-3039, split the recover work from RMStateStore in a > separated JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3359) Recover collector list in RM failed over
[ https://issues.apache.org/jira/browse/YARN-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485188#comment-15485188 ] Li Lu commented on YARN-3359: - Thanks [~djp], I'm taking it over. Thanks for letting me know YARN-4758 which may be a potential conflict to this JIRA. Let me first explore possible options to persist the state of collectors here. After the plan is fixed here we can decide how to proceed. > Recover collector list in RM failed over > > > Key: YARN-3359 > URL: https://issues.apache.org/jira/browse/YARN-3359 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Junping Du >Assignee: Li Lu > Labels: YARN-5355 > > Per discussion in YARN-3039, split the recover work from RMStateStore in a > separated JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-5613) Fair Scheduler can assign containers from blacklisted nodes
[ https://issues.apache.org/jira/browse/YARN-5613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton resolved YARN-5613. Resolution: Invalid Turns out the issue I was seeing is coincidentally resolved by YARN-3547. > Fair Scheduler can assign containers from blacklisted nodes > --- > > Key: YARN-5613 > URL: https://issues.apache.org/jira/browse/YARN-5613 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-5613.001.patch > > > The {{FairScheduler.allocate()}} makes its resource request before it updates > the blacklist. If the scheduler processes the resource request before the > allocating thread updates the blacklist, the scheduler can assign containers > that are on nodes in the blacklist. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4404) Typo in comment in SchedulerUtils
[ https://issues.apache.org/jira/browse/YARN-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485177#comment-15485177 ] Manjunath Ballur commented on YARN-4404: There are lot many places where this typo exists. Can I take this up? If yes, please assign this to me. > Typo in comment in SchedulerUtils > - > > Key: YARN-4404 > URL: https://issues.apache.org/jira/browse/YARN-4404 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Devon Michaels >Priority: Trivial > Labels: newbie > > The comment starting on line 254 says: > {code} > /** >* Utility method to validate a resource request, by insuring that the >* requested memory/vcore is non-negative and not greater than max >* >* @throws InvalidResourceRequestException when there is invalid request >*/ > {code} > "Insuring" should be "ensuring." -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-3359) Recover collector list in RM failed over
[ https://issues.apache.org/jira/browse/YARN-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu reassigned YARN-3359: --- Assignee: Li Lu (was: Junping Du) > Recover collector list in RM failed over > > > Key: YARN-3359 > URL: https://issues.apache.org/jira/browse/YARN-3359 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Junping Du >Assignee: Li Lu > Labels: YARN-5355 > > Per discussion in YARN-3039, split the recover work from RMStateStore in a > separated JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5324) Stateless router policies implementation
[ https://issues.apache.org/jira/browse/YARN-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485097#comment-15485097 ] Hadoop QA commented on YARN-5324: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 6s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 45s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 11s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common: The patch generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 52s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 51s {color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 15m 51s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common | | | org.apache.hadoop.yarn.server.federation.policies.dao.WeightedPolicyInfo defines equals and uses Object.hashCode() At WeightedPolicyInfo.java:Object.hashCode() At WeightedPolicyInfo.java:[lines 140-157] | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12828092/YARN-5324-YARN-2915.08.patch | | JIRA Issue | YARN-5324 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 3d93d9d04bd3 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | YARN-2915 / 302d206 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/13090/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt | | findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/13090/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.html | | Test
[jira] [Commented] (YARN-5324) Stateless router policies implementation
[ https://issues.apache.org/jira/browse/YARN-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485039#comment-15485039 ] Carlo Curino commented on YARN-5324: Standard pass on fixing checkstyle/javadoc/findbugs/asflicense issues. > Stateless router policies implementation > > > Key: YARN-5324 > URL: https://issues.apache.org/jira/browse/YARN-5324 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Affects Versions: YARN-2915 >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-5324-YARN-2915.06.patch, > YARN-5324-YARN-2915.07.patch, YARN-5324-YARN-2915.08.patch, > YARN-5324.01.patch, YARN-5324.02.patch, YARN-5324.03.patch, > YARN-5324.04.patch, YARN-5324.05.patch > > > These are policies at the Router that do not require maintaing state across > choices (e.g., weighted random). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5324) Stateless router policies implementation
[ https://issues.apache.org/jira/browse/YARN-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-5324: --- Attachment: YARN-5324-YARN-2915.08.patch > Stateless router policies implementation > > > Key: YARN-5324 > URL: https://issues.apache.org/jira/browse/YARN-5324 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Affects Versions: YARN-2915 >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-5324-YARN-2915.06.patch, > YARN-5324-YARN-2915.07.patch, YARN-5324-YARN-2915.08.patch, > YARN-5324.01.patch, YARN-5324.02.patch, YARN-5324.03.patch, > YARN-5324.04.patch, YARN-5324.05.patch > > > These are policies at the Router that do not require maintaing state across > choices (e.g., weighted random). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2571) RM to support YARN registry
[ https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485010#comment-15485010 ] Hadoop QA commented on YARN-2571: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 14 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 51s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 53s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 13s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 2m 13s {color} | {color:red} hadoop-yarn-project_hadoop-yarn generated 1 new + 34 unchanged - 1 fixed = 35 total (was 35) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 47s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 15 new + 787 unchanged - 168 fixed = 802 total (was 955) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch 4 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s {color} | {color:green} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-registry generated 0 new + 17 unchanged - 35 fixed = 17 total (was 52) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 7s {color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 41s {color} | {color:green} hadoop-yarn-registry in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s {color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | |
[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications
[ https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484875#comment-15484875 ] Hadoop QA commented on YARN-5605: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 45s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 56s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 31s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 39s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 10 new + 133 unchanged - 122 fixed = 143 total (was 255) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 18s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 1 new + 937 unchanged - 4 fixed = 938 total (was 941) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 33s {color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 44m 8s {color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 71m 48s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | | Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.liveContainers; locked 94% of time Unsynchronized access at FSAppAttempt.java:94% of time Unsynchronized access at FSAppAttempt.java:[line 500] | | | org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSStarvedApps$StarvationComparator implements Comparator but not Serializable At FSStarvedApps.java:Serializable At FSStarvedApps.java:[lines 51-55] | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Issue | YARN-5605 | | GITHUB PR | https://github.com/apache/hadoop/pull/124 | | Optional Tests | asflicense compile javac javadoc mvninstall
[jira] [Commented] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero
[ https://issues.apache.org/jira/browse/YARN-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484852#comment-15484852 ] Wangda Tan commented on YARN-5545: -- [~bibinchundatt], [~sunilg], [~Naganarasimha]. Thanks for discussion, I think for this issue, what we should do: - Don't split maximum-application-number to per-partition, as we already have am-resource-percent-per-partition, adding more per-partition configuration will confuse user - And also, you cannot say one app belongs to one partition, you can only say one AM belongs to one partition - So queue will split maximum-application-number according to ratio of their total configured resource across partitions. For example, {code} Cluster maximum-application = 100, queueA configured partitionX = 10G, partitionY = 20G; queueB configured partitionX = 20G, partitionY = 50G; {code} So queueA 's maximum-application is 100 * (10 + 20) / (10 + 20 + 20 + 50) = 30 And queueB's maximum-application is 100 * (20 + 50) / (10 + 20 + 20 + 50) = 70 - Please note that, the maximum-applications of queues will be updated when CS configuration updated (refresh queue), and cluster resource updated, so we need to update it inside CSQueue#updateClusterResource . Thoughts? > App submit failure on queue with label when default queue partition capacity > is zero > > > Key: YARN-5545 > URL: https://issues.apache.org/jira/browse/YARN-5545 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-5545.0001.patch, YARN-5545.0002.patch, > YARN-5545.0003.patch, capacity-scheduler.xml > > > Configure capacity scheduler > yarn.scheduler.capacity.root.default.capacity=0 > yarn.scheduler.capacity.root.queue1.accessible-node-labels.labelx.capacity=50 > yarn.scheduler.capacity.root.default.accessible-node-labels.labelx.capacity=50 > Submit application as below > ./yarn jar > ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-alpha2-SNAPSHOT-tests.jar > sleep -Dmapreduce.job.node-label-expression=labelx > -Dmapreduce.job.queuename=default -m 1 -r 1 -mt 1000 -rt 1 > {noformat} > 2016-08-21 18:21:31,375 INFO mapreduce.JobSubmitter: Cleaning up the staging > area /tmp/hadoop-yarn/staging/root/.staging/job_1471670113386_0001 > java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed > to submit application_1471670113386_0001 to YARN : > org.apache.hadoop.security.AccessControlException: Queue root.default already > has 0 applications, cannot accept submission of application: > application_1471670113386_0001 > at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:316) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:255) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1790) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362) > at org.apache.hadoop.mapreduce.SleepJob.run(SleepJob.java:273) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.mapreduce.SleepJob.main(SleepJob.java:194) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) > at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) > at > org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:136) > at > org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:144) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:239) > at org.apache.hadoop.util.RunJar.main(RunJar.java:153) > Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit > application_1471670113386_0001 to YARN : >
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484820#comment-15484820 ] Allen Wittenauer commented on YARN-5567: bq. would you prefer this be a config setting to choose the behavior? The history of the health check script is interesting, but long. But not trusting the exit code was one of the key learnings by the ops team from the HOD experience. It fails a lot more often than people realize, mainly due to users doing crazy things, especially on insecure systems. This is one of those times where it's going to be extremely difficult to convince me otherwise. I can't think of a reason to ever trust the exit code enough to bring down the NodeManager. In this particular environment, the number of conditions that the script can fail for reasons which may be temporary/pointless are many. Now it could be argued that those temporary failures should cause the NM to come down, but then you get into a race condition between heartbeats and actual issues. HDFS worked around it by basically saying "it has to fail for X long". Ignoring the exit code avoids that problem because one can be sure that "ERROR -" really did come from the script. bq. Alternatively, would you be okay with standardizing on a specific error code for "detected bad Node" vs "bad script"? If by error code you specifically mean the value the NM reports back to the RM, yes that makes sense. It just can't fail the node. > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 3.0.0-alpha1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2571) RM to support YARN registry
[ https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2571: - Attachment: YARN-2571-016.patch > RM to support YARN registry > > > Key: YARN-2571 > URL: https://issues.apache.org/jira/browse/YARN-2571 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.3 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-2571-001.patch, YARN-2571-002.patch, > YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, > YARN-2571-008.patch, YARN-2571-009.patch, YARN-2571-010.patch, > YARN-2571-012.patch, YARN-2571-013.patch, YARN-2571-015.patch, > YARN-2571-016.patch > > > The RM needs to (optionally) integrate with the YARN registry: > # startup: create the /services and /users paths with system ACLs (yarn, hdfs > principals) > # app-launch: create the user directory /users/$username with the relevant > permissions (CRD) for them to create subnodes. > # attempt, container, app completion: remove service records with the > matching persistence and ID -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5296) NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl
[ https://issues.apache.org/jira/browse/YARN-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484776#comment-15484776 ] Wangda Tan commented on YARN-5296: -- Already closed it, thanks for explanations! > NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl > --- > > Key: YARN-5296 > URL: https://issues.apache.org/jira/browse/YARN-5296 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Karam Singh >Assignee: Junping Du > Fix For: 2.9.0, 3.0.0-alpha1 > > Attachments: YARN-5296-v2.1.patch, YARN-5296-v2.patch, > YARN-5296.patch, after v2 fix.png, before v2 fix.png > > > Ran tests in following manner, > 1. Run GridMix of 768 sequestionally around 17 times to execute about 12.9K > apps. > 2. After 4-5hrs take Check NM Heap using Memory Analyser. It report around > 96% Heap is being used my ContainerMetrics > 3. Run 7 more GridMix run for have around 18.2apps ran in total. Again check > NM heap using Memory Analyser again 96% heap is being used by > ContainerMetrics. > 4. Start one more grimdmix run, while run going on , NMs started going down > with OOM, around running 18.7K+, On analysing NM heap using Memory analyser, > OOM was caused by ContainerMetrics -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5190) Registering/unregistering container metrics triggered by ContainerEvent and ContainersMonitorEvent are conflict which cause uncaught exception in ContainerMonitorImpl
[ https://issues.apache.org/jira/browse/YARN-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484772#comment-15484772 ] Wangda Tan commented on YARN-5190: -- [~djp], HADOOP-13362 should be able to fix the issue, thanks for pointing me this. Closing this ticket. > Registering/unregistering container metrics triggered by ContainerEvent and > ContainersMonitorEvent are conflict which cause uncaught exception in > ContainerMonitorImpl > -- > > Key: YARN-5190 > URL: https://issues.apache.org/jira/browse/YARN-5190 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du >Priority: Blocker > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: YARN-5190-branch-2.7.001.patch, YARN-5190-v2.patch, > YARN-5190.patch > > > The exception stack is as following: > {noformat} > 310735 2016-05-22 01:50:04,554 [Container Monitor] ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[Container > Monitor,5,main] threw an Exception. > 310736 org.apache.hadoop.metrics2.MetricsException: Metrics source > ContainerResource_container_1463840817638_14484_01_10 already exists! > 310737 at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135) > 310738 at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112) > 310739 at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229) > 310740 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.forContainer(ContainerMetrics.java:212) > 310741 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.forContainer(ContainerMetrics.java:198) > 310742 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:385) > {noformat} > After YARN-4906, we have multiple places to get ContainerMetrics for a > particular container that could cause race condition in registering the same > container metrics to DefaultMetricsSystem by different threads. Lacking of > proper handling of MetricsException which could get thrown, the exception > will could bring down daemon of ContainerMonitorImpl or even whole NM. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications
[ https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484683#comment-15484683 ] ASF GitHub Bot commented on YARN-5605: -- Github user templedf commented on a diff in the pull request: https://github.com/apache/hadoop/pull/124#discussion_r78414618 --- Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java --- @@ -223,17 +225,76 @@ public void setPolicy(SchedulingPolicy policy) } super.policy = policy; } - + @Override - public void recomputeShares() { + public void updateInternal(boolean checkStarvation) { readLock.lock(); try { policy.computeShares(runnableApps, getFairShare()); + if (checkStarvation) { +identifyStarvedApplications(); + } } finally { readLock.unlock(); } } + /** + * Helper method to identify starved applications. This needs to be called + * ONLY from {@link #updateInternal}, after the application shares + * are updated. + * + * A queue can be starving due to fairshare or minshare. + * + * Minshare is defined only on the queue and not the applications. + * Fairshare is defined for both the queue and the applications. + * + * If this queue is starved due to minshare, we need to identify the most + * deserving apps if they themselves are not starved due to fairshare. + * + * If this queue is starving due to fairshare, there must be at least + * one application that is starved. And, even if the queue is not + * starved due to fairshare, there might still be starved applications. + */ + private void identifyStarvedApplications() { +// First identify starved applications and track total amount of +// starvation (in resources) +Resource fairShareStarvation = Resources.clone(none()); +TreeSet appsWithDemand = fetchAppsWithDemand(); +for (FSAppAttempt app : appsWithDemand) { + Resource appStarvation = app.fairShareStarvation(); + if (Resources.equals(Resources.none(), appStarvation)) { +break; + } else { --- End diff -- It might be clearer if you swapped the _if_ and _else_. > Preempt containers (all on one node) to meet the requirement of starved > applications > > > Key: YARN-5605 > URL: https://issues.apache.org/jira/browse/YARN-5605 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-5605-1.patch, yarn-5605-2.patch > > > Required items: > # Identify starved applications > # Identify a node that has enough containers from applications over their > fairshare. > # Preempt those containers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications
[ https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484679#comment-15484679 ] ASF GitHub Bot commented on YARN-5605: -- Github user templedf commented on a diff in the pull request: https://github.com/apache/hadoop/pull/124#discussion_r78414451 --- Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java --- @@ -535,6 +535,23 @@ public synchronized Resource getResource(SchedulerRequestKey schedulerKey) { } /** + * Method to return the next resource request to be serviced. + * + * In the initial implementation, we just pick any {@link ResourceRequest} + * corresponding to the highest priority. + * + * @return next {@link ResourceRequest} to allocate resources for. + */ + @Unstable + public synchronized ResourceRequest getNextResourceRequest() { +for (ResourceRequest rr: +resourceRequestMap.get(schedulerKeys.first()).values()) { + return rr; --- End diff -- It's stylistic, and there's no guideline about multiple exit points that I know of, so I won't push it. I don't think this form is very future-safe, though. > Preempt containers (all on one node) to meet the requirement of starved > applications > > > Key: YARN-5605 > URL: https://issues.apache.org/jira/browse/YARN-5605 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-5605-1.patch, yarn-5605-2.patch > > > Required items: > # Identify starved applications > # Identify a node that has enough containers from applications over their > fairshare. > # Preempt those containers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications
[ https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484668#comment-15484668 ] ASF GitHub Bot commented on YARN-5605: -- Github user templedf commented on a diff in the pull request: https://github.com/apache/hadoop/pull/124#discussion_r78413869 --- Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java --- @@ -223,17 +225,76 @@ public void setPolicy(SchedulingPolicy policy) } super.policy = policy; } - + @Override - public void recomputeShares() { + public void updateInternal(boolean checkStarvation) { readLock.lock(); try { policy.computeShares(runnableApps, getFairShare()); + if (checkStarvation) { +identifyStarvedApplications(); + } } finally { readLock.unlock(); } } + /** + * Helper method to identify starved applications. This needs to be called + * ONLY from {@link #updateInternal}, after the application shares + * are updated. + * + * A queue can be starving due to fairshare or minshare. + * + * Minshare is defined only on the queue and not the applications. + * Fairshare is defined for both the queue and the applications. + * + * If this queue is starved due to minshare, we need to identify the most + * deserving apps if they themselves are not starved due to fairshare. + * + * If this queue is starving due to fairshare, there must be at least + * one application that is starved. And, even if the queue is not + * starved due to fairshare, there might still be starved applications. + */ + private void identifyStarvedApplications() { +// First identify starved applications and track total amount of +// starvation (in resources) +Resource fairShareStarvation = Resources.clone(none()); +TreeSet appsWithDemand = fetchAppsWithDemand(); +for (FSAppAttempt app : appsWithDemand) { + Resource appStarvation = app.fairShareStarvation(); + if (Resources.equals(Resources.none(), appStarvation)) { +break; + } else { +context.getStarvedApps().addStarvedApp(app); --- End diff -- I think you should do whatever makes the code cleanest and easiest to maintain. I don't think making the context a glorified hash map helps you in any notable way here. > Preempt containers (all on one node) to meet the requirement of starved > applications > > > Key: YARN-5605 > URL: https://issues.apache.org/jira/browse/YARN-5605 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-5605-1.patch, yarn-5605-2.patch > > > Required items: > # Identify starved applications > # Identify a node that has enough containers from applications over their > fairshare. > # Preempt those containers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications
[ https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484660#comment-15484660 ] ASF GitHub Bot commented on YARN-5605: -- Github user templedf commented on a diff in the pull request: https://github.com/apache/hadoop/pull/124#discussion_r78413426 --- Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSPreemptionThread.java --- @@ -0,0 +1,173 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair; + +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; +import org.apache.hadoop.yarn.api.records.ApplicationAttemptId; +import org.apache.hadoop.yarn.api.records.ContainerStatus; +import org.apache.hadoop.yarn.api.records.Resource; +import org.apache.hadoop.yarn.api.records.ResourceRequest; +import org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer; +import org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerEventType; +import org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils; +import org.apache.hadoop.yarn.util.resource.Resources; + +import java.util.ArrayList; +import java.util.Comparator; +import java.util.List; +import java.util.Timer; +import java.util.TimerTask; + +/** + * Thread that handles FairScheduler preemption + */ +public class FSPreemptionThread extends Thread { + private static final Log LOG = LogFactory.getLog(FSPreemptionThread.class); + private final FSContext context; + private final FairScheduler scheduler; + private final long warnTimeBeforeKill; + private final Timer preemptionTimer; + + public FSPreemptionThread(FairScheduler scheduler) { +this.scheduler = scheduler; +this.context = scheduler.getContext(); +FairSchedulerConfiguration fsConf = scheduler.getConf(); +context.setPreemptionEnabled(); +context.setPreemptionUtilizationThreshold( +fsConf.getPreemptionUtilizationThreshold()); +warnTimeBeforeKill = fsConf.getWaitTimeBeforeKill(); +preemptionTimer = new Timer("Preemption Timer", true); + +setDaemon(true); +setName("FSPreemptionThread"); + } + + public void run() { +while (!Thread.interrupted()) { + FSAppAttempt starvedApp; + try{ +starvedApp = context.getStarvedApps().take(); +if (Resources.none().equals(starvedApp.getStarvation())) { + continue; +} + } catch (InterruptedException e) { +LOG.info("Preemption thread interrupted! Exiting."); +return; --- End diff -- I think I was a little confused. How about this: try{ starvedApp = context.getStarvedApps().take(); if (!Resources.none().equals(starvedApp.getStarvation())) { List containers = identifyContainersToPreempt(starvedApp); if (containers != null) { preemptContainers(containers); } } } catch (InterruptedException e) { LOG.info("Preemption thread interrupted! Exiting."); interrupt(); } It does some extra work inside the _try_, but the logic is much simpler. > Preempt containers (all on one node) to meet the requirement of starved > applications > > > Key: YARN-5605 > URL: https://issues.apache.org/jira/browse/YARN-5605 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-5605-1.patch, yarn-5605-2.patch > >
[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications
[ https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484643#comment-15484643 ] ASF GitHub Bot commented on YARN-5605: -- Github user templedf commented on a diff in the pull request: https://github.com/apache/hadoop/pull/124#discussion_r78411782 --- Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java --- @@ -557,28 +599,33 @@ private boolean preemptContainerPreCheck() { getFairShare()); } - /** - * Is a queue being starved for its min share. - */ - @VisibleForTesting - boolean isStarvedForMinShare() { -return isStarved(getMinShare()); + private Resource minShareStarvation() { +Resource desiredShare = Resources.min(policy.getResourceCalculator(), +scheduler.getClusterResource(), getMinShare(), getDemand()); + +Resource starvation = Resources.subtract(desiredShare, getResourceUsage()); +boolean starved = Resources.greaterThan(policy.getResourceCalculator(), +scheduler.getClusterResource(), starvation, none()); + +long now = scheduler.getClock().getTime(); +if (!starved) { + setLastTimeAtMinShare(now); +} + +if (starved && +(now - lastTimeAtMinShare > getMinSharePreemptionTimeout())) { + return starvation; +} else { + return Resources.clone(Resources.none()); --- End diff -- You can make it: if (!starved || (now - lastTimeAtMinShare < getMinSharePreemptionTimeout())) { starvation = Resources.clone(Resources.none()); if (!starved) { setLastTimeAtMinShare(now); } } return starvation; > Preempt containers (all on one node) to meet the requirement of starved > applications > > > Key: YARN-5605 > URL: https://issues.apache.org/jira/browse/YARN-5605 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-5605-1.patch, yarn-5605-2.patch > > > Required items: > # Identify starved applications > # Identify a node that has enough containers from applications over their > fairshare. > # Preempt those containers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications
[ https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-5605: --- Attachment: yarn-5605-2.patch > Preempt containers (all on one node) to meet the requirement of starved > applications > > > Key: YARN-5605 > URL: https://issues.apache.org/jira/browse/YARN-5605 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-5605-1.patch, yarn-5605-2.patch > > > Required items: > # Identify starved applications > # Identify a node that has enough containers from applications over their > fairshare. > # Preempt those containers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484619#comment-15484619 ] Ray Chiang edited comment on YARN-5567 at 9/12/16 4:58 PM: --- [~aw], would you prefer this be a config setting to choose the behavior? Alternatively, would you be okay with standardizing on a specific error code for "detected bad Node" vs "bad script"? was (Author: rchiang): [~aw], would you prefer this be a config setting to choose the behavior? > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 3.0.0-alpha1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2571) RM to support YARN registry
[ https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484627#comment-15484627 ] Hadoop QA commented on YARN-2571: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 14 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 49s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 53s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 47s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 2m 47s {color} | {color:red} hadoop-yarn-project_hadoop-yarn generated 1 new + 34 unchanged - 1 fixed = 35 total (was 35) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 49s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 15 new + 788 unchanged - 168 fixed = 803 total (was 956) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 51s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch 4 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s {color} | {color:green} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-registry generated 0 new + 17 unchanged - 35 fixed = 17 total (was 52) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 8s {color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 48s {color} | {color:green} hadoop-yarn-registry in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s {color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | |
[jira] [Commented] (YARN-4767) Network issues can cause persistent RM UI outage
[ https://issues.apache.org/jira/browse/YARN-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484623#comment-15484623 ] Karthik Kambatla commented on YARN-4767: Ping [~vinodkv] > Network issues can cause persistent RM UI outage > > > Key: YARN-4767 > URL: https://issues.apache.org/jira/browse/YARN-4767 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.7.2 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Attachments: YARN-4767.001.patch, YARN-4767.002.patch, > YARN-4767.003.patch, YARN-4767.004.patch, YARN-4767.005.patch, > YARN-4767.006.patch, YARN-4767.007.patch, YARN-4767.008.patch, > YARN-4767.009.patch, YARN-4767.010.patch > > > If a network issue causes an AM web app to resolve the RM proxy's address to > something other than what's listed in the allowed proxies list, the > AmIpFilter will 302 redirect the RM proxy's request back to the RM proxy. > The RM proxy will then consume all available handler threads connecting to > itself over and over, resulting in an outage of the web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484619#comment-15484619 ] Ray Chiang commented on YARN-5567: -- [~aw], would you prefer this be a config setting to choose the behavior? > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 3.0.0-alpha1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5620) Core changes in NodeManager to support re-initialization of Containers with new launchContext
[ https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484558#comment-15484558 ] Hadoop QA commented on YARN-5620: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 9s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 49s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 22s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 10 new + 515 unchanged - 4 fixed = 525 total (was 519) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager generated 0 new + 240 unchanged - 2 fixed = 240 total (was 242) {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 34s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 29m 53s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.TestDefaultContainerExecutor | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12828059/YARN-5620.012.patch | | JIRA Issue | YARN-5620 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux f4886f9afe87 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9faccd1 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/13087/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/13087/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | unit test logs | https://builds.apache.org/job/PreCommit-YARN-Build/13087/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results |
[jira] [Created] (YARN-5637) Changes in NodeManager to support Container upgrade and rollback/commit
Arun Suresh created YARN-5637: - Summary: Changes in NodeManager to support Container upgrade and rollback/commit Key: YARN-5637 URL: https://issues.apache.org/jira/browse/YARN-5637 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun Suresh Assignee: Arun Suresh YARN-5620 added support for re-initialization of Containers using a new launch Context. This JIRA proposes to use the above feature to support upgrade and subsequent rollback or commit of the upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5620) Core changes in NodeManager to support re-initialization of Containers with new launchContext
[ https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-5620: -- Summary: Core changes in NodeManager to support re-initialization of Containers with new launchContext (was: Core changes in NodeManager to support for re-initialization of Containers) > Core changes in NodeManager to support re-initialization of Containers with > new launchContext > - > > Key: YARN-5620 > URL: https://issues.apache.org/jira/browse/YARN-5620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-5620.001.patch, YARN-5620.002.patch, > YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, > YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch, > YARN-5620.009.patch, YARN-5620.010.patch, YARN-5620.011.patch, > YARN-5620.012.patch > > > JIRA proposes to modify the ContainerManager (and other core classes) to > support upgrade of a running container with a new {{ContainerLaunchContext}} > as well as the ability to rollback the upgrade if the container is not able > to restart using the new launch Context. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5620) Core changes in NodeManager to support for re-initialization of Containers
[ https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-5620: -- Summary: Core changes in NodeManager to support for re-initialization of Containers (was: Core changes in NodeManager to support for upgrade and rollback of Containers) > Core changes in NodeManager to support for re-initialization of Containers > -- > > Key: YARN-5620 > URL: https://issues.apache.org/jira/browse/YARN-5620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-5620.001.patch, YARN-5620.002.patch, > YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, > YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch, > YARN-5620.009.patch, YARN-5620.010.patch, YARN-5620.011.patch, > YARN-5620.012.patch > > > JIRA proposes to modify the ContainerManager (and other core classes) to > support upgrade of a running container with a new {{ContainerLaunchContext}} > as well as the ability to rollback the upgrade if the container is not able > to restart using the new launch Context. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers
[ https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-5620: -- Attachment: YARN-5620.012.patch Done.. Thanks [~jianhe].. > Core changes in NodeManager to support for upgrade and rollback of Containers > - > > Key: YARN-5620 > URL: https://issues.apache.org/jira/browse/YARN-5620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-5620.001.patch, YARN-5620.002.patch, > YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, > YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch, > YARN-5620.009.patch, YARN-5620.010.patch, YARN-5620.011.patch, > YARN-5620.012.patch > > > JIRA proposes to modify the ContainerManager (and other core classes) to > support upgrade of a running container with a new {{ContainerLaunchContext}} > as well as the ability to rollback the upgrade if the container is not able > to restart using the new launch Context. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5630) NM fails to start after downgrade from 2.8 to 2.7
[ https://issues.apache.org/jira/browse/YARN-5630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484465#comment-15484465 ] Jason Lowe commented on YARN-5630: -- Thanks for the +1, Arun! I'll commit this later today if there are no objections. Regarding the key removal, yes if we read the keys out of the store itself then that totally makes sense. Actually I was assuming that during the rollback when the old software saw a key that was ignored it would just remove it as it iterated the database. I suppose we could have the entry flag whether or not the ignored key should be preserved as long as the container is active in case there is a subsequent roll-forward. Then there's no need for a prepare-for-rollback once the support for the key table is in the old software version and all we need to do is delete keys. Of course if there's more to do than just delete keys or fail containers then something needs to be run on the new software before downgrading to the old one. > NM fails to start after downgrade from 2.8 to 2.7 > - > > Key: YARN-5630 > URL: https://issues.apache.org/jira/browse/YARN-5630 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Blocker > Attachments: YARN-5630.001.patch, YARN-5630.002.patch > > > A downgrade from 2.8 to 2.7 causes nodemanagers to fail to start due to an > unrecognized "version" container key on startup. This breaks downgrades from > 2.8 to 2.7. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2571) RM to support YARN registry
[ https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2571: - Attachment: YARN-2571-015.patch YARN-2517 patch 015: remove a test which was obsolete, but which didn't show as it was broken. Fix up more javadocs > RM to support YARN registry > > > Key: YARN-2571 > URL: https://issues.apache.org/jira/browse/YARN-2571 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.3 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-2571-001.patch, YARN-2571-002.patch, > YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, > YARN-2571-008.patch, YARN-2571-009.patch, YARN-2571-010.patch, > YARN-2571-012.patch, YARN-2571-013.patch, YARN-2571-015.patch > > > The RM needs to (optionally) integrate with the YARN registry: > # startup: create the /services and /users paths with system ACLs (yarn, hdfs > principals) > # app-launch: create the user directory /users/$username with the relevant > permissions (CRD) for them to create subnodes. > # attempt, container, app completion: remove service records with the > matching persistence and ID -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5630) NM fails to start after downgrade from 2.8 to 2.7
[ https://issues.apache.org/jira/browse/YARN-5630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484432#comment-15484432 ] Arun Suresh commented on YARN-5630: --- In that case... +1 for the latest patch.. bq. it adds another user-visible phase to the rollback procedure and places the burden on admins, requiring them to either know what keys are valid/appropriate to specify .. Not necessarily. We can check in a rollback file that contains all new keys included in every compatible CURRENT_VERSION_INFO which can be scrubbed safely before rollback. {code} 1.3 /version 1.2 /logDir /workDir 1.1 /queued ... {code} rel 2.8 corresponds to ver 1.3 version info. and rel 2.7 corresponds to 1.0. The _--prepare-for-rollback_ flag just takes a target version_info argument (in this case is 1.0). And now we have a list of keys to scrub without any deep involvement from the admin (Additionally, we can choose to to Key the file with the release version rather than the CURRENT_VERSION_INFO to make it even easier) > NM fails to start after downgrade from 2.8 to 2.7 > - > > Key: YARN-5630 > URL: https://issues.apache.org/jira/browse/YARN-5630 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Blocker > Attachments: YARN-5630.001.patch, YARN-5630.002.patch > > > A downgrade from 2.8 to 2.7 causes nodemanagers to fail to start due to an > unrecognized "version" container key on startup. This breaks downgrades from > 2.8 to 2.7. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484429#comment-15484429 ] Allen Wittenauer commented on YARN-5567: bq. Should we think of some other state which could warn the admin about this(which is captured in webui/Rest)? Probably. The key problem is going to be putting it some place that admins will actually notice it. (Hint: most folks in ops that I know don't actually look at the web UIs...) If folks want to pursue that, they'll need to do it in another JIRA since this one has been in a release. :( > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 3.0.0-alpha1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4855) Should check if node exists when replace nodelabels
[ https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484371#comment-15484371 ] Sunil G commented on YARN-4855: --- HI [~Tao Jie], I think we need not have to modify some apis in {{RmAdminProtocol}}, We could add a new api itself. If there is a clear usecase, i think there is no issue in adding an api. But its better we discuss more in that. Looping [~naganarasimha...@apache.org] and [~leftnoteasy]. > Should check if node exists when replace nodelabels > --- > > Key: YARN-4855 > URL: https://issues.apache.org/jira/browse/YARN-4855 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.6.0 >Reporter: Tao Jie >Assignee: Tao Jie >Priority: Minor > Attachments: YARN-4855.001.patch, YARN-4855.002.patch, > YARN-4855.003.patch, YARN-4855.004.patch, YARN-4855.005.patch, > YARN-4855.006.patch, YARN-4855.007.patch, YARN-4855.008.patch > > > Today when we add nodelabels to nodes, it would succeed even if nodes are not > existing NodeManger in cluster without any message. > It could be like this: > When we use *yarn rmadmin -replaceLabelsOnNode --fail-on-unkown-nodes > "node1=label1"* , it would be denied if node is unknown. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5148) [YARN-3368] Add page to new YARN UI to view server side configurations/logs/JVM-metrics
[ https://issues.apache.org/jira/browse/YARN-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484297#comment-15484297 ] Kai Sasaki commented on YARN-5148: -- {quote} 1. Could we categorize with labels like YARN, MR etc? {quote} Sure. Since each property represents a model of ember, we can categorize property name. {quote} Could you please share a screen shot. Similarly for JMX. {quote} Currently JXM and logs page redirects to RM web UI (http://localhost:8088/jmx and http://localhost:8088/logs). So it looks same to current UI. > [YARN-3368] Add page to new YARN UI to view server side > configurations/logs/JVM-metrics > --- > > Key: YARN-5148 > URL: https://issues.apache.org/jira/browse/YARN-5148 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Kai Sasaki > Attachments: Screen Shot 2016-09-11 at 23.28.31.png, > YARN-5148-YARN-3368.01.patch, YARN-5148-YARN-3368.02.patch, yarn-conf.png, > yarn-tools.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers
[ https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484308#comment-15484308 ] Jian He commented on YARN-5620: --- Arun, thank you very much for the prompt response.. I think you forgot to remove the unused checkAndUpdatePending method, given that we anyway need one more patch.. few nits: - reInitContext no need to be volatile - remove unused imports in ContainerLocalizationRequestEvent latest patch looks good to me. [~vvasudev], want to take a look ? > Core changes in NodeManager to support for upgrade and rollback of Containers > - > > Key: YARN-5620 > URL: https://issues.apache.org/jira/browse/YARN-5620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-5620.001.patch, YARN-5620.002.patch, > YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, > YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch, > YARN-5620.009.patch, YARN-5620.010.patch, YARN-5620.011.patch > > > JIRA proposes to modify the ContainerManager (and other core classes) to > support upgrade of a running container with a new {{ContainerLaunchContext}} > as well as the ability to rollback the upgrade if the container is not able > to restart using the new launch Context. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers
[ https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484303#comment-15484303 ] Hadoop QA commented on YARN-5620: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 12 new + 514 unchanged - 4 fixed = 526 total (was 518) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager generated 0 new + 240 unchanged - 2 fixed = 240 total (was 242) {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 5s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 29m 23s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.TestDefaultContainerExecutor | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12828048/YARN-5620.011.patch | | JIRA Issue | YARN-5620 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux dc2101f05586 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cc01ed70 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/13085/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/13085/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | unit test logs | https://builds.apache.org/job/PreCommit-YARN-Build/13085/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results |
[jira] [Commented] (YARN-5630) NM fails to start after downgrade from 2.8 to 2.7
[ https://issues.apache.org/jira/browse/YARN-5630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484295#comment-15484295 ] Jason Lowe commented on YARN-5630: -- I'm not a fan of the "prepare for rollback" approach if we can avoid it. It adds another user-visible phase to the rollback procedure and places the burden on admins, requiring them to either know what keys are valid/appropriate to specify for the command or that they need to run a special script which embeds this knowledge. Also simply removing the keys from the database is not going to be a proper downgrade procedure. Those keys represent state that is important to preserve on a restart, and if we ignore it then we are dropping a user request for a container. That's not going to be OK in the general case, as that may prevent a container from launching properly or having the proper properties when it is launched. Depending upon the nature of the feature that added the new store keys, we may not be able to support the downgrade at all short of failing the container because we can't execute it as requested. In the short term I think we should commit something similar to this patch to unblock the 2.8 release. IMHO we should be OK if we support downgrades from 2.8 to 2.7 if the user does not leverage the new features in 2.8 (i.e.: container increase/decrease, queuing, etc.). Once those features are used then a downgrade may not work. This mirrors what was done for the epoch number in container IDs between 2.5 and 2.6. Downgrades worked as long as the new work-preserving RM restart wasn't performed after upgrading to 2.6. In general if we are careful only to use new store keys when they are absolutely necessary then we can support rollbacks as long as users don't use the new features added in the new release. After unblocking 2.8 we can then work on the data-driven key ignoring in YARN-5547. That will help cover another set of features where a simple delete of the keys is sufficient to perform the downgrade. That would then leave the features where we can't just ignore keys, and we'll have to come up with some other approach or state to users that downgrades do not necessarily work once that new feature is being used. > NM fails to start after downgrade from 2.8 to 2.7 > - > > Key: YARN-5630 > URL: https://issues.apache.org/jira/browse/YARN-5630 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Blocker > Attachments: YARN-5630.001.patch, YARN-5630.002.patch > > > A downgrade from 2.8 to 2.7 causes nodemanagers to fail to start due to an > unrecognized "version" container key on startup. This breaks downgrades from > 2.8 to 2.7. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2571) RM to support YARN registry
[ https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484271#comment-15484271 ] Hadoop QA commented on YARN-2571: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 13 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 47s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 34s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 1m 59s {color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 59s {color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 49s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 52 new + 695 unchanged - 10 fixed = 747 total (was 705) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 35s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 27s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 12s {color} | {color:red} hadoop-yarn-registry in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 47s {color} | {color:green} hadoop-yarn-registry in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s {color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 29s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 27m 13s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Issue | YARN-2571 | | GITHUB PR | https://github.com/apache/hadoop/pull/66 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml | | uname | Linux 4e62ed6fc545 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cc01ed70 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | mvninstall |
[jira] [Commented] (YARN-4205) Add a service for monitoring application life time out
[ https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484242#comment-15484242 ] Hadoop QA commented on YARN-4205: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 44s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 42s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 17s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 42s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 25 new + 503 unchanged - 3 fixed = 528 total (was 506) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 7 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 9s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 18s {color} | {color:red} hadoop-yarn-api in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s {color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 36s {color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 45m 34s {color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 77m 17s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api | | | Result of integer multiplication cast to long in org.apache.hadoop.yarn.api.records.ApplicationTimeouts.hashCode() At ApplicationTimeouts.java:to long in org.apache.hadoop.yarn.api.records.ApplicationTimeouts.hashCode() At ApplicationTimeouts.java:[line 77] | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12828034/0003-YARN-4205.patch | | JIRA Issue |
[jira] [Updated] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers
[ https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-5620: -- Attachment: YARN-5620.011.patch Thanks [~jianhe].. Uploading patch (v011) with the changes. I left the CLEANUP_CONTAINER_FOR_REINIT there, even though it does the same thing as CLEANUP_CONTAINER. It is sent by a different source, it can be used for debugging etc. > Core changes in NodeManager to support for upgrade and rollback of Containers > - > > Key: YARN-5620 > URL: https://issues.apache.org/jira/browse/YARN-5620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-5620.001.patch, YARN-5620.002.patch, > YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, > YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch, > YARN-5620.009.patch, YARN-5620.010.patch, YARN-5620.011.patch > > > JIRA proposes to modify the ContainerManager (and other core classes) to > support upgrade of a running container with a new {{ContainerLaunchContext}} > as well as the ability to rollback the upgrade if the container is not able > to restart using the new launch Context. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2571) RM to support YARN registry
[ https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2571: - Attachment: YARN-2571-013.patch Patch 013 * fix compile problem against new yarn test * do as much as possible to shut up checkstyle and javadoc > RM to support YARN registry > > > Key: YARN-2571 > URL: https://issues.apache.org/jira/browse/YARN-2571 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-2571-001.patch, YARN-2571-002.patch, > YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, > YARN-2571-008.patch, YARN-2571-009.patch, YARN-2571-010.patch, > YARN-2571-012.patch, YARN-2571-013.patch > > > The RM needs to (optionally) integrate with the YARN registry: > # startup: create the /services and /users paths with system ACLs (yarn, hdfs > principals) > # app-launch: create the user directory /users/$username with the relevant > permissions (CRD) for them to create subnodes. > # attempt, container, app completion: remove service records with the > matching persistence and ID -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2571) RM to support YARN registry
[ https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484032#comment-15484032 ] Hadoop QA commented on YARN-2571: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 13 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 48s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 56s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 28s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 1m 39s {color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 39s {color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 46s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 52 new + 696 unchanged - 10 fixed = 748 total (was 706) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 29s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 23s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 11s {color} | {color:red} hadoop-yarn-registry in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 49s {color} | {color:green} hadoop-yarn-registry in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s {color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 28s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 26m 17s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Issue | YARN-2571 | | GITHUB PR | https://github.com/apache/hadoop/pull/66 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml | | uname | Linux b4ae3cd62fff 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cc01ed70 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | mvninstall |
[jira] [Updated] (YARN-4205) Add a service for monitoring application life time out
[ https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-4205: Attachment: 0003-YARN-4205.patch Update the patch fixing review comments. This patch has following changes from previous. # Added ApplicationTimeouts class that contains lifetime values. This class can be used in future to support for any other timeouts such as queu_timeout or statestore_timeout. > Add a service for monitoring application life time out > -- > > Key: YARN-4205 > URL: https://issues.apache.org/jira/browse/YARN-4205 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: nijel >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4205.patch, 0002-YARN-4205.patch, > 0003-YARN-4205.patch, YARN-4205_01.patch, YARN-4205_02.patch, > YARN-4205_03.patch > > > This JIRA intend to provide a lifetime monitor service. > The service will monitor the applications where the life time is configured. > If the application is running beyond the lifetime, it will be killed. > The lifetime will be considered from the submit time. > The thread monitoring interval is configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-679) add an entry point that can start any Yarn service
[ https://issues.apache.org/jira/browse/YARN-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483977#comment-15483977 ] Steve Loughran commented on YARN-679: - Dan, I've broken things down all that happens iis parts of the patch get ignored. Well, ignored more than this. What we have here is self-contained, and, being derived with what we've been using in Slider, not far off what's been used elsewhere. So afraid not. but: thank you very, very much for the reviews, I'm going take a break from s3 coding to address them -steve > add an entry point that can start any Yarn service > -- > > Key: YARN-679 > URL: https://issues.apache.org/jira/browse/YARN-679 > Project: Hadoop YARN > Issue Type: New Feature > Components: api >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-679-001.patch, YARN-679-002.patch, > YARN-679-002.patch, YARN-679-003.patch, YARN-679-004.patch, > YARN-679-005.patch, YARN-679-006.patch, YARN-679-007.patch, > YARN-679-008.patch, YARN-679-009.patch, YARN-679-010.patch, > YARN-679-011.patch, org.apache.hadoop.servic...mon 3.0.0-SNAPSHOT API).pdf > > Time Spent: 72h > Remaining Estimate: 0h > > There's no need to write separate .main classes for every Yarn service, given > that the startup mechanism should be identical: create, init, start, wait for > stopped -with an interrupt handler to trigger a clean shutdown on a control-c > interrupt. > Provide one that takes any classname, and a list of config files/options -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2571) RM to support YARN registry
[ https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2571: - Attachment: YARN-2571-012.patch Patch 012: rebase to trunk This patch is coming up to second birthday. Can someone pleae look at at it. Thanks > RM to support YARN registry > > > Key: YARN-2571 > URL: https://issues.apache.org/jira/browse/YARN-2571 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-2571-001.patch, YARN-2571-002.patch, > YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, > YARN-2571-008.patch, YARN-2571-009.patch, YARN-2571-010.patch, > YARN-2571-012.patch > > > The RM needs to (optionally) integrate with the YARN registry: > # startup: create the /services and /users paths with system ACLs (yarn, hdfs > principals) > # app-launch: create the user directory /users/$username with the relevant > permissions (CRD) for them to create subnodes. > # attempt, container, app completion: remove service records with the > matching persistence and ID -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2571) RM to support YARN registry
[ https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2571: - Labels: (was: BB2015-05-TBR) > RM to support YARN registry > > > Key: YARN-2571 > URL: https://issues.apache.org/jira/browse/YARN-2571 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-2571-001.patch, YARN-2571-002.patch, > YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, > YARN-2571-008.patch, YARN-2571-009.patch, YARN-2571-010.patch > > > The RM needs to (optionally) integrate with the YARN registry: > # startup: create the /services and /users paths with system ACLs (yarn, hdfs > principals) > # app-launch: create the user directory /users/$username with the relevant > permissions (CRD) for them to create subnodes. > # attempt, container, app completion: remove service records with the > matching persistence and ID -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5587) Add support for resource profiles
[ https://issues.apache.org/jira/browse/YARN-5587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483623#comment-15483623 ] Hadoop QA commented on YARN-5587: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 10 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 4m 5s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 18s {color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 10s {color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 42s {color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 1s {color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 28s {color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 37s {color} | {color:green} YARN-3926 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s {color} | {color:green} YARN-3926 passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 12s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 7m 12s {color} | {color:red} root generated 3 new + 713 unchanged - 0 fixed = 716 total (was 713) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 39s {color} | {color:red} root: The patch generated 65 new + 1029 unchanged - 2 fixed = 1094 total (was 1031) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 7s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 8s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 18s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api generated 4 new + 156 unchanged - 0 fixed = 160 total (was 156) {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 25s {color} | {color:red} hadoop-yarn-api in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 20s {color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 4s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 39m 55s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 5s {color} | {color:red} hadoop-yarn-client in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 44m 37s {color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 26s {color} | {color:red} The patch generated 5 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} |
[jira] [Commented] (YARN-3359) Recover collector list in RM failed over
[ https://issues.apache.org/jira/browse/YARN-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483620#comment-15483620 ] Junping Du commented on YARN-3359: -- Hi [~gtCarrera9], please feel free to take this over as I don't have short term plan on this. However, as you may know, I am currently working on AM address discovery (for AM restart case) that may consolidate with our previous collector address discovery work and will publish a design soon (on YARN-4758). I would suggest to wait on that effort clear to continue our effort if this is not our top item so far. What do you think? > Recover collector list in RM failed over > > > Key: YARN-3359 > URL: https://issues.apache.org/jira/browse/YARN-3359 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Junping Du >Assignee: Junping Du > Labels: YARN-5355 > > Per discussion in YARN-3039, split the recover work from RMStateStore in a > separated JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers
[ https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483565#comment-15483565 ] Jian He edited comment on YARN-5620 at 9/12/16 9:18 AM: bq. It is also possible that the an admin logs into the NM and does a 'kill -9' which will also cause the ContainerLaunch to send CONTAINER_KILLED_ON_REQUEST but it wont be in KILLING state.. right ? I guess in this case, it’s also fine to do the upgrade… because the upgrade API does accept it, it’s hard to distinguish which one should go first.. It's also likely the reverse can also happen because it's transient, if the setKillForReInitialization is called first, then the container process is killed, It will be considered as re-init, even though it is killed by external signal. so keep it consistent ? bq. Actually if you look at the prepareContainerUpgrade() function, ah, yes, mislooked . thank you ! bq. The problem with getAllResourcesByVisibility, is it gets all resources. I just need the pending resources In this case, the pendingResources in the same as the getAllResourcesByVisibility, right? basically, I meant like below.. and the newly added methods could be not needed. {code} MappendingResources = ((ContainerReInitEvent) event).getResourceSet() .getAllResourcesByVisibility(); if (!pendingResources.isEmpty()) { container.dispatcher.getEventHandler().handle( new ContainerLocalizationRequestEvent(container, pendingResources)); } else { {code} - Forgot to say, similarly, is the change in ResourceLocalizedWhileRunningTransition required. as the symlinks are also already distinct. was (Author: jianhe): bq. It is also possible that the an admin logs into the NM and does a 'kill -9' which will also cause the ContainerLaunch to send CONTAINER_KILLED_ON_REQUEST but it wont be in KILLING state.. right ? I guess in this case, it’s also fine to do the upgrade… because the upgrade API does accept it, it’s hard to distinguish which one should go first.. It's also likely the reverse can also happen because it's transient, if the setKillForReInitialization is called first, then the container process is killed, It will be considered as re-init, even though it is killed by external signal. so keep it consistent ? bq. Actually if you look at the prepareContainerUpgrade() function, ah, yes, mislooked . thank you ! bq. The problem with getAllResourcesByVisibility, is it gets all resources. I just need the pending resources In this case, the pendingResources in the same as the getAllResourcesByVisibility, right? basically, I meant like below.. and the newly added methods could be not needed. {code} Map pendingResources = ((ContainerReInitEvent) event).getResourceSet() .getAllResourcesByVisibility(); if (!pendingResources.isEmpty()) { container.dispatcher.getEventHandler().handle( new ContainerLocalizationRequestEvent(container, pendingResources)); } else { {code} > Core changes in NodeManager to support for upgrade and rollback of Containers > - > > Key: YARN-5620 > URL: https://issues.apache.org/jira/browse/YARN-5620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-5620.001.patch, YARN-5620.002.patch, > YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, > YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch, > YARN-5620.009.patch, YARN-5620.010.patch > > > JIRA proposes to modify the ContainerManager (and other core classes) to > support upgrade of a running container with a new {{ContainerLaunchContext}} > as well as the ability to rollback the upgrade if the container is not able > to restart using the new launch Context. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3115) [Collector wireup] Work-preserving restarting of per-node timeline collector
[ https://issues.apache.org/jira/browse/YARN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483580#comment-15483580 ] Junping Du commented on YARN-3115: -- Hi [~varun_saxena], please feel free to take this up if you got plan to do it in the short term. I can help to review it. > [Collector wireup] Work-preserving restarting of per-node timeline collector > > > Key: YARN-3115 > URL: https://issues.apache.org/jira/browse/YARN-3115 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Junping Du > Labels: YARN-5355 > > YARN-3030 makes the per-node aggregator work as the aux service of a NM. It > contains the states of the per-app aggregators corresponding to the running > AM containers on this NM. While NM is restarted in work-preserving mode, this > information of per-node aggregator needs to be carried on over restarting too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5296) NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl
[ https://issues.apache.org/jira/browse/YARN-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483569#comment-15483569 ] Junping Du edited comment on YARN-5296 at 9/12/16 9:04 AM: --- [~leftnoteasy], this is actually no need for branch-2.7 - as I discussed this with Jason on HADOOP-13362, this is just a misunderstand caused by different container remove places between branch-2.7 and branch-2. Just forget about my comment above. Also, I noticed you reopen YARN-5190 for branch-2.7 which seems duplicated with HADOOP-13362. Can you double check and close it? Thx! was (Author: djp): [~leftnoteasy], this is actually no need for branch-2.7 - as I discussed this with Jason on HADOOP-13362, this is just a misunderstand caused by different container remove places between branch-2.7 and branch-2. Just forget about the comment. Also, I noticed you reopen YARN-5190 for branch-2.7 which seems duplicated with HADOOP-13362. Can you double check and close it? Thx! > NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl > --- > > Key: YARN-5296 > URL: https://issues.apache.org/jira/browse/YARN-5296 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Karam Singh >Assignee: Junping Du > Fix For: 2.9.0, 3.0.0-alpha1 > > Attachments: YARN-5296-v2.1.patch, YARN-5296-v2.patch, > YARN-5296.patch, after v2 fix.png, before v2 fix.png > > > Ran tests in following manner, > 1. Run GridMix of 768 sequestionally around 17 times to execute about 12.9K > apps. > 2. After 4-5hrs take Check NM Heap using Memory Analyser. It report around > 96% Heap is being used my ContainerMetrics > 3. Run 7 more GridMix run for have around 18.2apps ran in total. Again check > NM heap using Memory Analyser again 96% heap is being used by > ContainerMetrics. > 4. Start one more grimdmix run, while run going on , NMs started going down > with OOM, around running 18.7K+, On analysing NM heap using Memory analyser, > OOM was caused by ContainerMetrics -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5296) NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl
[ https://issues.apache.org/jira/browse/YARN-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483569#comment-15483569 ] Junping Du commented on YARN-5296: -- [~leftnoteasy], this is actually no need for branch-2.7 - as I discussed this with Jason on HADOOP-13362, this is just a misunderstand caused by different container remove places between branch-2.7 and branch-2. Just forget about the comment. Also, I noticed you reopen YARN-5190 for branch-2.7 which seems duplicated with HADOOP-13362. Can you double check and close it? Thx! > NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl > --- > > Key: YARN-5296 > URL: https://issues.apache.org/jira/browse/YARN-5296 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Karam Singh >Assignee: Junping Du > Fix For: 2.9.0, 3.0.0-alpha1 > > Attachments: YARN-5296-v2.1.patch, YARN-5296-v2.patch, > YARN-5296.patch, after v2 fix.png, before v2 fix.png > > > Ran tests in following manner, > 1. Run GridMix of 768 sequestionally around 17 times to execute about 12.9K > apps. > 2. After 4-5hrs take Check NM Heap using Memory Analyser. It report around > 96% Heap is being used my ContainerMetrics > 3. Run 7 more GridMix run for have around 18.2apps ran in total. Again check > NM heap using Memory Analyser again 96% heap is being used by > ContainerMetrics. > 4. Start one more grimdmix run, while run going on , NMs started going down > with OOM, around running 18.7K+, On analysing NM heap using Memory analyser, > OOM was caused by ContainerMetrics -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers
[ https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483565#comment-15483565 ] Jian He commented on YARN-5620: --- bq. It is also possible that the an admin logs into the NM and does a 'kill -9' which will also cause the ContainerLaunch to send CONTAINER_KILLED_ON_REQUEST but it wont be in KILLING state.. right ? I guess in this case, it’s also fine to do the upgrade… because the upgrade API does accept it, it’s hard to distinguish which one should go first.. It's also likely the reverse can also happen because it's transient, if the setKillForReInitialization is called first, then the container process is killed, It will be considered as re-init, even though it is killed by external signal. so keep it consistent ? bq. Actually if you look at the prepareContainerUpgrade() function, ah, yes, mislooked . thank you ! bq. The problem with getAllResourcesByVisibility, is it gets all resources. I just need the pending resources In this case, the pendingResources in the same as the getAllResourcesByVisibility, right? basically, I meant like below.. and the newly added methods could be not needed. {code} MappendingResources = ((ContainerReInitEvent) event).getResourceSet() .getAllResourcesByVisibility(); if (!pendingResources.isEmpty()) { container.dispatcher.getEventHandler().handle( new ContainerLocalizationRequestEvent(container, pendingResources)); } else { {code} > Core changes in NodeManager to support for upgrade and rollback of Containers > - > > Key: YARN-5620 > URL: https://issues.apache.org/jira/browse/YARN-5620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-5620.001.patch, YARN-5620.002.patch, > YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, > YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch, > YARN-5620.009.patch, YARN-5620.010.patch > > > JIRA proposes to modify the ContainerManager (and other core classes) to > support upgrade of a running container with a new {{ContainerLaunchContext}} > as well as the ability to rollback the upgrade if the container is not able > to restart using the new launch Context. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5190) Registering/unregistering container metrics triggered by ContainerEvent and ContainersMonitorEvent are conflict which cause uncaught exception in ContainerMonitorImpl
[ https://issues.apache.org/jira/browse/YARN-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483553#comment-15483553 ] Junping Du commented on YARN-5190: -- Hi [~leftnoteasy], HADOOP-13362 is proposed to fix this issue for branch-2.7 and already get checked in. Anything more to fix here? > Registering/unregistering container metrics triggered by ContainerEvent and > ContainersMonitorEvent are conflict which cause uncaught exception in > ContainerMonitorImpl > -- > > Key: YARN-5190 > URL: https://issues.apache.org/jira/browse/YARN-5190 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du >Priority: Blocker > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: YARN-5190-branch-2.7.001.patch, YARN-5190-v2.patch, > YARN-5190.patch > > > The exception stack is as following: > {noformat} > 310735 2016-05-22 01:50:04,554 [Container Monitor] ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[Container > Monitor,5,main] threw an Exception. > 310736 org.apache.hadoop.metrics2.MetricsException: Metrics source > ContainerResource_container_1463840817638_14484_01_10 already exists! > 310737 at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135) > 310738 at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112) > 310739 at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229) > 310740 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.forContainer(ContainerMetrics.java:212) > 310741 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.forContainer(ContainerMetrics.java:198) > 310742 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:385) > {noformat} > After YARN-4906, we have multiple places to get ContainerMetrics for a > particular container that could cause race condition in registering the same > container metrics to DefaultMetricsSystem by different threads. Lacking of > proper handling of MetricsException which could get thrown, the exception > will could bring down daemon of ContainerMonitorImpl or even whole NM. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers
[ https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483521#comment-15483521 ] Hadoop QA commented on YARN-5620: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 20s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 22 new + 575 unchanged - 4 fixed = 597 total (was 579) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager generated 0 new + 240 unchanged - 2 fixed = 240 total (was 242) {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 7s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 27m 30s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.TestDefaultContainerExecutor | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12827991/YARN-5620.010.patch | | JIRA Issue | YARN-5620 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux d596c40c7af7 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cc01ed70 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/13081/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/13081/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | unit test logs | https://builds.apache.org/job/PreCommit-YARN-Build/13081/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results |
[jira] [Comment Edited] (YARN-5610) Initial code for native services REST API
[ https://issues.apache.org/jira/browse/YARN-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483469#comment-15483469 ] Jian He edited comment on YARN-5610 at 9/12/16 8:26 AM: Thanks, Gour. bq. So according to an app-owner, it is STARTED but not RUNNING yet. I would prefer rename as such pair STARTED -> READAY, or RUNNING -> READY bq. The swagger definition defines this. Do you mean swagger has such a date type in string format ? which one is this? I couldn't find it in swagger documentation. bq. why the changes needed in hadoop-project/pom.xml I meant what is this change used for ? {code} http://git-wip-us.apache.org/repos/asf/hadoop.git scm:git:http://git-wip-us.apache.org/repos/asf/hadoop.git scm:git:http://git-wip-us.apache.org/repos/asf/hadoop.git {code} bq. appOptions and uniqueGlobalPropertyCache are required as appOptions is the way application configuration properties are injected into Slider client. I see. Then the {{if (uniqueGlobalPropertyCache == null)}} condition is not needed, because uniqueGlobalPropertyCache is initialized as not null. {code} private void addOptionsIfNotPresent(List options, Set uniqueGlobalPropertyCache, String key, String value) { if (uniqueGlobalPropertyCache == null) { options.addAll(Arrays.asList(key, value)); } {code} bq. In case, of complex and nested applications some components will be by themselves full blown and independent applications itself. The APPLICATION type artifact refers to such external application definitions Then, what are the other parameters in Component used for in this case ? like number_of_containers, launch_command, resource etc. was (Author: jianhe): Thanks, Gour. bq. So according to an app-owner, it is STARTED but not RUNNING yet. I would prefer rename as such pair STARTED -> READAY, or RUNNING -> READY bq. The swagger definition defines this. Do you mean swagger has such a date type in string format ? which one is this? I couldn't find it in swagger documentation. bq. why the changes needed in hadoop-project/pom.xml I meant what is this change used for ? {code} http://git-wip-us.apache.org/repos/asf/hadoop.git scm:git:http://git-wip-us.apache.org/repos/asf/hadoop.git scm:git:http://git-wip-us.apache.org/repos/asf/hadoop.git {code} bq. appOptions and uniqueGlobalPropertyCache are required as appOptions is the way application configuration properties are injected into Slider client. I see. Then the {{if (uniqueGlobalPropertyCache == null) }} condition is not needed, because uniqueGlobalPropertyCache is initialized as not null. {code} private void addOptionsIfNotPresent(List options, Set uniqueGlobalPropertyCache, String key, String value) { if (uniqueGlobalPropertyCache == null) { options.addAll(Arrays.asList(key, value)); } {code} bq. In case, of complex and nested applications some components will be by themselves full blown and independent applications itself. The APPLICATION type artifact refers to such external application definitions Then, what are the other parameters in Component used for in this case ? like number_of_containers, launch_command, resource etc. > Initial code for native services REST API > - > > Key: YARN-5610 > URL: https://issues.apache.org/jira/browse/YARN-5610 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gour Saha >Assignee: Gour Saha > Attachments: YARN-4793-yarn-native-services.001.patch, > YARN-5610-yarn-native-services.002.patch > > > This task will be used to submit and review patches for the initial code drop > for the native services REST API -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5610) Initial code for native services REST API
[ https://issues.apache.org/jira/browse/YARN-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483469#comment-15483469 ] Jian He commented on YARN-5610: --- Thanks, Gour. bq. So according to an app-owner, it is STARTED but not RUNNING yet. I would prefer rename as such pair STARTED -> READAY, or RUNNING -> READY bq. The swagger definition defines this. Do you mean swagger has such a date type in string format ? which one is this? I couldn't find it in swagger documentation. bq. why the changes needed in hadoop-project/pom.xml I meant what is this change used for ? {code} http://git-wip-us.apache.org/repos/asf/hadoop.git scm:git:http://git-wip-us.apache.org/repos/asf/hadoop.git scm:git:http://git-wip-us.apache.org/repos/asf/hadoop.git {code} bq. appOptions and uniqueGlobalPropertyCache are required as appOptions is the way application configuration properties are injected into Slider client. I see. Then the {{if (uniqueGlobalPropertyCache == null) }} condition is not needed, because uniqueGlobalPropertyCache is initialized as not null. {code} private void addOptionsIfNotPresent(List options, Set uniqueGlobalPropertyCache, String key, String value) { if (uniqueGlobalPropertyCache == null) { options.addAll(Arrays.asList(key, value)); } {code} bq. In case, of complex and nested applications some components will be by themselves full blown and independent applications itself. The APPLICATION type artifact refers to such external application definitions Then, what are the other parameters in Component used for in this case ? like number_of_containers, launch_command, resource etc. > Initial code for native services REST API > - > > Key: YARN-5610 > URL: https://issues.apache.org/jira/browse/YARN-5610 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gour Saha >Assignee: Gour Saha > Attachments: YARN-4793-yarn-native-services.001.patch, > YARN-5610-yarn-native-services.002.patch > > > This task will be used to submit and review patches for the initial code drop > for the native services REST API -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers
[ https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-5620: -- Attachment: YARN-5620.010.patch Fixing failed tests (The _TestDefaultContainerExecutor_ error seems to be unrelated) and some more checkstyles. > Core changes in NodeManager to support for upgrade and rollback of Containers > - > > Key: YARN-5620 > URL: https://issues.apache.org/jira/browse/YARN-5620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-5620.001.patch, YARN-5620.002.patch, > YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, > YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch, > YARN-5620.009.patch, YARN-5620.010.patch > > > JIRA proposes to modify the ContainerManager (and other core classes) to > support upgrade of a running container with a new {{ContainerLaunchContext}} > as well as the ability to rollback the upgrade if the container is not able > to restart using the new launch Context. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers
[ https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483437#comment-15483437 ] Hadoop QA commented on YARN-5620: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 23s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 25 new + 604 unchanged - 4 fixed = 629 total (was 608) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager generated 0 new + 240 unchanged - 2 fixed = 240 total (was 242) {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 32s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 30m 35s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.TestContainerManager | | | hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRegression | | | hadoop.yarn.server.nodemanager.TestDefaultContainerExecutor | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12827978/YARN-5620.009.patch | | JIRA Issue | YARN-5620 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 761d12e98010 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cc01ed70 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/13080/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/13080/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | unit test logs |
[jira] [Commented] (YARN-5631) Missing refreshClusterMaxPriority usage in rmadmin help message
[ https://issues.apache.org/jira/browse/YARN-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483408#comment-15483408 ] Rohith Sharma K S commented on YARN-5631: - +1 LGTM, will commit it shortly > Missing refreshClusterMaxPriority usage in rmadmin help message > --- > > Key: YARN-5631 > URL: https://issues.apache.org/jira/browse/YARN-5631 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Kai Sasaki >Assignee: Kai Sasaki >Priority: Minor > Attachments: YARN-5631.01.patch, YARN-5631.02.patch > > > {{rmadmin -help}} does not show {{-refreshClusterMaxPriority}} option in > usage line. > {code} > $ bin/yarn rmadmin -help > rmadmin is the command to execute YARN administrative commands. > The full syntax is: > yarn rmadmin [-refreshQueues] [-refreshNodes [-g|graceful [timeout in > seconds] -client|server]] [-refreshNodesResources] > [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] > [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] > [-addToClusterNodeLabels > <"label1(exclusive=true),label2(exclusive=false),label3">] > [-removeFromClusterNodeLabels] [-replaceLabelsOnNode > <"node1[:port]=label1,label2 node2[:port]=label1">] > [-directlyAccessNodeLabelStore] [-updateNodeResource [NodeID] [MemSize] > [vCores] ([OvercommitTimeout]) [-help [cmd]] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3692) Allow REST API to set a user generated message when killing an application
[ https://issues.apache.org/jira/browse/YARN-3692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483382#comment-15483382 ] Hadoop QA commented on YARN-3692: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 46s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 54s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 42s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 36s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 59s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 52s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 27s {color} | {color:red} root: The patch generated 3 new + 190 unchanged - 1 fixed = 193 total (was 191) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 24s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 14s {color} | {color:red} hadoop-yarn-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s {color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 18s {color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 37m 31s {color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 16m 26s {color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 118m 12s {color} | {color:green} hadoop-mapreduce-client-jobclient in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 221m 33s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12827958/0004-YARN-3692.patch | | JIRA Issue | YARN-3692 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc | | uname | Linux 41200cc542b3 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Comment Edited] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers
[ https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483362#comment-15483362 ] Arun Suresh edited comment on YARN-5620 at 9/12/16 7:35 AM: Updating patch. * Addressing [~jianhe]'s latest comments * some javadoc, checkstyle and javac fixes bq. IIUC, in this case, the ContainerImpl will receive the KILL event first and move to the KILLING state, and the CONTAINER_KILLED_ON_REQUEST will be sent to the container at KILLING state.. It goes to KILLING stage only if the AM explicitly sends a kill signal or the RM asks NM to kill. It is also possible that the an admin logs into the NM and does a 'kill -9' which will also cause the ContainerLaunch to send CONTAINER_KILLED_ON_REQUEST but it wont be in KILLING state.. right ? bq. ..In testContainerUpgradeSuccess, could you make newStartFile a new upgrade resource, and verify the output is written into it, this verifies the part about the localization part as well. Actually if you look at the _prepareContainerUpgrade()_ function, we create a new script file *scriptFile_new* which is passed into the _prepareContainerLaunchContext()_ function which associates the new file to a new *dest_file_new* location.. this should verify that the upgrade needed a new localized resource. The output of the script is also written to a new *start_file_n.txt* which we read and verify to check if the new process has actually started. Also by the way: bq. We can use the ResourceSet#getAllResourcesByVisibility method instead, and so the getLocalPendingRequests method and the new constructor in ContainerLocalizationRequestEvent is not needed The problem with getAllResourcesByVisibility, is it gets all resources. I just need the pending resources... So if you are ok with it, Id like to keep it as is.. was (Author: asuresh): Updating patch. * Addressing [~jianhe]'s latest comments * some javadoc, checkstyle and javac fixes bq. IIUC, in this case, the ContainerImpl will receive the KILL event first and move to the KILLING state, and the CONTAINER_KILLED_ON_REQUEST will be sent to the container at KILLING state.. It goes to KILLING stage only if the AM explicitly sends a kill signal or the RM asks NM to kill. It is also possible that the an admin logs into the NM and does a 'kill -9' which will also cause the ContainerLaunch to send CONTAINER_KILLED_ON_REQUEST but it wont be in KILLING state.. right ? bq. ..In testContainerUpgradeSuccess, could you make newStartFile a new upgrade resource, and verify the output is written into it, this verifies the part about the localization part as well. Actually if you look at the _prepareContainerUpgrade()_ function, we create a new script file *scriptFile_new* while passed into the _prepareContainerLaunchContext()_ function which associates the new file to a new *dest_file_new* location.. this should verify that the upgrade needed a new localized resource. The output of the script is also written to a new *start_file_n.txt* which we read and verify to check if the new process has actually started. Also by the way: bq. We can use the ResourceSet#getAllResourcesByVisibility method instead, and so the getLocalPendingRequests method and the new constructor in ContainerLocalizationRequestEvent is not needed The problem with getAllResourcesByVisibility, is it gets all resources. I just need the pending resources... So if you are ok with it, Id like to keep it as is.. > Core changes in NodeManager to support for upgrade and rollback of Containers > - > > Key: YARN-5620 > URL: https://issues.apache.org/jira/browse/YARN-5620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-5620.001.patch, YARN-5620.002.patch, > YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, > YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch, > YARN-5620.009.patch > > > JIRA proposes to modify the ContainerManager (and other core classes) to > support upgrade of a running container with a new {{ContainerLaunchContext}} > as well as the ability to rollback the upgrade if the container is not able > to restart using the new launch Context. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers
[ https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-5620: -- Attachment: YARN-5620.009.patch Updating patch. * Addressing [~jianhe]'s latest comments * some javadoc, checkstyle and javac fixes bq. IIUC, in this case, the ContainerImpl will receive the KILL event first and move to the KILLING state, and the CONTAINER_KILLED_ON_REQUEST will be sent to the container at KILLING state.. It goes to KILLING stage only if the AM explicitly sends a kill signal or the RM asks NM to kill. It is also possible that the an admin logs into the NM and does a 'kill -9' which will also cause the ContainerLaunch to send CONTAINER_KILLED_ON_REQUEST but it wont be in KILLING state.. right ? bq. ..In testContainerUpgradeSuccess, could you make newStartFile a new upgrade resource, and verify the output is written into it, this verifies the part about the localization part as well. Actually if you look at the _prepareContainerUpgrade()_ function, we create a new script file *scriptFile_new* while passed into the _prepareContainerLaunchContext()_ function which associates the new file to a new *dest_file_new* location.. this should verify that the upgrade needed a new localized resource. The output of the script is also written to a new *start_file_n.txt* which we read and verify to check if the new process has actually started. Also by the way: bq. We can use the ResourceSet#getAllResourcesByVisibility method instead, and so the getLocalPendingRequests method and the new constructor in ContainerLocalizationRequestEvent is not needed The problem with getAllResourcesByVisibility, is it gets all resources. I just need the pending resources... So if you are ok with it, Id like to keep it as is.. > Core changes in NodeManager to support for upgrade and rollback of Containers > - > > Key: YARN-5620 > URL: https://issues.apache.org/jira/browse/YARN-5620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-5620.001.patch, YARN-5620.002.patch, > YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, > YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch, > YARN-5620.009.patch > > > JIRA proposes to modify the ContainerManager (and other core classes) to > support upgrade of a running container with a new {{ContainerLaunchContext}} > as well as the ability to rollback the upgrade if the container is not able > to restart using the new launch Context. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4855) Should check if node exists when replace nodelabels
[ https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483364#comment-15483364 ] Tao Jie commented on YARN-4855: --- Thank you for comments, [~sunilg]. Doing nodes check on client side is to avoid modifying RmAdminProtocol, and we don't expect we do this operation frequently. I looked {{NodeId#compareTo}} more closely, it seems that it will compare *both* host and port in {{NodeId#compareTo}} while in {{isNodeSame}} it will check host and port if port is set, otherwise only host will be checked. > Should check if node exists when replace nodelabels > --- > > Key: YARN-4855 > URL: https://issues.apache.org/jira/browse/YARN-4855 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.6.0 >Reporter: Tao Jie >Assignee: Tao Jie >Priority: Minor > Attachments: YARN-4855.001.patch, YARN-4855.002.patch, > YARN-4855.003.patch, YARN-4855.004.patch, YARN-4855.005.patch, > YARN-4855.006.patch, YARN-4855.007.patch, YARN-4855.008.patch > > > Today when we add nodelabels to nodes, it would succeed even if nodes are not > existing NodeManger in cluster without any message. > It could be like this: > When we use *yarn rmadmin -replaceLabelsOnNode --fail-on-unkown-nodes > "node1=label1"* , it would be denied if node is unknown. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers
[ https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483313#comment-15483313 ] Jian He commented on YARN-5620: --- One more thing about the test.. In testContainerUpgradeSuccess, could you make newStartFile a new upgrade resource, and verify the output is written into it, this verifies the part about the localization part as well. > Core changes in NodeManager to support for upgrade and rollback of Containers > - > > Key: YARN-5620 > URL: https://issues.apache.org/jira/browse/YARN-5620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-5620.001.patch, YARN-5620.002.patch, > YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, > YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch > > > JIRA proposes to modify the ContainerManager (and other core classes) to > support upgrade of a running container with a new {{ContainerLaunchContext}} > as well as the ability to rollback the upgrade if the container is not able > to restart using the new launch Context. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers
[ https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483274#comment-15483274 ] Jian He commented on YARN-5620: --- bq. the container should be killable explicitly via an external signal. IIUC, in this case, the ContainerImpl will receive the KILL event first and move to the KILLING state, and the CONTAINER_KILLED_ON_REQUEST will be sent to the container at KILLING state. > Core changes in NodeManager to support for upgrade and rollback of Containers > - > > Key: YARN-5620 > URL: https://issues.apache.org/jira/browse/YARN-5620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-5620.001.patch, YARN-5620.002.patch, > YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, > YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch > > > JIRA proposes to modify the ContainerManager (and other core classes) to > support upgrade of a running container with a new {{ContainerLaunchContext}} > as well as the ability to rollback the upgrade if the container is not able > to restart using the new launch Context. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5631) Missing refreshClusterMaxPriority usage in rmadmin help message
[ https://issues.apache.org/jira/browse/YARN-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483276#comment-15483276 ] Sunil G commented on YARN-5631: --- Looks good for me. > Missing refreshClusterMaxPriority usage in rmadmin help message > --- > > Key: YARN-5631 > URL: https://issues.apache.org/jira/browse/YARN-5631 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Kai Sasaki >Assignee: Kai Sasaki >Priority: Minor > Attachments: YARN-5631.01.patch, YARN-5631.02.patch > > > {{rmadmin -help}} does not show {{-refreshClusterMaxPriority}} option in > usage line. > {code} > $ bin/yarn rmadmin -help > rmadmin is the command to execute YARN administrative commands. > The full syntax is: > yarn rmadmin [-refreshQueues] [-refreshNodes [-g|graceful [timeout in > seconds] -client|server]] [-refreshNodesResources] > [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] > [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] > [-addToClusterNodeLabels > <"label1(exclusive=true),label2(exclusive=false),label3">] > [-removeFromClusterNodeLabels] [-replaceLabelsOnNode > <"node1[:port]=label1,label2 node2[:port]=label1">] > [-directlyAccessNodeLabelStore] [-updateNodeResource [NodeID] [MemSize] > [vCores] ([OvercommitTimeout]) [-help [cmd]] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4855) Should check if node exists when replace nodelabels
[ https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483269#comment-15483269 ] Sunil G commented on YARN-4855: --- HI [~Tao Jie] and [~Naganarasimha Garla] Thanks for the work on this item. I have doubts on the patch. - Thinking out node, {{replaceLabelsOnNodes}} seems a little complex for me. Could we add a server side API to check whether set of nodes are registered or not? {{ConcurrentMapgetRMNodes()}} is available at server side. So it will be faster and better approach. - If we are sticking with client side impl, then we could try make use of {{NodeId#compareTo}} instead of {{isNodeSame}} > Should check if node exists when replace nodelabels > --- > > Key: YARN-4855 > URL: https://issues.apache.org/jira/browse/YARN-4855 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.6.0 >Reporter: Tao Jie >Assignee: Tao Jie >Priority: Minor > Attachments: YARN-4855.001.patch, YARN-4855.002.patch, > YARN-4855.003.patch, YARN-4855.004.patch, YARN-4855.005.patch, > YARN-4855.006.patch, YARN-4855.007.patch, YARN-4855.008.patch > > > Today when we add nodelabels to nodes, it would succeed even if nodes are not > existing NodeManger in cluster without any message. > It could be like this: > When we use *yarn rmadmin -replaceLabelsOnNode --fail-on-unkown-nodes > "node1=label1"* , it would be denied if node is unknown. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5587) Add support for resource profiles
[ https://issues.apache.org/jira/browse/YARN-5587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-5587: Attachment: YARN-5587-YARN-3926.001.patch > Add support for resource profiles > - > > Key: YARN-5587 > URL: https://issues.apache.org/jira/browse/YARN-5587 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-5587-YARN-3926.001.patch > > > Add support for resource profiles on the RM side to allow users to use > shorthands to specify resource requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5561) [Atsv2] : Support for ability to retrieve apps/app-attempt/containers and entities via REST
[ https://issues.apache.org/jira/browse/YARN-5561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483258#comment-15483258 ] Varun Saxena commented on YARN-5561: bq. are looking for a complete applications page where all applications which were running/completed had to be listed. For this purpose, I think we need the api as suggested by Rohith. Being said this, We will also be showing hierarchy from flows too. So you plan to have 2 app pages. One from a specific flow run and other a list of all the apps in a cluster. Right ? [~rohithsharma], how do you plan to support fetching all apps within a cluster ? Probably you can adopt the approach I had suggested. Because otherwise it would lead to full table scan. bq. New API's required. Thoughts? We should have them for the sake of completeness. > [Atsv2] : Support for ability to retrieve apps/app-attempt/containers and > entities via REST > --- > > Key: YARN-5561 > URL: https://issues.apache.org/jira/browse/YARN-5561 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: YARN-5561.patch, YARN-5561.v0.patch > > > ATSv2 model lacks retrieval of {{list-of-all-apps}}, > {{list-of-all-app-attempts}} and {{list-of-all-containers-per-attempt}} via > REST API's. And also it is required to know about all the entities in an > applications. > It is pretty much highly required these URLs for Web UI. > New REST URL would be > # GET {{/ws/v2/timeline/apps}} > # GET {{/ws/v2/timeline/apps/\{app-id\}/appattempts}}. > # GET > {{/ws/v2/timeline/apps/\{app-id\}/appattempts/\{attempt-id\}/containers}} > # GET {{/ws/v2/timeline/apps/\{app id\}/entities}} should display list of > entities that can be queried. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero
[ https://issues.apache.org/jira/browse/YARN-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483254#comment-15483254 ] Bibin A Chundatt commented on YARN-5545: [~Naganarasimha Garla] and [~sunilg] {quote} Also consider the cases when the accessibility is * and new partitions are added without refreshing, this configuration will be wrong as its static. {quote} Thank you for pointing out will check the same. But [~Naganarasimha Garla] when ever we reconfigure capacity scheduler xml this limits also will get refreshed. {quote} Would it be better to set the default value of yarn.scheduler.capacity.maximum-applications.accessible-node-labels. to that of yarn.scheduler.capacity.maximum-applications {quote} Will use {{yarn.scheduler.capacity.maximum-applications}} itself. {quote} IIUC you seem to adopt the approach little different than what you mention in your comment, though we are having per partition level max app limit, we just sum up max limits of all partitions under a queue and check against ApplicationLimit.getAllMaxApplication() {quote} This was added since application per partition we can't consider for app limit IIUC we have to check max apps to queue from all partitions. Documentation will add for the same. > App submit failure on queue with label when default queue partition capacity > is zero > > > Key: YARN-5545 > URL: https://issues.apache.org/jira/browse/YARN-5545 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-5545.0001.patch, YARN-5545.0002.patch, > YARN-5545.0003.patch, capacity-scheduler.xml > > > Configure capacity scheduler > yarn.scheduler.capacity.root.default.capacity=0 > yarn.scheduler.capacity.root.queue1.accessible-node-labels.labelx.capacity=50 > yarn.scheduler.capacity.root.default.accessible-node-labels.labelx.capacity=50 > Submit application as below > ./yarn jar > ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-alpha2-SNAPSHOT-tests.jar > sleep -Dmapreduce.job.node-label-expression=labelx > -Dmapreduce.job.queuename=default -m 1 -r 1 -mt 1000 -rt 1 > {noformat} > 2016-08-21 18:21:31,375 INFO mapreduce.JobSubmitter: Cleaning up the staging > area /tmp/hadoop-yarn/staging/root/.staging/job_1471670113386_0001 > java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed > to submit application_1471670113386_0001 to YARN : > org.apache.hadoop.security.AccessControlException: Queue root.default already > has 0 applications, cannot accept submission of application: > application_1471670113386_0001 > at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:316) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:255) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1790) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362) > at org.apache.hadoop.mapreduce.SleepJob.run(SleepJob.java:273) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.mapreduce.SleepJob.main(SleepJob.java:194) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) > at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) > at > org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:136) > at > org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:144) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:239) > at org.apache.hadoop.util.RunJar.main(RunJar.java:153) > Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit > application_1471670113386_0001 to YARN : >