[jira] [Updated] (YARN-8591) [ATSv2] YARN_CONTAINER API throws 500 Internal Server Error
[ https://issues.apache.org/jira/browse/YARN-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akhil PB updated YARN-8591: --- Description: {code:java} GET http://ctr-e138-1518143905142-417433-01-04.hwx.site:8198/ws/v2/timeline/apps/application_1532578985272_0002/entities/YARN_CONTAINER?fields=ALL&_=1532670071899{code} {code:java} 2018-07-27 05:32:03,468 WARN webapp.GenericExceptionHandler (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR javax.ws.rs.WebApplicationException: java.lang.NullPointerException at org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.handleException(TimelineReaderWebServices.java:196) at org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getEntities(TimelineReaderWebServices.java:624) at org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getEntities(TimelineReaderWebServices.java:474) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) at org.apache.hadoop.yarn.server.timelineservice.reader.security.TimelineReaderWhitelistAuthorizationFilter.doFilter(TimelineReaderWhitelistAuthorizationFilter.java:85) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.security.http.CrossOriginFilter.doFilter(CrossOriginFilter.java:98) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at
[jira] [Updated] (YARN-8591) [ATSv2] YARN_CONTAINER API throws 500 Internal Server Error
[ https://issues.apache.org/jira/browse/YARN-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akhil PB updated YARN-8591: --- Summary: [ATSv2] YARN_CONTAINER API throws 500 Internal Server Error (was: [ATSv2] Yarn container API throws 500 Internal Server Error) > [ATSv2] YARN_CONTAINER API throws 500 Internal Server Error > --- > > Key: YARN-8591 > URL: https://issues.apache.org/jira/browse/YARN-8591 > Project: Hadoop YARN > Issue Type: Bug > Components: timelinereader, timelineserver >Reporter: Akhil PB >Assignee: Rohith Sharma K S >Priority: Major > > GET > ctr-e138-1518143905142-417433-01-04.hwx.site:8198/ws/v2/timeline/apps/application_1532578985272_0002/entities/YARN_CONTAINER?fields=ALL&_=1532670071899 > {code:java} > 2018-07-27 05:32:03,468 WARN webapp.GenericExceptionHandler > (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR > javax.ws.rs.WebApplicationException: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.handleException(TimelineReaderWebServices.java:196) > at > org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getEntities(TimelineReaderWebServices.java:624) > at > org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getEntities(TimelineReaderWebServices.java:474) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) > at > com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) > at > com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) > at > com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at > com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at > com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) > at > com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) > at > com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) > at > com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) > at > com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) > at > com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) > at > com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558) > at > com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.yarn.server.timelineservice.reader.security.TimelineReaderWhitelistAuthorizationFilter.doFilter(TimelineReaderWhitelistAuthorizationFilter.java:85) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.apache.hadoop.security.http.CrossOriginFilter.doFilter(CrossOriginFilter.java:98) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) >
[jira] [Created] (YARN-8591) [ATSv2] Yarn container API throws 500 Internal Server Error
Akhil PB created YARN-8591: -- Summary: [ATSv2] Yarn container API throws 500 Internal Server Error Key: YARN-8591 URL: https://issues.apache.org/jira/browse/YARN-8591 Project: Hadoop YARN Issue Type: Bug Components: timelinereader, timelineserver Reporter: Akhil PB Assignee: Rohith Sharma K S GET ctr-e138-1518143905142-417433-01-04.hwx.site:8198/ws/v2/timeline/apps/application_1532578985272_0002/entities/YARN_CONTAINER?fields=ALL&_=1532670071899 {code:java} 2018-07-27 05:32:03,468 WARN webapp.GenericExceptionHandler (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR javax.ws.rs.WebApplicationException: java.lang.NullPointerException at org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.handleException(TimelineReaderWebServices.java:196) at org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getEntities(TimelineReaderWebServices.java:624) at org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getEntities(TimelineReaderWebServices.java:474) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) at org.apache.hadoop.yarn.server.timelineservice.reader.security.TimelineReaderWhitelistAuthorizationFilter.doFilter(TimelineReaderWhitelistAuthorizationFilter.java:85) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.security.http.CrossOriginFilter.doFilter(CrossOriginFilter.java:98) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at
[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data
[ https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559242#comment-16559242 ] genericqa commented on YARN-8579: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 18m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 7s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 25s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 13s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 69m 2s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 40s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}180m 49s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8579 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12933296/YARN-8579.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 5fa995b24f28 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8d3c068 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21389/testReport/ | | Max. process+thread count | 850 (vs. ulimit of 1) |
[jira] [Updated] (YARN-8579) New AM attempt could not retrieve previous attempt component data
[ https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8579: Fix Version/s: 3.1.2 3.2.0 > New AM attempt could not retrieve previous attempt component data > - > > Key: YARN-8579 > URL: https://issues.apache.org/jira/browse/YARN-8579 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Critical > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8579.001.patch > > > Steps: > 1) Launch httpd-docker > 2) Wait for app to be in STABLE state > 3) Run validation for app (It takes around 3 mins) > 4) Stop all Zks > 5) Wait 60 sec > 6) Kill AM > 7) wait for 30 sec > 8) Start all ZKs > 9) Wait for application to finish > 10) Validate expected containers of the app > Expected behavior: > New attempt of AM should start and docker containers launched by 1st attempt > should be recovered by new attempt. > Actual behavior: > New AM attempt starts. It can not recover 1st attempt docker containers. It > can not read component details from ZK. > Thus, it starts new attempt for all containers. > {code} > 2018-07-19 22:42:47,595 [main] INFO service.ServiceScheduler - Registering > appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into > registry > 2018-07-19 22:42:47,611 [main] INFO service.ServiceScheduler - Received 1 > containers from previous attempt. > 2018-07-19 22:42:47,642 [main] INFO service.ServiceScheduler - Could not > read component paths: > `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components': > No such file or directory: KeeperErrorCode = NoNode for > /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Handling > container_e08_1531977563978_0015_01_03 from previous attempt > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Record not > found in registry for container container_e08_1531977563978_0015_01_03 > from previous attempt, releasing > 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO > impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019 > 2018-07-19 22:42:47,651 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component httpd > 2018-07-19 22:42:47,652 [main] INFO component.Component - [INIT COMPONENT > httpd]: 2 instances. > 2018-07-19 22:42:47,652 [main] INFO component.Component - [COMPONENT httpd] > Requesting for 2 container(s){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly
[ https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559134#comment-16559134 ] Gour Saha commented on YARN-8429: - Thanks [~eyang] for the commit. Can you please commit it to branch-3.1 also since it is targetted for 3.1.2 release also? > Improve diagnostic message when artifact is not set properly > > > Key: YARN-8429 > URL: https://issues.apache.org/jira/browse/YARN-8429 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8429.001.patch, YARN-8429.002.patch, > YARN-8429.003.patch, YARN-8429.004.patch > > > Steps: > 1) Create launch json file. Replace "artifact" with "artifacts" > 2) launch yarn service app with cli > The application launch fails with below error > {code} > [xxx xxx]$ yarn app -launch test2-2 test.json > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition > from local FS: /xxx/test.json > 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms > 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be > absolute path: /xxx/xxx > {code} > artifact field is not mandatory. However, If that field is specified > incorrectly, launch cmd should fail with proper error. > Here, The error message regarding Dest file is misleading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8579) New AM attempt could not retrieve previous attempt component data
[ https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8579: Attachment: YARN-8579.001.patch > New AM attempt could not retrieve previous attempt component data > - > > Key: YARN-8579 > URL: https://issues.apache.org/jira/browse/YARN-8579 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Critical > Attachments: YARN-8579.001.patch > > > Steps: > 1) Launch httpd-docker > 2) Wait for app to be in STABLE state > 3) Run validation for app (It takes around 3 mins) > 4) Stop all Zks > 5) Wait 60 sec > 6) Kill AM > 7) wait for 30 sec > 8) Start all ZKs > 9) Wait for application to finish > 10) Validate expected containers of the app > Expected behavior: > New attempt of AM should start and docker containers launched by 1st attempt > should be recovered by new attempt. > Actual behavior: > New AM attempt starts. It can not recover 1st attempt docker containers. It > can not read component details from ZK. > Thus, it starts new attempt for all containers. > {code} > 2018-07-19 22:42:47,595 [main] INFO service.ServiceScheduler - Registering > appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into > registry > 2018-07-19 22:42:47,611 [main] INFO service.ServiceScheduler - Received 1 > containers from previous attempt. > 2018-07-19 22:42:47,642 [main] INFO service.ServiceScheduler - Could not > read component paths: > `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components': > No such file or directory: KeeperErrorCode = NoNode for > /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Handling > container_e08_1531977563978_0015_01_03 from previous attempt > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Record not > found in registry for container container_e08_1531977563978_0015_01_03 > from previous attempt, releasing > 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO > impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019 > 2018-07-19 22:42:47,651 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component httpd > 2018-07-19 22:42:47,652 [main] INFO component.Component - [INIT COMPONENT > httpd]: 2 instances. > 2018-07-19 22:42:47,652 [main] INFO component.Component - [COMPONENT httpd] > Requesting for 2 container(s){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data
[ https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559122#comment-16559122 ] Gour Saha commented on YARN-8579: - Uploading patch 001 with a fix that I successfully tested in my cluster > New AM attempt could not retrieve previous attempt component data > - > > Key: YARN-8579 > URL: https://issues.apache.org/jira/browse/YARN-8579 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Critical > Attachments: YARN-8579.001.patch > > > Steps: > 1) Launch httpd-docker > 2) Wait for app to be in STABLE state > 3) Run validation for app (It takes around 3 mins) > 4) Stop all Zks > 5) Wait 60 sec > 6) Kill AM > 7) wait for 30 sec > 8) Start all ZKs > 9) Wait for application to finish > 10) Validate expected containers of the app > Expected behavior: > New attempt of AM should start and docker containers launched by 1st attempt > should be recovered by new attempt. > Actual behavior: > New AM attempt starts. It can not recover 1st attempt docker containers. It > can not read component details from ZK. > Thus, it starts new attempt for all containers. > {code} > 2018-07-19 22:42:47,595 [main] INFO service.ServiceScheduler - Registering > appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into > registry > 2018-07-19 22:42:47,611 [main] INFO service.ServiceScheduler - Received 1 > containers from previous attempt. > 2018-07-19 22:42:47,642 [main] INFO service.ServiceScheduler - Could not > read component paths: > `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components': > No such file or directory: KeeperErrorCode = NoNode for > /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Handling > container_e08_1531977563978_0015_01_03 from previous attempt > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Record not > found in registry for container container_e08_1531977563978_0015_01_03 > from previous attempt, releasing > 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO > impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019 > 2018-07-19 22:42:47,651 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component httpd > 2018-07-19 22:42:47,652 [main] INFO component.Component - [INIT COMPONENT > httpd]: 2 instances. > 2018-07-19 22:42:47,652 [main] INFO component.Component - [COMPONENT httpd] > Requesting for 2 container(s){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data
[ https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559121#comment-16559121 ] Gour Saha commented on YARN-8579: - I investigated this issue and figured that the root cause is the missing NM tokens corresponding to the containers which were passed to the AM after registration via the onContainersReceivedFromPreviousAttempts callback. This is required with the change made in YARN-6168. Exception seen in AM log is as below - {code} 2018-07-26 23:22:31,373 [pool-5-thread-4] ERROR instance.ComponentInstance - [COMPINSTANCE httpd-proxy-0 : container_e15_1532637883791_0001_01_04] Failed to get container status on ctr-e138-1518143905142-412155-01-05.hwx.site:25454, will try again org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent for ctr-e138-1518143905142-412155-01-05.hwx.site:25454 at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:262) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:252) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:137) at org.apache.hadoop.yarn.client.api.impl.NMClientImpl.getContainerStatus(NMClientImpl.java:323) at org.apache.hadoop.yarn.service.component.instance.ComponentInstance$ContainerStatusRetriever.run(ComponentInstance.java:596) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} > New AM attempt could not retrieve previous attempt component data > - > > Key: YARN-8579 > URL: https://issues.apache.org/jira/browse/YARN-8579 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Critical > > Steps: > 1) Launch httpd-docker > 2) Wait for app to be in STABLE state > 3) Run validation for app (It takes around 3 mins) > 4) Stop all Zks > 5) Wait 60 sec > 6) Kill AM > 7) wait for 30 sec > 8) Start all ZKs > 9) Wait for application to finish > 10) Validate expected containers of the app > Expected behavior: > New attempt of AM should start and docker containers launched by 1st attempt > should be recovered by new attempt. > Actual behavior: > New AM attempt starts. It can not recover 1st attempt docker containers. It > can not read component details from ZK. > Thus, it starts new attempt for all containers. > {code} > 2018-07-19 22:42:47,595 [main] INFO service.ServiceScheduler - Registering > appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into > registry > 2018-07-19 22:42:47,611 [main] INFO service.ServiceScheduler - Received 1 > containers from previous attempt. > 2018-07-19 22:42:47,642 [main] INFO service.ServiceScheduler - Could not > read component paths: > `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components': > No such file or directory: KeeperErrorCode = NoNode for > /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Handling > container_e08_1531977563978_0015_01_03 from previous attempt > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Record not > found in registry for container container_e08_1531977563978_0015_01_03 > from previous attempt, releasing > 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO > impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019 > 2018-07-19 22:42:47,651 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component httpd > 2018-07-19 22:42:47,652 [main] INFO component.Component - [INIT COMPONENT > httpd]: 2 instances. > 2018-07-19 22:42:47,652 [main] INFO component.Component - [COMPONENT httpd] > Requesting for 2 container(s){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail:
[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly
[ https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559088#comment-16559088 ] Hudson commented on YARN-8429: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14651 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14651/]) YARN-8429. Improve diagnostic message when artifact is not set properly. (eyang: rev 8d3c068e59f18e3f8260713fee83c458aa1d) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/exceptions/RestApiErrorMessages.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/providers/TestDefaultClientProvider.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/providers/TestAbstractClientProvider.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/provider/AbstractClientProvider.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/utils/ServiceApiUtil.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/provider/defaultImpl/DefaultClientProvider.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/provider/tarball/TarballClientProvider.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/TestServiceApiUtil.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/provider/docker/DockerClientProvider.java > Improve diagnostic message when artifact is not set properly > > > Key: YARN-8429 > URL: https://issues.apache.org/jira/browse/YARN-8429 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8429.001.patch, YARN-8429.002.patch, > YARN-8429.003.patch, YARN-8429.004.patch > > > Steps: > 1) Create launch json file. Replace "artifact" with "artifacts" > 2) launch yarn service app with cli > The application launch fails with below error > {code} > [xxx xxx]$ yarn app -launch test2-2 test.json > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition > from local FS: /xxx/test.json > 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms > 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be > absolute path: /xxx/xxx > {code} > artifact field is not mandatory. However, If that field is specified > incorrectly, launch cmd should fail with proper error. > Here, The error message regarding Dest file is misleading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559072#comment-16559072 ] genericqa commented on YARN-8509: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 56s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 13s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 42s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}139m 8s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8509 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12933273/YARN-8509.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 46cdc61b785a 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d70d845 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/21386/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21386/testReport/ | | Max. process+thread count | 859 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U:
[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly
[ https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559071#comment-16559071 ] Eric Yang commented on YARN-8429: - Thank you [~gsaha] for the patch. Thank you [~billie.rinaldi] for the review. +1 Patch 4 looks good to me. > Improve diagnostic message when artifact is not set properly > > > Key: YARN-8429 > URL: https://issues.apache.org/jira/browse/YARN-8429 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8429.001.patch, YARN-8429.002.patch, > YARN-8429.003.patch, YARN-8429.004.patch > > > Steps: > 1) Create launch json file. Replace "artifact" with "artifacts" > 2) launch yarn service app with cli > The application launch fails with below error > {code} > [xxx xxx]$ yarn app -launch test2-2 test.json > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition > from local FS: /xxx/test.json > 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms > 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be > absolute path: /xxx/xxx > {code} > artifact field is not mandatory. However, If that field is specified > incorrectly, launch cmd should fail with proper error. > Here, The error message regarding Dest file is misleading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8574) Allow dot in attribute values
[ https://issues.apache.org/jira/browse/YARN-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559060#comment-16559060 ] Naganarasimha G R edited comment on YARN-8574 at 7/27/18 12:11 AM: --- Hi [~bibinchundatt], You are correct i meant when we start using it as namespace later on. but anyway as we are supporting for internal its as well good to be supported for others too. My only doubt now is how was it NOT failing for earlier mappings for prefix? Have manually triggered the build too... was (Author: naganarasimha): Hi [~bibinchundatt], You are correct i meant when we start using it as namespace later on. but anyway as we are supporting for internal its as well good to be supported for others too. My ownly doubt now is how was it not failing for earlier mappings for prefix? > Allow dot in attribute values > -- > > Key: YARN-8574 > URL: https://issues.apache.org/jira/browse/YARN-8574 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: YARN-8574-YARN-3409.001.patch > > > Currently "." is considered as invalid value. Enable the same; -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8407) Container launch exception in AM log should be printed in ERROR level
[ https://issues.apache.org/jira/browse/YARN-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559063#comment-16559063 ] Yesha Vora commented on YARN-8407: -- Thanks [~bibinchundatt] for review. Patch-2 is submitted. > Container launch exception in AM log should be printed in ERROR level > - > > Key: YARN-8407 > URL: https://issues.apache.org/jira/browse/YARN-8407 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: YARN-8407.001.patch, YARN-8407.002.patch > > > when a container launch is failing due to docker image not available is > logged as INFO level in AM log. > Container launch failure should be logged as ERROR. > Steps: > launch httpd yarn-service application with invalid docker image > > {code:java} > 2018-06-07 01:51:32,966 [Component dispatcher] INFO > instance.ComponentInstance - [COMPINSTANCE httpd-0 : > container_e05_1528335963594_0001_01_02]: > container_e05_1528335963594_0001_01_02 completed. Reinsert back to > pending list and requested a new container. > exitStatus=-1, diagnostics=[2018-06-07 01:51:02.363]Exception from > container-launch. > Container id: container_e05_1528335963594_0001_01_02 > Exit code: 7 > Exception message: Launch container failed > Shell error output: Unable to find image 'xxx/httpd:0.1' locally > Trying to pull repository xxx/httpd ... > /usr/bin/docker-current: Get https://xxx/v1/_ping: dial tcp: lookup xxx on > yyy: no such host. > See '/usr/bin/docker-current run --help'. > Shell output: main : command provided 4 > main : run as user is hbase > main : requested yarn user is hbase > Creating script paths... > Creating local dirs... > Getting exit code file... > Changing effective user to root... > Wrote the exit code 7 to > /grid/0/hadoop/yarn/local/nmPrivate/application_1528335963594_0001/container_e05_1528335963594_0001_01_02/container_e05_1528335963594_0001_01_02.pid.exitcode > [2018-06-07 01:51:02.393]Diagnostic message from attempt : > [2018-06-07 01:51:02.394]Container exited with a non-zero exit code 7. Last > 4096 bytes of stderr.txt : > [2018-06-07 01:51:32.428]Could not find > nmPrivate/application_1528335963594_0001/container_e05_1528335963594_0001_01_02//container_e05_1528335963594_0001_01_02.pid > in any of the directories > 2018-06-07 01:51:32,966 [Component dispatcher] INFO > instance.ComponentInstance - [COMPINSTANCE httpd-0 : > container_e05_1528335963594_0001_01_02] Transitioned from STARTED to INIT > on STOP event{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8407) Container launch exception in AM log should be printed in ERROR level
[ https://issues.apache.org/jira/browse/YARN-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora updated YARN-8407: - Attachment: YARN-8407.002.patch > Container launch exception in AM log should be printed in ERROR level > - > > Key: YARN-8407 > URL: https://issues.apache.org/jira/browse/YARN-8407 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: YARN-8407.001.patch, YARN-8407.002.patch > > > when a container launch is failing due to docker image not available is > logged as INFO level in AM log. > Container launch failure should be logged as ERROR. > Steps: > launch httpd yarn-service application with invalid docker image > > {code:java} > 2018-06-07 01:51:32,966 [Component dispatcher] INFO > instance.ComponentInstance - [COMPINSTANCE httpd-0 : > container_e05_1528335963594_0001_01_02]: > container_e05_1528335963594_0001_01_02 completed. Reinsert back to > pending list and requested a new container. > exitStatus=-1, diagnostics=[2018-06-07 01:51:02.363]Exception from > container-launch. > Container id: container_e05_1528335963594_0001_01_02 > Exit code: 7 > Exception message: Launch container failed > Shell error output: Unable to find image 'xxx/httpd:0.1' locally > Trying to pull repository xxx/httpd ... > /usr/bin/docker-current: Get https://xxx/v1/_ping: dial tcp: lookup xxx on > yyy: no such host. > See '/usr/bin/docker-current run --help'. > Shell output: main : command provided 4 > main : run as user is hbase > main : requested yarn user is hbase > Creating script paths... > Creating local dirs... > Getting exit code file... > Changing effective user to root... > Wrote the exit code 7 to > /grid/0/hadoop/yarn/local/nmPrivate/application_1528335963594_0001/container_e05_1528335963594_0001_01_02/container_e05_1528335963594_0001_01_02.pid.exitcode > [2018-06-07 01:51:02.393]Diagnostic message from attempt : > [2018-06-07 01:51:02.394]Container exited with a non-zero exit code 7. Last > 4096 bytes of stderr.txt : > [2018-06-07 01:51:32.428]Could not find > nmPrivate/application_1528335963594_0001/container_e05_1528335963594_0001_01_02//container_e05_1528335963594_0001_01_02.pid > in any of the directories > 2018-06-07 01:51:32,966 [Component dispatcher] INFO > instance.ComponentInstance - [COMPINSTANCE httpd-0 : > container_e05_1528335963594_0001_01_02] Transitioned from STARTED to INIT > on STOP event{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8574) Allow dot in attribute values
[ https://issues.apache.org/jira/browse/YARN-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559060#comment-16559060 ] Naganarasimha G R commented on YARN-8574: - Hi [~bibinchundatt], You are correct i meant when we start using it as namespace later on. but anyway as we are supporting for internal its as well good to be supported for others too. My ownly doubt now is how was it not failing for earlier mappings for prefix? > Allow dot in attribute values > -- > > Key: YARN-8574 > URL: https://issues.apache.org/jira/browse/YARN-8574 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: YARN-8574-YARN-3409.001.patch > > > Currently "." is considered as invalid value. Enable the same; -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8574) Allow dot in attribute values
[ https://issues.apache.org/jira/browse/YARN-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559053#comment-16559053 ] genericqa commented on YARN-8574: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 14m 2s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8574 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12933017/YARN-8574-YARN-3409.001.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21387/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Allow dot in attribute values > -- > > Key: YARN-8574 > URL: https://issues.apache.org/jira/browse/YARN-8574 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: YARN-8574-YARN-3409.001.patch > > > Currently "." is considered as invalid value. Enable the same; -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8590) Fair scheduler promotion does not update container execution type and token
[ https://issues.apache.org/jira/browse/YARN-8590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559055#comment-16559055 ] genericqa commented on YARN-8590: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 8m 8s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8590 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12933282/YARN-8590-YARN-1011.00.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21388/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Fair scheduler promotion does not update container execution type and token > --- > > Key: YARN-8590 > URL: https://issues.apache.org/jira/browse/YARN-8590 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: YARN-1011 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8590-YARN-1011.00.patch > > > Fair Scheduler promotion of opportunistic containers does not update > container execution type and token. This leads to incorrect resource > accounting when the promoted containers are released. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8590) Fair scheduler promotion does not update container execution type and token
[ https://issues.apache.org/jira/browse/YARN-8590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-8590: - Attachment: YARN-8590-YARN-1011.00.patch > Fair scheduler promotion does not update container execution type and token > --- > > Key: YARN-8590 > URL: https://issues.apache.org/jira/browse/YARN-8590 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: YARN-1011 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8590-YARN-1011.00.patch > > > Fair Scheduler promotion of opportunistic containers does not update > container execution type and token. This leads to incorrect resource > accounting when the promoted containers are released. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559043#comment-16559043 ] Chandni Singh commented on YARN-8509: - Couple of nits and questions: 1. This is a javadoc block which should be above the method. I understand that this test is moved out of another test class but this gives a good opportunity to fix this. {code:java} /** * Test case: Submit three applications (app1/app2/app3) to different * queues, queue structure: * * * Root */ | \ \ * a b c d * 30 30 30 10 * * */ {code} 2. Why explicitly setting the log level to debug in the code? {code} Logger.getRootLogger().setLevel(Level.DEBUG); {code} 3. Can you explain the comment? {code} // We should release pending resource be capped at user limit, think about // a user ask for 1maps. but cluster can run a max of 1000. In this // case, as soon as each map finish, other one pending will get scheduled // When not deduct reserved, total-pending = 3G (u1) + 20G (u2) = 23G // deduct reserved, total-pending = 0G (u1) + 20G (u2) = 20G {code} > Fix UserLimit calculation for preemption to balance scenario after queue > satisfied > > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch, YARN-8509.002.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8590) Fair scheduler promotion does not update container execution type and token
Haibo Chen created YARN-8590: Summary: Fair scheduler promotion does not update container execution type and token Key: YARN-8590 URL: https://issues.apache.org/jira/browse/YARN-8590 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Affects Versions: YARN-1011 Reporter: Haibo Chen Assignee: Haibo Chen Fair Scheduler promotion of opportunistic containers does not update container execution type and token. This leads to incorrect resource accounting when the promoted containers are released. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8545) YARN native service should return container if launch failed
[ https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559026#comment-16559026 ] Hudson commented on YARN-8545: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14649 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14649/]) YARN-8545. Return allocated resource to RM for failed container. (eyang: rev 40fad32824d2f8f960c779d78357e62103453da0) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/instance/ComponentInstanceEvent.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/TestServiceAM.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ServiceScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/containerlaunch/ContainerLaunchService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/MockServiceAM.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/component/TestComponent.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/component/instance/TestComponentInstance.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/instance/ComponentInstance.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/Component.java > YARN native service should return container if launch failed > > > Key: YARN-8545 > URL: https://issues.apache.org/jira/browse/YARN-8545 > Project: Hadoop YARN > Issue Type: Task >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8545.001.patch > > > In some cases, container launch may fail but container will not be properly > returned to RM. > This could happen when AM trying to prepare container launch context but > failed w/o sending container launch context to NM (Once container launch > context is sent to NM, NM will report failed container to RM). > Exception like: > {code:java} > java.io.FileNotFoundException: File does not exist: > hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591) > at > org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388) > at > org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253) > at > org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152) > at > org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){code} > And even after container launch context prepare failed, AM still trying to > monitor container's readiness: > {code:java} > 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO monitor.ServiceMonitor - > Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 > 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP > presence", exception="java.io.IOException: primary-worker-0: IP is not > available yet" > ...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe,
[jira] [Commented] (YARN-8584) Several typos in Log Aggregation related classes
[ https://issues.apache.org/jira/browse/YARN-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559023#comment-16559023 ] Chandni Singh commented on YARN-8584: - Looks good. We can also change the log statements to utilize slf4j instead of concatenating strings. For example {code:java} LOG.warn("rollingMonitorInterval should be more than or equal to " + MIN_LOG_ROLLING_INTERVAL + " seconds. Using " + MIN_LOG_ROLLING_INTERVAL + " seconds instead.");{code} to {code:java} LOG.warn("rollingMonitorInterval should be more than or equal to {} seconds. Using {} seconds instead.", MIN_LOG_ROLLING_INTERVAL, MIN_LOG_ROLLING_INTERVAL);{code} > Several typos in Log Aggregation related classes > > > Key: YARN-8584 > URL: https://issues.apache.org/jira/browse/YARN-8584 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: YARN-8584.001.patch > > > There are typos in comments, log messages, method names, field names, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly
[ https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559018#comment-16559018 ] genericqa commented on YARN-8429: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 47s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 12s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 34s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 62m 56s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8429 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12933270/YARN-8429.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 7cf8dd6e6bc0 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d70d845 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21385/testReport/ | | Max. process+thread count | 755 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21385/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Improve diagnostic
[jira] [Commented] (YARN-8545) YARN native service should return container if launch failed
[ https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558997#comment-16558997 ] Eric Yang commented on YARN-8545: - +1 looks good to me. Committing shortly. > YARN native service should return container if launch failed > > > Key: YARN-8545 > URL: https://issues.apache.org/jira/browse/YARN-8545 > Project: Hadoop YARN > Issue Type: Task >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8545.001.patch > > > In some cases, container launch may fail but container will not be properly > returned to RM. > This could happen when AM trying to prepare container launch context but > failed w/o sending container launch context to NM (Once container launch > context is sent to NM, NM will report failed container to RM). > Exception like: > {code:java} > java.io.FileNotFoundException: File does not exist: > hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591) > at > org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388) > at > org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253) > at > org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152) > at > org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){code} > And even after container launch context prepare failed, AM still trying to > monitor container's readiness: > {code:java} > 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO monitor.ServiceMonitor - > Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 > 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP > presence", exception="java.io.IOException: primary-worker-0: IP is not > available yet" > ...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8571) Validate service principal format prior to launching yarn service
[ https://issues.apache.org/jira/browse/YARN-8571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558975#comment-16558975 ] genericqa commented on YARN-8571: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 0s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 49s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 23s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 17s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 80m 4s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8571 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12933267/YARN-8571.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 518eed9d580a 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d70d845 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21383/testReport/ | | Max. process+thread count | 742 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21383/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Validate service
[jira] [Created] (YARN-8589) ATS TimelineACLsManager checkAccess is slow
Prabhu Joseph created YARN-8589: --- Summary: ATS TimelineACLsManager checkAccess is slow Key: YARN-8589 URL: https://issues.apache.org/jira/browse/YARN-8589 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.7.3 Reporter: Prabhu Joseph ATS rest api is very slow when there are more than 1lakh entries if yarn.acl.enable is set to true as TimelineACLsManager has to check access for every entries. We can;t disable yarn.acl.enable as all the YARN ACLs uses the same config. We can have a separate config to provide read access to the ATS Entries. {code} curl http://:8188/ws/v1/timeline/HIVE_QUERY_ID {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8522) Application fails with InvalidResourceRequestException
[ https://issues.apache.org/jira/browse/YARN-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558965#comment-16558965 ] Zian Chen commented on YARN-8522: - [~sunilg], could you help review the latest patch? > Application fails with InvalidResourceRequestException > -- > > Key: YARN-8522 > URL: https://issues.apache.org/jira/browse/YARN-8522 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8522.001.patch, YARN-8522.002.patch > > > Launch multiple streaming app simultaneously. Here, sometimes one of the > application fails with below stack trace. > {code} > 18/07/02 07:14:32 INFO retry.RetryInvocationHandler: > java.net.ConnectException: Call From xx.xx.xx.xx/xx.xx.xx.xx to > xx.xx.xx.xx:8032 failed on connection exception: java.net.ConnectException: > Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused, while invoking > ApplicationClientProtocolPBClientImpl.submitApplication over null. Retrying > after sleeping for 3ms. > 18/07/02 07:14:32 WARN client.RequestHedgingRMFailoverProxyProvider: > Invocation returned exception: > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, only one resource request with * is allowed > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > on [rm2], so propagating back to caller. > 18/07/02 07:14:32 INFO mapreduce.JobSubmitter: Cleaning up the staging area > /user/hrt_qa/.staging/job_1530515284077_0007 > 18/07/02 07:14:32 ERROR streaming.StreamJob: Error Launching job : > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, only one resource request with * is allowed > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > Streaming Command Failed!{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail:
[jira] [Created] (YARN-8588) Logging improvements for better debuggability
Suma Shivaprasad created YARN-8588: -- Summary: Logging improvements for better debuggability Key: YARN-8588 URL: https://issues.apache.org/jira/browse/YARN-8588 Project: Hadoop YARN Issue Type: Sub-task Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Capacity allocations decided in GuaranteedCapacityOvertimePolicy are available via AutoCreatedLeafQueueConfig. However this class lacks a toString and some other DEBUG level logs are needed for better debuggability -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558960#comment-16558960 ] Suma Shivaprasad commented on YARN-8559: [~cheersyang] Thanks for the patch. Just a minor comment else the patch LGTM. Can you please replace the webservice endpoint "/schedulerconf" with the constant available in RMWebServices? > Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint > > > Key: YARN-8559 > URL: https://issues.apache.org/jira/browse/YARN-8559 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Anna Savarin >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8559.001.patch, YARN-8559.002.patch > > > All Hadoop services provide a set of common endpoints (/stacks, /logLevel, > /metrics, /jmx, /conf). In the case of the Resource Manager, part of the > configuration comes from the scheduler being used. Currently, these > configuration key/values are not exposed through the /conf endpoint, thereby > revealing an incomplete configuration picture. > Make an improvement and expose the scheduling configuration info through the > RM's /conf endpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart
[ https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558940#comment-16558940 ] genericqa commented on YARN-6966: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 12m 38s{color} | {color:red} Docker failed to build yetus/hadoop:20ca677. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-6966 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12933195/YARN-6966-branch-3.0.0.001.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21384/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > NodeManager metrics may return wrong negative values when NM restart > > > Key: YARN-6966 > URL: https://issues.apache.org/jira/browse/YARN-6966 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Szilard Nemeth >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-6966-branch-2.001.patch, > YARN-6966-branch-3.0.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, > YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, > YARN-6966.005.patch, YARN-6966.006.patch > > > Just as YARN-6212. However, I think it is not a duplicate of YARN-3933. > The primary cause of negative values is that metrics do not recover properly > when NM restart. > AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores > in metrics also need to recover when NM restart. > This should be done in ContainerManagerImpl#recoverContainer. > The scenario could be reproduction by the following steps: > # Make sure > YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true > in NM > # Submit an application and keep running > # Restart NM > # Stop the application > # Now you get the negative values > {code} > /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics > {code} > {code} > { > name: "Hadoop:service=NodeManager,name=NodeManagerMetrics", > modelerType: "NodeManagerMetrics", > tag.Context: "yarn", > tag.Hostname: "hadoop.com", > ContainersLaunched: 0, > ContainersCompleted: 0, > ContainersFailed: 2, > ContainersKilled: 0, > ContainersIniting: 0, > ContainersRunning: 0, > AllocatedGB: 0, > AllocatedContainers: -2, > AvailableGB: 160, > AllocatedVCores: -11, > AvailableVCores: 3611, > ContainerLaunchDurationNumOps: 2, > ContainerLaunchDurationAvgTime: 6, > BadLocalDirs: 0, > BadLogDirs: 0, > GoodLocalDirsDiskUtilizationPerc: 2, > GoodLogDirsDiskUtilizationPerc: 2 > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8488) Need to add "SUCCEED" state to YARN service
[ https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad reassigned YARN-8488: -- Assignee: Suma Shivaprasad > Need to add "SUCCEED" state to YARN service > --- > > Key: YARN-8488 > URL: https://issues.apache.org/jira/browse/YARN-8488 > Project: Hadoop YARN > Issue Type: Task > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Suma Shivaprasad >Priority: Major > > Existing YARN service has following states: > {code} > public enum ServiceState { > ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING, > UPGRADING_AUTO_FINALIZE; > } > {code} > Ideally we should add "SUCCEEDED" state in order to support long running > applications like Tensorflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558936#comment-16558936 ] Zian Chen commented on YARN-8509: - Update patch fixing failed UTs. > Fix UserLimit calculation for preemption to balance scenario after queue > satisfied > > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch, YARN-8509.002.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8509: Attachment: YARN-8509.002.patch > Fix UserLimit calculation for preemption to balance scenario after queue > satisfied > > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch, YARN-8509.002.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly
[ https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558931#comment-16558931 ] Gour Saha commented on YARN-8429: - Mistakenly had a test commented out in patch 003. Undoing that in patch 004. Thanks [~billie.rinaldi] for catching that. > Improve diagnostic message when artifact is not set properly > > > Key: YARN-8429 > URL: https://issues.apache.org/jira/browse/YARN-8429 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8429.001.patch, YARN-8429.002.patch, > YARN-8429.003.patch, YARN-8429.004.patch > > > Steps: > 1) Create launch json file. Replace "artifact" with "artifacts" > 2) launch yarn service app with cli > The application launch fails with below error > {code} > [xxx xxx]$ yarn app -launch test2-2 test.json > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition > from local FS: /xxx/test.json > 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms > 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be > absolute path: /xxx/xxx > {code} > artifact field is not mandatory. However, If that field is specified > incorrectly, launch cmd should fail with proper error. > Here, The error message regarding Dest file is misleading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8522) Application fails with InvalidResourceRequestException
[ https://issues.apache.org/jira/browse/YARN-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558930#comment-16558930 ] genericqa commented on YARN-8522: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 19s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 56s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 11s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 56m 16s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8522 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12933265/YARN-8522.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 4b93b11f816f 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d70d845 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21382/testReport/ | | Max. process+thread count | 397 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21382/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Application
[jira] [Updated] (YARN-8429) Improve diagnostic message when artifact is not set properly
[ https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8429: Attachment: YARN-8429.004.patch > Improve diagnostic message when artifact is not set properly > > > Key: YARN-8429 > URL: https://issues.apache.org/jira/browse/YARN-8429 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8429.001.patch, YARN-8429.002.patch, > YARN-8429.003.patch, YARN-8429.004.patch > > > Steps: > 1) Create launch json file. Replace "artifact" with "artifacts" > 2) launch yarn service app with cli > The application launch fails with below error > {code} > [xxx xxx]$ yarn app -launch test2-2 test.json > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition > from local FS: /xxx/test.json > 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms > 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be > absolute path: /xxx/xxx > {code} > artifact field is not mandatory. However, If that field is specified > incorrectly, launch cmd should fail with proper error. > Here, The error message regarding Dest file is misleading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558907#comment-16558907 ] Eric Yang commented on YARN-8587: - There is backward incompatibility concern with distributed shell, where we allow user to specify multiple unix command and output redirection of log file. For fixing this transient false positive, logging mechanism behavior will change. stderr, stdout will contain command output. stderr.txt and stdout.txt will container more information including command launched, and docker errors. Hence, this can only be fixed if we agree that the incompatible change is negligible. > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Priority: Major > Labels: Docker > > Launch dshell application. Wait for application to go in RUNNING state. > {code:java} > yarn jar /xx/hadoop-yarn-applications-distributedshell-*.jar -shell_command > "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker > -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar > {code} > Find out container allocation. Run docker inspect command for docker > containers launched by app. > Sometimes, the container is allocated to NM but docker PID is not up. > {code:java} > Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null > xxx "sudo su - -c \"docker ps -a | grep > container_e02_1531189225093_0003_01_02\" root" failed after 0 retries > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8508) GPU does not get released even though the container is killed
[ https://issues.apache.org/jira/browse/YARN-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558905#comment-16558905 ] genericqa commented on YARN-8508: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 33s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 44s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 17s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 1s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 70m 46s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8508 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12933259/YARN-8505.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 4ab7d00c07b9 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / be150a1 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21381/testReport/ | | Max. process+thread count | 408 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21381/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > GPU does not get released even though the container is killed
[jira] [Updated] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8509: Description: In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total pending resource based on user-limit percent and user-limit factor which will cap pending resource for each user to the minimum of user-limit pending and actual pending. This will prevent queue from taking more pending resource to achieve queue balance after all queue satisfied with its ideal allocation. We need to change the logic to let queue pending can go beyond userlimit. was: In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total pending resource based on user-limit percent and user-limit factor which will cap pending resource for each user to the minimum of user-limit pending and actual pending. This will prevent queue from taking more pending resource to achieve queue balance after all queue satisfied with its ideal allocation. We need to change the logic to let queue pending can reach at most (Queue_max_capacity - Queue_used_capacity). > Fix UserLimit calculation for preemption to balance scenario after queue > satisfied > > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558882#comment-16558882 ] Zian Chen commented on YARN-8509: - talked with [~sunilg], It can go beyond maxCap - usedCap. Because a user can ask for 1maps. but cluster can run a max of 1000. In this case, as soon as each map finish, other one pending will get scheduled > Fix UserLimit calculation for preemption to balance scenario after queue > satisfied > > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can reach at most > (Queue_max_capacity - Queue_used_capacity). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8571) Validate service principal format prior to launching yarn service
[ https://issues.apache.org/jira/browse/YARN-8571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558880#comment-16558880 ] Eric Yang commented on YARN-8571: - Patch 002 added error handling for NPE. > Validate service principal format prior to launching yarn service > - > > Key: YARN-8571 > URL: https://issues.apache.org/jira/browse/YARN-8571 > Project: Hadoop YARN > Issue Type: Bug > Components: security, yarn >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8571.001.patch, YARN-8571.002.patch > > > Hadoop client and server interaction is designed to validate the service > principal before RPC request is permitted. In YARN service, the same > security model is enforced to prevent replay attack. However, end user > might submit JSON that looks like this to YARN service REST API: > {code} > { > "name": "sleeper-service", > "version": "1.0.0", > "components" : > [ > { > "name": "sleeper", > "number_of_containers": 2, > "launch_command": "sleep 90", > "resource": { > "cpus": 1, > "memory": "256" > } > } > ], > "kerberos_principal" : { > "principal_name" : "ambari...@example.com", > "keytab" : "file:///etc/security/keytabs/smokeuser.headless.keytab" > } > } > {code} > The kerberos principal is end user kerberos principal instead of service > principal. This does not work properly because YARN service application > master requires to run with a service principal to communicate with YARN CLI > client via Hadoop RPC. Without breaking Hadoop security design in this JIRA, > it might be in our best interest to validate principal_name during > submission, and report error message when someone tries to run YARN service > with user principal. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8571) Validate service principal format prior to launching yarn service
[ https://issues.apache.org/jira/browse/YARN-8571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8571: Attachment: YARN-8571.002.patch > Validate service principal format prior to launching yarn service > - > > Key: YARN-8571 > URL: https://issues.apache.org/jira/browse/YARN-8571 > Project: Hadoop YARN > Issue Type: Bug > Components: security, yarn >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8571.001.patch, YARN-8571.002.patch > > > Hadoop client and server interaction is designed to validate the service > principal before RPC request is permitted. In YARN service, the same > security model is enforced to prevent replay attack. However, end user > might submit JSON that looks like this to YARN service REST API: > {code} > { > "name": "sleeper-service", > "version": "1.0.0", > "components" : > [ > { > "name": "sleeper", > "number_of_containers": 2, > "launch_command": "sleep 90", > "resource": { > "cpus": 1, > "memory": "256" > } > } > ], > "kerberos_principal" : { > "principal_name" : "ambari...@example.com", > "keytab" : "file:///etc/security/keytabs/smokeuser.headless.keytab" > } > } > {code} > The kerberos principal is end user kerberos principal instead of service > principal. This does not work properly because YARN service application > master requires to run with a service principal to communicate with YARN CLI > client via Hadoop RPC. Without breaking Hadoop security design in this JIRA, > it might be in our best interest to validate principal_name during > submission, and report error message when someone tries to run YARN service > with user principal. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8522) Application fails with InvalidResourceRequestException
[ https://issues.apache.org/jira/browse/YARN-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8522: Attachment: YARN-8522.002.patch > Application fails with InvalidResourceRequestException > -- > > Key: YARN-8522 > URL: https://issues.apache.org/jira/browse/YARN-8522 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8522.001.patch, YARN-8522.002.patch > > > Launch multiple streaming app simultaneously. Here, sometimes one of the > application fails with below stack trace. > {code} > 18/07/02 07:14:32 INFO retry.RetryInvocationHandler: > java.net.ConnectException: Call From xx.xx.xx.xx/xx.xx.xx.xx to > xx.xx.xx.xx:8032 failed on connection exception: java.net.ConnectException: > Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused, while invoking > ApplicationClientProtocolPBClientImpl.submitApplication over null. Retrying > after sleeping for 3ms. > 18/07/02 07:14:32 WARN client.RequestHedgingRMFailoverProxyProvider: > Invocation returned exception: > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, only one resource request with * is allowed > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > on [rm2], so propagating back to caller. > 18/07/02 07:14:32 INFO mapreduce.JobSubmitter: Cleaning up the staging area > /user/hrt_qa/.staging/job_1530515284077_0007 > 18/07/02 07:14:32 ERROR streaming.StreamJob: Error Launching job : > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, only one resource request with * is allowed > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > Streaming Command Failed!{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional
[jira] [Commented] (YARN-8522) Application fails with InvalidResourceRequestException
[ https://issues.apache.org/jira/browse/YARN-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558870#comment-16558870 ] Zian Chen commented on YARN-8522: - Thanks for the suggestions [~sunilg], Update patch 002. > Application fails with InvalidResourceRequestException > -- > > Key: YARN-8522 > URL: https://issues.apache.org/jira/browse/YARN-8522 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8522.001.patch, YARN-8522.002.patch > > > Launch multiple streaming app simultaneously. Here, sometimes one of the > application fails with below stack trace. > {code} > 18/07/02 07:14:32 INFO retry.RetryInvocationHandler: > java.net.ConnectException: Call From xx.xx.xx.xx/xx.xx.xx.xx to > xx.xx.xx.xx:8032 failed on connection exception: java.net.ConnectException: > Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused, while invoking > ApplicationClientProtocolPBClientImpl.submitApplication over null. Retrying > after sleeping for 3ms. > 18/07/02 07:14:32 WARN client.RequestHedgingRMFailoverProxyProvider: > Invocation returned exception: > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, only one resource request with * is allowed > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > on [rm2], so propagating back to caller. > 18/07/02 07:14:32 INFO mapreduce.JobSubmitter: Cleaning up the staging area > /user/hrt_qa/.staging/job_1530515284077_0007 > 18/07/02 07:14:32 ERROR streaming.StreamJob: Error Launching job : > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, only one resource request with * is allowed > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > Streaming Command Failed!{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe,
[jira] [Commented] (YARN-8508) GPU does not get released even though the container is killed
[ https://issues.apache.org/jira/browse/YARN-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558847#comment-16558847 ] genericqa commented on YARN-8508: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 45s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 48s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 0s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 76m 41s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8508 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12933248/YARN-8505.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 95544b2179d7 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / be150a1 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21380/testReport/ | | Max. process+thread count | 336 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21380/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT
[jira] [Commented] (YARN-8508) GPU does not get released even though the container is killed
[ https://issues.apache.org/jira/browse/YARN-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558838#comment-16558838 ] Chandni Singh commented on YARN-8508: - [~shaneku...@gmail.com] [~eyang] could you please review patch 2? > GPU does not get released even though the container is killed > -- > > Key: YARN-8508 > URL: https://issues.apache.org/jira/browse/YARN-8508 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8505.001.patch, YARN-8505.002.patch > > > GPU failed to release even though the container using it is being killed > {Code} > 2018-07-06 05:22:26,201 INFO container.ContainerImpl > (ContainerImpl.java:handle(2093)) - Container > container_e20_1530854311763_0006_01_01 transitioned from RUNNING to > KILLING > 2018-07-06 05:22:26,250 INFO container.ContainerImpl > (ContainerImpl.java:handle(2093)) - Container > container_e20_1530854311763_0006_01_02 transitioned from RUNNING to > KILLING > 2018-07-06 05:22:26,251 INFO application.ApplicationImpl > (ApplicationImpl.java:handle(632)) - Application > application_1530854311763_0006 transitioned from RUNNING to > FINISHING_CONTAINERS_WAIT > 2018-07-06 05:22:26,251 INFO launcher.ContainerLaunch > (ContainerLaunch.java:cleanupContainer(734)) - Cleaning up container > container_e20_1530854311763_0006_01_02 > 2018-07-06 05:22:31,358 INFO launcher.ContainerLaunch > (ContainerLaunch.java:getContainerPid(1102)) - Could not get pid for > container_e20_1530854311763_0006_01_02. Waited for 5000 ms. > 2018-07-06 05:22:31,358 WARN launcher.ContainerLaunch > (ContainerLaunch.java:cleanupContainer(784)) - Container clean up before pid > file created container_e20_1530854311763_0006_01_02 > 2018-07-06 05:22:31,359 INFO launcher.ContainerLaunch > (ContainerLaunch.java:reapDockerContainerNoPid(940)) - Unable to obtain pid, > but docker container request detected. Attempting to reap container > container_e20_1530854311763_0006_01_02 > 2018-07-06 05:22:31,494 INFO nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : > /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/launch_container.sh > 2018-07-06 05:22:31,500 INFO nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : > /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/container_tokens > 2018-07-06 05:22:31,510 INFO container.ContainerImpl > (ContainerImpl.java:handle(2093)) - Container > container_e20_1530854311763_0006_01_01 transitioned from KILLING to > CONTAINER_CLEANEDUP_AFTER_KILL > 2018-07-06 05:22:31,510 INFO container.ContainerImpl > (ContainerImpl.java:handle(2093)) - Container > container_e20_1530854311763_0006_01_02 transitioned from KILLING to > CONTAINER_CLEANEDUP_AFTER_KILL > 2018-07-06 05:22:31,512 INFO container.ContainerImpl > (ContainerImpl.java:handle(2093)) - Container > container_e20_1530854311763_0006_01_01 transitioned from > CONTAINER_CLEANEDUP_AFTER_KILL to DONE > 2018-07-06 05:22:31,513 INFO container.ContainerImpl > (ContainerImpl.java:handle(2093)) - Container > container_e20_1530854311763_0006_01_02 transitioned from > CONTAINER_CLEANEDUP_AFTER_KILL to DONE > 2018-07-06 05:22:38,955 INFO container.ContainerImpl > (ContainerImpl.java:handle(2093)) - Container > container_e20_1530854311763_0007_01_02 transitioned from NEW to SCHEDULED > {Code} > New container requesting for GPU fails to launch > {code} > 2018-07-06 05:22:39,048 ERROR nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleLaunchForLaunchType(550)) - > ResourceHandlerChain.preStart() failed! > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException: > Failed to find enough GPUs, > requestor=container_e20_1530854311763_0007_01_02, #RequestedGPUs=2, > #availableGpus=1 > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.internalAssignGpus(GpuResourceAllocator.java:225) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.assignGpus(GpuResourceAllocator.java:173) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceHandlerImpl.preStart(GpuResourceHandlerImpl.java:98) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.preStart(ResourceHandlerChain.java:75) > at >
[jira] [Updated] (YARN-8508) GPU does not get released even though the container is killed
[ https://issues.apache.org/jira/browse/YARN-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-8508: Attachment: YARN-8505.002.patch > GPU does not get released even though the container is killed > -- > > Key: YARN-8508 > URL: https://issues.apache.org/jira/browse/YARN-8508 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8505.001.patch, YARN-8505.002.patch > > > GPU failed to release even though the container using it is being killed > {Code} > 2018-07-06 05:22:26,201 INFO container.ContainerImpl > (ContainerImpl.java:handle(2093)) - Container > container_e20_1530854311763_0006_01_01 transitioned from RUNNING to > KILLING > 2018-07-06 05:22:26,250 INFO container.ContainerImpl > (ContainerImpl.java:handle(2093)) - Container > container_e20_1530854311763_0006_01_02 transitioned from RUNNING to > KILLING > 2018-07-06 05:22:26,251 INFO application.ApplicationImpl > (ApplicationImpl.java:handle(632)) - Application > application_1530854311763_0006 transitioned from RUNNING to > FINISHING_CONTAINERS_WAIT > 2018-07-06 05:22:26,251 INFO launcher.ContainerLaunch > (ContainerLaunch.java:cleanupContainer(734)) - Cleaning up container > container_e20_1530854311763_0006_01_02 > 2018-07-06 05:22:31,358 INFO launcher.ContainerLaunch > (ContainerLaunch.java:getContainerPid(1102)) - Could not get pid for > container_e20_1530854311763_0006_01_02. Waited for 5000 ms. > 2018-07-06 05:22:31,358 WARN launcher.ContainerLaunch > (ContainerLaunch.java:cleanupContainer(784)) - Container clean up before pid > file created container_e20_1530854311763_0006_01_02 > 2018-07-06 05:22:31,359 INFO launcher.ContainerLaunch > (ContainerLaunch.java:reapDockerContainerNoPid(940)) - Unable to obtain pid, > but docker container request detected. Attempting to reap container > container_e20_1530854311763_0006_01_02 > 2018-07-06 05:22:31,494 INFO nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : > /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/launch_container.sh > 2018-07-06 05:22:31,500 INFO nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : > /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/container_tokens > 2018-07-06 05:22:31,510 INFO container.ContainerImpl > (ContainerImpl.java:handle(2093)) - Container > container_e20_1530854311763_0006_01_01 transitioned from KILLING to > CONTAINER_CLEANEDUP_AFTER_KILL > 2018-07-06 05:22:31,510 INFO container.ContainerImpl > (ContainerImpl.java:handle(2093)) - Container > container_e20_1530854311763_0006_01_02 transitioned from KILLING to > CONTAINER_CLEANEDUP_AFTER_KILL > 2018-07-06 05:22:31,512 INFO container.ContainerImpl > (ContainerImpl.java:handle(2093)) - Container > container_e20_1530854311763_0006_01_01 transitioned from > CONTAINER_CLEANEDUP_AFTER_KILL to DONE > 2018-07-06 05:22:31,513 INFO container.ContainerImpl > (ContainerImpl.java:handle(2093)) - Container > container_e20_1530854311763_0006_01_02 transitioned from > CONTAINER_CLEANEDUP_AFTER_KILL to DONE > 2018-07-06 05:22:38,955 INFO container.ContainerImpl > (ContainerImpl.java:handle(2093)) - Container > container_e20_1530854311763_0007_01_02 transitioned from NEW to SCHEDULED > {Code} > New container requesting for GPU fails to launch > {code} > 2018-07-06 05:22:39,048 ERROR nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleLaunchForLaunchType(550)) - > ResourceHandlerChain.preStart() failed! > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException: > Failed to find enough GPUs, > requestor=container_e20_1530854311763_0007_01_02, #RequestedGPUs=2, > #availableGpus=1 > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.internalAssignGpus(GpuResourceAllocator.java:225) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.assignGpus(GpuResourceAllocator.java:173) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceHandlerImpl.preStart(GpuResourceHandlerImpl.java:98) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.preStart(ResourceHandlerChain.java:75) > at >
[jira] [Commented] (YARN-8545) YARN native service should return container if launch failed
[ https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558835#comment-16558835 ] Chandni Singh commented on YARN-8545: - [~billie.rinaldi] [~eyang] Do you have any comments on patch 1? > YARN native service should return container if launch failed > > > Key: YARN-8545 > URL: https://issues.apache.org/jira/browse/YARN-8545 > Project: Hadoop YARN > Issue Type: Task >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8545.001.patch > > > In some cases, container launch may fail but container will not be properly > returned to RM. > This could happen when AM trying to prepare container launch context but > failed w/o sending container launch context to NM (Once container launch > context is sent to NM, NM will report failed container to RM). > Exception like: > {code:java} > java.io.FileNotFoundException: File does not exist: > hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591) > at > org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388) > at > org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253) > at > org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152) > at > org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){code} > And even after container launch context prepare failed, AM still trying to > monitor container's readiness: > {code:java} > 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO monitor.ServiceMonitor - > Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 > 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP > presence", exception="java.io.IOException: primary-worker-0: IP is not > available yet" > ...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6906) Cluster Node API and Cluster Nodes API should report resource types information
[ https://issues.apache.org/jira/browse/YARN-6906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558803#comment-16558803 ] genericqa commented on YARN-6906: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 7s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 23s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 69m 50s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}130m 10s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-6906 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12933235/YARN-6906.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux e89b3442cf1f 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / be150a1 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21378/testReport/ | | Max. process+thread count | 1309 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21378/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Cluster Node API and Cluster Nodes API should
[jira] [Updated] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-8587: -- Labels: Docker (was: ) > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Priority: Major > Labels: Docker > > Launch dshell application. Wait for application to go in RUNNING state. > {code:java} > yarn jar /xx/hadoop-yarn-applications-distributedshell-*.jar -shell_command > "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker > -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar > {code} > Find out container allocation. Run docker inspect command for docker > containers launched by app. > Sometimes, the container is allocated to NM but docker PID is not up. > {code:java} > Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null > xxx "sudo su - -c \"docker ps -a | grep > container_e02_1531189225093_0003_01_02\" root" failed after 0 retries > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8508) GPU does not get released even though the container is killed
[ https://issues.apache.org/jira/browse/YARN-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-8508: Attachment: YARN-8505.001.patch > GPU does not get released even though the container is killed > -- > > Key: YARN-8508 > URL: https://issues.apache.org/jira/browse/YARN-8508 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8505.001.patch > > > GPU failed to release even though the container using it is being killed > {Code} > 2018-07-06 05:22:26,201 INFO container.ContainerImpl > (ContainerImpl.java:handle(2093)) - Container > container_e20_1530854311763_0006_01_01 transitioned from RUNNING to > KILLING > 2018-07-06 05:22:26,250 INFO container.ContainerImpl > (ContainerImpl.java:handle(2093)) - Container > container_e20_1530854311763_0006_01_02 transitioned from RUNNING to > KILLING > 2018-07-06 05:22:26,251 INFO application.ApplicationImpl > (ApplicationImpl.java:handle(632)) - Application > application_1530854311763_0006 transitioned from RUNNING to > FINISHING_CONTAINERS_WAIT > 2018-07-06 05:22:26,251 INFO launcher.ContainerLaunch > (ContainerLaunch.java:cleanupContainer(734)) - Cleaning up container > container_e20_1530854311763_0006_01_02 > 2018-07-06 05:22:31,358 INFO launcher.ContainerLaunch > (ContainerLaunch.java:getContainerPid(1102)) - Could not get pid for > container_e20_1530854311763_0006_01_02. Waited for 5000 ms. > 2018-07-06 05:22:31,358 WARN launcher.ContainerLaunch > (ContainerLaunch.java:cleanupContainer(784)) - Container clean up before pid > file created container_e20_1530854311763_0006_01_02 > 2018-07-06 05:22:31,359 INFO launcher.ContainerLaunch > (ContainerLaunch.java:reapDockerContainerNoPid(940)) - Unable to obtain pid, > but docker container request detected. Attempting to reap container > container_e20_1530854311763_0006_01_02 > 2018-07-06 05:22:31,494 INFO nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : > /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/launch_container.sh > 2018-07-06 05:22:31,500 INFO nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : > /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/container_tokens > 2018-07-06 05:22:31,510 INFO container.ContainerImpl > (ContainerImpl.java:handle(2093)) - Container > container_e20_1530854311763_0006_01_01 transitioned from KILLING to > CONTAINER_CLEANEDUP_AFTER_KILL > 2018-07-06 05:22:31,510 INFO container.ContainerImpl > (ContainerImpl.java:handle(2093)) - Container > container_e20_1530854311763_0006_01_02 transitioned from KILLING to > CONTAINER_CLEANEDUP_AFTER_KILL > 2018-07-06 05:22:31,512 INFO container.ContainerImpl > (ContainerImpl.java:handle(2093)) - Container > container_e20_1530854311763_0006_01_01 transitioned from > CONTAINER_CLEANEDUP_AFTER_KILL to DONE > 2018-07-06 05:22:31,513 INFO container.ContainerImpl > (ContainerImpl.java:handle(2093)) - Container > container_e20_1530854311763_0006_01_02 transitioned from > CONTAINER_CLEANEDUP_AFTER_KILL to DONE > 2018-07-06 05:22:38,955 INFO container.ContainerImpl > (ContainerImpl.java:handle(2093)) - Container > container_e20_1530854311763_0007_01_02 transitioned from NEW to SCHEDULED > {Code} > New container requesting for GPU fails to launch > {code} > 2018-07-06 05:22:39,048 ERROR nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleLaunchForLaunchType(550)) - > ResourceHandlerChain.preStart() failed! > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException: > Failed to find enough GPUs, > requestor=container_e20_1530854311763_0007_01_02, #RequestedGPUs=2, > #availableGpus=1 > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.internalAssignGpus(GpuResourceAllocator.java:225) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.assignGpus(GpuResourceAllocator.java:173) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceHandlerImpl.preStart(GpuResourceHandlerImpl.java:98) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.preStart(ResourceHandlerChain.java:75) > at >
[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes
[ https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558735#comment-16558735 ] genericqa commented on YARN-7863: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 11s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-7863 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12933244/YARN-7863-YARN-3409.003.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21379/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Modify placement constraints to support node attributes > --- > > Key: YARN-7863 > URL: https://issues.apache.org/jira/browse/YARN-7863 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7863-YARN-3409.002.patch, > YARN-7863-YARN-3409.003.patch, YARN-7863.v0.patch > > > This Jira will track to *Modify existing placement constraints to support > node attributes.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7863) Modify placement constraints to support node attributes
[ https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558728#comment-16558728 ] Sunil Govindan edited comment on YARN-7863 at 7/26/18 6:35 PM: --- Thanks [~Naganarasimha] and [~cheersyang] for comments A quick summary based on an offline discussion: {noformat} DS Example: bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.0-SNAPSHOT.jar -shell_command Sleep -shell_args 10 -num_containers 2 -master_memory 200 -container_memory 200 -placement_spec OR(IN,python=true:NOTIN,java=1.8):IN,os!=centos:NOTIN,env=prod,dev {noformat} For DS, I think its better we correlate with placement_spec for now for better easiness in review and api compatibility. A refactor improvement could be done later to improvise this inline with native service approach. Proposal for jira for later time is {{-placement_spec (python=true OR java!=1.8) AND (os!=centos) AND (env NOTIN (prod,dev))}} Now to expand more on below approach in patch, {noformat} -placement_spec OR(IN,python=true:NOTIN,java=1.8):IN,os!=centos:NOTIN,env=prod,dev{noformat} we could see this like *put me to a node where it has node-attribute python=true.* OR *dont put me in a node where it has node-attribute java=1.8* AND *put me to a node where it has node-attribute os is not centos.* AND *dont put me in a node where it has attributes env with values prod or dev* Also attaching a patch based on this including comments from Naga. I ll add test case in next patch. was (Author: sunilg): Thanks [~Naganarasimha] and [~cheersyang] for comments A quick summary based on an offline discussion: {noformat} DS Example: bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.0-SNAPSHOT.jar -shell_command Sleep -shell_args 10 -num_containers 2 -master_memory 200 -container_memory 200 -placement_spec OR(IN,python=true:NOTIN,java=1.8):IN,os!=centos:NOTIN,env=prod,dev {noformat} For DS, I think its better we correlate with placement_spec for now for better easiness in review and api compatibility. A refactor improvement could be done later to improvise this inline with native service approach. Proposal for jira for later time is {{-placement_spec (python=true OR java!=1.8) AND (os!=centos) AND (env NOTIN (prod,dev))}} Now to expand more on below approach in patch, {noformat} -placement_spec OR(IN,python=true:NOTIN,java=1.8):IN,os!=centos:NOTIN,env=prod,dev{noformat} we could see this like *put me to a node where it has node-attribute python=true.* OR *dont put me in a node where it has node-attribute java=1.8* AND *put me to a node where it has node-attribute os is not centos.* AND *dont put me in a node where it has attributes env with values prod or dev* Also attaching a patch based on this including comments from Naga. > Modify placement constraints to support node attributes > --- > > Key: YARN-7863 > URL: https://issues.apache.org/jira/browse/YARN-7863 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7863-YARN-3409.002.patch, > YARN-7863-YARN-3409.003.patch, YARN-7863.v0.patch > > > This Jira will track to *Modify existing placement constraints to support > node attributes.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7863) Modify placement constraints to support node attributes
[ https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-7863: - Attachment: YARN-7863-YARN-3409.003.patch > Modify placement constraints to support node attributes > --- > > Key: YARN-7863 > URL: https://issues.apache.org/jira/browse/YARN-7863 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7863-YARN-3409.002.patch, > YARN-7863-YARN-3409.003.patch, YARN-7863.v0.patch > > > This Jira will track to *Modify existing placement constraints to support > node attributes.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes
[ https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558728#comment-16558728 ] Sunil Govindan commented on YARN-7863: -- Thanks [~Naganarasimha] and [~cheersyang] for comments A quick summary based on an offline discussion: {noformat} DS Example: bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.0-SNAPSHOT.jar -shell_command Sleep -shell_args 10 -num_containers 2 -master_memory 200 -container_memory 200 -placement_spec OR(IN,python=true:NOTIN,java=1.8):IN,os!=centos:NOTIN,env=prod,dev {noformat} For DS, I think its better we correlate with placement_spec for now for better easiness in review and api compatibility. A refactor improvement could be done later to improvise this inline with native service approach. Proposal for jira for later time is {{-placement_spec (python=true OR java!=1.8) AND (os!=centos) AND (env NOTIN (prod,dev))}} Now to expand more on below approach in patch, {noformat} -placement_spec OR(IN,python=true:NOTIN,java=1.8):IN,os!=centos:NOTIN,env=prod,dev{noformat} we could see this like *put me to a node where it has node-attribute python=true.* OR *dont put me in a node where it has node-attribute java=1.8* AND *put me to a node where it has node-attribute os is not centos.* AND *dont put me in a node where it has attributes env with values prod or dev* Also attaching a patch based on this including comments from Naga. > Modify placement constraints to support node attributes > --- > > Key: YARN-7863 > URL: https://issues.apache.org/jira/browse/YARN-7863 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7863-YARN-3409.002.patch, YARN-7863.v0.patch > > > This Jira will track to *Modify existing placement constraints to support > node attributes.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558720#comment-16558720 ] Eric Yang edited comment on YARN-8587 at 7/26/18 6:30 PM: -- This bug is result of docker run detach reports exit_code 0, but the process inside the container fail to run. For a brief period of time, node manager will report back that container is in RUNNING state, then fail the container later. One possible solution is to change container-executor for non-entry-point mode to become more similar to entry_point mode to run docker run in the foreground, and parent process have a set of retries for docker inspect to obtain PID. This removes the possible false positive reporting of RUNNING state. The synthetic timeout approach may kill container prematurely (or wait longer than necessary for failing container), if container takes more than 30 seconds (or configured values) to start the first process in the container. Do we want to make non-entry-point to work like entry-point to prevent the false positive or we are ok with current state? was (Author: eyang): This bug is result of docker run detach reports exit_code 0, but the process inside the container fail to run. For a brief period of time, node manager will report back that container is in RUNNING state, then fail the container later. One possible solution is to change container-executor for non-entry-point mode to become more similar to entry_point mode to run docker run in the foreground, and parent process have a set of retries for docker inspect to obtain PID. This removes the possible false positive reporting of RUNNING state. The synthetic timeout approach may kill container prematurely (or wait longer than necessary for failing container), if container takes more than 30 seconds (or configured values) to start the first process in the container. > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Priority: Major > > Launch dshell application. Wait for application to go in RUNNING state. > {code:java} > yarn jar /xx/hadoop-yarn-applications-distributedshell-*.jar -shell_command > "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker > -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar > {code} > Find out container allocation. Run docker inspect command for docker > containers launched by app. > Sometimes, the container is allocated to NM but docker PID is not up. > {code:java} > Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null > xxx "sudo su - -c \"docker ps -a | grep > container_e02_1531189225093_0003_01_02\" root" failed after 0 retries > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container
[ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558720#comment-16558720 ] Eric Yang commented on YARN-8587: - This bug is result of docker run detach reports exit_code 0, but the process inside the container fail to run. For a brief period of time, node manager will report back that container is in RUNNING state, then fail the container later. One possible solution is to change container-executor for non-entry-point mode to become more similar to entry_point mode to run docker run in the foreground, and parent process have a set of retries for docker inspect to obtain PID. This removes the possible false positive reporting of RUNNING state. The synthetic timeout approach may kill container prematurely (or wait longer than necessary for failing container), if container takes more than 30 seconds (or configured values) to start the first process in the container. > Delays are noticed to launch docker container > - > > Key: YARN-8587 > URL: https://issues.apache.org/jira/browse/YARN-8587 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Priority: Major > > Launch dshell application. Wait for application to go in RUNNING state. > {code:java} > yarn jar /xx/hadoop-yarn-applications-distributedshell-*.jar -shell_command > "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker > -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar > {code} > Find out container allocation. Run docker inspect command for docker > containers launched by app. > Sometimes, the container is allocated to NM but docker PID is not up. > {code:java} > Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null > xxx "sudo su - -c \"docker ps -a | grep > container_e02_1531189225093_0003_01_02\" root" failed after 0 retries > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8587) Delays are noticed to launch docker container
Yesha Vora created YARN-8587: Summary: Delays are noticed to launch docker container Key: YARN-8587 URL: https://issues.apache.org/jira/browse/YARN-8587 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.1.1 Reporter: Yesha Vora Launch dshell application. Wait for application to go in RUNNING state. {code:java} yarn jar /xx/hadoop-yarn-applications-distributedshell-*.jar -shell_command "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar {code} Find out container allocation. Run docker inspect command for docker containers launched by app. Sometimes, the container is allocated to NM but docker PID is not up. {code:java} Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null xxx "sudo su - -c \"docker ps -a | grep container_e02_1531189225093_0003_01_02\" root" failed after 0 retries {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA
[ https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Bálint Steinbach reassigned YARN-5464: Assignee: Antal Bálint Steinbach > Server-Side NM Graceful Decommissioning with RM HA > -- > > Key: YARN-5464 > URL: https://issues.apache.org/jira/browse/YARN-5464 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Robert Kanter >Assignee: Antal Bálint Steinbach >Priority: Major > Attachments: YARN-5464.wip.patch > > > Make sure to remove the note added by YARN-7094 about RM HA failover not > working right. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6906) Cluster Node API and Cluster Nodes API should report resource types information
[ https://issues.apache.org/jira/browse/YARN-6906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R reassigned YARN-6906: -- Assignee: Manikandan R > Cluster Node API and Cluster Nodes API should report resource types > information > --- > > Key: YARN-6906 > URL: https://issues.apache.org/jira/browse/YARN-6906 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: YARN-3926 >Reporter: Daniel Templeton >Assignee: Manikandan R >Priority: Major > Attachments: YARN-6906.001.patch > > > These endpoints currently report: > {noformat} > > /default-rack > RUNNING > localhost:51877 > localhost > localhost:8042 > 1501534150336 > 3.0.0-beta1-SNAPSHOT > > 4 > 5120 > 3072 > 4 > 0 > 0 > 0 > 0 > 0 > > 0 > 0 > 0.0 > 0 > 0 > 0.0 > > > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6906) Cluster Node API and Cluster Nodes API should report resource types information
[ https://issues.apache.org/jira/browse/YARN-6906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-6906: --- Attachment: YARN-6906.001.patch > Cluster Node API and Cluster Nodes API should report resource types > information > --- > > Key: YARN-6906 > URL: https://issues.apache.org/jira/browse/YARN-6906 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: YARN-3926 >Reporter: Daniel Templeton >Priority: Major > Attachments: YARN-6906.001.patch > > > These endpoints currently report: > {noformat} > > /default-rack > RUNNING > localhost:51877 > localhost > localhost:8042 > 1501534150336 > 3.0.0-beta1-SNAPSHOT > > 4 > 5120 > 3072 > 4 > 0 > 0 > 0 > 0 > 0 > > 0 > 0 > 0.0 > 0 > 0 > 0.0 > > > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6906) Cluster Node API and Cluster Nodes API should report resource types information
[ https://issues.apache.org/jira/browse/YARN-6906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558632#comment-16558632 ] Manikandan R commented on YARN-6906: This request has been handled as part of YARN-7817. YARN-7817 introduced usedResource and availableResource nodes as part of NodeInfo which contains ResourceInformations as well. Added unit test cases to ensure the same in .001 patch. Another thing to note is, even though NM is not configured for any particular resource type configured at RM side, still those resource type would be available in NodeInfo API response, but with value as 0. > Cluster Node API and Cluster Nodes API should report resource types > information > --- > > Key: YARN-6906 > URL: https://issues.apache.org/jira/browse/YARN-6906 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: YARN-3926 >Reporter: Daniel Templeton >Priority: Major > > These endpoints currently report: > {noformat} > > /default-rack > RUNNING > localhost:51877 > localhost > localhost:8042 > 1501534150336 > 3.0.0-beta1-SNAPSHOT > > 4 > 5120 > 3072 > 4 > 0 > 0 > 0 > 0 > 0 > > 0 > 0 > 0.0 > 0 > 0 > 0.0 > > > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8566) Add diagnostic message for unschedulable containers
[ https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558628#comment-16558628 ] genericqa commented on YARN-8566: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 52s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 13s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 47s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 14s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}155m 52s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8566 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12933212/YARN-8566.006.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 92ede80f6b63 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a192295 | | maven | version: Apache Maven 3.3.9 | | Default Java |
[jira] [Commented] (YARN-8242) YARN NM: OOM error while reading back the state store on recovery
[ https://issues.apache.org/jira/browse/YARN-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558620#comment-16558620 ] genericqa commented on YARN-8242: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 26s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 53s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 26s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 9s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 78m 4s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8242 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12933220/YARN-8242.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 7e26fe890f0b 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a192295 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21376/testReport/ | | Max. process+thread count | 301 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21376/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > YARN NM: OOM error while reading back the state store on
[jira] [Comment Edited] (YARN-8584) Several typos in Log Aggregation related classes
[ https://issues.apache.org/jira/browse/YARN-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558609#comment-16558609 ] Bibin A Chundatt edited comment on YARN-8584 at 7/26/18 5:03 PM: - +1 LGTM. Will commit it by tomorrow, if no one objects was (Author: bibinchundatt): +1 LGTM Will commit it by tommorrow. > Several typos in Log Aggregation related classes > > > Key: YARN-8584 > URL: https://issues.apache.org/jira/browse/YARN-8584 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: YARN-8584.001.patch > > > There are typos in comments, log messages, method names, field names, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8584) Several typos in Log Aggregation related classes
[ https://issues.apache.org/jira/browse/YARN-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558609#comment-16558609 ] Bibin A Chundatt commented on YARN-8584: +1 LGTM Will commit it by tommorrow. > Several typos in Log Aggregation related classes > > > Key: YARN-8584 > URL: https://issues.apache.org/jira/browse/YARN-8584 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: YARN-8584.001.patch > > > There are typos in comments, log messages, method names, field names, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented
[ https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558574#comment-16558574 ] genericqa commented on YARN-8517: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 34m 36s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 56s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 47m 8s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8517 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12933223/YARN-8517.005.patch | | Optional Tests | asflicense mvnsite | | uname | Linux 8662c046cb19 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a192295 | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 410 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21377/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > getContainer and getContainers ResourceManager REST API methods are not > documented > -- > > Key: YARN-8517 > URL: https://issues.apache.org/jira/browse/YARN-8517 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Antal Bálint Steinbach >Priority: Major > Labels: newbie, newbie++ > Attachments: YARN-8517.001.patch, YARN-8517.002.patch, > YARN-8517.003.patch, YARN-8517.004.patch, YARN-8517.005.patch > > > Looking at the documentation here: > https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html > I cannot find documentation for 2 RM REST endpoints: > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\} > I suppose they are not intentionally undocumented. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8508) GPU does not get released even though the container is killed
[ https://issues.apache.org/jira/browse/YARN-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558571#comment-16558571 ] Chandni Singh commented on YARN-8508: - This happens with a container that gets cleaned up before its pid file is created. To solve it, we need to release the resources at the end of \{{LinuxContainerExecutor.reapContainer()}} just like we do in \{{LinuxContainerExecutor.launchContainer()}}, {\{LinuxContainerExecutor.reLaunchContainer()}}, and \{{LinuxContainerExecutor.reacquireContainer}}. Please see my explanation below: Refer \{{container_e21_1532545600682_0001_01_02}} in yarn8505.nodemanager.log - 002 is launched but its pid file is not created {code} 2018-07-25 19:08:54,409 DEBUG util.ProcessIdFileReader (ProcessIdFileReader.java:getProcessId(53)) - Accessing pid from pid file /.../application_1532545600682_0001/container_e21_1532545600682_0001_01_02/container_e21_1532545600682_0001_01_02.pid 2018-07-25 19:08:54,409 DEBUG util.ProcessIdFileReader (ProcessIdFileReader.java:getProcessId(103)) - Got pid null from path /.../application_1532545600682_0001/container_e21_1532545600682_0001_01_02/container_e21_1532545600682_0001_01_02.pid {code} - Since application is killed, 002 is killed by ResourceManager {code} 2018-07-25 19:08:54,643 DEBUG container.ContainerImpl (ContainerImpl.java:handle(2080)) - Processing container_e21_1532545600682_0001_01_02 of type CONTAINER_KILLED_ON_REQUEST {code} - The above triggers \{{ContainerLaunch.cleanupContainer()}} for 002. This happens before the pid file is created {code} 2018-07-25 19:08:54,409 WARN launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(784)) - Container clean up before pid file created container_e21_1532545600682_0001_01_02 {code} - \{{cleanupContainer}} invokes \{{reapDockerContainerNoPid(user)}} {code} 2018-07-25 19:08:54,410 INFO launcher.ContainerLaunch (ContainerLaunch.java:reapDockerContainerNoPid(940)) - Unable to obtain pid, but docker container request detected. Attempting to reap container container_e21_1532545600682_0001_01_02 {code} - \{{reapDockerContainerNoPid(user)}} calls \{{exec.reapContainer(...)}} {code} 2018-07-25 19:08:54,412 DEBUG docker.DockerCommandExecutor (DockerCommandExecutor.java:executeDockerCommand(89)) - Running docker command: inspect docker-command=inspect format=\{{.State.Status}} name=container_e21_1532545600682_0001_01_02 2018-07-25 19:08:54,412 DEBUG privileged.PrivilegedOperationExecutor (PrivilegedOperationExecutor.java:getPrivilegedOperationExecutionCommand(119)) - Privileged Execution Command Array: [/.../hadoop-yarn/bin/container-executor, --inspect-docker-container, --format=\{{.State.Status}}, container_e21_1532545600682_0001_01_02] 2018-07-25 19:08:54,530 DEBUG docker.DockerCommandExecutor (DockerCommandExecutor.java:getContainerStatus(160)) - Container Status: nonexistent ContainerId: container_e21_1532545600682_0001_01_02 2018-07-25 19:08:54,530 DEBUG launcher.ContainerLaunch (ContainerLaunch.java:reapDockerContainerNoPid(948)) - Sent signal to docker container container_e21_1532545600682_0001_01_02 as user hrt_qa, result=success {code} - The problem is that the \{{reapContainer}} in \{{LinuxContainerExecutor}} doesn't release the resources assigned to the container. The below code snippet that performs these tasks after the container completes doesn't happen at this point. {code} resourcesHandler.postExecute(containerId); try { if (resourceHandlerChain != null) { LOG.info("{} POST Complete", containerId); resourceHandlerChain.postComplete(containerId); } } catch (ResourceHandlerException e) { LOG.warn("ResourceHandlerChain.postComplete failed for " + "containerId: " + containerId + ". Exception: " + e); } } {code} - The launch of container fails after 4 minutes and only then the resources are released. {code} 2018-07-25 19:12:09,999 WARN nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container container_e21_1532545600682_0001_01_02 is : 27 2018-07-25 19:12:10,000 WARN nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from container-launch with container ID: container_e21_1532545600682_0001_01_02 and exit code: 27 2018-07-25 19:12:10,000 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Container id: container_e21_1532545600682_0001_01_02 2018-07-25 19:12:10,003 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Docker inspect command: /usr/bin/docker inspect --format \{{.State.Pid}} container_e21_1532545600682_0001_01_02 2018-07-25 19:12:10,003 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Failed to write pid to file /cgroup/cpu/.../container_e21_1532545600682_0001_01_02/tasks - No such
[jira] [Commented] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented
[ https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558458#comment-16558458 ] Antal Bálint Steinbach commented on YARN-8517: -- Thanks [~rkanter]. Descriptions added. > getContainer and getContainers ResourceManager REST API methods are not > documented > -- > > Key: YARN-8517 > URL: https://issues.apache.org/jira/browse/YARN-8517 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Antal Bálint Steinbach >Priority: Major > Labels: newbie, newbie++ > Attachments: YARN-8517.001.patch, YARN-8517.002.patch, > YARN-8517.003.patch, YARN-8517.004.patch, YARN-8517.005.patch > > > Looking at the documentation here: > https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html > I cannot find documentation for 2 RM REST endpoints: > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\} > I suppose they are not intentionally undocumented. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented
[ https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Bálint Steinbach updated YARN-8517: - Attachment: YARN-8517.005.patch > getContainer and getContainers ResourceManager REST API methods are not > documented > -- > > Key: YARN-8517 > URL: https://issues.apache.org/jira/browse/YARN-8517 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Antal Bálint Steinbach >Priority: Major > Labels: newbie, newbie++ > Attachments: YARN-8517.001.patch, YARN-8517.002.patch, > YARN-8517.003.patch, YARN-8517.004.patch, YARN-8517.005.patch > > > Looking at the documentation here: > https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html > I cannot find documentation for 2 RM REST endpoints: > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\} > I suppose they are not intentionally undocumented. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8242) YARN NM: OOM error while reading back the state store on recovery
[ https://issues.apache.org/jira/browse/YARN-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lurdh Pradeep Reddy Ambati updated YARN-8242: - Attachment: YARN-8242.004.patch > YARN NM: OOM error while reading back the state store on recovery > - > > Key: YARN-8242 > URL: https://issues.apache.org/jira/browse/YARN-8242 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.6.0, 2.9.0, 2.6.5, 2.8.3, 3.1.0, 2.7.6, 3.0.2 >Reporter: Kanwaljeet Sachdev >Priority: Critical > Attachments: YARN-8242.001.patch, YARN-8242.002.patch, > YARN-8242.003.patch, YARN-8242.004.patch > > > On startup the NM reads its state store and builds a list of application in > the state store to process. If the number of applications in the state store > is large and have a lot of "state" connected to it the NM can run OOM and > never get to the point that it can start processing the recovery. > Since it never starts the recovery there is no way for the NM to ever pass > this point. It will require a change in heap size to get the NM started. > > Following is the stack trace > {code:java} > at java.lang.OutOfMemoryError. (OutOfMemoryError.java:48) at > com.google.protobuf.ByteString.copyFrom (ByteString.java:192) at > com.google.protobuf.CodedInputStream.readBytes (CodedInputStream.java:324) at > org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto. > (YarnProtos.java:47069) at > org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto. > (YarnProtos.java:47014) at > org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom > (YarnProtos.java:47102) at > org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom > (YarnProtos.java:47097) at com.google.protobuf.CodedInputStream.readMessage > (CodedInputStream.java:309) at > org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto. > (YarnProtos.java:41016) at > org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto. > (YarnProtos.java:40942) at > org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom > (YarnProtos.java:41080) at > org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom > (YarnProtos.java:41075) at com.google.protobuf.CodedInputStream.readMessage > (CodedInputStream.java:309) at > org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto. > (YarnServiceProtos.java:24517) at > org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto. > (YarnServiceProtos.java:24464) at > org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom > (YarnServiceProtos.java:24568) at > org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom > (YarnServiceProtos.java:24563) at > com.google.protobuf.AbstractParser.parsePartialFrom (AbstractParser.java:141) > at com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:176) at > com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:188) at > com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:193) at > com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:49) at > org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.parseFrom > (YarnServiceProtos.java:24739) at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState > (NMLeveldbStateStoreService.java:217) at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState > (NMLeveldbStateStoreService.java:170) at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover > (ContainerManagerImpl.java:253) at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit > (ContainerManagerImpl.java:237) at > org.apache.hadoop.service.AbstractService.init (AbstractService.java:163) at > org.apache.hadoop.service.CompositeService.serviceInit > (CompositeService.java:107) at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit > (NodeManager.java:255) at org.apache.hadoop.service.AbstractService.init > (AbstractService.java:163) at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager > (NodeManager.java:474) at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main > (NodeManager.java:521){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8584) Several typos in Log Aggregation related classes
[ https://issues.apache.org/jira/browse/YARN-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558391#comment-16558391 ] genericqa commented on YARN-8584: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 18s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 13s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 3m 12s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 10s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}102m 37s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.client.api.impl.TestTimelineClientV2Impl | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8584 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12933202/YARN-8584.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux c8a65fe2f6e5 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9089790 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | unit |
[jira] [Commented] (YARN-8566) Add diagnostic message for unschedulable containers
[ https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558371#comment-16558371 ] Szilard Nemeth commented on YARN-8566: -- Uploaded new patch that fixes the UT failures. > Add diagnostic message for unschedulable containers > --- > > Key: YARN-8566 > URL: https://issues.apache.org/jira/browse/YARN-8566 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8566.001.patch, YARN-8566.002.patch, > YARN-8566.003.patch, YARN-8566.004.patch, YARN-8566.005.patch, > YARN-8566.006.patch > > > If a queue is configured with maxResources set to 0 for a resource, and an > application is submitted to that queue that requests that resource, that > application will remain pending until it is removed or moved to a different > queue. This behavior can be realized without extended resources, but it’s > unlikely a user will create a queue that allows 0 memory or CPU. As the > number of resources in the system increases, this scenario will become more > common, and it will become harder to recognize these cases. Therefore, the > scheduler should indicate in the diagnostic string for an application if it > was not scheduled because of a 0 maxResources setting. > Example configuration (fair-scheduler.xml) : > {code:java} > > 10 > > 1 mb,2vcores > 9 mb,4vcores, 0gpu > 50 > -1.0f > 2.0 > fair > > > {code} > Command: > {code:java} > yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi > -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000; > {code} > The job hangs and the application diagnostic info is empty. > Given that an exception is thrown before any mapper/reducer container is > created, the diagnostic message of the AM should be updated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8566) Add diagnostic message for unschedulable containers
[ https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8566: - Attachment: YARN-8566.006.patch > Add diagnostic message for unschedulable containers > --- > > Key: YARN-8566 > URL: https://issues.apache.org/jira/browse/YARN-8566 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8566.001.patch, YARN-8566.002.patch, > YARN-8566.003.patch, YARN-8566.004.patch, YARN-8566.005.patch, > YARN-8566.006.patch > > > If a queue is configured with maxResources set to 0 for a resource, and an > application is submitted to that queue that requests that resource, that > application will remain pending until it is removed or moved to a different > queue. This behavior can be realized without extended resources, but it’s > unlikely a user will create a queue that allows 0 memory or CPU. As the > number of resources in the system increases, this scenario will become more > common, and it will become harder to recognize these cases. Therefore, the > scheduler should indicate in the diagnostic string for an application if it > was not scheduled because of a 0 maxResources setting. > Example configuration (fair-scheduler.xml) : > {code:java} > > 10 > > 1 mb,2vcores > 9 mb,4vcores, 0gpu > 50 > -1.0f > 2.0 > fair > > > {code} > Command: > {code:java} > yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi > -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000; > {code} > The job hangs and the application diagnostic info is empty. > Given that an exception is thrown before any mapper/reducer container is > created, the diagnostic message of the AM should be updated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8566) Add diagnostic message for unschedulable containers
[ https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8566: - Attachment: (was: YARN-8566.006.patch) > Add diagnostic message for unschedulable containers > --- > > Key: YARN-8566 > URL: https://issues.apache.org/jira/browse/YARN-8566 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8566.001.patch, YARN-8566.002.patch, > YARN-8566.003.patch, YARN-8566.004.patch, YARN-8566.005.patch > > > If a queue is configured with maxResources set to 0 for a resource, and an > application is submitted to that queue that requests that resource, that > application will remain pending until it is removed or moved to a different > queue. This behavior can be realized without extended resources, but it’s > unlikely a user will create a queue that allows 0 memory or CPU. As the > number of resources in the system increases, this scenario will become more > common, and it will become harder to recognize these cases. Therefore, the > scheduler should indicate in the diagnostic string for an application if it > was not scheduled because of a 0 maxResources setting. > Example configuration (fair-scheduler.xml) : > {code:java} > > 10 > > 1 mb,2vcores > 9 mb,4vcores, 0gpu > 50 > -1.0f > 2.0 > fair > > > {code} > Command: > {code:java} > yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi > -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000; > {code} > The job hangs and the application diagnostic info is empty. > Given that an exception is thrown before any mapper/reducer container is > created, the diagnostic message of the AM should be updated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8566) Add diagnostic message for unschedulable containers
[ https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8566: - Attachment: YARN-8566.006.patch > Add diagnostic message for unschedulable containers > --- > > Key: YARN-8566 > URL: https://issues.apache.org/jira/browse/YARN-8566 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8566.001.patch, YARN-8566.002.patch, > YARN-8566.003.patch, YARN-8566.004.patch, YARN-8566.005.patch > > > If a queue is configured with maxResources set to 0 for a resource, and an > application is submitted to that queue that requests that resource, that > application will remain pending until it is removed or moved to a different > queue. This behavior can be realized without extended resources, but it’s > unlikely a user will create a queue that allows 0 memory or CPU. As the > number of resources in the system increases, this scenario will become more > common, and it will become harder to recognize these cases. Therefore, the > scheduler should indicate in the diagnostic string for an application if it > was not scheduled because of a 0 maxResources setting. > Example configuration (fair-scheduler.xml) : > {code:java} > > 10 > > 1 mb,2vcores > 9 mb,4vcores, 0gpu > 50 > -1.0f > 2.0 > fair > > > {code} > Command: > {code:java} > yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi > -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000; > {code} > The job hangs and the application diagnostic info is empty. > Given that an exception is thrown before any mapper/reducer container is > created, the diagnostic message of the AM should be updated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8566) Add diagnostic message for unschedulable containers
[ https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558307#comment-16558307 ] genericqa commented on YARN-8566: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 25s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 57s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 46s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 25s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}152m 15s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8566 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12933189/YARN-8566.005.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux c8e0325e0694 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9089790 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | unit |
[jira] [Created] (YARN-8586) Extract log aggregation related fields and methods from RMAppImpl
Szilard Nemeth created YARN-8586: Summary: Extract log aggregation related fields and methods from RMAppImpl Key: YARN-8586 URL: https://issues.apache.org/jira/browse/YARN-8586 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth Assignee: Szilard Nemeth Given that RMAppImpl is already above 2000 lines and it is very complex, as a very simple and straightforward step, all Log aggregation related fields and methods could be extracted to a new class. The clients of RMAppImpl may access the same methods and RMAppImpl would delegate all those calls to the newly introduced class. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8585) Add test class for DefaultAMSProcessor
Szilard Nemeth created YARN-8585: Summary: Add test class for DefaultAMSProcessor Key: YARN-8585 URL: https://issues.apache.org/jira/browse/YARN-8585 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth Assignee: Szilard Nemeth Since this class has no test coverage at all, it seems to be a good idea to test it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8584) Several typos in Log Aggregation related classes
[ https://issues.apache.org/jira/browse/YARN-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8584: - Attachment: YARN-8584.001.patch > Several typos in Log Aggregation related classes > > > Key: YARN-8584 > URL: https://issues.apache.org/jira/browse/YARN-8584 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: YARN-8584.001.patch > > > There are typos in comments, log messages, method names, field names, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8584) Several typos in Log Aggregation related classes
Szilard Nemeth created YARN-8584: Summary: Several typos in Log Aggregation related classes Key: YARN-8584 URL: https://issues.apache.org/jira/browse/YARN-8584 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth Assignee: Szilard Nemeth There are typos in comments, log messages, method names, field names, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart
[ https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558250#comment-16558250 ] genericqa commented on YARN-6966: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 10s{color} | {color:red} Docker failed to build yetus/hadoop:20ca677. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-6966 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12933195/YARN-6966-branch-3.0.0.001.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21373/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > NodeManager metrics may return wrong negative values when NM restart > > > Key: YARN-6966 > URL: https://issues.apache.org/jira/browse/YARN-6966 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Szilard Nemeth >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-6966-branch-2.001.patch, > YARN-6966-branch-3.0.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, > YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, > YARN-6966.005.patch, YARN-6966.006.patch > > > Just as YARN-6212. However, I think it is not a duplicate of YARN-3933. > The primary cause of negative values is that metrics do not recover properly > when NM restart. > AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores > in metrics also need to recover when NM restart. > This should be done in ContainerManagerImpl#recoverContainer. > The scenario could be reproduction by the following steps: > # Make sure > YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true > in NM > # Submit an application and keep running > # Restart NM > # Stop the application > # Now you get the negative values > {code} > /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics > {code} > {code} > { > name: "Hadoop:service=NodeManager,name=NodeManagerMetrics", > modelerType: "NodeManagerMetrics", > tag.Context: "yarn", > tag.Hostname: "hadoop.com", > ContainersLaunched: 0, > ContainersCompleted: 0, > ContainersFailed: 2, > ContainersKilled: 0, > ContainersIniting: 0, > ContainersRunning: 0, > AllocatedGB: 0, > AllocatedContainers: -2, > AvailableGB: 160, > AllocatedVCores: -11, > AvailableVCores: 3611, > ContainerLaunchDurationNumOps: 2, > ContainerLaunchDurationAvgTime: 6, > BadLocalDirs: 0, > BadLogDirs: 0, > GoodLocalDirsDiskUtilizationPerc: 2, > GoodLogDirsDiskUtilizationPerc: 2 > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart
[ https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558248#comment-16558248 ] Szilard Nemeth commented on YARN-6966: -- Hi [~haibochen]! Uploaded patch for branch-3.0.0 I hope the patch was named correctly. Is there anything I should do with this jira at this point? Thanks! > NodeManager metrics may return wrong negative values when NM restart > > > Key: YARN-6966 > URL: https://issues.apache.org/jira/browse/YARN-6966 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Szilard Nemeth >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-6966-branch-2.001.patch, > YARN-6966-branch-3.0.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, > YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, > YARN-6966.005.patch, YARN-6966.006.patch > > > Just as YARN-6212. However, I think it is not a duplicate of YARN-3933. > The primary cause of negative values is that metrics do not recover properly > when NM restart. > AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores > in metrics also need to recover when NM restart. > This should be done in ContainerManagerImpl#recoverContainer. > The scenario could be reproduction by the following steps: > # Make sure > YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true > in NM > # Submit an application and keep running > # Restart NM > # Stop the application > # Now you get the negative values > {code} > /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics > {code} > {code} > { > name: "Hadoop:service=NodeManager,name=NodeManagerMetrics", > modelerType: "NodeManagerMetrics", > tag.Context: "yarn", > tag.Hostname: "hadoop.com", > ContainersLaunched: 0, > ContainersCompleted: 0, > ContainersFailed: 2, > ContainersKilled: 0, > ContainersIniting: 0, > ContainersRunning: 0, > AllocatedGB: 0, > AllocatedContainers: -2, > AvailableGB: 160, > AllocatedVCores: -11, > AvailableVCores: 3611, > ContainerLaunchDurationNumOps: 2, > ContainerLaunchDurationAvgTime: 6, > BadLocalDirs: 0, > BadLogDirs: 0, > GoodLocalDirsDiskUtilizationPerc: 2, > GoodLogDirsDiskUtilizationPerc: 2 > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart
[ https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-6966: - Attachment: YARN-6966-branch-3.0.0.001.patch > NodeManager metrics may return wrong negative values when NM restart > > > Key: YARN-6966 > URL: https://issues.apache.org/jira/browse/YARN-6966 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Szilard Nemeth >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-6966-branch-2.001.patch, > YARN-6966-branch-3.0.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, > YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, > YARN-6966.005.patch, YARN-6966.006.patch > > > Just as YARN-6212. However, I think it is not a duplicate of YARN-3933. > The primary cause of negative values is that metrics do not recover properly > when NM restart. > AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores > in metrics also need to recover when NM restart. > This should be done in ContainerManagerImpl#recoverContainer. > The scenario could be reproduction by the following steps: > # Make sure > YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true > in NM > # Submit an application and keep running > # Restart NM > # Stop the application > # Now you get the negative values > {code} > /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics > {code} > {code} > { > name: "Hadoop:service=NodeManager,name=NodeManagerMetrics", > modelerType: "NodeManagerMetrics", > tag.Context: "yarn", > tag.Hostname: "hadoop.com", > ContainersLaunched: 0, > ContainersCompleted: 0, > ContainersFailed: 2, > ContainersKilled: 0, > ContainersIniting: 0, > ContainersRunning: 0, > AllocatedGB: 0, > AllocatedContainers: -2, > AvailableGB: 160, > AllocatedVCores: -11, > AvailableVCores: 3611, > ContainerLaunchDurationNumOps: 2, > ContainerLaunchDurationAvgTime: 6, > BadLocalDirs: 0, > BadLogDirs: 0, > GoodLocalDirsDiskUtilizationPerc: 2, > GoodLogDirsDiskUtilizationPerc: 2 > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8566) Add diagnostic message for unschedulable containers
[ https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8566: - Attachment: YARN-8566.005.patch > Add diagnostic message for unschedulable containers > --- > > Key: YARN-8566 > URL: https://issues.apache.org/jira/browse/YARN-8566 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8566.001.patch, YARN-8566.002.patch, > YARN-8566.003.patch, YARN-8566.004.patch, YARN-8566.005.patch > > > If a queue is configured with maxResources set to 0 for a resource, and an > application is submitted to that queue that requests that resource, that > application will remain pending until it is removed or moved to a different > queue. This behavior can be realized without extended resources, but it’s > unlikely a user will create a queue that allows 0 memory or CPU. As the > number of resources in the system increases, this scenario will become more > common, and it will become harder to recognize these cases. Therefore, the > scheduler should indicate in the diagnostic string for an application if it > was not scheduled because of a 0 maxResources setting. > Example configuration (fair-scheduler.xml) : > {code:java} > > 10 > > 1 mb,2vcores > 9 mb,4vcores, 0gpu > 50 > -1.0f > 2.0 > fair > > > {code} > Command: > {code:java} > yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi > -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000; > {code} > The job hangs and the application diagnostic info is empty. > Given that an exception is thrown before any mapper/reducer container is > created, the diagnostic message of the AM should be updated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8566) Add diagnostic message for unschedulable containers
[ https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558163#comment-16558163 ] Szilard Nemeth commented on YARN-8566: -- Hi [~rkanter]! Thanks for the quick review, see my new patch with the fixes. 1. Fixed 2. I would leave as it is, as the exception is passed to LOG.warn, so the message will be printed anyway. Do you agree with this? 3. Good point, I reused the exception message in {{DefaultAMSProcessor.handleInvalidResourceException}}, but I would like to keep {{InvalidResourceType}} for the purpose of deciding about updating the diagnostics message or not. I would only like to update the message if the {{InvalidResourceException}} is created because of the resource was less than zero or greater than the maximum allocation. As this exception is created in other parts of the code for other reasons, I would not touch the diagnostic message for those cases. About {{SchedulerUtils.throwInvalidResourceException}}: I wanted to keep the details on how the {{InvalidResourceException}} is created instead of providing the message from the callers so this is why I do the formatting of the exception message with this method. > Add diagnostic message for unschedulable containers > --- > > Key: YARN-8566 > URL: https://issues.apache.org/jira/browse/YARN-8566 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8566.001.patch, YARN-8566.002.patch, > YARN-8566.003.patch, YARN-8566.004.patch > > > If a queue is configured with maxResources set to 0 for a resource, and an > application is submitted to that queue that requests that resource, that > application will remain pending until it is removed or moved to a different > queue. This behavior can be realized without extended resources, but it’s > unlikely a user will create a queue that allows 0 memory or CPU. As the > number of resources in the system increases, this scenario will become more > common, and it will become harder to recognize these cases. Therefore, the > scheduler should indicate in the diagnostic string for an application if it > was not scheduled because of a 0 maxResources setting. > Example configuration (fair-scheduler.xml) : > {code:java} > > 10 > > 1 mb,2vcores > 9 mb,4vcores, 0gpu > 50 > -1.0f > 2.0 > fair > > > {code} > Command: > {code:java} > yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi > -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000; > {code} > The job hangs and the application diagnostic info is empty. > Given that an exception is thrown before any mapper/reducer container is > created, the diagnostic message of the AM should be updated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8252) Fix ServiceMaster main not found
[ https://issues.apache.org/jira/browse/YARN-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved YARN-8252. Resolution: Not A Problem the problem is caused by that the the transitive dependencies are missing; which are put there by "fastlaunch" - a better way of reporting the error would be better... > Fix ServiceMaster main not found > > > Key: YARN-8252 > URL: https://issues.apache.org/jira/browse/YARN-8252 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Zoltan Haindrich >Priority: Major > > I was looking into using yarn services; however it seems for some reason it > is not possible to run {{ServiceMaster}} class from the jar...I might be > missing some fundamental...so I've put together a shellscript to make it easy > for anyone to checkI would be happy with any exception beyond main not > found > [ServiceMaster.main > method|https://github.com/apache/hadoop/blob/67f239c42f676237290d18ddbbc9aec369267692/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ServiceMaster.java#L305] > {code:java} > #!/bin/bash > set -e > wget -O core.jar -nv > http://central.maven.org/maven2/org/apache/hadoop/hadoop-yarn-services-core/3.1.0/hadoop-yarn-services-core-3.1.0.jar > unzip -qn core.jar > cat > org/apache/hadoop/yarn/service/ServiceMaster2.java << EOF > package org.apache.hadoop.yarn.service; > public class ServiceMaster2 { > public static void main(String[] args) throws Exception { > System.out.println("asd!"); > } > } > EOF > javac org/apache/hadoop/yarn/service/ServiceMaster2.java > jar -cf a1.jar org > find org -name ServiceMaster* > # this will print "asd!" > java -cp a1.jar org.apache.hadoop.yarn.service.ServiceMaster2 > #the following invocations result in: > # Error: Could not find or load main class > org.apache.hadoop.yarn.service.ServiceMaster > # > set +e > java -cp a1.jar org.apache.hadoop.yarn.service.ServiceMaster > java -cp core.jar org.apache.hadoop.yarn.service.ServiceMaster > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8252) Fix ServiceMaster main not found
[ https://issues.apache.org/jira/browse/YARN-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556948#comment-16556948 ] Zoltan Haindrich commented on YARN-8252: [~jmarhuen]: I've found out that this is "normal" but very misleaing at first you have to run {{yarn app -enableFastLaunch }} to enable it prior to launching the service - I think it would be better to either do automatically on first launch ; or give a better explanation about what's going south > Fix ServiceMaster main not found > > > Key: YARN-8252 > URL: https://issues.apache.org/jira/browse/YARN-8252 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Zoltan Haindrich >Priority: Major > > I was looking into using yarn services; however it seems for some reason it > is not possible to run {{ServiceMaster}} class from the jar...I might be > missing some fundamental...so I've put together a shellscript to make it easy > for anyone to checkI would be happy with any exception beyond main not > found > [ServiceMaster.main > method|https://github.com/apache/hadoop/blob/67f239c42f676237290d18ddbbc9aec369267692/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ServiceMaster.java#L305] > {code:java} > #!/bin/bash > set -e > wget -O core.jar -nv > http://central.maven.org/maven2/org/apache/hadoop/hadoop-yarn-services-core/3.1.0/hadoop-yarn-services-core-3.1.0.jar > unzip -qn core.jar > cat > org/apache/hadoop/yarn/service/ServiceMaster2.java << EOF > package org.apache.hadoop.yarn.service; > public class ServiceMaster2 { > public static void main(String[] args) throws Exception { > System.out.println("asd!"); > } > } > EOF > javac org/apache/hadoop/yarn/service/ServiceMaster2.java > jar -cf a1.jar org > find org -name ServiceMaster* > # this will print "asd!" > java -cp a1.jar org.apache.hadoop.yarn.service.ServiceMaster2 > #the following invocations result in: > # Error: Could not find or load main class > org.apache.hadoop.yarn.service.ServiceMaster > # > set +e > java -cp a1.jar org.apache.hadoop.yarn.service.ServiceMaster > java -cp core.jar org.apache.hadoop.yarn.service.ServiceMaster > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org