[jira] [Updated] (YARN-8591) [ATSv2] YARN_CONTAINER API throws 500 Internal Server Error

2018-07-26 Thread Akhil PB (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB updated YARN-8591:
---
Description: 
{code:java}
GET 
http://ctr-e138-1518143905142-417433-01-04.hwx.site:8198/ws/v2/timeline/apps/application_1532578985272_0002/entities/YARN_CONTAINER?fields=ALL&_=1532670071899{code}
{code:java}
2018-07-27 05:32:03,468 WARN  webapp.GenericExceptionHandler 
(GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR
javax.ws.rs.WebApplicationException: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.handleException(TimelineReaderWebServices.java:196)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getEntities(TimelineReaderWebServices.java:624)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getEntities(TimelineReaderWebServices.java:474)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at 
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
at 
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at 
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
at 
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.security.TimelineReaderWhitelistAuthorizationFilter.doFilter(TimelineReaderWhitelistAuthorizationFilter.java:85)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.apache.hadoop.security.http.CrossOriginFilter.doFilter(CrossOriginFilter.java:98)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 

[jira] [Updated] (YARN-8591) [ATSv2] YARN_CONTAINER API throws 500 Internal Server Error

2018-07-26 Thread Akhil PB (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB updated YARN-8591:
---
Summary: [ATSv2] YARN_CONTAINER API throws 500 Internal Server Error  (was: 
[ATSv2] Yarn container API throws 500 Internal Server Error)

> [ATSv2] YARN_CONTAINER API throws 500 Internal Server Error
> ---
>
> Key: YARN-8591
> URL: https://issues.apache.org/jira/browse/YARN-8591
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelinereader, timelineserver
>Reporter: Akhil PB
>Assignee: Rohith Sharma K S
>Priority: Major
>
> GET 
> ctr-e138-1518143905142-417433-01-04.hwx.site:8198/ws/v2/timeline/apps/application_1532578985272_0002/entities/YARN_CONTAINER?fields=ALL&_=1532670071899
> {code:java}
> 2018-07-27 05:32:03,468 WARN  webapp.GenericExceptionHandler 
> (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR
> javax.ws.rs.WebApplicationException: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.handleException(TimelineReaderWebServices.java:196)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getEntities(TimelineReaderWebServices.java:624)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getEntities(TimelineReaderWebServices.java:474)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
> at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
> at 
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
> at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
> at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
> at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
> at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
> at 
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
> at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
> at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
> at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
> at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
> at 
> com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
> at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
> at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
> at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.security.TimelineReaderWhitelistAuthorizationFilter.doFilter(TimelineReaderWhitelistAuthorizationFilter.java:85)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
> at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644)
> at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
> at 
> org.apache.hadoop.security.http.CrossOriginFilter.doFilter(CrossOriginFilter.java:98)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
> at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
> 

[jira] [Created] (YARN-8591) [ATSv2] Yarn container API throws 500 Internal Server Error

2018-07-26 Thread Akhil PB (JIRA)
Akhil PB created YARN-8591:
--

 Summary: [ATSv2] Yarn container API throws 500 Internal Server 
Error
 Key: YARN-8591
 URL: https://issues.apache.org/jira/browse/YARN-8591
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelinereader, timelineserver
Reporter: Akhil PB
Assignee: Rohith Sharma K S


GET 
ctr-e138-1518143905142-417433-01-04.hwx.site:8198/ws/v2/timeline/apps/application_1532578985272_0002/entities/YARN_CONTAINER?fields=ALL&_=1532670071899
{code:java}
2018-07-27 05:32:03,468 WARN  webapp.GenericExceptionHandler 
(GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR
javax.ws.rs.WebApplicationException: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.handleException(TimelineReaderWebServices.java:196)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getEntities(TimelineReaderWebServices.java:624)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getEntities(TimelineReaderWebServices.java:474)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at 
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
at 
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at 
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
at 
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.security.TimelineReaderWhitelistAuthorizationFilter.doFilter(TimelineReaderWhitelistAuthorizationFilter.java:85)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.apache.hadoop.security.http.CrossOriginFilter.doFilter(CrossOriginFilter.java:98)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 

[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559242#comment-16559242
 ] 

genericqa commented on YARN-8579:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 18m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
7s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 69m  
2s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 
40s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}180m 49s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8579 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933296/YARN-8579.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 5fa995b24f28 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8d3c068 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21389/testReport/ |
| Max. process+thread count | 850 (vs. ulimit of 1) |

[jira] [Updated] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-26 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8579:

Fix Version/s: 3.1.2
   3.2.0

> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8579.001.patch
>
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly

2018-07-26 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559134#comment-16559134
 ] 

Gour Saha commented on YARN-8429:
-

Thanks [~eyang] for the commit. Can you please commit it to branch-3.1 also 
since it is targetted for 3.1.2 release also?

> Improve diagnostic message when artifact is not set properly
> 
>
> Key: YARN-8429
> URL: https://issues.apache.org/jira/browse/YARN-8429
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8429.001.patch, YARN-8429.002.patch, 
> YARN-8429.003.patch, YARN-8429.004.patch
>
>
> Steps:
> 1) Create launch json file. Replace "artifact" with "artifacts"
> 2) launch yarn service app with cli
> The application launch fails with below error
> {code}
> [xxx xxx]$ yarn app -launch test2-2 test.json 
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /xxx/test.json
> 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms
> 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be 
> absolute path: /xxx/xxx
> {code}
> artifact field is not mandatory. However, If that field is specified 
> incorrectly, launch cmd should fail with proper error. 
> Here, The error message regarding Dest file is misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-26 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8579:

Attachment: YARN-8579.001.patch

> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
> Attachments: YARN-8579.001.patch
>
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-26 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559122#comment-16559122
 ] 

Gour Saha commented on YARN-8579:
-

Uploading patch 001 with a fix that I successfully tested in my cluster

> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
> Attachments: YARN-8579.001.patch
>
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-26 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559121#comment-16559121
 ] 

Gour Saha commented on YARN-8579:
-

I investigated this issue and figured that the root cause is the missing NM 
tokens corresponding to the containers which were passed to the AM after 
registration via the onContainersReceivedFromPreviousAttempts callback. This is 
required with the change made in YARN-6168. Exception seen in AM log is as 
below -

{code}
2018-07-26 23:22:31,373 [pool-5-thread-4] ERROR instance.ComponentInstance - 
[COMPINSTANCE httpd-proxy-0 : container_e15_1532637883791_0001_01_04] 
Failed to get container status on 
ctr-e138-1518143905142-412155-01-05.hwx.site:25454, will try again
org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
for ctr-e138-1518143905142-412155-01-05.hwx.site:25454
at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:262)
at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:252)
at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:137)
at 
org.apache.hadoop.yarn.client.api.impl.NMClientImpl.getContainerStatus(NMClientImpl.java:323)
at 
org.apache.hadoop.yarn.service.component.instance.ComponentInstance$ContainerStatusRetriever.run(ComponentInstance.java:596)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly

2018-07-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559088#comment-16559088
 ] 

Hudson commented on YARN-8429:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14651 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14651/])
YARN-8429. Improve diagnostic message when artifact is not set properly. 
(eyang: rev 8d3c068e59f18e3f8260713fee83c458aa1d)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/exceptions/RestApiErrorMessages.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/providers/TestDefaultClientProvider.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/providers/TestAbstractClientProvider.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/provider/AbstractClientProvider.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/utils/ServiceApiUtil.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/provider/defaultImpl/DefaultClientProvider.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/provider/tarball/TarballClientProvider.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/TestServiceApiUtil.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/provider/docker/DockerClientProvider.java


> Improve diagnostic message when artifact is not set properly
> 
>
> Key: YARN-8429
> URL: https://issues.apache.org/jira/browse/YARN-8429
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8429.001.patch, YARN-8429.002.patch, 
> YARN-8429.003.patch, YARN-8429.004.patch
>
>
> Steps:
> 1) Create launch json file. Replace "artifact" with "artifacts"
> 2) launch yarn service app with cli
> The application launch fails with below error
> {code}
> [xxx xxx]$ yarn app -launch test2-2 test.json 
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /xxx/test.json
> 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms
> 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be 
> absolute path: /xxx/xxx
> {code}
> artifact field is not mandatory. However, If that field is specified 
> incorrectly, launch cmd should fail with proper error. 
> Here, The error message regarding Dest file is misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559072#comment-16559072
 ] 

genericqa commented on YARN-8509:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 56s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 42s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}139m  8s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8509 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933273/YARN-8509.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 46cdc61b785a 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d70d845 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/21386/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21386/testReport/ |
| Max. process+thread count | 859 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 

[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly

2018-07-26 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559071#comment-16559071
 ] 

Eric Yang commented on YARN-8429:
-

Thank you [~gsaha] for the patch.
Thank you [~billie.rinaldi] for the review.

+1 Patch 4 looks good to me.

> Improve diagnostic message when artifact is not set properly
> 
>
> Key: YARN-8429
> URL: https://issues.apache.org/jira/browse/YARN-8429
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8429.001.patch, YARN-8429.002.patch, 
> YARN-8429.003.patch, YARN-8429.004.patch
>
>
> Steps:
> 1) Create launch json file. Replace "artifact" with "artifacts"
> 2) launch yarn service app with cli
> The application launch fails with below error
> {code}
> [xxx xxx]$ yarn app -launch test2-2 test.json 
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /xxx/test.json
> 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms
> 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be 
> absolute path: /xxx/xxx
> {code}
> artifact field is not mandatory. However, If that field is specified 
> incorrectly, launch cmd should fail with proper error. 
> Here, The error message regarding Dest file is misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8574) Allow dot in attribute values

2018-07-26 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559060#comment-16559060
 ] 

Naganarasimha G R edited comment on YARN-8574 at 7/27/18 12:11 AM:
---

Hi [~bibinchundatt],

You are correct i meant when we start using it as namespace later on. but 
anyway as we are supporting for internal its as well good to be supported for 
others too.

My only doubt now is how was it NOT failing for earlier mappings for prefix?

 Have manually triggered the build too...


was (Author: naganarasimha):
Hi [~bibinchundatt],

You are correct i meant when we start using it as namespace later on. but 
anyway as we are supporting for internal its as well good to be supported for 
others too.

My ownly doubt now is how was it not failing for earlier mappings for prefix?

 

> Allow dot in attribute values 
> --
>
> Key: YARN-8574
> URL: https://issues.apache.org/jira/browse/YARN-8574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: YARN-8574-YARN-3409.001.patch
>
>
> Currently "." is considered as invalid value. Enable  the same;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8407) Container launch exception in AM log should be printed in ERROR level

2018-07-26 Thread Yesha Vora (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559063#comment-16559063
 ] 

Yesha Vora commented on YARN-8407:
--

Thanks [~bibinchundatt] for review. Patch-2 is submitted.

> Container launch exception in AM log should be printed in ERROR level
> -
>
> Key: YARN-8407
> URL: https://issues.apache.org/jira/browse/YARN-8407
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: YARN-8407.001.patch, YARN-8407.002.patch
>
>
> when a container launch is failing due to docker image not available is 
> logged as INFO level in AM log. 
> Container launch failure should be logged as ERROR.
> Steps:
> launch httpd yarn-service application with invalid docker image
>  
> {code:java}
> 2018-06-07 01:51:32,966 [Component  dispatcher] INFO  
> instance.ComponentInstance - [COMPINSTANCE httpd-0 : 
> container_e05_1528335963594_0001_01_02]: 
> container_e05_1528335963594_0001_01_02 completed. Reinsert back to 
> pending list and requested a new container.
> exitStatus=-1, diagnostics=[2018-06-07 01:51:02.363]Exception from 
> container-launch.
> Container id: container_e05_1528335963594_0001_01_02
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: Unable to find image 'xxx/httpd:0.1' locally
> Trying to pull repository xxx/httpd ...
> /usr/bin/docker-current: Get https://xxx/v1/_ping: dial tcp: lookup xxx on 
> yyy: no such host.
> See '/usr/bin/docker-current run --help'.
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> Getting exit code file...
> Changing effective user to root...
> Wrote the exit code 7 to 
> /grid/0/hadoop/yarn/local/nmPrivate/application_1528335963594_0001/container_e05_1528335963594_0001_01_02/container_e05_1528335963594_0001_01_02.pid.exitcode
> [2018-06-07 01:51:02.393]Diagnostic message from attempt :
> [2018-06-07 01:51:02.394]Container exited with a non-zero exit code 7. Last 
> 4096 bytes of stderr.txt :
> [2018-06-07 01:51:32.428]Could not find 
> nmPrivate/application_1528335963594_0001/container_e05_1528335963594_0001_01_02//container_e05_1528335963594_0001_01_02.pid
>  in any of the directories
> 2018-06-07 01:51:32,966 [Component  dispatcher] INFO  
> instance.ComponentInstance - [COMPINSTANCE httpd-0 : 
> container_e05_1528335963594_0001_01_02] Transitioned from STARTED to INIT 
> on STOP event{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8407) Container launch exception in AM log should be printed in ERROR level

2018-07-26 Thread Yesha Vora (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-8407:
-
Attachment: YARN-8407.002.patch

> Container launch exception in AM log should be printed in ERROR level
> -
>
> Key: YARN-8407
> URL: https://issues.apache.org/jira/browse/YARN-8407
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: YARN-8407.001.patch, YARN-8407.002.patch
>
>
> when a container launch is failing due to docker image not available is 
> logged as INFO level in AM log. 
> Container launch failure should be logged as ERROR.
> Steps:
> launch httpd yarn-service application with invalid docker image
>  
> {code:java}
> 2018-06-07 01:51:32,966 [Component  dispatcher] INFO  
> instance.ComponentInstance - [COMPINSTANCE httpd-0 : 
> container_e05_1528335963594_0001_01_02]: 
> container_e05_1528335963594_0001_01_02 completed. Reinsert back to 
> pending list and requested a new container.
> exitStatus=-1, diagnostics=[2018-06-07 01:51:02.363]Exception from 
> container-launch.
> Container id: container_e05_1528335963594_0001_01_02
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: Unable to find image 'xxx/httpd:0.1' locally
> Trying to pull repository xxx/httpd ...
> /usr/bin/docker-current: Get https://xxx/v1/_ping: dial tcp: lookup xxx on 
> yyy: no such host.
> See '/usr/bin/docker-current run --help'.
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> Getting exit code file...
> Changing effective user to root...
> Wrote the exit code 7 to 
> /grid/0/hadoop/yarn/local/nmPrivate/application_1528335963594_0001/container_e05_1528335963594_0001_01_02/container_e05_1528335963594_0001_01_02.pid.exitcode
> [2018-06-07 01:51:02.393]Diagnostic message from attempt :
> [2018-06-07 01:51:02.394]Container exited with a non-zero exit code 7. Last 
> 4096 bytes of stderr.txt :
> [2018-06-07 01:51:32.428]Could not find 
> nmPrivate/application_1528335963594_0001/container_e05_1528335963594_0001_01_02//container_e05_1528335963594_0001_01_02.pid
>  in any of the directories
> 2018-06-07 01:51:32,966 [Component  dispatcher] INFO  
> instance.ComponentInstance - [COMPINSTANCE httpd-0 : 
> container_e05_1528335963594_0001_01_02] Transitioned from STARTED to INIT 
> on STOP event{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8574) Allow dot in attribute values

2018-07-26 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559060#comment-16559060
 ] 

Naganarasimha G R commented on YARN-8574:
-

Hi [~bibinchundatt],

You are correct i meant when we start using it as namespace later on. but 
anyway as we are supporting for internal its as well good to be supported for 
others too.

My ownly doubt now is how was it not failing for earlier mappings for prefix?

 

> Allow dot in attribute values 
> --
>
> Key: YARN-8574
> URL: https://issues.apache.org/jira/browse/YARN-8574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: YARN-8574-YARN-3409.001.patch
>
>
> Currently "." is considered as invalid value. Enable  the same;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8574) Allow dot in attribute values

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559053#comment-16559053
 ] 

genericqa commented on YARN-8574:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 14m  
2s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8574 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933017/YARN-8574-YARN-3409.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21387/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Allow dot in attribute values 
> --
>
> Key: YARN-8574
> URL: https://issues.apache.org/jira/browse/YARN-8574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: YARN-8574-YARN-3409.001.patch
>
>
> Currently "." is considered as invalid value. Enable  the same;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8590) Fair scheduler promotion does not update container execution type and token

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559055#comment-16559055
 ] 

genericqa commented on YARN-8590:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  8m  
8s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8590 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933282/YARN-8590-YARN-1011.00.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21388/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Fair scheduler promotion does not update container execution type and token
> ---
>
> Key: YARN-8590
> URL: https://issues.apache.org/jira/browse/YARN-8590
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: YARN-1011
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8590-YARN-1011.00.patch
>
>
> Fair Scheduler promotion of opportunistic containers does not update 
> container execution type and token. This leads to incorrect resource 
> accounting when the promoted containers are released.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8590) Fair scheduler promotion does not update container execution type and token

2018-07-26 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-8590:
-
Attachment: YARN-8590-YARN-1011.00.patch

> Fair scheduler promotion does not update container execution type and token
> ---
>
> Key: YARN-8590
> URL: https://issues.apache.org/jira/browse/YARN-8590
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: YARN-1011
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8590-YARN-1011.00.patch
>
>
> Fair Scheduler promotion of opportunistic containers does not update 
> container execution type and token. This leads to incorrect resource 
> accounting when the promoted containers are released.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied

2018-07-26 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559043#comment-16559043
 ] 

Chandni Singh commented on YARN-8509:
-

Couple of nits and questions:

1. This is a javadoc block which should be above the method. I understand that 
this test is moved out of another test class but this gives a good opportunity 
to fix this.
{code:java}
/**
 * Test case: Submit three applications (app1/app2/app3) to different
 * queues, queue structure:
 *
 * 
 *   Root
 */  |  \  \
 *   a   b   c  d
 *  30  30  30  10
 * 
 *
 */
{code}

2. Why explicitly setting the log level to debug in the code?
{code}
Logger.getRootLogger().setLevel(Level.DEBUG);
{code}

3. Can you explain the comment? 
{code}
   // We should release pending resource be capped at user limit, think about
// a user ask for 1maps. but cluster can run a max of 1000. In this
// case, as soon as each map finish, other one pending will get scheduled
// When not deduct reserved, total-pending = 3G (u1) + 20G (u2) = 23G
//  deduct reserved, total-pending = 0G (u1) + 20G (u2) = 20G
{code}


> Fix UserLimit calculation for preemption to balance scenario after queue 
> satisfied  
> 
>
> Key: YARN-8509
> URL: https://issues.apache.org/jira/browse/YARN-8509
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8509.001.patch, YARN-8509.002.patch
>
>
> In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total 
> pending resource based on user-limit percent and user-limit factor which will 
> cap pending resource for each user to the minimum of user-limit pending and 
> actual pending. This will prevent queue from taking more pending resource to 
> achieve queue balance after all queue satisfied with its ideal allocation.
>   
>  We need to change the logic to let queue pending can go beyond userlimit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8590) Fair scheduler promotion does not update container execution type and token

2018-07-26 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8590:


 Summary: Fair scheduler promotion does not update container 
execution type and token
 Key: YARN-8590
 URL: https://issues.apache.org/jira/browse/YARN-8590
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Affects Versions: YARN-1011
Reporter: Haibo Chen
Assignee: Haibo Chen


Fair Scheduler promotion of opportunistic containers does not update container 
execution type and token. This leads to incorrect resource accounting when the 
promoted containers are released.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8545) YARN native service should return container if launch failed

2018-07-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559026#comment-16559026
 ] 

Hudson commented on YARN-8545:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14649 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14649/])
YARN-8545.  Return allocated resource to RM for failed container.
(eyang: rev 40fad32824d2f8f960c779d78357e62103453da0)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/instance/ComponentInstanceEvent.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/TestServiceAM.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ServiceScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/containerlaunch/ContainerLaunchService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/MockServiceAM.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/component/TestComponent.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/component/instance/TestComponentInstance.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/instance/ComponentInstance.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/Component.java


> YARN native service should return container if launch failed
> 
>
> Key: YARN-8545
> URL: https://issues.apache.org/jira/browse/YARN-8545
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8545.001.patch
>
>
> In some cases, container launch may fail but container will not be properly 
> returned to RM. 
> This could happen when AM trying to prepare container launch context but 
> failed w/o sending container launch context to NM (Once container launch 
> context is sent to NM, NM will report failed container to RM).
> Exception like: 
> {code:java}
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591)
>   at 
> org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388)
>   at 
> org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253)
>   at 
> org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152)
>   at 
> org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745){code}
> And even after container launch context prepare failed, AM still trying to 
> monitor container's readiness:
> {code:java}
> 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO  monitor.ServiceMonitor - 
> Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 
> 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP 
> presence", exception="java.io.IOException: primary-worker-0: IP is not 
> available yet"
> ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, 

[jira] [Commented] (YARN-8584) Several typos in Log Aggregation related classes

2018-07-26 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559023#comment-16559023
 ] 

Chandni Singh commented on YARN-8584:
-

Looks good.

We can also change the log statements to utilize slf4j instead of concatenating 
strings. 
For example
{code:java}
LOG.warn("rollingMonitorInterval should be more than or equal to " + 
MIN_LOG_ROLLING_INTERVAL + " seconds. Using " + MIN_LOG_ROLLING_INTERVAL + " 
seconds instead.");{code}
to 
{code:java}
LOG.warn("rollingMonitorInterval should be more than or equal to {} seconds. 
Using {} seconds instead.", MIN_LOG_ROLLING_INTERVAL, 
MIN_LOG_ROLLING_INTERVAL);{code}
 

> Several typos in Log Aggregation related classes
> 
>
> Key: YARN-8584
> URL: https://issues.apache.org/jira/browse/YARN-8584
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-8584.001.patch
>
>
> There are typos in comments, log messages, method names, field names, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559018#comment-16559018
 ] 

genericqa commented on YARN-8429:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 47s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 12s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 
34s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 62m 56s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8429 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933270/YARN-8429.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 7cf8dd6e6bc0 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d70d845 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21385/testReport/ |
| Max. process+thread count | 755 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21385/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Improve diagnostic 

[jira] [Commented] (YARN-8545) YARN native service should return container if launch failed

2018-07-26 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558997#comment-16558997
 ] 

Eric Yang commented on YARN-8545:
-

+1 looks good to me.  Committing shortly.

> YARN native service should return container if launch failed
> 
>
> Key: YARN-8545
> URL: https://issues.apache.org/jira/browse/YARN-8545
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8545.001.patch
>
>
> In some cases, container launch may fail but container will not be properly 
> returned to RM. 
> This could happen when AM trying to prepare container launch context but 
> failed w/o sending container launch context to NM (Once container launch 
> context is sent to NM, NM will report failed container to RM).
> Exception like: 
> {code:java}
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591)
>   at 
> org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388)
>   at 
> org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253)
>   at 
> org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152)
>   at 
> org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745){code}
> And even after container launch context prepare failed, AM still trying to 
> monitor container's readiness:
> {code:java}
> 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO  monitor.ServiceMonitor - 
> Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 
> 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP 
> presence", exception="java.io.IOException: primary-worker-0: IP is not 
> available yet"
> ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8571) Validate service principal format prior to launching yarn service

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558975#comment-16558975
 ] 

genericqa commented on YARN-8571:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m  
0s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 
17s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 80m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8571 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933267/YARN-8571.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 518eed9d580a 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d70d845 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21383/testReport/ |
| Max. process+thread count | 742 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21383/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Validate service 

[jira] [Created] (YARN-8589) ATS TimelineACLsManager checkAccess is slow

2018-07-26 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created YARN-8589:
---

 Summary: ATS TimelineACLsManager checkAccess is slow
 Key: YARN-8589
 URL: https://issues.apache.org/jira/browse/YARN-8589
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.7.3
Reporter: Prabhu Joseph


ATS rest api is very slow when there are more than 1lakh entries if 
yarn.acl.enable is set to true as TimelineACLsManager has to check access for 
every entries. We can;t disable yarn.acl.enable as all the YARN ACLs uses the 
same config. We can have a separate config to provide read access to the ATS 
Entries.

{code}
curl  http://:8188/ws/v1/timeline/HIVE_QUERY_ID
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8522) Application fails with InvalidResourceRequestException

2018-07-26 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558965#comment-16558965
 ] 

Zian Chen commented on YARN-8522:
-

[~sunilg], could you help review the latest patch?

> Application fails with InvalidResourceRequestException
> --
>
> Key: YARN-8522
> URL: https://issues.apache.org/jira/browse/YARN-8522
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8522.001.patch, YARN-8522.002.patch
>
>
> Launch multiple streaming app simultaneously. Here, sometimes one of the 
> application fails with below stack trace.
> {code}
> 18/07/02 07:14:32 INFO retry.RetryInvocationHandler: 
> java.net.ConnectException: Call From xx.xx.xx.xx/xx.xx.xx.xx to 
> xx.xx.xx.xx:8032 failed on connection exception: java.net.ConnectException: 
> Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
> ApplicationClientProtocolPBClientImpl.submitApplication over null. Retrying 
> after sleeping for 3ms.
> 18/07/02 07:14:32 WARN client.RequestHedgingRMFailoverProxyProvider: 
> Invocation returned exception: 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, only one resource request with * is allowed
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
>  on [rm2], so propagating back to caller.
> 18/07/02 07:14:32 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /user/hrt_qa/.staging/job_1530515284077_0007
> 18/07/02 07:14:32 ERROR streaming.StreamJob: Error Launching job : 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, only one resource request with * is allowed
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> Streaming Command Failed!{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Created] (YARN-8588) Logging improvements for better debuggability

2018-07-26 Thread Suma Shivaprasad (JIRA)
Suma Shivaprasad created YARN-8588:
--

 Summary: Logging improvements for better debuggability
 Key: YARN-8588
 URL: https://issues.apache.org/jira/browse/YARN-8588
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Suma Shivaprasad
Assignee: Suma Shivaprasad


Capacity allocations decided in GuaranteedCapacityOvertimePolicy are available 
via AutoCreatedLeafQueueConfig. However this class lacks a toString and some 
other DEBUG level logs are needed for better debuggability



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint

2018-07-26 Thread Suma Shivaprasad (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558960#comment-16558960
 ] 

Suma Shivaprasad commented on YARN-8559:


[~cheersyang] Thanks for the patch. Just a minor comment else the patch LGTM. 
Can you please replace the webservice endpoint "/schedulerconf" with the 
constant available in RMWebServices?



> Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint
> 
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8559.001.patch, YARN-8559.002.patch
>
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558940#comment-16558940
 ] 

genericqa commented on YARN-6966:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 12m 
38s{color} | {color:red} Docker failed to build yetus/hadoop:20ca677. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-6966 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933195/YARN-6966-branch-3.0.0.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21384/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-3.0.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8488) Need to add "SUCCEED" state to YARN service

2018-07-26 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad reassigned YARN-8488:
--

Assignee: Suma Shivaprasad

> Need to add "SUCCEED" state to YARN service
> ---
>
> Key: YARN-8488
> URL: https://issues.apache.org/jira/browse/YARN-8488
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Major
>
> Existing YARN service has following states:
> {code} 
> public enum ServiceState {
>   ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING,
>   UPGRADING_AUTO_FINALIZE;
> }
> {code} 
> Ideally we should add "SUCCEEDED" state in order to support long running 
> applications like Tensorflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied

2018-07-26 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558936#comment-16558936
 ] 

Zian Chen commented on YARN-8509:
-

Update patch fixing failed UTs.

> Fix UserLimit calculation for preemption to balance scenario after queue 
> satisfied  
> 
>
> Key: YARN-8509
> URL: https://issues.apache.org/jira/browse/YARN-8509
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8509.001.patch, YARN-8509.002.patch
>
>
> In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total 
> pending resource based on user-limit percent and user-limit factor which will 
> cap pending resource for each user to the minimum of user-limit pending and 
> actual pending. This will prevent queue from taking more pending resource to 
> achieve queue balance after all queue satisfied with its ideal allocation.
>   
>  We need to change the logic to let queue pending can go beyond userlimit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied

2018-07-26 Thread Zian Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zian Chen updated YARN-8509:

Attachment: YARN-8509.002.patch

> Fix UserLimit calculation for preemption to balance scenario after queue 
> satisfied  
> 
>
> Key: YARN-8509
> URL: https://issues.apache.org/jira/browse/YARN-8509
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8509.001.patch, YARN-8509.002.patch
>
>
> In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total 
> pending resource based on user-limit percent and user-limit factor which will 
> cap pending resource for each user to the minimum of user-limit pending and 
> actual pending. This will prevent queue from taking more pending resource to 
> achieve queue balance after all queue satisfied with its ideal allocation.
>   
>  We need to change the logic to let queue pending can go beyond userlimit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly

2018-07-26 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558931#comment-16558931
 ] 

Gour Saha commented on YARN-8429:
-

Mistakenly had a test commented out in patch 003. Undoing that in patch 004. 
Thanks [~billie.rinaldi] for catching that.

> Improve diagnostic message when artifact is not set properly
> 
>
> Key: YARN-8429
> URL: https://issues.apache.org/jira/browse/YARN-8429
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8429.001.patch, YARN-8429.002.patch, 
> YARN-8429.003.patch, YARN-8429.004.patch
>
>
> Steps:
> 1) Create launch json file. Replace "artifact" with "artifacts"
> 2) launch yarn service app with cli
> The application launch fails with below error
> {code}
> [xxx xxx]$ yarn app -launch test2-2 test.json 
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /xxx/test.json
> 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms
> 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be 
> absolute path: /xxx/xxx
> {code}
> artifact field is not mandatory. However, If that field is specified 
> incorrectly, launch cmd should fail with proper error. 
> Here, The error message regarding Dest file is misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8522) Application fails with InvalidResourceRequestException

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558930#comment-16558930
 ] 

genericqa commented on YARN-8522:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 19s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 56s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
11s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 56m 16s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8522 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933265/YARN-8522.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4b93b11f816f 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d70d845 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21382/testReport/ |
| Max. process+thread count | 397 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21382/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Application 

[jira] [Updated] (YARN-8429) Improve diagnostic message when artifact is not set properly

2018-07-26 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8429:

Attachment: YARN-8429.004.patch

> Improve diagnostic message when artifact is not set properly
> 
>
> Key: YARN-8429
> URL: https://issues.apache.org/jira/browse/YARN-8429
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8429.001.patch, YARN-8429.002.patch, 
> YARN-8429.003.patch, YARN-8429.004.patch
>
>
> Steps:
> 1) Create launch json file. Replace "artifact" with "artifacts"
> 2) launch yarn service app with cli
> The application launch fails with below error
> {code}
> [xxx xxx]$ yarn app -launch test2-2 test.json 
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /xxx/test.json
> 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms
> 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be 
> absolute path: /xxx/xxx
> {code}
> artifact field is not mandatory. However, If that field is specified 
> incorrectly, launch cmd should fail with proper error. 
> Here, The error message regarding Dest file is misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container

2018-07-26 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558907#comment-16558907
 ] 

Eric Yang commented on YARN-8587:
-

There is backward incompatibility concern with distributed shell, where we 
allow user to specify multiple unix command and output redirection of log file. 
 For fixing this transient false positive, logging mechanism behavior will 
change.  stderr, stdout will contain command output.  stderr.txt and stdout.txt 
will container more information including command launched, and docker errors.  
Hence, this can only be fixed if we agree that the incompatible change is 
negligible.

> Delays are noticed to launch docker container
> -
>
> Key: YARN-8587
> URL: https://issues.apache.org/jira/browse/YARN-8587
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Priority: Major
>  Labels: Docker
>
> Launch dshell application. Wait for application to go in RUNNING state.
> {code:java}
> yarn  jar /xx/hadoop-yarn-applications-distributedshell-*.jar  -shell_command 
> "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker 
> -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar
> {code}
> Find out container allocation. Run docker inspect command for docker 
> containers launched by app.
> Sometimes, the container is allocated to NM but docker PID is not up.
> {code:java}
> Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null 
> xxx "sudo su - -c \"docker ps  -a | grep 
> container_e02_1531189225093_0003_01_02\" root" failed after 0 retries 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8508) GPU does not get released even though the container is killed

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558905#comment-16558905
 ] 

genericqa commented on YARN-8508:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 44s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 17s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m  
1s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 70m 46s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8508 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933259/YARN-8505.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4ab7d00c07b9 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / be150a1 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21381/testReport/ |
| Max. process+thread count | 408 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21381/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> GPU  does not get released even though the container is killed

[jira] [Updated] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied

2018-07-26 Thread Zian Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zian Chen updated YARN-8509:

Description: 
In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total 
pending resource based on user-limit percent and user-limit factor which will 
cap pending resource for each user to the minimum of user-limit pending and 
actual pending. This will prevent queue from taking more pending resource to 
achieve queue balance after all queue satisfied with its ideal allocation.
  
 We need to change the logic to let queue pending can go beyond userlimit.

  was:
In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total 
pending resource based on user-limit percent and user-limit factor which will 
cap pending resource for each user to the minimum of user-limit pending and 
actual pending. This will prevent queue from taking more pending resource to 
achieve queue balance after all queue satisfied with its ideal allocation.
 
We need to change the logic to let queue pending can reach at most 
(Queue_max_capacity - Queue_used_capacity).


> Fix UserLimit calculation for preemption to balance scenario after queue 
> satisfied  
> 
>
> Key: YARN-8509
> URL: https://issues.apache.org/jira/browse/YARN-8509
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8509.001.patch
>
>
> In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total 
> pending resource based on user-limit percent and user-limit factor which will 
> cap pending resource for each user to the minimum of user-limit pending and 
> actual pending. This will prevent queue from taking more pending resource to 
> achieve queue balance after all queue satisfied with its ideal allocation.
>   
>  We need to change the logic to let queue pending can go beyond userlimit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied

2018-07-26 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558882#comment-16558882
 ] 

Zian Chen commented on YARN-8509:
-

talked with [~sunilg], It can go beyond maxCap - usedCap. Because a user can 
ask for 1maps. but cluster can run a max of 1000. In this case, as soon as 
each map finish, other one pending will get scheduled

> Fix UserLimit calculation for preemption to balance scenario after queue 
> satisfied  
> 
>
> Key: YARN-8509
> URL: https://issues.apache.org/jira/browse/YARN-8509
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8509.001.patch
>
>
> In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total 
> pending resource based on user-limit percent and user-limit factor which will 
> cap pending resource for each user to the minimum of user-limit pending and 
> actual pending. This will prevent queue from taking more pending resource to 
> achieve queue balance after all queue satisfied with its ideal allocation.
>  
> We need to change the logic to let queue pending can reach at most 
> (Queue_max_capacity - Queue_used_capacity).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8571) Validate service principal format prior to launching yarn service

2018-07-26 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558880#comment-16558880
 ] 

Eric Yang commented on YARN-8571:
-

Patch 002 added error handling for NPE.

> Validate service principal format prior to launching yarn service
> -
>
> Key: YARN-8571
> URL: https://issues.apache.org/jira/browse/YARN-8571
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security, yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8571.001.patch, YARN-8571.002.patch
>
>
> Hadoop client and server interaction is designed to validate the service 
> principal before RPC request is permitted.  In YARN service, the same 
> security model is enforced to prevent replay attack.   However, end user 
> might submit JSON that looks like this to YARN service REST API:
> {code}
> {
>   "name": "sleeper-service",
>   "version": "1.0.0",
>   "components" :
>   [
> {
>   "name": "sleeper",
>   "number_of_containers": 2,
>   "launch_command": "sleep 90",
>   "resource": {
> "cpus": 1,
> "memory": "256"
>   }
> }
>   ],
>   "kerberos_principal" : {
> "principal_name" : "ambari...@example.com",
> "keytab" : "file:///etc/security/keytabs/smokeuser.headless.keytab"
>   }
> }
> {code}
> The kerberos principal is end user kerberos principal instead of service 
> principal.  This does not work properly because YARN service application 
> master requires to run with a service principal to communicate with YARN CLI 
> client via Hadoop RPC.  Without breaking Hadoop security design in this JIRA, 
> it might be in our best interest to validate principal_name during 
> submission, and report error message when someone tries to run YARN service 
> with user principal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8571) Validate service principal format prior to launching yarn service

2018-07-26 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8571:

Attachment: YARN-8571.002.patch

> Validate service principal format prior to launching yarn service
> -
>
> Key: YARN-8571
> URL: https://issues.apache.org/jira/browse/YARN-8571
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security, yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8571.001.patch, YARN-8571.002.patch
>
>
> Hadoop client and server interaction is designed to validate the service 
> principal before RPC request is permitted.  In YARN service, the same 
> security model is enforced to prevent replay attack.   However, end user 
> might submit JSON that looks like this to YARN service REST API:
> {code}
> {
>   "name": "sleeper-service",
>   "version": "1.0.0",
>   "components" :
>   [
> {
>   "name": "sleeper",
>   "number_of_containers": 2,
>   "launch_command": "sleep 90",
>   "resource": {
> "cpus": 1,
> "memory": "256"
>   }
> }
>   ],
>   "kerberos_principal" : {
> "principal_name" : "ambari...@example.com",
> "keytab" : "file:///etc/security/keytabs/smokeuser.headless.keytab"
>   }
> }
> {code}
> The kerberos principal is end user kerberos principal instead of service 
> principal.  This does not work properly because YARN service application 
> master requires to run with a service principal to communicate with YARN CLI 
> client via Hadoop RPC.  Without breaking Hadoop security design in this JIRA, 
> it might be in our best interest to validate principal_name during 
> submission, and report error message when someone tries to run YARN service 
> with user principal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8522) Application fails with InvalidResourceRequestException

2018-07-26 Thread Zian Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zian Chen updated YARN-8522:

Attachment: YARN-8522.002.patch

> Application fails with InvalidResourceRequestException
> --
>
> Key: YARN-8522
> URL: https://issues.apache.org/jira/browse/YARN-8522
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8522.001.patch, YARN-8522.002.patch
>
>
> Launch multiple streaming app simultaneously. Here, sometimes one of the 
> application fails with below stack trace.
> {code}
> 18/07/02 07:14:32 INFO retry.RetryInvocationHandler: 
> java.net.ConnectException: Call From xx.xx.xx.xx/xx.xx.xx.xx to 
> xx.xx.xx.xx:8032 failed on connection exception: java.net.ConnectException: 
> Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
> ApplicationClientProtocolPBClientImpl.submitApplication over null. Retrying 
> after sleeping for 3ms.
> 18/07/02 07:14:32 WARN client.RequestHedgingRMFailoverProxyProvider: 
> Invocation returned exception: 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, only one resource request with * is allowed
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
>  on [rm2], so propagating back to caller.
> 18/07/02 07:14:32 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /user/hrt_qa/.staging/job_1530515284077_0007
> 18/07/02 07:14:32 ERROR streaming.StreamJob: Error Launching job : 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, only one resource request with * is allowed
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> Streaming Command Failed!{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional 

[jira] [Commented] (YARN-8522) Application fails with InvalidResourceRequestException

2018-07-26 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558870#comment-16558870
 ] 

Zian Chen commented on YARN-8522:
-

Thanks for the suggestions [~sunilg], Update patch 002.

> Application fails with InvalidResourceRequestException
> --
>
> Key: YARN-8522
> URL: https://issues.apache.org/jira/browse/YARN-8522
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8522.001.patch, YARN-8522.002.patch
>
>
> Launch multiple streaming app simultaneously. Here, sometimes one of the 
> application fails with below stack trace.
> {code}
> 18/07/02 07:14:32 INFO retry.RetryInvocationHandler: 
> java.net.ConnectException: Call From xx.xx.xx.xx/xx.xx.xx.xx to 
> xx.xx.xx.xx:8032 failed on connection exception: java.net.ConnectException: 
> Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
> ApplicationClientProtocolPBClientImpl.submitApplication over null. Retrying 
> after sleeping for 3ms.
> 18/07/02 07:14:32 WARN client.RequestHedgingRMFailoverProxyProvider: 
> Invocation returned exception: 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, only one resource request with * is allowed
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
>  on [rm2], so propagating back to caller.
> 18/07/02 07:14:32 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /user/hrt_qa/.staging/job_1530515284077_0007
> 18/07/02 07:14:32 ERROR streaming.StreamJob: Error Launching job : 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, only one resource request with * is allowed
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> Streaming Command Failed!{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, 

[jira] [Commented] (YARN-8508) GPU does not get released even though the container is killed

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558847#comment-16558847
 ] 

genericqa commented on YARN-8508:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 45s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 48s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m  
0s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 76m 41s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8508 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933248/YARN-8505.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 95544b2179d7 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / be150a1 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21380/testReport/ |
| Max. process+thread count | 336 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21380/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   

[jira] [Commented] (YARN-8508) GPU does not get released even though the container is killed

2018-07-26 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558838#comment-16558838
 ] 

Chandni Singh commented on YARN-8508:
-

[~shaneku...@gmail.com] [~eyang] could you please review patch 2?

> GPU  does not get released even though the container is killed
> --
>
> Key: YARN-8508
> URL: https://issues.apache.org/jira/browse/YARN-8508
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8505.001.patch, YARN-8505.002.patch
>
>
> GPU failed to release even though the container using it is being killed
> {Code}
> 2018-07-06 05:22:26,201 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from RUNNING to 
> KILLING
> 2018-07-06 05:22:26,250 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from RUNNING to 
> KILLING
> 2018-07-06 05:22:26,251 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(632)) - Application 
> application_1530854311763_0006 transitioned from RUNNING to 
> FINISHING_CONTAINERS_WAIT
> 2018-07-06 05:22:26,251 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:cleanupContainer(734)) - Cleaning up container 
> container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,358 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:getContainerPid(1102)) - Could not get pid for 
> container_e20_1530854311763_0006_01_02. Waited for 5000 ms.
> 2018-07-06 05:22:31,358 WARN  launcher.ContainerLaunch 
> (ContainerLaunch.java:cleanupContainer(784)) - Container clean up before pid 
> file created container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,359 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:reapDockerContainerNoPid(940)) - Unable to obtain pid, 
> but docker container request detected. Attempting to reap container 
> container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,494 INFO  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : 
> /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/launch_container.sh
> 2018-07-06 05:22:31,500 INFO  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : 
> /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/container_tokens
> 2018-07-06 05:22:31,510 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL
> 2018-07-06 05:22:31,510 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL
> 2018-07-06 05:22:31,512 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from 
> CONTAINER_CLEANEDUP_AFTER_KILL to DONE
> 2018-07-06 05:22:31,513 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from 
> CONTAINER_CLEANEDUP_AFTER_KILL to DONE
> 2018-07-06 05:22:38,955 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0007_01_02 transitioned from NEW to SCHEDULED
> {Code}
> New container requesting for GPU fails to launch
> {code}
> 2018-07-06 05:22:39,048 ERROR nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleLaunchForLaunchType(550)) - 
> ResourceHandlerChain.preStart() failed!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException:
>  Failed to find enough GPUs, 
> requestor=container_e20_1530854311763_0007_01_02, #RequestedGPUs=2, 
> #availableGpus=1
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.internalAssignGpus(GpuResourceAllocator.java:225)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.assignGpus(GpuResourceAllocator.java:173)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceHandlerImpl.preStart(GpuResourceHandlerImpl.java:98)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.preStart(ResourceHandlerChain.java:75)
>   at 
> 

[jira] [Updated] (YARN-8508) GPU does not get released even though the container is killed

2018-07-26 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8508:

Attachment: YARN-8505.002.patch

> GPU  does not get released even though the container is killed
> --
>
> Key: YARN-8508
> URL: https://issues.apache.org/jira/browse/YARN-8508
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8505.001.patch, YARN-8505.002.patch
>
>
> GPU failed to release even though the container using it is being killed
> {Code}
> 2018-07-06 05:22:26,201 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from RUNNING to 
> KILLING
> 2018-07-06 05:22:26,250 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from RUNNING to 
> KILLING
> 2018-07-06 05:22:26,251 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(632)) - Application 
> application_1530854311763_0006 transitioned from RUNNING to 
> FINISHING_CONTAINERS_WAIT
> 2018-07-06 05:22:26,251 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:cleanupContainer(734)) - Cleaning up container 
> container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,358 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:getContainerPid(1102)) - Could not get pid for 
> container_e20_1530854311763_0006_01_02. Waited for 5000 ms.
> 2018-07-06 05:22:31,358 WARN  launcher.ContainerLaunch 
> (ContainerLaunch.java:cleanupContainer(784)) - Container clean up before pid 
> file created container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,359 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:reapDockerContainerNoPid(940)) - Unable to obtain pid, 
> but docker container request detected. Attempting to reap container 
> container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,494 INFO  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : 
> /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/launch_container.sh
> 2018-07-06 05:22:31,500 INFO  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : 
> /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/container_tokens
> 2018-07-06 05:22:31,510 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL
> 2018-07-06 05:22:31,510 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL
> 2018-07-06 05:22:31,512 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from 
> CONTAINER_CLEANEDUP_AFTER_KILL to DONE
> 2018-07-06 05:22:31,513 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from 
> CONTAINER_CLEANEDUP_AFTER_KILL to DONE
> 2018-07-06 05:22:38,955 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0007_01_02 transitioned from NEW to SCHEDULED
> {Code}
> New container requesting for GPU fails to launch
> {code}
> 2018-07-06 05:22:39,048 ERROR nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleLaunchForLaunchType(550)) - 
> ResourceHandlerChain.preStart() failed!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException:
>  Failed to find enough GPUs, 
> requestor=container_e20_1530854311763_0007_01_02, #RequestedGPUs=2, 
> #availableGpus=1
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.internalAssignGpus(GpuResourceAllocator.java:225)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.assignGpus(GpuResourceAllocator.java:173)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceHandlerImpl.preStart(GpuResourceHandlerImpl.java:98)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.preStart(ResourceHandlerChain.java:75)
>   at 
> 

[jira] [Commented] (YARN-8545) YARN native service should return container if launch failed

2018-07-26 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558835#comment-16558835
 ] 

Chandni Singh commented on YARN-8545:
-

[~billie.rinaldi] [~eyang] Do you have any comments on patch 1?

> YARN native service should return container if launch failed
> 
>
> Key: YARN-8545
> URL: https://issues.apache.org/jira/browse/YARN-8545
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8545.001.patch
>
>
> In some cases, container launch may fail but container will not be properly 
> returned to RM. 
> This could happen when AM trying to prepare container launch context but 
> failed w/o sending container launch context to NM (Once container launch 
> context is sent to NM, NM will report failed container to RM).
> Exception like: 
> {code:java}
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591)
>   at 
> org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388)
>   at 
> org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253)
>   at 
> org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152)
>   at 
> org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745){code}
> And even after container launch context prepare failed, AM still trying to 
> monitor container's readiness:
> {code:java}
> 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO  monitor.ServiceMonitor - 
> Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 
> 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP 
> presence", exception="java.io.IOException: primary-worker-0: IP is not 
> available yet"
> ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6906) Cluster Node API and Cluster Nodes API should report resource types information

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558803#comment-16558803
 ] 

genericqa commented on YARN-6906:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 69m 
50s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}130m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-6906 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933235/YARN-6906.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux e89b3442cf1f 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / be150a1 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21378/testReport/ |
| Max. process+thread count | 1309 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21378/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Cluster Node API and Cluster Nodes API should 

[jira] [Updated] (YARN-8587) Delays are noticed to launch docker container

2018-07-26 Thread Eric Badger (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8587:
--
Labels: Docker  (was: )

> Delays are noticed to launch docker container
> -
>
> Key: YARN-8587
> URL: https://issues.apache.org/jira/browse/YARN-8587
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Priority: Major
>  Labels: Docker
>
> Launch dshell application. Wait for application to go in RUNNING state.
> {code:java}
> yarn  jar /xx/hadoop-yarn-applications-distributedshell-*.jar  -shell_command 
> "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker 
> -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar
> {code}
> Find out container allocation. Run docker inspect command for docker 
> containers launched by app.
> Sometimes, the container is allocated to NM but docker PID is not up.
> {code:java}
> Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null 
> xxx "sudo su - -c \"docker ps  -a | grep 
> container_e02_1531189225093_0003_01_02\" root" failed after 0 retries 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8508) GPU does not get released even though the container is killed

2018-07-26 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8508:

Attachment: YARN-8505.001.patch

> GPU  does not get released even though the container is killed
> --
>
> Key: YARN-8508
> URL: https://issues.apache.org/jira/browse/YARN-8508
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8505.001.patch
>
>
> GPU failed to release even though the container using it is being killed
> {Code}
> 2018-07-06 05:22:26,201 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from RUNNING to 
> KILLING
> 2018-07-06 05:22:26,250 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from RUNNING to 
> KILLING
> 2018-07-06 05:22:26,251 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(632)) - Application 
> application_1530854311763_0006 transitioned from RUNNING to 
> FINISHING_CONTAINERS_WAIT
> 2018-07-06 05:22:26,251 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:cleanupContainer(734)) - Cleaning up container 
> container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,358 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:getContainerPid(1102)) - Could not get pid for 
> container_e20_1530854311763_0006_01_02. Waited for 5000 ms.
> 2018-07-06 05:22:31,358 WARN  launcher.ContainerLaunch 
> (ContainerLaunch.java:cleanupContainer(784)) - Container clean up before pid 
> file created container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,359 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:reapDockerContainerNoPid(940)) - Unable to obtain pid, 
> but docker container request detected. Attempting to reap container 
> container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,494 INFO  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : 
> /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/launch_container.sh
> 2018-07-06 05:22:31,500 INFO  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : 
> /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/container_tokens
> 2018-07-06 05:22:31,510 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL
> 2018-07-06 05:22:31,510 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL
> 2018-07-06 05:22:31,512 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from 
> CONTAINER_CLEANEDUP_AFTER_KILL to DONE
> 2018-07-06 05:22:31,513 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from 
> CONTAINER_CLEANEDUP_AFTER_KILL to DONE
> 2018-07-06 05:22:38,955 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0007_01_02 transitioned from NEW to SCHEDULED
> {Code}
> New container requesting for GPU fails to launch
> {code}
> 2018-07-06 05:22:39,048 ERROR nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleLaunchForLaunchType(550)) - 
> ResourceHandlerChain.preStart() failed!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException:
>  Failed to find enough GPUs, 
> requestor=container_e20_1530854311763_0007_01_02, #RequestedGPUs=2, 
> #availableGpus=1
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.internalAssignGpus(GpuResourceAllocator.java:225)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.assignGpus(GpuResourceAllocator.java:173)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceHandlerImpl.preStart(GpuResourceHandlerImpl.java:98)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.preStart(ResourceHandlerChain.java:75)
>   at 
> 

[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558735#comment-16558735
 ] 

genericqa commented on YARN-7863:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
11s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-7863 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933244/YARN-7863-YARN-3409.003.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21379/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Modify placement constraints to support node attributes
> ---
>
> Key: YARN-7863
> URL: https://issues.apache.org/jira/browse/YARN-7863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-7863-YARN-3409.002.patch, 
> YARN-7863-YARN-3409.003.patch, YARN-7863.v0.patch
>
>
> This Jira will track to *Modify existing placement constraints to support 
> node attributes.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7863) Modify placement constraints to support node attributes

2018-07-26 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558728#comment-16558728
 ] 

Sunil Govindan edited comment on YARN-7863 at 7/26/18 6:35 PM:
---

Thanks [~Naganarasimha] and [~cheersyang] for comments

A quick summary based on an offline discussion:
{noformat}
DS Example:

bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.0-SNAPSHOT.jar 
-shell_command Sleep
-shell_args 10
-num_containers 2
-master_memory 200
-container_memory 200
-placement_spec 
OR(IN,python=true:NOTIN,java=1.8):IN,os!=centos:NOTIN,env=prod,dev
{noformat}
For DS, I think its better we correlate with placement_spec for now for better 
easiness in review and api compatibility. A refactor improvement could be done 
later to improvise this inline with native service approach.
 Proposal for jira for later time is
 {{-placement_spec (python=true OR java!=1.8) AND (os!=centos) AND (env NOTIN 
(prod,dev))}}

Now to expand more on below approach in patch, 
{noformat}
-placement_spec 
OR(IN,python=true:NOTIN,java=1.8):IN,os!=centos:NOTIN,env=prod,dev{noformat}
 
 we could see this like
 *put me to a node where it has node-attribute python=true.*
 OR
 *dont put me in a node where it has node-attribute java=1.8*

AND
 *put me to a node where it has node-attribute os is not centos.*
 AND
 *dont put me in a node where it has attributes env with values prod or dev*

Also attaching a patch based on this including comments from Naga. I ll add 
test case in next patch.


was (Author: sunilg):
Thanks [~Naganarasimha] and [~cheersyang] for comments

A quick summary based on an offline discussion:
{noformat}
DS Example:

bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.0-SNAPSHOT.jar 
-shell_command Sleep
-shell_args 10
-num_containers 2
-master_memory 200
-container_memory 200
-placement_spec 
OR(IN,python=true:NOTIN,java=1.8):IN,os!=centos:NOTIN,env=prod,dev
{noformat}
For DS, I think its better we correlate with placement_spec for now for better 
easiness in review and api compatibility. A refactor improvement could be done 
later to improvise this inline with native service approach.
 Proposal for jira for later time is
 {{-placement_spec (python=true OR java!=1.8) AND (os!=centos) AND (env NOTIN 
(prod,dev))}}

Now to expand more on below approach in patch, 
{noformat}
-placement_spec 
OR(IN,python=true:NOTIN,java=1.8):IN,os!=centos:NOTIN,env=prod,dev{noformat}
 
 we could see this like
 *put me to a node where it has node-attribute python=true.*
 OR
 *dont put me in a node where it has node-attribute java=1.8*

AND
 *put me to a node where it has node-attribute os is not centos.*
 AND
 *dont put me in a node where it has attributes env with values prod or dev*

 

Also attaching a patch based on this including comments from Naga.

 

> Modify placement constraints to support node attributes
> ---
>
> Key: YARN-7863
> URL: https://issues.apache.org/jira/browse/YARN-7863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-7863-YARN-3409.002.patch, 
> YARN-7863-YARN-3409.003.patch, YARN-7863.v0.patch
>
>
> This Jira will track to *Modify existing placement constraints to support 
> node attributes.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7863) Modify placement constraints to support node attributes

2018-07-26 Thread Sunil Govindan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-7863:
-
Attachment: YARN-7863-YARN-3409.003.patch

> Modify placement constraints to support node attributes
> ---
>
> Key: YARN-7863
> URL: https://issues.apache.org/jira/browse/YARN-7863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-7863-YARN-3409.002.patch, 
> YARN-7863-YARN-3409.003.patch, YARN-7863.v0.patch
>
>
> This Jira will track to *Modify existing placement constraints to support 
> node attributes.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes

2018-07-26 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558728#comment-16558728
 ] 

Sunil Govindan commented on YARN-7863:
--

Thanks [~Naganarasimha] and [~cheersyang] for comments

A quick summary based on an offline discussion:
{noformat}
DS Example:

bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.0-SNAPSHOT.jar 
-shell_command Sleep
-shell_args 10
-num_containers 2
-master_memory 200
-container_memory 200
-placement_spec 
OR(IN,python=true:NOTIN,java=1.8):IN,os!=centos:NOTIN,env=prod,dev
{noformat}
For DS, I think its better we correlate with placement_spec for now for better 
easiness in review and api compatibility. A refactor improvement could be done 
later to improvise this inline with native service approach.
 Proposal for jira for later time is
 {{-placement_spec (python=true OR java!=1.8) AND (os!=centos) AND (env NOTIN 
(prod,dev))}}

Now to expand more on below approach in patch, 
{noformat}
-placement_spec 
OR(IN,python=true:NOTIN,java=1.8):IN,os!=centos:NOTIN,env=prod,dev{noformat}
 
 we could see this like
 *put me to a node where it has node-attribute python=true.*
 OR
 *dont put me in a node where it has node-attribute java=1.8*

AND
 *put me to a node where it has node-attribute os is not centos.*
 AND
 *dont put me in a node where it has attributes env with values prod or dev*

 

Also attaching a patch based on this including comments from Naga.

 

> Modify placement constraints to support node attributes
> ---
>
> Key: YARN-7863
> URL: https://issues.apache.org/jira/browse/YARN-7863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-7863-YARN-3409.002.patch, YARN-7863.v0.patch
>
>
> This Jira will track to *Modify existing placement constraints to support 
> node attributes.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8587) Delays are noticed to launch docker container

2018-07-26 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558720#comment-16558720
 ] 

Eric Yang edited comment on YARN-8587 at 7/26/18 6:30 PM:
--

This bug is result of docker run detach reports exit_code 0, but the process 
inside the container fail to run.  For a brief period of time, node manager 
will report back that container is in RUNNING state, then fail the container 
later.  One possible solution is to change container-executor for 
non-entry-point mode to become more similar to entry_point mode to run docker 
run in the foreground, and parent process have a set of retries for docker 
inspect to obtain PID.  This removes the possible false positive reporting of 
RUNNING state.  The synthetic timeout approach may kill container prematurely 
(or wait longer than necessary for failing container), if container takes more 
than 30 seconds (or configured values) to start the first process in the 
container.  Do we want to make non-entry-point to work like entry-point to 
prevent the false positive or we are ok with current state?


was (Author: eyang):
This bug is result of docker run detach reports exit_code 0, but the process 
inside the container fail to run.  For a brief period of time, node manager 
will report back that container is in RUNNING state, then fail the container 
later.  One possible solution is to change container-executor for 
non-entry-point mode to become more similar to entry_point mode to run docker 
run in the foreground, and parent process have a set of retries for docker 
inspect to obtain PID.  This removes the possible false positive reporting of 
RUNNING state.  The synthetic timeout approach may kill container prematurely 
(or wait longer than necessary for failing container), if container takes more 
than 30 seconds (or configured values) to start the first process in the 
container.

> Delays are noticed to launch docker container
> -
>
> Key: YARN-8587
> URL: https://issues.apache.org/jira/browse/YARN-8587
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Priority: Major
>
> Launch dshell application. Wait for application to go in RUNNING state.
> {code:java}
> yarn  jar /xx/hadoop-yarn-applications-distributedshell-*.jar  -shell_command 
> "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker 
> -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar
> {code}
> Find out container allocation. Run docker inspect command for docker 
> containers launched by app.
> Sometimes, the container is allocated to NM but docker PID is not up.
> {code:java}
> Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null 
> xxx "sudo su - -c \"docker ps  -a | grep 
> container_e02_1531189225093_0003_01_02\" root" failed after 0 retries 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8587) Delays are noticed to launch docker container

2018-07-26 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558720#comment-16558720
 ] 

Eric Yang commented on YARN-8587:
-

This bug is result of docker run detach reports exit_code 0, but the process 
inside the container fail to run.  For a brief period of time, node manager 
will report back that container is in RUNNING state, then fail the container 
later.  One possible solution is to change container-executor for 
non-entry-point mode to become more similar to entry_point mode to run docker 
run in the foreground, and parent process have a set of retries for docker 
inspect to obtain PID.  This removes the possible false positive reporting of 
RUNNING state.  The synthetic timeout approach may kill container prematurely 
(or wait longer than necessary for failing container), if container takes more 
than 30 seconds (or configured values) to start the first process in the 
container.

> Delays are noticed to launch docker container
> -
>
> Key: YARN-8587
> URL: https://issues.apache.org/jira/browse/YARN-8587
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Priority: Major
>
> Launch dshell application. Wait for application to go in RUNNING state.
> {code:java}
> yarn  jar /xx/hadoop-yarn-applications-distributedshell-*.jar  -shell_command 
> "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker 
> -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar
> {code}
> Find out container allocation. Run docker inspect command for docker 
> containers launched by app.
> Sometimes, the container is allocated to NM but docker PID is not up.
> {code:java}
> Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null 
> xxx "sudo su - -c \"docker ps  -a | grep 
> container_e02_1531189225093_0003_01_02\" root" failed after 0 retries 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8587) Delays are noticed to launch docker container

2018-07-26 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8587:


 Summary: Delays are noticed to launch docker container
 Key: YARN-8587
 URL: https://issues.apache.org/jira/browse/YARN-8587
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: Yesha Vora


Launch dshell application. Wait for application to go in RUNNING state.
{code:java}
yarn  jar /xx/hadoop-yarn-applications-distributedshell-*.jar  -shell_command 
"sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker 
-shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env 
YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar 
/usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar
{code}
Find out container allocation. Run docker inspect command for docker containers 
launched by app.

Sometimes, the container is allocated to NM but docker PID is not up.
{code:java}
Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null xxx 
"sudo su - -c \"docker ps  -a | grep 
container_e02_1531189225093_0003_01_02\" root" failed after 0 retries 
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA

2018-07-26 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Bálint Steinbach reassigned YARN-5464:


Assignee: Antal Bálint Steinbach

> Server-Side NM Graceful Decommissioning with RM HA
> --
>
> Key: YARN-5464
> URL: https://issues.apache.org/jira/browse/YARN-5464
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Robert Kanter
>Assignee: Antal Bálint Steinbach
>Priority: Major
> Attachments: YARN-5464.wip.patch
>
>
> Make sure to remove the note added by YARN-7094 about RM HA failover not 
> working right.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6906) Cluster Node API and Cluster Nodes API should report resource types information

2018-07-26 Thread Manikandan R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R reassigned YARN-6906:
--

Assignee: Manikandan R

> Cluster Node API and Cluster Nodes API should report resource types 
> information
> ---
>
> Key: YARN-6906
> URL: https://issues.apache.org/jira/browse/YARN-6906
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: YARN-3926
>Reporter: Daniel Templeton
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-6906.001.patch
>
>
> These endpoints currently report:
> {noformat}
> 
> /default-rack
> RUNNING
> localhost:51877
> localhost
> localhost:8042
> 1501534150336
> 3.0.0-beta1-SNAPSHOT
> 
> 4
> 5120
> 3072
> 4
> 0
> 0
> 0
> 0
> 0
> 
> 0
> 0
> 0.0
> 0
> 0
> 0.0
> 
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6906) Cluster Node API and Cluster Nodes API should report resource types information

2018-07-26 Thread Manikandan R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-6906:
---
Attachment: YARN-6906.001.patch

> Cluster Node API and Cluster Nodes API should report resource types 
> information
> ---
>
> Key: YARN-6906
> URL: https://issues.apache.org/jira/browse/YARN-6906
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: YARN-3926
>Reporter: Daniel Templeton
>Priority: Major
> Attachments: YARN-6906.001.patch
>
>
> These endpoints currently report:
> {noformat}
> 
> /default-rack
> RUNNING
> localhost:51877
> localhost
> localhost:8042
> 1501534150336
> 3.0.0-beta1-SNAPSHOT
> 
> 4
> 5120
> 3072
> 4
> 0
> 0
> 0
> 0
> 0
> 
> 0
> 0
> 0.0
> 0
> 0
> 0.0
> 
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6906) Cluster Node API and Cluster Nodes API should report resource types information

2018-07-26 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558632#comment-16558632
 ] 

Manikandan R commented on YARN-6906:


This request has been handled as part of YARN-7817. YARN-7817 introduced 
usedResource and availableResource nodes as part of NodeInfo which contains 
ResourceInformations as well. Added unit test cases to ensure the same in .001 
patch.

Another thing to note is, even though NM is not configured for any particular 
resource type configured at RM side, still those resource type would be 
available in NodeInfo API response, but with value as 0.

> Cluster Node API and Cluster Nodes API should report resource types 
> information
> ---
>
> Key: YARN-6906
> URL: https://issues.apache.org/jira/browse/YARN-6906
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: YARN-3926
>Reporter: Daniel Templeton
>Priority: Major
>
> These endpoints currently report:
> {noformat}
> 
> /default-rack
> RUNNING
> localhost:51877
> localhost
> localhost:8042
> 1501534150336
> 3.0.0-beta1-SNAPSHOT
> 
> 4
> 5120
> 3072
> 4
> 0
> 0
> 0
> 0
> 0
> 
> 0
> 0
> 0.0
> 0
> 0
> 0.0
> 
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8566) Add diagnostic message for unschedulable containers

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558628#comment-16558628
 ] 

genericqa commented on YARN-8566:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 52s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
47s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 14s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}155m 52s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8566 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933212/YARN-8566.006.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 92ede80f6b63 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a192295 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 

[jira] [Commented] (YARN-8242) YARN NM: OOM error while reading back the state store on recovery

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558620#comment-16558620
 ] 

genericqa commented on YARN-8242:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 53s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m  
9s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 78m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8242 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933220/YARN-8242.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 7e26fe890f0b 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a192295 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21376/testReport/ |
| Max. process+thread count | 301 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21376/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> YARN NM: OOM error while reading back the state store on 

[jira] [Comment Edited] (YARN-8584) Several typos in Log Aggregation related classes

2018-07-26 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558609#comment-16558609
 ] 

Bibin A Chundatt edited comment on YARN-8584 at 7/26/18 5:03 PM:
-

+1 LGTM.

Will commit it by tomorrow, if no one objects


was (Author: bibinchundatt):
+1 LGTM

Will commit it by tommorrow.

> Several typos in Log Aggregation related classes
> 
>
> Key: YARN-8584
> URL: https://issues.apache.org/jira/browse/YARN-8584
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-8584.001.patch
>
>
> There are typos in comments, log messages, method names, field names, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8584) Several typos in Log Aggregation related classes

2018-07-26 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558609#comment-16558609
 ] 

Bibin A Chundatt commented on YARN-8584:


+1 LGTM

Will commit it by tommorrow.

> Several typos in Log Aggregation related classes
> 
>
> Key: YARN-8584
> URL: https://issues.apache.org/jira/browse/YARN-8584
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-8584.001.patch
>
>
> There are typos in comments, log messages, method names, field names, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558574#comment-16558574
 ] 

genericqa commented on YARN-8517:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
34m 36s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 56s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 47m  8s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8517 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933223/YARN-8517.005.patch |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 8662c046cb19 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a192295 |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 410 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21377/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> getContainer and getContainers ResourceManager REST API methods are not 
> documented
> --
>
> Key: YARN-8517
> URL: https://issues.apache.org/jira/browse/YARN-8517
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Antal Bálint Steinbach
>Priority: Major
>  Labels: newbie, newbie++
> Attachments: YARN-8517.001.patch, YARN-8517.002.patch, 
> YARN-8517.003.patch, YARN-8517.004.patch, YARN-8517.005.patch
>
>
> Looking at the documentation here: 
> https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
> I cannot find documentation for 2 RM REST endpoints: 
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\}
> I suppose they are not intentionally undocumented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8508) GPU does not get released even though the container is killed

2018-07-26 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558571#comment-16558571
 ] 

Chandni Singh commented on YARN-8508:
-

This happens with a container that gets cleaned up before its pid file is 
created. To solve it, we need to release the resources at the end of 
\{{LinuxContainerExecutor.reapContainer()}} just like we do in 
\{{LinuxContainerExecutor.launchContainer()}}, 
{\{LinuxContainerExecutor.reLaunchContainer()}}, and 
\{{LinuxContainerExecutor.reacquireContainer}}.

Please see my explanation below:
Refer \{{container_e21_1532545600682_0001_01_02}} in 
yarn8505.nodemanager.log

- 002 is launched but its pid file is not created
{code}
2018-07-25 19:08:54,409 DEBUG util.ProcessIdFileReader 
(ProcessIdFileReader.java:getProcessId(53)) - Accessing pid from pid file 
/.../application_1532545600682_0001/container_e21_1532545600682_0001_01_02/container_e21_1532545600682_0001_01_02.pid
2018-07-25 19:08:54,409 DEBUG util.ProcessIdFileReader 
(ProcessIdFileReader.java:getProcessId(103)) - Got pid null from path 
/.../application_1532545600682_0001/container_e21_1532545600682_0001_01_02/container_e21_1532545600682_0001_01_02.pid
{code}

- Since application is killed, 002 is killed by ResourceManager
{code}
2018-07-25 19:08:54,643 DEBUG container.ContainerImpl 
(ContainerImpl.java:handle(2080)) - Processing 
container_e21_1532545600682_0001_01_02 of type CONTAINER_KILLED_ON_REQUEST
{code}

- The above triggers \{{ContainerLaunch.cleanupContainer()}} for 002. This 
happens before the pid file is created
{code}
2018-07-25 19:08:54,409 WARN launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(784)) - Container clean up before pid 
file created container_e21_1532545600682_0001_01_02
{code}

- \{{cleanupContainer}} invokes \{{reapDockerContainerNoPid(user)}}
{code}
2018-07-25 19:08:54,410 INFO launcher.ContainerLaunch 
(ContainerLaunch.java:reapDockerContainerNoPid(940)) - Unable to obtain pid, 
but docker container request detected. Attempting to reap container 
container_e21_1532545600682_0001_01_02
{code}

- \{{reapDockerContainerNoPid(user)}} calls \{{exec.reapContainer(...)}}
{code}
2018-07-25 19:08:54,412 DEBUG docker.DockerCommandExecutor 
(DockerCommandExecutor.java:executeDockerCommand(89)) - Running docker command: 
inspect docker-command=inspect format=\{{.State.Status}} 
name=container_e21_1532545600682_0001_01_02
2018-07-25 19:08:54,412 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:getPrivilegedOperationExecutionCommand(119)) 
- Privileged Execution Command Array: [/.../hadoop-yarn/bin/container-executor, 
--inspect-docker-container, --format=\{{.State.Status}}, 
container_e21_1532545600682_0001_01_02]
2018-07-25 19:08:54,530 DEBUG docker.DockerCommandExecutor 
(DockerCommandExecutor.java:getContainerStatus(160)) - Container Status: 
nonexistent ContainerId: container_e21_1532545600682_0001_01_02
2018-07-25 19:08:54,530 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:reapDockerContainerNoPid(948)) - Sent signal to docker 
container container_e21_1532545600682_0001_01_02 as user hrt_qa, 
result=success
{code}

- The problem is that the \{{reapContainer}} in \{{LinuxContainerExecutor}} 
doesn't release the resources assigned to the container. The below code snippet 
that performs these tasks after the container completes doesn't happen at this 
point.
{code}
 resourcesHandler.postExecute(containerId);

try {
 if (resourceHandlerChain != null) {
 LOG.info("{} POST Complete", containerId);
 resourceHandlerChain.postComplete(containerId);
 }
 } catch (ResourceHandlerException e) {
 LOG.warn("ResourceHandlerChain.postComplete failed for " +
 "containerId: " + containerId + ". Exception: " + e);
 }
 }
{code}

- The launch of container fails after 4 minutes and only then the resources are 
released.
{code}
2018-07-25 19:12:09,999 WARN nodemanager.LinuxContainerExecutor 
(LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
container_e21_1532545600682_0001_01_02 is : 27
2018-07-25 19:12:10,000 WARN nodemanager.LinuxContainerExecutor 
(LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
container-launch with container ID: container_e21_1532545600682_0001_01_02 
and exit code: 27
2018-07-25 19:12:10,000 INFO nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Container id: 
container_e21_1532545600682_0001_01_02
2018-07-25 19:12:10,003 INFO nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Docker inspect command: 
/usr/bin/docker inspect --format \{{.State.Pid}} 
container_e21_1532545600682_0001_01_02
2018-07-25 19:12:10,003 INFO nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Failed to write pid to file 
/cgroup/cpu/.../container_e21_1532545600682_0001_01_02/tasks - No such 

[jira] [Commented] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented

2018-07-26 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558458#comment-16558458
 ] 

Antal Bálint Steinbach commented on YARN-8517:
--

Thanks [~rkanter]. Descriptions added.

> getContainer and getContainers ResourceManager REST API methods are not 
> documented
> --
>
> Key: YARN-8517
> URL: https://issues.apache.org/jira/browse/YARN-8517
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Antal Bálint Steinbach
>Priority: Major
>  Labels: newbie, newbie++
> Attachments: YARN-8517.001.patch, YARN-8517.002.patch, 
> YARN-8517.003.patch, YARN-8517.004.patch, YARN-8517.005.patch
>
>
> Looking at the documentation here: 
> https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
> I cannot find documentation for 2 RM REST endpoints: 
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\}
> I suppose they are not intentionally undocumented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented

2018-07-26 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Bálint Steinbach updated YARN-8517:
-
Attachment: YARN-8517.005.patch

> getContainer and getContainers ResourceManager REST API methods are not 
> documented
> --
>
> Key: YARN-8517
> URL: https://issues.apache.org/jira/browse/YARN-8517
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Antal Bálint Steinbach
>Priority: Major
>  Labels: newbie, newbie++
> Attachments: YARN-8517.001.patch, YARN-8517.002.patch, 
> YARN-8517.003.patch, YARN-8517.004.patch, YARN-8517.005.patch
>
>
> Looking at the documentation here: 
> https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
> I cannot find documentation for 2 RM REST endpoints: 
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\}
> I suppose they are not intentionally undocumented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8242) YARN NM: OOM error while reading back the state store on recovery

2018-07-26 Thread Lurdh Pradeep Reddy Ambati (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lurdh Pradeep Reddy Ambati updated YARN-8242:
-
Attachment: YARN-8242.004.patch

> YARN NM: OOM error while reading back the state store on recovery
> -
>
> Key: YARN-8242
> URL: https://issues.apache.org/jira/browse/YARN-8242
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.6.0, 2.9.0, 2.6.5, 2.8.3, 3.1.0, 2.7.6, 3.0.2
>Reporter: Kanwaljeet Sachdev
>Priority: Critical
> Attachments: YARN-8242.001.patch, YARN-8242.002.patch, 
> YARN-8242.003.patch, YARN-8242.004.patch
>
>
> On startup the NM reads its state store and builds a list of application in 
> the state store to process. If the number of applications in the state store 
> is large and have a lot of "state" connected to it the NM can run OOM and 
> never get to the point that it can start processing the recovery.
> Since it never starts the recovery there is no way for the NM to ever pass 
> this point. It will require a change in heap size to get the NM started.
>  
> Following is the stack trace
> {code:java}
> at java.lang.OutOfMemoryError. (OutOfMemoryError.java:48) at 
> com.google.protobuf.ByteString.copyFrom (ByteString.java:192) at 
> com.google.protobuf.CodedInputStream.readBytes (CodedInputStream.java:324) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto. 
> (YarnProtos.java:47069) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto. 
> (YarnProtos.java:47014) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom
>  (YarnProtos.java:47102) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom
>  (YarnProtos.java:47097) at com.google.protobuf.CodedInputStream.readMessage 
> (CodedInputStream.java:309) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto. 
> (YarnProtos.java:41016) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto. 
> (YarnProtos.java:40942) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom
>  (YarnProtos.java:41080) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom
>  (YarnProtos.java:41075) at com.google.protobuf.CodedInputStream.readMessage 
> (CodedInputStream.java:309) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.
>  (YarnServiceProtos.java:24517) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.
>  (YarnServiceProtos.java:24464) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom
>  (YarnServiceProtos.java:24568) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom
>  (YarnServiceProtos.java:24563) at 
> com.google.protobuf.AbstractParser.parsePartialFrom (AbstractParser.java:141) 
> at com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:176) at 
> com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:188) at 
> com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:193) at 
> com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:49) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.parseFrom
>  (YarnServiceProtos.java:24739) at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState
>  (NMLeveldbStateStoreService.java:217) at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState
>  (NMLeveldbStateStoreService.java:170) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover
>  (ContainerManagerImpl.java:253) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit
>  (ContainerManagerImpl.java:237) at 
> org.apache.hadoop.service.AbstractService.init (AbstractService.java:163) at 
> org.apache.hadoop.service.CompositeService.serviceInit 
> (CompositeService.java:107) at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit 
> (NodeManager.java:255) at org.apache.hadoop.service.AbstractService.init 
> (AbstractService.java:163) at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager 
> (NodeManager.java:474) at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main 
> (NodeManager.java:521){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8584) Several typos in Log Aggregation related classes

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558391#comment-16558391
 ] 

genericqa commented on YARN-8584:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  3m 12s{color} 
| {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
10s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}102m 37s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.client.api.impl.TestTimelineClientV2Impl |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8584 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933202/YARN-8584.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c8a65fe2f6e5 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 
17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9089790 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 

[jira] [Commented] (YARN-8566) Add diagnostic message for unschedulable containers

2018-07-26 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558371#comment-16558371
 ] 

Szilard Nemeth commented on YARN-8566:
--

Uploaded new patch that fixes the UT failures.

> Add diagnostic message for unschedulable containers
> ---
>
> Key: YARN-8566
> URL: https://issues.apache.org/jira/browse/YARN-8566
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8566.001.patch, YARN-8566.002.patch, 
> YARN-8566.003.patch, YARN-8566.004.patch, YARN-8566.005.patch, 
> YARN-8566.006.patch
>
>
> If a queue is configured with maxResources set to 0 for a resource, and an 
> application is submitted to that queue that requests that resource, that 
> application will remain pending until it is removed or moved to a different 
> queue. This behavior can be realized without extended resources, but it’s 
> unlikely a user will create a queue that allows 0 memory or CPU. As the 
> number of resources in the system increases, this scenario will become more 
> common, and it will become harder to recognize these cases. Therefore, the 
> scheduler should indicate in the diagnostic string for an application if it 
> was not scheduled because of a 0 maxResources setting.
> Example configuration (fair-scheduler.xml) : 
> {code:java}
> 
>   10
> 
> 1 mb,2vcores
> 9 mb,4vcores, 0gpu
> 50
> -1.0f
> 2.0
> fair
>   
> 
> {code}
> Command: 
> {code:java}
> yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi 
> -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000;
> {code}
> The job hangs and the application diagnostic info is empty.
> Given that an exception is thrown before any mapper/reducer container is 
> created, the diagnostic message of the AM should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8566) Add diagnostic message for unschedulable containers

2018-07-26 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8566:
-
Attachment: YARN-8566.006.patch

> Add diagnostic message for unschedulable containers
> ---
>
> Key: YARN-8566
> URL: https://issues.apache.org/jira/browse/YARN-8566
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8566.001.patch, YARN-8566.002.patch, 
> YARN-8566.003.patch, YARN-8566.004.patch, YARN-8566.005.patch, 
> YARN-8566.006.patch
>
>
> If a queue is configured with maxResources set to 0 for a resource, and an 
> application is submitted to that queue that requests that resource, that 
> application will remain pending until it is removed or moved to a different 
> queue. This behavior can be realized without extended resources, but it’s 
> unlikely a user will create a queue that allows 0 memory or CPU. As the 
> number of resources in the system increases, this scenario will become more 
> common, and it will become harder to recognize these cases. Therefore, the 
> scheduler should indicate in the diagnostic string for an application if it 
> was not scheduled because of a 0 maxResources setting.
> Example configuration (fair-scheduler.xml) : 
> {code:java}
> 
>   10
> 
> 1 mb,2vcores
> 9 mb,4vcores, 0gpu
> 50
> -1.0f
> 2.0
> fair
>   
> 
> {code}
> Command: 
> {code:java}
> yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi 
> -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000;
> {code}
> The job hangs and the application diagnostic info is empty.
> Given that an exception is thrown before any mapper/reducer container is 
> created, the diagnostic message of the AM should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8566) Add diagnostic message for unschedulable containers

2018-07-26 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8566:
-
Attachment: (was: YARN-8566.006.patch)

> Add diagnostic message for unschedulable containers
> ---
>
> Key: YARN-8566
> URL: https://issues.apache.org/jira/browse/YARN-8566
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8566.001.patch, YARN-8566.002.patch, 
> YARN-8566.003.patch, YARN-8566.004.patch, YARN-8566.005.patch
>
>
> If a queue is configured with maxResources set to 0 for a resource, and an 
> application is submitted to that queue that requests that resource, that 
> application will remain pending until it is removed or moved to a different 
> queue. This behavior can be realized without extended resources, but it’s 
> unlikely a user will create a queue that allows 0 memory or CPU. As the 
> number of resources in the system increases, this scenario will become more 
> common, and it will become harder to recognize these cases. Therefore, the 
> scheduler should indicate in the diagnostic string for an application if it 
> was not scheduled because of a 0 maxResources setting.
> Example configuration (fair-scheduler.xml) : 
> {code:java}
> 
>   10
> 
> 1 mb,2vcores
> 9 mb,4vcores, 0gpu
> 50
> -1.0f
> 2.0
> fair
>   
> 
> {code}
> Command: 
> {code:java}
> yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi 
> -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000;
> {code}
> The job hangs and the application diagnostic info is empty.
> Given that an exception is thrown before any mapper/reducer container is 
> created, the diagnostic message of the AM should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8566) Add diagnostic message for unschedulable containers

2018-07-26 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8566:
-
Attachment: YARN-8566.006.patch

> Add diagnostic message for unschedulable containers
> ---
>
> Key: YARN-8566
> URL: https://issues.apache.org/jira/browse/YARN-8566
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8566.001.patch, YARN-8566.002.patch, 
> YARN-8566.003.patch, YARN-8566.004.patch, YARN-8566.005.patch
>
>
> If a queue is configured with maxResources set to 0 for a resource, and an 
> application is submitted to that queue that requests that resource, that 
> application will remain pending until it is removed or moved to a different 
> queue. This behavior can be realized without extended resources, but it’s 
> unlikely a user will create a queue that allows 0 memory or CPU. As the 
> number of resources in the system increases, this scenario will become more 
> common, and it will become harder to recognize these cases. Therefore, the 
> scheduler should indicate in the diagnostic string for an application if it 
> was not scheduled because of a 0 maxResources setting.
> Example configuration (fair-scheduler.xml) : 
> {code:java}
> 
>   10
> 
> 1 mb,2vcores
> 9 mb,4vcores, 0gpu
> 50
> -1.0f
> 2.0
> fair
>   
> 
> {code}
> Command: 
> {code:java}
> yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi 
> -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000;
> {code}
> The job hangs and the application diagnostic info is empty.
> Given that an exception is thrown before any mapper/reducer container is 
> created, the diagnostic message of the AM should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8566) Add diagnostic message for unschedulable containers

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558307#comment-16558307
 ] 

genericqa commented on YARN-8566:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 57s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
46s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 25s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}152m 15s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8566 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933189/YARN-8566.005.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c8e0325e0694 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9089790 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 

[jira] [Created] (YARN-8586) Extract log aggregation related fields and methods from RMAppImpl

2018-07-26 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8586:


 Summary: Extract log aggregation related fields and methods from 
RMAppImpl
 Key: YARN-8586
 URL: https://issues.apache.org/jira/browse/YARN-8586
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


Given that RMAppImpl is already above 2000 lines and it is very complex, as a 
very simple 
and straightforward step, all Log aggregation related fields and methods could 
be extracted to a new class.
The clients of RMAppImpl may access the same methods and RMAppImpl would 
delegate all those calls to the newly introduced class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8585) Add test class for DefaultAMSProcessor

2018-07-26 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8585:


 Summary: Add test class for DefaultAMSProcessor
 Key: YARN-8585
 URL: https://issues.apache.org/jira/browse/YARN-8585
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


Since this class has no test coverage at all, it seems to be a good idea to 
test it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8584) Several typos in Log Aggregation related classes

2018-07-26 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8584:
-
Attachment: YARN-8584.001.patch

> Several typos in Log Aggregation related classes
> 
>
> Key: YARN-8584
> URL: https://issues.apache.org/jira/browse/YARN-8584
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-8584.001.patch
>
>
> There are typos in comments, log messages, method names, field names, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8584) Several typos in Log Aggregation related classes

2018-07-26 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8584:


 Summary: Several typos in Log Aggregation related classes
 Key: YARN-8584
 URL: https://issues.apache.org/jira/browse/YARN-8584
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


There are typos in comments, log messages, method names, field names, etc.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558250#comment-16558250
 ] 

genericqa commented on YARN-6966:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
10s{color} | {color:red} Docker failed to build yetus/hadoop:20ca677. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-6966 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933195/YARN-6966-branch-3.0.0.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21373/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-3.0.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-26 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558248#comment-16558248
 ] 

Szilard Nemeth commented on YARN-6966:
--

Hi [~haibochen]!
Uploaded patch for branch-3.0.0
I hope the patch was named correctly.
Is there anything I should do with this jira at this point? 
Thanks!

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-3.0.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-26 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-6966:
-
Attachment: YARN-6966-branch-3.0.0.001.patch

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-3.0.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8566) Add diagnostic message for unschedulable containers

2018-07-26 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8566:
-
Attachment: YARN-8566.005.patch

> Add diagnostic message for unschedulable containers
> ---
>
> Key: YARN-8566
> URL: https://issues.apache.org/jira/browse/YARN-8566
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8566.001.patch, YARN-8566.002.patch, 
> YARN-8566.003.patch, YARN-8566.004.patch, YARN-8566.005.patch
>
>
> If a queue is configured with maxResources set to 0 for a resource, and an 
> application is submitted to that queue that requests that resource, that 
> application will remain pending until it is removed or moved to a different 
> queue. This behavior can be realized without extended resources, but it’s 
> unlikely a user will create a queue that allows 0 memory or CPU. As the 
> number of resources in the system increases, this scenario will become more 
> common, and it will become harder to recognize these cases. Therefore, the 
> scheduler should indicate in the diagnostic string for an application if it 
> was not scheduled because of a 0 maxResources setting.
> Example configuration (fair-scheduler.xml) : 
> {code:java}
> 
>   10
> 
> 1 mb,2vcores
> 9 mb,4vcores, 0gpu
> 50
> -1.0f
> 2.0
> fair
>   
> 
> {code}
> Command: 
> {code:java}
> yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi 
> -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000;
> {code}
> The job hangs and the application diagnostic info is empty.
> Given that an exception is thrown before any mapper/reducer container is 
> created, the diagnostic message of the AM should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8566) Add diagnostic message for unschedulable containers

2018-07-26 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558163#comment-16558163
 ] 

Szilard Nemeth commented on YARN-8566:
--

Hi [~rkanter]!
Thanks for the quick review, see my new patch with the fixes.
1. Fixed
2. I would leave as it is, as the exception is passed to LOG.warn, so the 
message will be printed anyway. Do you agree with this?
3. Good point, I reused the exception message in 
{{DefaultAMSProcessor.handleInvalidResourceException}}, but I would like to 
keep {{InvalidResourceType}} for the purpose of deciding about updating the 
diagnostics message or not.
I would only like to update the message if the {{InvalidResourceException}} is 
created because of the resource was less than zero or greater than the maximum 
allocation. As this exception is created in other parts of the code for other 
reasons, I would not touch the diagnostic message for those cases.
About {{SchedulerUtils.throwInvalidResourceException}}: I wanted to keep the 
details on how the {{InvalidResourceException}} is created instead of providing 
the message from the callers so this is why I do the formatting of the 
exception message with this method.


> Add diagnostic message for unschedulable containers
> ---
>
> Key: YARN-8566
> URL: https://issues.apache.org/jira/browse/YARN-8566
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8566.001.patch, YARN-8566.002.patch, 
> YARN-8566.003.patch, YARN-8566.004.patch
>
>
> If a queue is configured with maxResources set to 0 for a resource, and an 
> application is submitted to that queue that requests that resource, that 
> application will remain pending until it is removed or moved to a different 
> queue. This behavior can be realized without extended resources, but it’s 
> unlikely a user will create a queue that allows 0 memory or CPU. As the 
> number of resources in the system increases, this scenario will become more 
> common, and it will become harder to recognize these cases. Therefore, the 
> scheduler should indicate in the diagnostic string for an application if it 
> was not scheduled because of a 0 maxResources setting.
> Example configuration (fair-scheduler.xml) : 
> {code:java}
> 
>   10
> 
> 1 mb,2vcores
> 9 mb,4vcores, 0gpu
> 50
> -1.0f
> 2.0
> fair
>   
> 
> {code}
> Command: 
> {code:java}
> yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi 
> -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000;
> {code}
> The job hangs and the application diagnostic info is empty.
> Given that an exception is thrown before any mapper/reducer container is 
> created, the diagnostic message of the AM should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8252) Fix ServiceMaster main not found

2018-07-26 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved YARN-8252.

Resolution: Not A Problem

the problem is caused by that the the transitive dependencies are missing; 
which are put there by "fastlaunch" - a better way of reporting the error would 
be better...

> Fix ServiceMaster main not found
> 
>
> Key: YARN-8252
> URL: https://issues.apache.org/jira/browse/YARN-8252
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Zoltan Haindrich
>Priority: Major
>
> I was looking into using yarn services; however it seems for some reason it 
> is not possible to run {{ServiceMaster}} class from the jar...I might be 
> missing some fundamental...so I've put together a shellscript to make it easy 
> for anyone to checkI would be happy with any exception beyond main not 
> found
> [ServiceMaster.main 
> method|https://github.com/apache/hadoop/blob/67f239c42f676237290d18ddbbc9aec369267692/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ServiceMaster.java#L305]
> {code:java}
> #!/bin/bash
> set -e
> wget -O core.jar  -nv 
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-yarn-services-core/3.1.0/hadoop-yarn-services-core-3.1.0.jar
> unzip -qn core.jar
> cat > org/apache/hadoop/yarn/service/ServiceMaster2.java << EOF
> package org.apache.hadoop.yarn.service;
> public class ServiceMaster2 {
>   public static void main(String[] args) throws Exception {
> System.out.println("asd!");
>   }
> }
> EOF
> javac org/apache/hadoop/yarn/service/ServiceMaster2.java
> jar -cf a1.jar org
> find org -name ServiceMaster*
> # this will print "asd!"
> java -cp a1.jar org.apache.hadoop.yarn.service.ServiceMaster2
> #the following invocations result in:
> # Error: Could not find or load main class 
> org.apache.hadoop.yarn.service.ServiceMaster
> #
> set +e
> java -cp a1.jar org.apache.hadoop.yarn.service.ServiceMaster
> java -cp core.jar org.apache.hadoop.yarn.service.ServiceMaster
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8252) Fix ServiceMaster main not found

2018-07-26 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556948#comment-16556948
 ] 

Zoltan Haindrich commented on YARN-8252:


[~jmarhuen]: I've found out that this is "normal" but very misleaing at first
you have to run {{yarn app -enableFastLaunch  }} to enable it prior to 
launching the service - I think it would be better to either do automatically 
on first launch ; or give a better explanation about what's going south

> Fix ServiceMaster main not found
> 
>
> Key: YARN-8252
> URL: https://issues.apache.org/jira/browse/YARN-8252
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Zoltan Haindrich
>Priority: Major
>
> I was looking into using yarn services; however it seems for some reason it 
> is not possible to run {{ServiceMaster}} class from the jar...I might be 
> missing some fundamental...so I've put together a shellscript to make it easy 
> for anyone to checkI would be happy with any exception beyond main not 
> found
> [ServiceMaster.main 
> method|https://github.com/apache/hadoop/blob/67f239c42f676237290d18ddbbc9aec369267692/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ServiceMaster.java#L305]
> {code:java}
> #!/bin/bash
> set -e
> wget -O core.jar  -nv 
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-yarn-services-core/3.1.0/hadoop-yarn-services-core-3.1.0.jar
> unzip -qn core.jar
> cat > org/apache/hadoop/yarn/service/ServiceMaster2.java << EOF
> package org.apache.hadoop.yarn.service;
> public class ServiceMaster2 {
>   public static void main(String[] args) throws Exception {
> System.out.println("asd!");
>   }
> }
> EOF
> javac org/apache/hadoop/yarn/service/ServiceMaster2.java
> jar -cf a1.jar org
> find org -name ServiceMaster*
> # this will print "asd!"
> java -cp a1.jar org.apache.hadoop.yarn.service.ServiceMaster2
> #the following invocations result in:
> # Error: Could not find or load main class 
> org.apache.hadoop.yarn.service.ServiceMaster
> #
> set +e
> java -cp a1.jar org.apache.hadoop.yarn.service.ServiceMaster
> java -cp core.jar org.apache.hadoop.yarn.service.ServiceMaster
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org