[jira] [Commented] (YARN-9155) Can't re-run a submarine job, if the previous job with the same service name has finished

2019-01-07 Thread Zac Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735602#comment-16735602
 ] 

Zac Zhou commented on YARN-9155:


[~leftnoteasy], Thanks a lot for your comments. it makes sense to me. I'll work 
on it ~

> Can't re-run a submarine job, if the previous job with the same service name 
> has finished
> -
>
> Key: YARN-9155
> URL: https://issues.apache.org/jira/browse/YARN-9155
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Assignee: Zac Zhou
>Priority: Major
>
> Yarn native service doesn't clean up its HDFS service path when it is 
> finished.
> So if we don't execute "yarn app -destroy " command before the next run of a 
> submarine job. we would get the following exception:
> 2018-12-24 11:38:02,493 ERROR 
> org.apache.hadoop.yarn.service.utils.CoreFileSystem: Dir 
> /user/hadoop//services/distributed-tf-gpu-ml4/${service_name}.json 
> exists: hdfs://mldev/user/hadoop/**
> /services/distributed-tf-gpu-ml4/${service_name}.json 8472
> 2018-12-24 11:38:02,494 ERROR 
> org.apache.hadoop.yarn.service.webapp.ApiServer: Failed to create service 
> ${service_name}: {}
> java.lang.reflect.UndeclaredThrowableException
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1748)
>  at 
> org.apache.hadoop.yarn.service.webapp.ApiServer.createService(ApiServer.java:131)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
>  at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOu
> tInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
>  at 
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJav
> aMethodDispatcher.java:75)
>  at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
>  at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>  at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
>  at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>  at 
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:8
> 4)
>  at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1
> 542)
>  at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1
> 473)
>  at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:14
> 19)
>  at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:14
> 09)
>  at 
> com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
>  at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
>  at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>  at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>  at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:89)
>  at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941)
>  at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:1
> 79)
>  at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829)
>  at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
>  at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)
>  at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)
>  at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)
>  at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)
>  at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)
>  at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>  at 
> 

[jira] [Commented] (YARN-9155) Can't re-run a submarine job, if the previous job with the same service name has finished

2019-01-06 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735244#comment-16735244
 ] 

Wangda Tan commented on YARN-9155:
--

[~yuan_zac]. can we just add a Submarine cli option to remove old job folder if 
exists? By default we can turn it off, and print a log to Submarine cli output 
to hint user about the option if job dir exists.

> Can't re-run a submarine job, if the previous job with the same service name 
> has finished
> -
>
> Key: YARN-9155
> URL: https://issues.apache.org/jira/browse/YARN-9155
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Assignee: Zac Zhou
>Priority: Major
>
> Yarn native service doesn't clean up its HDFS service path when it is 
> finished.
> So if we don't execute "yarn app -destroy " command before the next run of a 
> submarine job. we would get the following exception:
> 2018-12-24 11:38:02,493 ERROR 
> org.apache.hadoop.yarn.service.utils.CoreFileSystem: Dir 
> /user/hadoop//services/distributed-tf-gpu-ml4/${service_name}.json 
> exists: hdfs://mldev/user/hadoop/**
> /services/distributed-tf-gpu-ml4/${service_name}.json 8472
> 2018-12-24 11:38:02,494 ERROR 
> org.apache.hadoop.yarn.service.webapp.ApiServer: Failed to create service 
> ${service_name}: {}
> java.lang.reflect.UndeclaredThrowableException
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1748)
>  at 
> org.apache.hadoop.yarn.service.webapp.ApiServer.createService(ApiServer.java:131)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
>  at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOu
> tInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
>  at 
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJav
> aMethodDispatcher.java:75)
>  at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
>  at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>  at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
>  at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>  at 
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:8
> 4)
>  at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1
> 542)
>  at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1
> 473)
>  at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:14
> 19)
>  at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:14
> 09)
>  at 
> com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
>  at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
>  at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>  at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>  at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:89)
>  at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941)
>  at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:1
> 79)
>  at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829)
>  at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
>  at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)
>  at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)
>  at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)
>  at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)
>  at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)
>  at 
> 

[jira] [Commented] (YARN-9155) Can't re-run a submarine job, if the previous job with the same service name has finished

2018-12-24 Thread Zac Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728293#comment-16728293
 ] 

Zac Zhou commented on YARN-9155:


Or should we add an after service listeners interface to yarn native service, 
so that users can specify want they want to do after yarn service is finished. 
If we use this way, YARN-8725 can be resolved easily as well.

> Can't re-run a submarine job, if the previous job with the same service name 
> has finished
> -
>
> Key: YARN-9155
> URL: https://issues.apache.org/jira/browse/YARN-9155
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Assignee: Zac Zhou
>Priority: Major
>
> Yarn native service doesn't clean up its HDFS service path when it is 
> finished.
> So if we don't execute "yarn app -destroy " command before the next run of a 
> submarine job. we would get the following exception:
> 2018-12-24 11:38:02,493 ERROR 
> org.apache.hadoop.yarn.service.utils.CoreFileSystem: Dir 
> /user/hadoop//services/distributed-tf-gpu-ml4/${service_name}.json 
> exists: hdfs://mldev/user/hadoop/**
> /services/distributed-tf-gpu-ml4/${service_name}.json 8472
> 2018-12-24 11:38:02,494 ERROR 
> org.apache.hadoop.yarn.service.webapp.ApiServer: Failed to create service 
> ${service_name}: {}
> java.lang.reflect.UndeclaredThrowableException
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1748)
>  at 
> org.apache.hadoop.yarn.service.webapp.ApiServer.createService(ApiServer.java:131)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
>  at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOu
> tInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
>  at 
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJav
> aMethodDispatcher.java:75)
>  at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
>  at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>  at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
>  at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>  at 
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:8
> 4)
>  at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1
> 542)
>  at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1
> 473)
>  at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:14
> 19)
>  at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:14
> 09)
>  at 
> com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
>  at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
>  at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>  at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>  at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:89)
>  at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941)
>  at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:1
> 79)
>  at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829)
>  at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
>  at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)
>  at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)
>  at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)
>  at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)
>  at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)
>  at 
>