[jira] [Created] (YARN-6655) We should validate yarn native services application submission side to ensure that the hostname should be less than 63 characters
Karam Singh created YARN-6655: - Summary: We should validate yarn native services application submission side to ensure that the hostname should be less than 63 characters Key: YARN-6655 URL: https://issues.apache.org/jira/browse/YARN-6655 Project: Hadoop YARN Issue Type: Sub-task Components: yarn-native-services Affects Versions: yarn-native-services Reporter: Karam Singh Fix For: yarn-native-services According to RFC 1035 the length of a FQDN is limited to 255 characters, and each label (node delimited by a dot in the hostname) is limited to 63 characters So We should validate yarn native services application submission side to ensure that the hostname should be less than 63 characters -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6544) Add Null check RegistryDNS service while parsing registry records
[ https://issues.apache.org/jira/browse/YARN-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karam Singh updated YARN-6544: -- Attachment: YARN-6544-yarn-native-services.004.patch Fixing whitespace issues > Add Null check RegistryDNS service while parsing registry records > - > > Key: YARN-6544 > URL: https://issues.apache.org/jira/browse/YARN-6544 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: yarn-native-services >Reporter: Karam Singh >Assignee: Karam Singh > Fix For: yarn-native-services > > Attachments: YARN-6544-yarn-native-services.001.patch, > YARN-6544-yarn-native-services.002.patch, > YARN-6544-yarn-native-services.002.patch, > YARN-6544-yarn-native-services.003.patch, > YARN-6544-yarn-native-services.004.patch > > > Add Null check RegistryDNS service while parsing registry records for Yarn > persistance attribute. > As of now It assumes that yarn registry record always contain yarn > persistance which is not the case -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6544) Add Null check RegistryDNS service while parsing registry records
[ https://issues.apache.org/jira/browse/YARN-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karam Singh updated YARN-6544: -- Attachment: YARN-6544-yarn-native-services.003.patch > Add Null check RegistryDNS service while parsing registry records > - > > Key: YARN-6544 > URL: https://issues.apache.org/jira/browse/YARN-6544 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: yarn-native-services >Reporter: Karam Singh >Assignee: Karam Singh > Fix For: yarn-native-services > > Attachments: YARN-6544-yarn-native-services.001.patch, > YARN-6544-yarn-native-services.002.patch, > YARN-6544-yarn-native-services.002.patch, > YARN-6544-yarn-native-services.003.patch > > > Add Null check RegistryDNS service while parsing registry records for Yarn > persistance attribute. > As of now It assumes that yarn registry record always contain yarn > persistance which is not the case -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6544) Add Null check RegistryDNS service while parsing registry records
[ https://issues.apache.org/jira/browse/YARN-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karam Singh updated YARN-6544: -- Attachment: YARN-6544-yarn-native-services.002.patch > Add Null check RegistryDNS service while parsing registry records > - > > Key: YARN-6544 > URL: https://issues.apache.org/jira/browse/YARN-6544 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: yarn-native-services >Reporter: Karam Singh >Assignee: Karam Singh > Fix For: yarn-native-services > > Attachments: YARN-6544-yarn-native-services.001.patch, > YARN-6544-yarn-native-services.002.patch, > YARN-6544-yarn-native-services.002.patch > > > Add Null check RegistryDNS service while parsing registry records for Yarn > persistance attribute. > As of now It assumes that yarn registry record always contain yarn > persistance which is not the case -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6544) Add Null check RegistryDNS service while parsing registry records
[ https://issues.apache.org/jira/browse/YARN-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karam Singh updated YARN-6544: -- Attachment: YARN-6544-yarn-native-services.002.patch > Add Null check RegistryDNS service while parsing registry records > - > > Key: YARN-6544 > URL: https://issues.apache.org/jira/browse/YARN-6544 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: yarn-native-services >Reporter: Karam Singh >Assignee: Karam Singh > Fix For: yarn-native-services > > Attachments: YARN-6544-yarn-native-services.001.patch, > YARN-6544-yarn-native-services.002.patch > > > Add Null check RegistryDNS service while parsing registry records for Yarn > persistance attribute. > As of now It assumes that yarn registry record always contain yarn > persistance which is not the case -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6544) Add Null check RegistryDNS service while parsing registry records
[ https://issues.apache.org/jira/browse/YARN-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karam Singh updated YARN-6544: -- Attachment: (was: YARN-6544-yarn-native-services.002.patch) > Add Null check RegistryDNS service while parsing registry records > - > > Key: YARN-6544 > URL: https://issues.apache.org/jira/browse/YARN-6544 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: yarn-native-services >Reporter: Karam Singh >Assignee: Karam Singh > Fix For: yarn-native-services > > Attachments: YARN-6544-yarn-native-services.001.patch > > > Add Null check RegistryDNS service while parsing registry records for Yarn > persistance attribute. > As of now It assumes that yarn registry record always contain yarn > persistance which is not the case -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6544) Add Null check RegistryDNS service while parsing registry records
[ https://issues.apache.org/jira/browse/YARN-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karam Singh updated YARN-6544: -- Attachment: YARN-6544-yarn-native-services.002.patch > Add Null check RegistryDNS service while parsing registry records > - > > Key: YARN-6544 > URL: https://issues.apache.org/jira/browse/YARN-6544 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: yarn-native-services >Reporter: Karam Singh >Assignee: Karam Singh > Fix For: yarn-native-services > > Attachments: YARN-6544-yarn-native-services.001.patch, > YARN-6544-yarn-native-services.002.patch > > > Add Null check RegistryDNS service while parsing registry records for Yarn > persistance attribute. > As of now It assumes that yarn registry record always contain yarn > persistance which is not the case -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6544) Add Null check RegistryDNS service while parsing registry records
[ https://issues.apache.org/jira/browse/YARN-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karam Singh updated YARN-6544: -- Attachment: (was: YARN-6544-yarn-native-services.002.patch) > Add Null check RegistryDNS service while parsing registry records > - > > Key: YARN-6544 > URL: https://issues.apache.org/jira/browse/YARN-6544 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: yarn-native-services >Reporter: Karam Singh >Assignee: Karam Singh > Fix For: yarn-native-services > > Attachments: YARN-6544-yarn-native-services.001.patch > > > Add Null check RegistryDNS service while parsing registry records for Yarn > persistance attribute. > As of now It assumes that yarn registry record always contain yarn > persistance which is not the case -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6544) Add Null check RegistryDNS service while parsing registry records
[ https://issues.apache.org/jira/browse/YARN-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karam Singh updated YARN-6544: -- Attachment: YARN-6544-yarn-native-services.002.patch second patch Add else part for if statement of null check tried to address checkstyle warnings also > Add Null check RegistryDNS service while parsing registry records > - > > Key: YARN-6544 > URL: https://issues.apache.org/jira/browse/YARN-6544 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: yarn-native-services >Reporter: Karam Singh >Assignee: Karam Singh > Fix For: yarn-native-services > > Attachments: YARN-6544-yarn-native-services.001.patch, > YARN-6544-yarn-native-services.002.patch > > > Add Null check RegistryDNS service while parsing registry records for Yarn > persistance attribute. > As of now It assumes that yarn registry record always contain yarn > persistance which is not the case -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6544) Add Null check RegistryDNS service while parsing registry records
[ https://issues.apache.org/jira/browse/YARN-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karam Singh updated YARN-6544: -- Attachment: YARN-6544-yarn-native-services.001.patch Initial patch to add Null check in dns service while parsing reigstry record for yarn persistance attribute > Add Null check RegistryDNS service while parsing registry records > - > > Key: YARN-6544 > URL: https://issues.apache.org/jira/browse/YARN-6544 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: yarn-native-services >Reporter: Karam Singh > Fix For: yarn-native-services > > Attachments: YARN-6544-yarn-native-services.001.patch > > > Add Null check RegistryDNS service while parsing registry records for Yarn > persistance attribute. > As of now It assumes that yarn registry record always contain yarn > persistance which is not the case -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6544) Add Null check RegistryDNS service while parsing registry records
[ https://issues.apache.org/jira/browse/YARN-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karam Singh updated YARN-6544: -- Fix Version/s: (was: YARN-4757) yarn-native-services > Add Null check RegistryDNS service while parsing registry records > - > > Key: YARN-6544 > URL: https://issues.apache.org/jira/browse/YARN-6544 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: yarn-native-services >Reporter: Karam Singh > Fix For: yarn-native-services > > > Add Null check RegistryDNS service while parsing registry records for Yarn > persistance attribute. > As of now It assumes that yarn registry record always contain yarn > persistance which is not the case -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6544) Add Null check RegistryDNS service while parsing registry records
[ https://issues.apache.org/jira/browse/YARN-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karam Singh updated YARN-6544: -- Affects Version/s: (was: YARN-4757) yarn-native-services > Add Null check RegistryDNS service while parsing registry records > - > > Key: YARN-6544 > URL: https://issues.apache.org/jira/browse/YARN-6544 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: yarn-native-services >Reporter: Karam Singh > Fix For: yarn-native-services > > > Add Null check RegistryDNS service while parsing registry records for Yarn > persistance attribute. > As of now It assumes that yarn registry record always contain yarn > persistance which is not the case -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6544) Add Null check RegistryDNS service while parsing registry records
Karam Singh created YARN-6544: - Summary: Add Null check RegistryDNS service while parsing registry records Key: YARN-6544 URL: https://issues.apache.org/jira/browse/YARN-6544 Project: Hadoop YARN Issue Type: Sub-task Components: yarn-native-services Affects Versions: YARN-4757 Reporter: Karam Singh Fix For: YARN-4757 Add Null check RegistryDNS service while parsing registry records for Yarn persistance attribute. As of now It assumes that yarn registry record always contain yarn persistance which is not the case -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5439) ATS out log file keeps on getting filled WadlGeneratorJAXBGrammarGenerator$8 can not access a member of class javax.ws.rs.core.Response with modifiers "protected" exceptio
[ https://issues.apache.org/jira/browse/YARN-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karam Singh updated YARN-5439: -- Description: While running various different type of using tez found that ATS out log file keeps on getting filled with following type of excpetions: com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator attachTypes INFO: Couldn't find JAX-B element for class javax.ws.rs.core.Response com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 resolve SEVERE: null java.lang.IllegalAccessException: Class com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 can not access a member of class javax.ws.rs.core.Response with modifiers "protected" at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:102) at java.lang.Class.newInstance(Class.java:436) at com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8.resolve(WadlGeneratorJAXBGrammarGenerator.java:467) at com.sun.jersey.server.wadl.WadlGenerator$ExternalGrammarDefinition.resolve(WadlGenerator.java:181) at com.sun.jersey.server.wadl.ApplicationDescription.resolve(ApplicationDescription.java:81) at com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.attachTypes(WadlGeneratorJAXBGrammarGenerator.java:518) at com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:124) at com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104) at com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:120) at com.sun.jersey.server.impl.wadl.WadlMethodFactory$WadlOptionsMethodDispatcher.dispatch(WadlMethodFactory.java:98) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.security.http.RestCsrfPreventionFilter$ServletFilterHttpInteraction.proceed(RestCsrfPreventionFilter.java:269) at org.apache.hadoop.security.http.RestCsrfPreventionFilter.handleHttpInteraction(RestCsrfPreventionFilter.java:197) at org.apache.hadoop.security.http.RestCsrfPreventionFilter.doFilter(RestCsrfPreventionFilter.java:209) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:96) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:614) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:294) at
[jira] [Created] (YARN-5439) ATS out log file keeps on getting filled WadlGeneratorJAXBGrammarGenerator$8 can not access a member of class javax.ws.rs.core.Response with modifiers "protected" exceptio
Karam Singh created YARN-5439: - Summary: ATS out log file keeps on getting filled WadlGeneratorJAXBGrammarGenerator$8 can not access a member of class javax.ws.rs.core.Response with modifiers "protected" exceptions Key: YARN-5439 URL: https://issues.apache.org/jira/browse/YARN-5439 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.8.0, 2.7.3 Reporter: Karam Singh While running various different type of using tez found that ATS out log file keeps on getting filled with following type of excpetions: Jun 13, 2016 1:43:07 PM com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator attachTypes INFO: Couldn't find JAX-B element for class javax.ws.rs.core.Response com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 resolve SEVERE: null java.lang.IllegalAccessException: Class com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 can not access a member of class javax.ws.rs.core.Response with modifiers "protected" at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:102) at java.lang.Class.newInstance(Class.java:436) at com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8.resolve(WadlGeneratorJAXBGrammarGenerator.java:467) at com.sun.jersey.server.wadl.WadlGenerator$ExternalGrammarDefinition.resolve(WadlGenerator.java:181) at com.sun.jersey.server.wadl.ApplicationDescription.resolve(ApplicationDescription.java:81) at com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.attachTypes(WadlGeneratorJAXBGrammarGenerator.java:518) at com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:124) at com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104) at com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:120) at com.sun.jersey.server.impl.wadl.WadlMethodFactory$WadlOptionsMethodDispatcher.dispatch(WadlMethodFactory.java:98) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.security.http.RestCsrfPreventionFilter$ServletFilterHttpInteraction.proceed(RestCsrfPreventionFilter.java:269) at org.apache.hadoop.security.http.RestCsrfPreventionFilter.handleHttpInteraction(RestCsrfPreventionFilter.java:197) at org.apache.hadoop.security.http.RestCsrfPreventionFilter.doFilter(RestCsrfPreventionFilter.java:209) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:96) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at
[jira] [Created] (YARN-5438) TimelineClientImpl leaking FileSystem Instances causing Long running services like HiverServer2 daemon going OOM
Karam Singh created YARN-5438: - Summary: TimelineClientImpl leaking FileSystem Instances causing Long running services like HiverServer2 daemon going OOM Key: YARN-5438 URL: https://issues.apache.org/jira/browse/YARN-5438 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.8.0, 2.7.3 Reporter: Karam Singh TimelineClientImpl leaking FileSystem Instances causing Long running services like HiverServer2 daemon going OOM In org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl, FileSystem.newInstance is invoked and is not closed. Causing over time Filesystem instances getting accumulated in long runninh Client (like Hiveserver2), finally causing them to OOM -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5432) Lock already held by another process while LevelDB cache store creation for dag
Karam Singh created YARN-5432: - Summary: Lock already held by another process while LevelDB cache store creation for dag Key: YARN-5432 URL: https://issues.apache.org/jira/browse/YARN-5432 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.8.0, 2.7.3 Reporter: Karam Singh While running ATS stress tests, 15 concurrent ATS reads (python thread which gives ws/v1/time/TEZ_DAG_ID, ws/v1/time/TEZ_VERTEX_DI?primaryFilter=TEZ_DAG_ID: etc) calls. Note: Summary store for ATSv1.5 is RLD, but as we for each dag/application ATS also creates leveldb cache when vertex/task/taskattempts information is queried from ATS. Getting following type of excpetion very frequently in ATS logs :- 2016-07-23 00:01:56,089 [1517798697@qtp-1198158701-850] INFO org.apache.hadoop.service.AbstractService: Service LeveldbCache.timelineEntityGroupId_1469090881194_4832_application_1469090881194_4832 failed in state INITED; cause: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /grid/4/yarn_ats/atsv15_rld/timelineEntityGroupId_1469090881194_4832_application_1469090881194_4832-timeline-cache.ldb/LOCK: already held by process org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /grid/4/yarn_ats/atsv15_rld/timelineEntityGroupId_1469090881194_4832_application_1469090881194_4832-timeline-cache.ldb/LOCK: already held by process at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) at org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore.serviceInit(LevelDBCacheTimelineStore.java:108) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.timeline.EntityCacheItem.refreshCache(EntityCacheItem.java:113) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getCachedStore(EntityGroupFSTimelineStore.java:1021) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getTimelineStoresFromCacheIds(EntityGroupFSTimelineStore.java:936) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getTimelineStoresForRead(EntityGroupFSTimelineStore.java:989) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getEntities(EntityGroupFSTimelineStore.java:1041) at org.apache.hadoop.yarn.server.timeline.TimelineDataManager.doGetEntities(TimelineDataManager.java:168) at org.apache.hadoop.yarn.server.timeline.TimelineDataManager.getEntities(TimelineDataManager.java:138) at org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:117) at sun.reflect.GeneratedMethodAccessor82.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886) at
[jira] [Commented] (YARN-5314) ConcurrentModificationException in ATS
[ https://issues.apache.org/jira/browse/YARN-5314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363928#comment-15363928 ] Karam Singh commented on YARN-5314: --- Also observed exceptions : 2016-06-28 20:27:40,830 [735346597@qtp-1356054329-1032] INFO org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore: Using leveldb path /grid/4/yarn_ats/atsv15_rld/timelineEntityGroupId_1467 035018499_4949_dag_1467035018499_4949_1-timeline-cache.ldb 2016-06-28 20:27:40,831 [982564321@qtp-1356054329-921] WARN org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR javax.ws.rs.WebApplicationException: org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /grid/4/yarn_ats/atsv15_rld/timelineEntity GroupId_1467035018499_4998_application_1467035018499_4998-timeline-cache.ldb/LOCK: already held by process at org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:137) at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.security.http.RestCsrfPreventionFilter$ServletFilterHttpInteraction.proceed(RestCsrfPreventionFilter.java:269) at org.apache.hadoop.security.http.RestCsrfPreventionFilter.handleHttpInteraction(RestCsrfPreventionFilter.java:197) at org.apache.hadoop.security.http.RestCsrfPreventionFilter.doFilter(RestCsrfPreventionFilter.java:209) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:96) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:614) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:294) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:573) at
[jira] [Commented] (YARN-5314) ConcurrentModificationException in ATS
[ https://issues.apache.org/jira/browse/YARN-5314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363927#comment-15363927 ] Karam Singh commented on YARN-5314: --- When WS response ATS from was Internal Server Error (e.g. TEZ_VERTEX_ID/ error: Internal server error): On ATS logs it also threw ConcurrentModificationException e.g. 2016-06-24 18:37:21,281 [679114143@qtp-2027837674-97] WARN org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR javax.ws.rs.WebApplicationException: java.util.ConcurrentModificationException at org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntity(TimelineWebServices.java:166) at sun.reflect.GeneratedMethodAccessor61.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) > ConcurrentModificationException in ATS > -- > > Key: YARN-5314 > URL: https://issues.apache.org/jira/browse/YARN-5314 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.8.0, 2.9.0 >Reporter: Karam Singh > > ConcurrentModificationException seen in ATS logs while getting Entities in > ATS log. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5314) ConcurrentModificationException in ATS
Karam Singh created YARN-5314: - Summary: ConcurrentModificationException in ATS Key: YARN-5314 URL: https://issues.apache.org/jira/browse/YARN-5314 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.8.0, 2.9.0 Reporter: Karam Singh ConcurrentModificationException seen in ATS logs while getting Entities in ATS log. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5314) ConcurrentModificationException in ATS
[ https://issues.apache.org/jira/browse/YARN-5314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363926#comment-15363926 ] Karam Singh commented on YARN-5314: --- Types of exceptions seen: 2016-06-24 18:46:46,790 [228327414@qtp-2027837674-94] ERROR org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices: Error getting entity java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util.ArrayList$Itr.next(ArrayList.java:851) at org.apache.hadoop.yarn.server.timeline.EntityCacheItem.refreshCache(EntityCacheItem.java:136) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getCachedStore(EntityGroupFSTimelineStore.java:938) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getTimelineStoresFromCacheIds(EntityGroupFSTimelineStore.java:853) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getTimelineStoresForRead(EntityGroupFSTimelineStore.java:889) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getEntity(EntityGroupFSTimelineStore.java:980) at org.apache.hadoop.yarn.server.timeline.TimelineDataManager.doGetEntity(TimelineDataManager.java:213) at org.apache.hadoop.yarn.server.timeline.TimelineDataManager.getEntity(TimelineDataManager.java:201) at org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntity(TimelineWebServices.java:157) at sun.reflect.GeneratedMethodAccessor61.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.security.http.RestCsrfPreventionFilter$ServletFilterHttpInteraction.proceed(RestCsrfPreventionFilter.java:269) at org.apache.hadoop.security.http.RestCsrfPreventionFilter.handleHttpInteraction(RestCsrfPreventionFilter.java:197) at org.apache.hadoop.security.http.RestCsrfPreventionFilter.doFilter(RestCsrfPreventionFilter.java:209) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at
[jira] [Commented] (YARN-5296) NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl
[ https://issues.apache.org/jira/browse/YARN-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351351#comment-15351351 ] Karam Singh commented on YARN-5296: --- >From offline discussion with [~djp] Root cause is in YARN-4811 where we launch tasks (scheduleAtFixedRate) in MutableQuantiles but never get chance to terminate these tasks > NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl > --- > > Key: YARN-5296 > URL: https://issues.apache.org/jira/browse/YARN-5296 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 2.9.0 >Reporter: Karam Singh > > Ran tests in following manner, > 1. Run GridMix of 768 sequestionally around 17 times to execute about 12.9K > apps. > 2. After 4-5hrs take Check NM Heap using Memory Analyser. It report around > 96% Heap is being used my ContainerMetrics > 3. Run 7 more GridMix run for have around 18.2apps ran in total. Again check > NM heap using Memory Analyser again 96% heap is being used by > ContainerMetrics. > 4. Start one more grimdmix run, while run going on , NMs started going down > with OOM, around running 18.7K+, On analysing NM heap using Memory analyser, > OOM was caused by ContainerMetrics -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5296) NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl
Karam Singh created YARN-5296: - Summary: NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl Key: YARN-5296 URL: https://issues.apache.org/jira/browse/YARN-5296 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.8.0, 2.9.0 Reporter: Karam Singh Ran tests in following manner, 1. Run GridMix of 768 sequestionally around 17 times to execute about 12.9K apps. 2. After 4-5hrs take Check NM Heap using Memory Analyser. It report around 96% Heap is being used my ContainerMetrics 3. Run 7 more GridMix run for have around 18.2apps ran in total. Again check NM heap using Memory Analyser again 96% heap is being used by ContainerMetrics. 4. Start one more grimdmix run, while run going on , NMs started going down with OOM, around running 18.7K+, On analysing NM heap using Memory analyser, OOM was caused by ContainerMetrics -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (YARN-5195) RM crashed with NPE while handling APP_ATTEMPT_REMOVED event
[ https://issues.apache.org/jira/browse/YARN-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karam Singh updated YARN-5195: -- Comment: was deleted (was: cc [~gp.leftnoteasy]) > RM crashed with NPE while handling APP_ATTEMPT_REMOVED event > > > Key: YARN-5195 > URL: https://issues.apache.org/jira/browse/YARN-5195 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Karam Singh >Priority: Critical > > While running gridmix experiments one time came across incident where RM went > down with following exception > {noformat} > 2016-05-28 15:45:24,459 [ResourceManager Event Processor] FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type APP_ATTEMPT_REMOVED to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1282) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1469) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:497) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:860) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1319) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:127) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:704) > at java.lang.Thread.run(Thread.java:745) > 2016-05-28 15:45:24,460 [ApplicationMasterLauncher #49] INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Cleaning > master appattempt_1464449118385_0006_01 > 2016-05-28 15:45:24,460 [ResourceManager Event Processor] INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5195) RM crashed with NPE while handling APP_ATTEMPT_REMOVED event
[ https://issues.apache.org/jira/browse/YARN-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312725#comment-15312725 ] Karam Singh commented on YARN-5195: --- cc [~gp.leftnoteasy] > RM crashed with NPE while handling APP_ATTEMPT_REMOVED event > > > Key: YARN-5195 > URL: https://issues.apache.org/jira/browse/YARN-5195 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Karam Singh >Priority: Critical > > While running gridmix experiments one time came across incident where RM went > down with following exception > {noformat} > 2016-05-28 15:45:24,459 [ResourceManager Event Processor] FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type APP_ATTEMPT_REMOVED to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1282) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1469) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:497) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:860) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1319) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:127) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:704) > at java.lang.Thread.run(Thread.java:745) > 2016-05-28 15:45:24,460 [ApplicationMasterLauncher #49] INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Cleaning > master appattempt_1464449118385_0006_01 > 2016-05-28 15:45:24,460 [ResourceManager Event Processor] INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5195) RM crashed with NPE while handling APP_ATTEMPT_REMOVED event
Karam Singh created YARN-5195: - Summary: RM crashed with NPE while handling APP_ATTEMPT_REMOVED event Key: YARN-5195 URL: https://issues.apache.org/jira/browse/YARN-5195 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Karam Singh Priority: Critical While running gridmix experiments one time came across incident where RM went down with following exception {noformat} 2016-05-28 15:45:24,459 [ResourceManager Event Processor] FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1282) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1469) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:497) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:860) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1319) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:127) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:704) at java.lang.Thread.run(Thread.java:745) 2016-05-28 15:45:24,460 [ApplicationMasterLauncher #49] INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Cleaning master appattempt_1464449118385_0006_01 2016-05-28 15:45:24,460 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5036) RM crashed with NPE while handling CONTAINER_EXPIRED event
Karam Singh created YARN-5036: - Summary: RM crashed with NPE while handling CONTAINER_EXPIRED event Key: YARN-5036 URL: https://issues.apache.org/jira/browse/YARN-5036 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Affects Versions: 2.9.0 Reporter: Karam Singh I was running some tpcds queries against branch-2, build and after some hours RM crashed with following exception: 2016-04-30 08:40:34,332 [Ping Checker] INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor:
[jira] [Created] (YARN-4606) Sometimes Fairness inconjuncttions with UserLimitPercent and UserLimitFactor in queue leads to situation where it appears that applications in queue are getting starved or
Karam Singh created YARN-4606: - Summary: Sometimes Fairness inconjuncttions with UserLimitPercent and UserLimitFactor in queue leads to situation where it appears that applications in queue are getting starved or stuck Key: YARN-4606 URL: https://issues.apache.org/jira/browse/YARN-4606 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler, capacityscheduler Affects Versions: 2.7.1, 2.8.0 Reporter: Karam Singh Encountered while studying behaviour fairness with UserLimitPercent and UserLimitFactor during following test: Ran GridMix with Queue settings: Capacity=10, MaxCap=80, UserLimit=25 UserLimitFactor=32, FairOrderingPolicy only. Encountered a application starving situation where 33 application (190 apps completed out of 761 apps, queue can 345 containers) are running with total of 45 containers running, and that 12 extra only one app(the app was having around 18000 tasks) , all other apps were having AM running only no other containers were given any apps. After that app finished, there were 32 AMs that kept running without any containers for task being launched GridMix was run with following settings: gridmix.client.pending.queue.depth=10, gridmix.job-submission.policy=REPLAY, gridmix.client.submit.threads=5, gridmix.submit.multiplier=0.0001, gridmix.job.type=SLEEPJOB, mapreduce.framework.name=yarn, mapreduce.job.queuename=hive1, mapred.job.queue.name=hive1, gridmix.sleep.max-map-time=5000, gridmix.sleep.max-reduce-time=5000, gridmix.user.resolve.class=org.apache.hadoop.mapred.gridmix.RoundRobinUserResolver With Users file containing 4 users for RoundRobinUserResolver -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4606) Sometimes Fairness inconjuncttions with UserLimitPercent and UserLimitFactor in queue leads to situation where it appears that applications in queue are getting starved
[ https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106507#comment-15106507 ] Karam Singh commented on YARN-4606: --- >From offline discussion with [~wangda]: After looked at log & code, I think I understand what happened: The root cause is: we shouldn't activate application when it's in pending state. This is not a new issue, at least branch-2.6 contains this issue. This leads to #active-users in a queue increased, but new added active user cannot get resource (because application is in pending state) and old user hits user-limit (new added user lowers user-limits). > Sometimes Fairness inconjuncttions with UserLimitPercent and UserLimitFactor > in queue leads to situation where it appears that applications in queue are > getting starved or stuck > - > > Key: YARN-4606 > URL: https://issues.apache.org/jira/browse/YARN-4606 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Affects Versions: 2.8.0, 2.7.1 >Reporter: Karam Singh > > Encountered while studying behaviour fairness with UserLimitPercent and > UserLimitFactor during following test: > Ran GridMix with Queue settings: Capacity=10, MaxCap=80, UserLimit=25 > UserLimitFactor=32, FairOrderingPolicy only. Encountered a application > starving situation where 33 application (190 apps completed out of 761 apps, > queue can 345 containers) are running with total of 45 containers running, > and that 12 extra only one app(the app was having around 18000 tasks) , all > other apps were having AM running only no other containers were given any > apps. After that app finished, there were 32 AMs that kept running without > any containers for task being launched > GridMix was run with following settings: > gridmix.client.pending.queue.depth=10, gridmix.job-submission.policy=REPLAY, > gridmix.client.submit.threads=5, gridmix.submit.multiplier=0.0001, > gridmix.job.type=SLEEPJOB, mapreduce.framework.name=yarn, > mapreduce.job.queuename=hive1, mapred.job.queue.name=hive1, > gridmix.sleep.max-map-time=5000, gridmix.sleep.max-reduce-time=5000, > gridmix.user.resolve.class=org.apache.hadoop.mapred.gridmix.RoundRobinUserResolver > With Users file containing 4 users for RoundRobinUserResolver -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4565) When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, Sometimes lead to situation where all queue resources consumed by AMs only
Karam Singh created YARN-4565: - Summary: When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, Sometimes lead to situation where all queue resources consumed by AMs only Key: YARN-4565 URL: https://issues.apache.org/jira/browse/YARN-4565 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler, capacityscheduler Affects Versions: 2.7.1, 2.8.0 Reporter: Karam Singh When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, Sometimes lead to situation where all queue resources consumed by AMs only, So from users perpective it appears that all application in queue are stuck, whole queue capacity is comsumed by AMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4565) When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, Sometimes lead to situation where all queue resources consumed by AMs only
[ https://issues.apache.org/jira/browse/YARN-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089325#comment-15089325 ] Karam Singh commented on YARN-4565: --- Came across this issue while experimenting with Fairness in queue with CapacityScheduler. Ecountered a situation when FairOrderingPolicy with SizeBasedWeight is enabled on queue in CapacityScheduler, while running GridMix V3 that all queue queue resources are consume AMs Following are setting: Cluster Total memory capacity 864GB, Global AMResourcePercent=0.1 Global MaxApplications=1, minAllocationMb=2048, AM memory=2048, mapMemory=reduceMemory=2048 Queue Settings: Capacity=10 MaxCapacity=80 UserLimitFactor=8, UserLimitPercent=100, FairOrderingPolicy with SizeBasedWeight=True According to this at max only 35 AMs can run at a time simultaneously and total 345 containers can run in queue, Which was verified While running GridMixV3 (which submits 760 applications) with FairOderingPolicy Only (without SizeBasedWeight) While when ran same test with FairOderingPolicy with SizeBasedWeight=true, 345 AMs(applications) running and since all queue resources are used by AMs no more containers can run, causing all application to get stuck. Looks like sizeBasedWeight somehow changes/overrides amResoucePercent. > When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, > Sometimes lead to situation where all queue resources consumed by AMs only > > > Key: YARN-4565 > URL: https://issues.apache.org/jira/browse/YARN-4565 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Affects Versions: 2.8.0, 2.7.1 >Reporter: Karam Singh > > When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, > Sometimes lead to situation where all queue resources consumed by AMs only, > So from users perpective it appears that all application in queue are stuck, > whole queue capacity is comsumed by AMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4565) When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, Sometimes lead to situation where all queue resources consumed by AMs only
[ https://issues.apache.org/jira/browse/YARN-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089407#comment-15089407 ] Karam Singh commented on YARN-4565: --- GridMix V3 Trace information with GridMix run settings for test Trace runs 760 jobs with settings: gridmix.client.pending.queue.depth=10, gridmix.job-submission.policy=REPLAY gridmix.client.submit.threads=5, gridmix.submit.multiplier=0.0001 gridmix.job.type=SLEEPJOB, mapreduce.framework.name=yarn gridmix.sleep.max-map-time=5000, gridmix.sleep.max-reduce-time=5000 gridmix.user.resolve.class=org.apache.hadoop.mapred.gridmix.RoundRobinUserResolver With Users file containing 4 users for RoundRobinUserResolver Debugging with [~wangda] > When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, > Sometimes lead to situation where all queue resources consumed by AMs only > > > Key: YARN-4565 > URL: https://issues.apache.org/jira/browse/YARN-4565 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Affects Versions: 2.8.0, 2.7.1 >Reporter: Karam Singh > > When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, > Sometimes lead to situation where all queue resources consumed by AMs only, > So from users perpective it appears that all application in queue are stuck, > whole queue capacity is comsumed by AMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4565) When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, Sometimes lead to situation where all queue resources consumed by AMs only
[ https://issues.apache.org/jira/browse/YARN-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089424#comment-15089424 ] Karam Singh commented on YARN-4565: --- >From offline chat with [~wangda] I think I found what happened: When application added to scheduler, CapacityScheduler#allocate will be called: if (updateDemandForQueue != null) { updateDemandForQueue.getOrderingPolicy().demandUpdated(application); } And in FairOrderingPolicy#demandUpdated, when sizeBasedWeight is enabled: if (sizeBasedWeight) { entityRequiresReordering(schedulableEntity); } It will call reordering the schedulableEntity. It will essentially insert the entity to the TreeSet. This could happen before the application (schedulableEntity) is not in active application list. So we can get application's container allocated before application activated. > When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, > Sometimes lead to situation where all queue resources consumed by AMs only > > > Key: YARN-4565 > URL: https://issues.apache.org/jira/browse/YARN-4565 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Affects Versions: 2.8.0, 2.7.1 >Reporter: Karam Singh > > When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, > Sometimes lead to situation where all queue resources consumed by AMs only, > So from users perpective it appears that all application in queue are stuck, > whole queue capacity is comsumed by AMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2602) Generic History Service of TimelineServer sometimes not able to handle NPE
Karam Singh created YARN-2602: - Summary: Generic History Service of TimelineServer sometimes not able to handle NPE Key: YARN-2602 URL: https://issues.apache.org/jira/browse/YARN-2602 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Environment: ATS is running with AHS/GHS enabled to use TimelineStore. Running for 4-5 days, with many random example jobs running Reporter: Karam Singh ATS is running with AHS/GHS enabled to use TimelineStore. Running for 4-5 day, with many random example jobs running . When I ran WS API for AHS/GHS: {code} curl --negotiate -u : 'http://TIMELINE_SERFVER_WEPBAPP_ADDR/v1/applicationhistory/apps/application_1411579118376_0001' {code} It ran successfully. However {code} curl --negotiate -u : 'http://TIMELINE_SERFVER_WEPBAPP_ADDR/ws/v1/applicationhistory/apps' {exception:WebApplicationException,message:java.lang.NullPointerException,javaClassName:javax.ws.rs.WebApplicationException} {code} Failed with Internal server error 500. After looking at TimelineServer logs found that there was NPE: -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2602) Generic History Service of TimelineServer sometimes not able to handle NPE
[ https://issues.apache.org/jira/browse/YARN-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147453#comment-14147453 ] Karam Singh commented on YARN-2602: --- Following is Stack Trace from TimelineServer {code} 2014-09-24 22:53:34,634 WARN webapp.GenericExceptionHandler (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR javax.ws.rs.WebApplicationException: java.lang.NullPointerException at org.apache.hadoop.yarn.server.webapp.WebServices.getApps(WebServices.java:154) at org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSWebServices.getApps(AHSWebServices.java:83) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:96) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:572) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:542) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1223) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at
[jira] [Created] (YARN-2594) ResourceManger sometimes become un-responsive
Karam Singh created YARN-2594: - Summary: ResourceManger sometimes become un-responsive Key: YARN-2594 URL: https://issues.apache.org/jira/browse/YARN-2594 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Karam Singh ResoruceManager sometimes become un-responsive: There was in exception in ResourceManager log and contains only following type of messages: {code} 2014-09-19 19:13:45,241 INFO event.AsyncDispatcher (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000 2014-09-19 19:30:26,312 INFO event.AsyncDispatcher (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000 2014-09-19 19:47:07,351 INFO event.AsyncDispatcher (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000 2014-09-19 20:03:48,460 INFO event.AsyncDispatcher (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000 2014-09-19 20:20:29,542 INFO event.AsyncDispatcher (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000 2014-09-19 20:37:10,635 INFO event.AsyncDispatcher (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000 2014-09-19 20:53:51,722 INFO event.AsyncDispatcher (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2565) ResourceManager is fails to start when GenericHistoryService is enabled in secure mode without doing manual kinit as yarn
Karam Singh created YARN-2565: - Summary: ResourceManager is fails to start when GenericHistoryService is enabled in secure mode without doing manual kinit as yarn Key: YARN-2565 URL: https://issues.apache.org/jira/browse/YARN-2565 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.6.0 Environment: Secure cluster with ATS (timeline server enabled) and yarn.resourcemanager.system-metrics-publisher.enabled=true so that RM can send Application history to Timeline Store Reporter: Karam Singh Observed that RM fails to start in Secure mode when GenericeHistoryService is enabled and ResourceManager is set to use Timeline Store -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2565) ResourceManager is fails to start when GenericHistoryService is enabled in secure mode without doing manual kinit as yarn
[ https://issues.apache.org/jira/browse/YARN-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137204#comment-14137204 ] Karam Singh commented on YARN-2565: --- Observed that RM fails to start in Secure mode when GenericeHistoryService is enabled and ResourceManager is set to use Timeline Store {code} yarn.resourcemanager.keytab=RM_HOST yarn.resourcemanager.principal=RM_PRINCIPAL yarn.timeline-service.enabled=true yarn.timeline-service.hostname=ATS_HOST yarn.timeline-service.address=ATS_HOST:10200 yarn.timeline-service.webapp.address=ATS_HOST:8188 yarn.timeline-service.handler-thread-count=10 yarn.timeline-service.ttl-enable=true yarn.timeline-service.ttl-ms=60480 yarn.timeline-service.leveldb-timeline-store.path=/tm/timeline yarn.timeline-service.keytab=ATS_KEYTAB yarn.timeline-service.principal=ATS_PRINCIPAL yarn.timeline-service.webapp.spnego-principal=ATS_SPNEGO_PRINICPAL yarn.timeline-service.webapp.spnego-keytab-file=ATS_SPNEGO_KETAB yarn.timeline-service.http-authentication.type=kerberos yarn.timeline-service.http-authentication.kerberos.principal=ATS_SPNEGO_PRINICPAL yarn.timeline-service.http-authentication.kerberos.keytab=ATS_SPNEGO_KETAB yarn.timeline-service.generic-application-history.enabled=true yarn.timeline-service.generic-application-history.store-class='' yarn.resourcemanager.system-metrics-publisher.enabled=true yarn.resourcemanager.system-metrics-publisher.dispatcher.pool-size=10 {code} Stop ResoruceManager and Timelineserver Start Timelineserver. After ATS gets restart successfully. Start ResourceManager. RM fails to start with following exception : {code} 2014-09-15 10:58:57,735 WARN ipc.Client (Client.java:run(675)) - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2014-09-15 10:58:57,740 ERROR applicationhistoryservice.FileSystemApplicationHistoryStore (FileSystemApplicationHistoryStore.java:serviceInit(132)) - Error when initializing FileSystemHistoryStorage java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: RM_HOST; destination host is: NN_HOST:8020; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) at org.apache.hadoop.ipc.Client.call(Client.java:1423) at org.apache.hadoop.ipc.Client.call(Client.java:1372) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:219) at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:748) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy15.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1918) at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1105) at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1101) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1101) at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1413) at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.serviceInit(FileSystemApplicationHistoryStore.java:126) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter.serviceInit(RMApplicationHistoryWriter.java:99) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:490) at
[jira] [Created] (YARN-2559) ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher
Karam Singh created YARN-2559: - Summary: ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher Key: YARN-2559 URL: https://issues.apache.org/jira/browse/YARN-2559 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.6.0 Environment: Generice History Service is enabled in Timelineserver with yarn.resourcemanager.system-metrics-publisher.enabled=true So that ResourceManager should Timeline Store for recording application history information Reporter: Karam Singh ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set
Karam Singh created YARN-2449: - Summary: Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Key: YARN-2449 URL: https://issues.apache.org/jira/browse/YARN-2449 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Environment: Deploy security enabled cluster is ATS also enabled and running, but no hadoop.http.filter.initializers set in core-site.xml Reporter: Karam Singh Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from ATS, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b '/grid/0/hadoopqe/y6/YarnWSAPISubmitAppKillApp/timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set
[ https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karam Singh updated YARN-2449: -- Priority: Critical (was: Major) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set --- Key: YARN-2449 URL: https://issues.apache.org/jira/browse/YARN-2449 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Environment: Deploy security enabled cluster is ATS also enabled and running, but no hadoop.http.filter.initializers set in core-site.xml Reporter: Karam Singh Priority: Critical Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from ATS, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b '/grid/0/hadoopqe/y6/YarnWSAPISubmitAppKillApp/timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set
[ https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karam Singh updated YARN-2449: -- Description: Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from timelineserver, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b '/grid/0/hadoopqe/y6/YarnWSAPISubmitAppKillApp/timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail was: Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from ATS, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b '/grid/0/hadoopqe/y6/YarnWSAPISubmitAppKillApp/timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set --- Key: YARN-2449 URL: https://issues.apache.org/jira/browse/YARN-2449 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Environment: Deploy security enabled cluster is ATS also enabled and running, but no hadoop.http.filter.initializers set in core-site.xml Reporter: Karam Singh Priority: Critical Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from timelineserver, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b '/grid/0/hadoopqe/y6/YarnWSAPISubmitAppKillApp/timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set
[ https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108873#comment-14108873 ] Karam Singh commented on YARN-2449: --- Similarly If you run hadoop applications e.g. without settings hadoop.http.filter.initializers with timelineserver enabled e.g : {code} hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.5.0.2.2.0.0-532.jar pi 10 10 {code} Application submission fails with following type of excpetion: {code} org.codehaus.jackson.map.exc.UnrecognizedPropertyException: Unrecognized field About (Class org.apache.hadoop.yarn.api.records.timeline.TimelineDelegationTokenResponse), not marked as ignorable at [Source: N/A; line: -1, column: -1] (through reference chain: org.apache.hadoop.yarn.api.records.timeline.TimelineDelegationTokenResponse[About]) {code} Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set --- Key: YARN-2449 URL: https://issues.apache.org/jira/browse/YARN-2449 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Environment: Deploy security enabled cluster is ATS also enabled and running, but no hadoop.http.filter.initializers set in core-site.xml Reporter: Karam Singh Priority: Critical Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from ATS, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b '/grid/0/hadoopqe/y6/YarnWSAPISubmitAppKillApp/timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set
[ https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karam Singh updated YARN-2449: -- Description: Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from timelineserver, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b 'timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail was: Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from timelineserver, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b '/grid/0/hadoopqe/y6/YarnWSAPISubmitAppKillApp/timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set --- Key: YARN-2449 URL: https://issues.apache.org/jira/browse/YARN-2449 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Environment: Deploy security enabled cluster is ATS also enabled and running, but no hadoop.http.filter.initializers set in core-site.xml Reporter: Karam Singh Assignee: Varun Vasudev Priority: Critical Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from timelineserver, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b 'timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2426) ResourceManger is not able renew WebHDFS token when application submitted by Yarn WebService
[ https://issues.apache.org/jira/browse/YARN-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karam Singh updated YARN-2426: -- Description: Encountered this issue during using new YARN's RM WS for application submission, on single node cluster while submitting Distributed Shell application using RM WS(webservice). For this we need pass custom script and AppMaster jar along with webhdfs token. Application was failing with ResouceManager was failing to renew token for user (appOwner). So RM was Rejecting application with following exception trace in RM log: {code} 2014-08-19 03:12:54,733 WARN security.DelegationTokenRenewer (DelegationTokenRenewer.java:handleDTRenewerAppSubmitEvent(661)) - Unable to add the application to the delegation token renewer. java.io.IOException: Failed to renew token: Kind: WEBHDFS delegation, Service: NNHOST:FSPORT, Ident: (WEBHDFS delegation token for hrt_qa) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:394) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$5(DelegationTokenRenewer.java:357) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Unexpected HTTP response: code=-1 != 200, op=RENEWDELEGATIONTOKEN, message=null at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:331) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:90) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:598) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:448) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:477) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:473) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.renewDelegationToken(WebHdfsFileSystem.java:1318) at org.apache.hadoop.hdfs.web.TokenAspect$TokenManager.renew(TokenAspect.java:73) at org.apache.hadoop.security.token.Token.renew(Token.java:377) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:477) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:1) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:473) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:392) ... 6 more Caused by: java.io.IOException: The error stream is null. at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.jsonParse(WebHdfsFileSystem.java:304) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:329) ... 24 more 2014-08-19 03:12:54,735 DEBUG event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppRejectedEvent.EventType: APP_REJECTED {code} From exception trace it is clear that RM is try contact to Namenode on FSPort instead of Http port and failing to renew token Looks like it is because WebHDFS token Namenodes IP and FSPort in delegation token instead of http. Causing RM to contact WebHDFS on FSPort and failing to renew token was: Encountered this issue during using new YARN's RM WS for application submission, on single node cluster while submitting Distributed Shell application using RM WS(webservice). For this we need pass custom script and AppMaster jar along with webhdfs token to NodeManager for localization. Distributed Shell Application was failing as
[jira] [Created] (YARN-2425) When Application submitted by via Yarn RM WS, log aggregation does not happens
Karam Singh created YARN-2425: - Summary: When Application submitted by via Yarn RM WS, log aggregation does not happens Key: YARN-2425 URL: https://issues.apache.org/jira/browse/YARN-2425 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: 2.5.0, 2.6.0 Environment: Secure (Kerberos enabled) hadoop cluster. With SPNEGO for Yarn RM enabled Reporter: Karam Singh When submit App to Yarn RM using Web service we need to pass credentials/tokens in json object/xml object to webservice As HDFS namenode does not provides any DT over WS (base64 encoded) like webhdfs/timeline server does. (HDFS fetch dt commad fetch java writable object and writes it to target file, we we cannot forward via application Submission WS objects) Looks like there is not way to pass HDFS token to NodeManager. While starting Application container also tries to create Application log aggregation dir and fails with following type exception {code} java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: hostname/ip; destination host is: NameNodeHost:FSPort; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) at org.apache.hadoop.ipc.Client.call(Client.java:1415) at org.apache.hadoop.ipc.Client.call(Client.java:1364) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy34.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:725) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy35.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1781) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1069) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1065) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1065) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:240) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:64) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:253) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:344) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:310) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:421) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:64) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:679) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at
[jira] [Created] (YARN-2426) NodeManger is not able use WebHDFS token properly to tallk to WebHDFS while localizing
Karam Singh created YARN-2426: - Summary: NodeManger is not able use WebHDFS token properly to tallk to WebHDFS while localizing Key: YARN-2426 URL: https://issues.apache.org/jira/browse/YARN-2426 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager, webapp Affects Versions: 2.6.0 Environment: Hadoop Keberos (Secure) cluster with LinuxContainerExcutor is enabled With SPNEGO on for Yarn new RM web services for application submission While using kinit we are using -C (to specify cachepath). Then while executing set export KRB5CCNAME = path provided with -C option There is no kerberos ticket in default KRB5 cache path with is /tmp Reporter: Karam Singh Encountered this issue during using new YARN's RM WS for application submission, on single node cluster while submitting Distributed Shell application using RM WS(webservice). For this we need pass custom script and AppMaster jar along with webhdfs token to NodeManager for localization. Distributed Shell Application was failing as Node was failing to localise AppMaster jar . Following is the NM log while localizing AppMaster jar: {code} 2014-08-18 01:53:52,434 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(114)) - Authorization successful for testing (auth:TOKEN) for protocol=interface org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB 2014-08-18 01:53:52,757 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:update(1011)) - DEBUG: FAILED { webhdfs://NAMENODEHOST:NAMENODEHTTPPORT/user/JARpPATH, 1408352019488, FILE, null }, Authentication required 2014-08-18 01:53:52,758 INFO localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource webhdfs://NAMENODEHOST:NAMENODEHTTPPORT/user/JARPATH(-NM_LOCAL_DIR/usercache/APP_USER/appcache/application_1408351986532_0001/filecache/10/DshellAppMaster.jar) transitioned from DOWNLOADING to FAILED 2014-08-18 01:53:52,758 INFO container.Container (ContainerImpl.java:handle(999)) - Container container_1408351986532_0001_01_01 transitioned from LOCALIZING to LOCALIZATION_FAILED {code} Which is similar to what we get is when we try access webhdfs in secure (kerberos) cluster without doing kinit Whereas if we do curl -i -k -s 'http://NAMENODEHOST:NAMENODEHTTPPORT/webhdfs/v1/user/JAR_PATH?op=listStatusdelegation=same webhdfs token used in app submission structure works properly I also tried using http://NAMENODEHOST:NAMENODEHTTPPORT/webhdfs/v1/user/hadoopqa/JAR_PATH in app submission object instead of webhdfs:// uri format Then NodeManger fail to localize as there is http filesystem scheme {code} 14-08-18 02:03:31,343 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(114)) - Authorization successful for testing (auth:TOKEN) for protocol=interface org.apache. hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB 2014-08-18 02:03:31,583 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:update(1011)) - DEBUG: FAILED { http://NAMENODEHOST:NAMENODEHTTPPORT/webhdfs/v1/user/JAR_PATH 1408352576841, FILE, null }, No FileSystem for scheme: http 2014-08-18 02:03:31,583 INFO localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource http://NAMENODEHOST:NAMENODEHTTPPORT/webhdfs/v1/user/JAR_PATH(-NM_LOCAL_DIR/usercache/APP_USER/appcache/application_1408352544163_0002/filecache/11/DshellAppMaster.jar) transitioned from DOWNLOADING to FAILED {code} Now do kinit without providing -C option for KRB5 cache path. So Ticket to goes to default KRB5 cache /tmp Again submit same application object to Yarn WS, with webhdfs:// uri format paths and webhdfs token This time NM is able download jar and custom shell script and application runs fine Looks like following is happening: webhdfs is trying look for krb ticket in NM while localising 1. As 1st case there was to krb ticket there in default cache. Application failing while localising AppMaster jar 2. In second case as already kinit and krb ticket was present in /tmp (default KRB5 cache). AppMaster got localized successfully -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2322) Provide Cli to refesh Admin Acls for Timeline server
Karam Singh created YARN-2322: - Summary: Provide Cli to refesh Admin Acls for Timeline server Key: YARN-2322 URL: https://issues.apache.org/jira/browse/YARN-2322 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver Reporter: Karam Singh Provide Cli to refresh Admin Acls for Timelineserver. Currently rmadmin -refreshAdminAcls provides facility to refresh Admin Acls for ResourceManager. But If we want modify adminAcls from Timelineserver, then we need to restart it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2165) Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero
Karam Singh created YARN-2165: - Summary: Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero Key: YARN-2165 URL: https://issues.apache.org/jira/browse/YARN-2165 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Karam Singh Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero Currently if set yarn.timeline-service.ttl-ms=0 Or yarn.timeline-service.ttl-ms=-86400 Timeline server start successfully with complaining {code} 2014-06-15 14:52:16,562 INFO timeline.LeveldbTimelineStore (LeveldbTimelineStore.java:init(247)) - Starting deletion thread with ttl -60480 and cycle interval 30 {code} At starting timelinserver should that yarn.timeline-service-ttl-ms 0 otherwise specially for -ive value discard oldvalues timestamp will be set future value. Which may lead to inconsistancy in behavior {code} public void run() { while (true) { long timestamp = System.currentTimeMillis() - ttl; try { discardOldEntities(timestamp); Thread.sleep(ttlInterval); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2166) Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store
Karam Singh created YARN-2166: - Summary: Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store Key: YARN-2166 URL: https://issues.apache.org/jira/browse/YARN-2166 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Karam Singh Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store other if we start timelineserver with yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms=-5000 Timeline starts but Thread.sleep call in EntityDeletionThread.run keep on throwing UnCaughtExcpetion -ive value {code} 2014-06-16 10:22:03,537 ERROR yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread Thread[Thread-4,5,main] threw an Exception. java.lang.IllegalArgumentException: timeout value is negative at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore$EntityDeletionThread.run(LeveldbTimelineStore.java:257) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2166) Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store
[ https://issues.apache.org/jira/browse/YARN-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karam Singh updated YARN-2166: -- Description: Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store other if we start timelineserver with yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms=-5000 Timeline starts but Thread.sleep call in EntityDeletionThread.run keep on throwing UncaughtException -ive value {code} 2014-06-16 10:22:03,537 ERROR yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread Thread[Thread-4,5,main] threw an Exception. java.lang.IllegalArgumentException: timeout value is negative at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore$EntityDeletionThread.run(LeveldbTimelineStore.java:257) {code} was: Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store other if we start timelineserver with yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms=-5000 Timeline starts but Thread.sleep call in EntityDeletionThread.run keep on throwing UnCaughtExcpetion -ive value {code} 2014-06-16 10:22:03,537 ERROR yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread Thread[Thread-4,5,main] threw an Exception. java.lang.IllegalArgumentException: timeout value is negative at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore$EntityDeletionThread.run(LeveldbTimelineStore.java:257) {code} Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store - Key: YARN-2166 URL: https://issues.apache.org/jira/browse/YARN-2166 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Karam Singh Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store other if we start timelineserver with yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms=-5000 Timeline starts but Thread.sleep call in EntityDeletionThread.run keep on throwing UncaughtException -ive value {code} 2014-06-16 10:22:03,537 ERROR yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread Thread[Thread-4,5,main] threw an Exception. java.lang.IllegalArgumentException: timeout value is negative at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore$EntityDeletionThread.run(LeveldbTimelineStore.java:257) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2135) Distributed shell should support passing generic parameters from cli
Karam Singh created YARN-2135: - Summary: Distributed shell should support passing generic parameters from cli Key: YARN-2135 URL: https://issues.apache.org/jira/browse/YARN-2135 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Karam Singh Distributed shell should support passing generic parameters from cli Currently we cannot pass generic option using -D to distributed shell -- This message was sent by Atlassian JIRA (v6.2#6252)