date:20190514

[jira] [Created] (YARN-9556) Most containers will go through the process of ending twice when they are finished

2019-05-14 Thread liyakun (JIRA)

liyakun created YARN-9556:
-

 Summary: Most containers will go through the process of ending 
twice when they are finished
 Key: YARN-9556
 URL: https://issues.apache.org/jira/browse/YARN-9556
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: liyakun
Assignee: liyakun


When AM reports a container is finished in heartbeat, the container id will be 
added to 

containersToBeRemovedFromNM of RMNodeImpl. 

Next, when NM's heartbeat arrives, it will first execute the 
setAndUpdateNodeHeartbeatResponse() method in RMNodeImpl. In this method, it 
will execute:

```

this.completedContainers.removeAll(this.containersToBeRemovedFromNM);
...
this.containersToBeRemovedFromNM.clear();

```

The container id is removed in completedContainers so far, but heartbeat 
request still carries the state of the container id. In the subsequent 
processing, the handleContainerStatus() method in the RMNodeImpl is entered. In 
this method, the container id that was just deleted will be added to the 
completedContainers and newlyCompletedContainers again.

Next, the container id will enter the updateCompletedContainers() method in the 
AbstractYarnScheduler class. It will be added to the untrackedContainerIdList 
and then go through the process of finish again by 
RMNodeFinishedContainersPulledByAMEvent.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9554) TimelineEntity DAO has java.util.Set interface which JAXB can't handle

2019-05-14 Thread Prabhu Joseph (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9554:

Component/s: timelineservice

> TimelineEntity DAO has java.util.Set interface which JAXB can't handle
> --
>
> Key: YARN-9554
> URL: https://issues.apache.org/jira/browse/YARN-9554
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineservice
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> TimelineEntity DAO has java.util.Set interface which JAXB can't handle. This 
> breaks the fix of YARN-7266.
> {code}
> Caused by: com.sun.xml.internal.bind.v2.runtime.IllegalAnnotationsException: 
> 1 counts of IllegalAnnotationExceptions
> java.util.Set is an interface, and JAXB can't handle interfaces.
>   this problem is related to the following location:
>   at java.util.Set
>   at public java.util.HashMap 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getPrimaryFiltersJAXB()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities
>   at 
> com.sun.xml.internal.bind.v2.runtime.IllegalAnnotationsException$Builder.check(IllegalAnnotationsException.java:91)
>   at 
> com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:445)
>   at 
> com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:277)
>   at 
> com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:124)
>   at 
> com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1123)
>   at 
> com.sun.xml.internal.bind.v2.ContextFactory.createContext(ContextFactory.java:147)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9553) TimelineWebService Rest Api {entityType}/events throws NPE

2019-05-14 Thread Prabhu Joseph (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9553:

Description: 
TimelineWebService Rest Api entityType/events throws NPE

{code}
http://172.25.35.146:8188/ws/v1/timeline/YARN_APPLICATION/events

{"exception":"WebApplicationException","message":"java.lang.NullPointerException","javaClassName":"javax.ws.rs.WebApplicationException"}


2019-05-14 11:47:38,376 WARN  webapp.GenericExceptionHandler 
(GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR
javax.ws.rs.WebApplicationException: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEvents(TimelineWebServices.java:206)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at 
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
at 
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at 
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at 
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
at 
com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.security.http.RestCsrfPreventionFilter$ServletFilterHttpInteraction.proceed(RestCsrfPreventionFilter.java:269)
at 
org.apache.hadoop.security.http.RestCsrfPreventionFilter.handleHttpInteraction(RestCsrfPreventionFilter.java:197)
at 
org.apache.hadoop.security.http.RestCsrfPreventionFilter.doFilter(RestCsrfPreventionFilter.java:209)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:617)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:294)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:576)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1400)
at

[jira] [Updated] (YARN-9553) TimelineWebService Rest Api {entityType}/events throws NPE

2019-05-14 Thread Prabhu Joseph (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9553:

Component/s: timelineservice

> TimelineWebService Rest Api {entityType}/events throws NPE
> --
>
> Key: YARN-9553
> URL: https://issues.apache.org/jira/browse/YARN-9553
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineservice
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> TimelineWebService Rest Api {entityType}/events throws NPE
> {code}
> http://172.25.35.146:8188/ws/v1/timeline/YARN_APPLICATION/events
> {"exception":"WebApplicationException","message":"java.lang.NullPointerException","javaClassName":"javax.ws.rs.WebApplicationException"}
> 2019-05-14 11:47:38,376 WARN  webapp.GenericExceptionHandler 
> (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR
> javax.ws.rs.WebApplicationException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEvents(TimelineWebServices.java:206)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
>   at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
>   at 
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
>   at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
>   at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>   at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
>   at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>   at 
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
>   at 
> com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>   at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.http.RestCsrfPreventionFilter$ServletFilterHttpInteraction.proceed(RestCsrfPreventionFilter.java:269)
>   at 
> org.apache.hadoop.security.http.RestCsrfPreventionFilter.handleHttpInteraction(RestCsrfPreventionFilter.java:197)
>   at 
> org.apache.hadoop.security.http.RestCsrfPreventionFilter.doFilter(RestCsrfPreventionFilter.java:209)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
>

[jira] [Updated] (YARN-9553) TimelineWebService Rest Api {entityType}/events throws NPE

2019-05-14 Thread Prabhu Joseph (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9553:

Affects Version/s: 3.2.0

> TimelineWebService Rest Api {entityType}/events throws NPE
> --
>
> Key: YARN-9553
> URL: https://issues.apache.org/jira/browse/YARN-9553
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> TimelineWebService Rest Api {entityType}/events throws NPE
> {code}
> http://172.25.35.146:8188/ws/v1/timeline/YARN_APPLICATION/events
> {"exception":"WebApplicationException","message":"java.lang.NullPointerException","javaClassName":"javax.ws.rs.WebApplicationException"}
> 2019-05-14 11:47:38,376 WARN  webapp.GenericExceptionHandler 
> (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR
> javax.ws.rs.WebApplicationException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEvents(TimelineWebServices.java:206)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
>   at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
>   at 
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
>   at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
>   at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>   at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
>   at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>   at 
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
>   at 
> com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>   at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.http.RestCsrfPreventionFilter$ServletFilterHttpInteraction.proceed(RestCsrfPreventionFilter.java:269)
>   at 
> org.apache.hadoop.security.http.RestCsrfPreventionFilter.handleHttpInteraction(RestCsrfPreventionFilter.java:197)
>   at 
> org.apache.hadoop.security.http.RestCsrfPreventionFilter.doFilter(RestCsrfPreventionFilter.java:209)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
>

[jira] [Updated] (YARN-9550) Suspect wrong way to calculate container utilized vcore.

2019-05-14 Thread Sihai Ke (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sihai Ke updated YARN-9550:
---
Summary: Suspect wrong way to calculate container utilized vcore.  (was: 
Suspect wrong way to calculater container utilized vcore.)

> Suspect wrong way to calculate container utilized vcore.
> 
>
> Key: YARN-9550
> URL: https://issues.apache.org/jira/browse/YARN-9550
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.1
>Reporter: Sihai Ke
>Priority: Minor
>
> In hadoop 2.9.1 class *ContainersMonitorImpl* line 664, I suspect it use the 
> wrong way to calculate the milliVcoresUsed, below is the code.
> {code:java}
> ResourceCalculatorProcessTree pTree = ptInfo.getProcessTree();
> pTree.updateProcessTree();// update process-tree
> if (!pTree.isValidData()) {
>   // If we cannot get the data for one container, we ignore it all
>   LOG.error("Cannot get the data for " + pId);
>   trackedContainersUtilization = null;
>   continue;
> }
> long currentVmemUsage = pTree.getVirtualMemorySize();
> long currentPmemUsage = pTree.getRssMemorySize();
> // if machine has 6 cores and 3 are used,
> // cpuUsagePercentPerCore should be 300% and
> // cpuUsageTotalCoresPercentage should be 50%
> float cpuUsagePercentPerCore = pTree.getCpuUsagePercent();
> if (cpuUsagePercentPerCore < 0) {
>   // CPU usage is not available likely because the container just
>   // started. Let us skip this turn and consider this container
>   // in the next iteration.
>   LOG.info("Skipping monitoring container " + containerId
>   + " since CPU usage is not yet available.");
>   continue;
> }
> float cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore /
> resourceCalculatorPlugin.getNumProcessors();
> // Multiply by 1000 to avoid losing data when converting to int
> int milliVcoresUsed = (int) (cpuUsageTotalCoresPercentage * 1000
> * maxVCoresAllottedForContainers /nodeCpuPercentageForYARN);
> // milliPcoresUsed = (int) (cpuUsagePercentPerCore * 1000 / 100;
> // As cpuUsagePercentagePerCore use 100 to represent 1 single core.
> int milliPcoresUsed = (int) (cpuUsagePercentPerCore * 10);
> // as processes begin with an age 1, we want to see if there
> // are processes more than 1 iteration old.
> vcoresUsageByAllContainers += milliVcoresUsed;
> pcoresByAllContainers += milliPcoresUsed;
> {code}
>  
> I think
>  
> {code:java}
> int milliVcoresUsed = (int) (cpuUsageTotalCoresPercentage * 1000 * 
> maxVCoresAllottedForContainers /nodeCpuPercentageForYARN);{code}
>  
> should be 
>  
> {code:java}
> int milliVcoresUsed = (int) (cpuUsageTotalCoresPercentage * 1000 * 
> maxVCoresAllottedForContainers;
> {code}
>  
>  
> I think it need not to divide nodeCpuPercentageForYARN, [~kasha], looks you 
> add this feature, could you help to have a look ? or could you educate me if 
> I am wrong ?
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-9555) Yarn Docs : single cluster yarn setup - Step 1 configure parameters - multiple roots

2019-05-14 Thread Vishva (JIRA)

Vishva created YARN-9555:


 Summary: Yarn Docs : single cluster yarn setup - Step 1 configure 
parameters - multiple roots
 Key: YARN-9555
 URL: https://issues.apache.org/jira/browse/YARN-9555
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 3.0.2
Reporter: Vishva


Step 1 for 
[https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-common/SingleCluster.html#YARN_on_Single_Node]
 
Configure parameters as follows:

{{etc/hadoop/mapred-site.xml}}:
 


mapreduce.framework.name
yarn





mapreduce.application.classpath

$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*



but setting this will throw an error when running yarn : 

{color:#6a9955}2019-05-14{color} {color:#6a9955}16:32:05,815{color} 
{color:#ce9178}ERROR{color} 
{color:#569cd6}org.apache.hadoop.conf.Configuration{color}{color:#d4d4d4}: 
error parsing conf {color}{color:#569cd6}mapred-site.xml{color}
{color:#ce9178}com.ctc.wstx.exc.WstxParsingException{color}{color:#d4d4d4}: 
Illegal to have multiple roots (start tag in epilog?).{color}This should be 
modified to 

{code:java}
 
 
mapreduce.framework.name 
yarn 
 
 
mapreduce.application.classpath 
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*
 
 
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9538) Document scheduler/app activities and REST APIs

2019-05-14 Thread Weiwei Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839830#comment-16839830
 ] 

Weiwei Yang commented on YARN-9538:
---

Hi [~Tao Yang]

Thanks for working on the documentation. I just did some proofreading, we 
should start by simplifying the content. I have shared a google doc, maybe that 
will be easier: 
[https://docs.google.com/document/d/1NIIDCWOLUqlhrclzr91YPYOrvmLfjlKbyCWrnCMci0g].
 Once the doc is finalized there, you can convert it to md.

Thanks.

> Document scheduler/app activities and REST APIs
> ---
>
> Key: YARN-9538
> URL: https://issues.apache.org/jira/browse/YARN-9538
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9538.001.patch
>
>
> Add documentation for scheduler/app activities in CapacityScheduler.md and 
> ResourceManagerRest.md.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9497) Support grouping by diagnostics for query results of scheduler and app activities

2019-05-14 Thread Weiwei Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839812#comment-16839812
 ] 

Weiwei Yang commented on YARN-9497:
---

Thanks [~Tao Yang] for the updates, it looks good now. I have one minor comment 
about the query parameter, right now it is \{{groupingType}}, can we change it 
to \{{groupBy}}? And regarding the type, I think we can rename 
\{{STATE_AND_DIAGNOSTIC}} to \{{diagnostic}}, because I don't expect different 
activity state would have the same diagnostic messages, please correct me if I 
am wrong.

Another thing is, can we make sure we use "diagnostic" instead of all capital 
letters in the query parameter? Because that's the convention used in resource 
manager restful API.

Thanks

> Support grouping by diagnostics for query results of scheduler and app 
> activities
> -
>
> Key: YARN-9497
> URL: https://issues.apache.org/jira/browse/YARN-9497
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9497.001.patch, YARN-9497.002.patch
>
>
> [Design Doc 
> #4.3|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.6fbpge17dmmr]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8625) Aggregate Resource Allocation for each job is not present in ATS

2019-05-14 Thread Prabhu Joseph (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839764#comment-16839764
 ] 

Prabhu Joseph commented on YARN-8625:
-

[~eepayne], Can you check if hadoop on your test cluster has fix for YARN-7266. 
Have faced applicationhistory fails to return apps. Will fix it in YARN-9554. 
Apologies, have tested it on hadoop-3.2.0 where did not face any issue.

> Aggregate Resource Allocation for each job is not present in ATS
> 
>
> Key: YARN-8625
> URL: https://issues.apache.org/jira/browse/YARN-8625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 2.7.4
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: 0001-YARN-8625.patch, 0002-YARN-8625.patch, 
> ApplicationHistoryServer_Rest_Api.png, ApplicationHistoryServer_UI.png, 
> yarn-site.xml
>
>
> Aggregate Resource Allocation shown on RM UI for finished job is very useful 
> metric to understand how much resource a job has consumed. But this does not 
> get stored in ATS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-9554) TimelineEntity DAO has java.util.Set interface which JAXB can't handle

2019-05-14 Thread Prabhu Joseph (JIRA)

Prabhu Joseph created YARN-9554:
---

 Summary: TimelineEntity DAO has java.util.Set interface which JAXB 
can't handle
 Key: YARN-9554
 URL: https://issues.apache.org/jira/browse/YARN-9554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


TimelineEntity DAO has java.util.Set interface which JAXB can't handle. This 
breaks the fix of YARN-7266.

{code}
Caused by: com.sun.xml.internal.bind.v2.runtime.IllegalAnnotationsException: 1 
counts of IllegalAnnotationExceptions
java.util.Set is an interface, and JAXB can't handle interfaces.
this problem is related to the following location:
at java.util.Set
at public java.util.HashMap 
org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getPrimaryFiltersJAXB()
at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity
at public java.util.List 
org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities()
at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities

at 
com.sun.xml.internal.bind.v2.runtime.IllegalAnnotationsException$Builder.check(IllegalAnnotationsException.java:91)
at 
com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:445)
at 
com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:277)
at 
com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:124)
at 
com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1123)
at 
com.sun.xml.internal.bind.v2.ContextFactory.createContext(ContextFactory.java:147)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7

2019-05-14 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839741#comment-16839741
 ] 

Hadoop QA commented on YARN-2194:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.7 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
41s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
30s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
49s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
22s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
23s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
24s{color} | {color:green} branch-2.7 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
43s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 19s{color} | {color:orange} root: The patch generated 3 new + 336 unchanged 
- 0 fixed = 339 total (was 336) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 65 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 24m 
22s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  6m 
45s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 76m  1s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:06eafee |
| JIRA Issue | YARN-2194 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12968699/YARN-2194-branch-2.7.001.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux 1563321a9fbd 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-2.7 / cec0041 |
| maven | version: Apache Maven 3.0.5 |
| Default Java | 1.7.0_201 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/24091/artifact/out/diff-checkstyle-root.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/24091/artifact/out/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24091/testReport/ |
| Max. process+thread count | 328

[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7

2019-05-14 Thread Jim Brennan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839719#comment-16839719
 ] 

Jim Brennan commented on YARN-2194:
---

As I mentioned in [YARN-9518], I am also +1 on this patch for branch-2.7 
(non-binding)

> Cgroups cease to work in RHEL7
> --
>
> Key: YARN-2194
> URL: https://issues.apache.org/jira/browse/YARN-2194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Critical
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
> YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch, 
> YARN-2194-branch-2.7.001.patch
>
>
> In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the 
> controller name leads to container launch failure. 
> RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
> systemd has certain shortcomings as identified in this JIRA (see comments). 
> This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7

2019-05-14 Thread Haibo Chen (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839715#comment-16839715
 ] 

Haibo Chen commented on YARN-2194:
--

+1 on the patch pending Jenkins.

> Cgroups cease to work in RHEL7
> --
>
> Key: YARN-2194
> URL: https://issues.apache.org/jira/browse/YARN-2194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Critical
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
> YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch, 
> YARN-2194-branch-2.7.001.patch
>
>
> In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the 
> controller name leads to container launch failure. 
> RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
> systemd has certain shortcomings as identified in this JIRA (see comments). 
> This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7

2019-05-14 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839709#comment-16839709
 ] 

Hadoop QA commented on YARN-9518:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.7 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
43s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
37s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
52s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
24s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
23s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
23s{color} | {color:green} branch-2.7 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 21s{color} | {color:orange} root: The patch generated 3 new + 336 unchanged 
- 0 fixed = 339 total (was 336) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 65 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 23m 
55s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 34s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 88m 42s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Unreaped Processes | hadoop-yarn-server-nodemanager:1 |
| Failed junit tests | hadoop.yarn.server.nodemanager.TestNodeManagerResync |
|   | hadoop.yarn.server.nodemanager.TestNodeManagerReboot |
|   | hadoop.yarn.server.nodemanager.webapp.TestNMWebServer |
| Timed out junit tests | 
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:06eafee |
| JIRA Issue | YARN-9518 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12968693/YARN-9518-branch-2.7.002.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux a7fe3c4bd0a5 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-2.7 / cec0041 |
| maven | version: Apache Maven 3.0.5 |
|

[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7

2019-05-14 Thread Jonathan Hung (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839699#comment-16839699
 ] 

Jonathan Hung commented on YARN-9518:
-

Yes, makes sense [~Jim_Brennan]. Attached a patch to YARN-2194. Closing this 
ticket.

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.7
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518-branch-2.7.002.patch, 
> YARN-9518-branch-2.7.7.001.patch, YARN-9518-trunk.001.patch, YARN-9518.patch
>
>
> The os version is centos7. 
> {code:java}
> cat /etc/redhat-release
> CentOS Linux release 7.3.1611 (Core)
> {code}
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated by container-executor as 
> "/sys/fs/cgroup/cpu" rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is compatible with  cgroup path of history os version such as 
> centos6, centos7 , and universally applicable to cgroup subsystem paths such 
> as cgroup network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>

[jira] [Updated] (YARN-2194) Cgroups cease to work in RHEL7

2019-05-14 Thread Jonathan Hung (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-2194:

Attachment: YARN-2194-branch-2.7.001.patch

> Cgroups cease to work in RHEL7
> --
>
> Key: YARN-2194
> URL: https://issues.apache.org/jira/browse/YARN-2194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Critical
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
> YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch, 
> YARN-2194-branch-2.7.001.patch
>
>
> In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the 
> controller name leads to container launch failure. 
> RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
> systemd has certain shortcomings as identified in this JIRA (see comments). 
> This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2194) Cgroups cease to work in RHEL7

2019-05-14 Thread Jonathan Hung (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839698#comment-16839698
 ] 

Jonathan Hung commented on YARN-2194:
-

Attaching a branch-2.7 patch. (Ref: YARN-9518)

> Cgroups cease to work in RHEL7
> --
>
> Key: YARN-2194
> URL: https://issues.apache.org/jira/browse/YARN-2194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Critical
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
> YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch, 
> YARN-2194-branch-2.7.001.patch
>
>
> In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the 
> controller name leads to container launch failure. 
> RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
> systemd has certain shortcomings as identified in this JIRA (see comments). 
> This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-2194) Cgroups cease to work in RHEL7

2019-05-14 Thread Jonathan Hung (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-2194:

Attachment: YARN-9518-branch-2.7.002.patch

> Cgroups cease to work in RHEL7
> --
>
> Key: YARN-2194
> URL: https://issues.apache.org/jira/browse/YARN-2194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Critical
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
> YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch
>
>
> In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the 
> controller name leads to container launch failure. 
> RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
> systemd has certain shortcomings as identified in this JIRA (see comments). 
> This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-2194) Cgroups cease to work in RHEL7

2019-05-14 Thread Jonathan Hung (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-2194:

Attachment: (was: YARN-9518-branch-2.7.002.patch)

> Cgroups cease to work in RHEL7
> --
>
> Key: YARN-2194
> URL: https://issues.apache.org/jira/browse/YARN-2194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Critical
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
> YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch
>
>
> In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the 
> controller name leads to container launch failure. 
> RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
> systemd has certain shortcomings as identified in this JIRA (see comments). 
> This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Reopened] (YARN-2194) Cgroups cease to work in RHEL7

2019-05-14 Thread Jonathan Hung (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung reopened YARN-2194:
-

> Cgroups cease to work in RHEL7
> --
>
> Key: YARN-2194
> URL: https://issues.apache.org/jira/browse/YARN-2194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Critical
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch, 
> YARN-2194-4.patch, YARN-2194-5.patch, YARN-2194-6.patch, YARN-2194-7.patch
>
>
> In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the 
> controller name leads to container launch failure. 
> RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
> systemd has certain shortcomings as identified in this JIRA (see comments). 
> This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7

2019-05-14 Thread Jim Brennan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839693#comment-16839693
 ] 

Jim Brennan commented on YARN-9518:
---

Thanks [~jhung]!  The patch looks good to me.  I also built it and ran the unit 
tests.  I'm +1 (non-binding) on this patch.

One question - should we re-open [YARN-2194] and put the patch on that Jira 
instead of this one, and then close this one as a DUP?


> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.7
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518-branch-2.7.002.patch, 
> YARN-9518-branch-2.7.7.001.patch, YARN-9518-trunk.001.patch, YARN-9518.patch
>
>
> The os version is centos7. 
> {code:java}
> cat /etc/redhat-release
> CentOS Linux release 7.3.1611 (Core)
> {code}
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated by container-executor as 
> "/sys/fs/cgroup/cpu" rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is compatible with  cgroup path of history os version such as 
> centos6, centos7 , and universally applicable to cgroup subsystem paths such 
> as cgroup network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
>

[jira] [Commented] (YARN-9519) TFile log aggregation file format is not working for yarn.log-aggregation.TFile.remote-app-log-dir config

2019-05-14 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839675#comment-16839675
 ] 

Hudson commented on YARN-9519:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16548 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16548/])
YARN-9519. TFile log aggregation file format is not working for (sunilg: rev 
7d831eca645f93d064975ebae35a7cbea3bbad31)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/ifile/LogAggregationIndexedFileController.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/filecontroller/TestLogAggregationFileControllerFactory.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/tfile/LogAggregationTFileController.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/LogAggregationFileController.java


> TFile log aggregation file format is not working for 
> yarn.log-aggregation.TFile.remote-app-log-dir config
> -
>
> Key: YARN-9519
> URL: https://issues.apache.org/jira/browse/YARN-9519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9519.001.patch, YARN-9519.002.patch, 
> YARN-9519.003.patch, YARN-9519.004.patch, YARN-9519.005.patch
>
>
> The TFile log aggregation file format is not sensitive to the 
> yarn.log-aggregation.TFile.remote-app-log-dir config.
> In {{LogAggregationTFileController$initInternal}}:
> {code:java}
> this.remoteRootLogDir = new Path(
> conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
> YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR));
> {code}
> So the remoteRootLogDir is only aware of the 
> yarn.nodemanager.remote-app-log-dir config, while other file format, like 
> IFile defaults to the file format config, so its priority is higher.
> From {{LogAggregationIndexedFileController$initInternal}}:
> {code:java}
> String remoteDirStr = String.format(
> YarnConfiguration.LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT,
> this.fileControllerName);
> String remoteDir = conf.get(remoteDirStr);
> if (remoteDir == null || remoteDir.isEmpty()) {
>   remoteDir = conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
>   YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR);
> }
> {code}
> (Where these configs are: )
> {code:java}
> public static final String LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT
>   = YARN_PREFIX + "log-aggregation.%s.remote-app-log-dir";
> public static final String NM_REMOTE_APP_LOG_DIR = 
> NM_PREFIX + "remote-app-log-dir";
> {code}
> I suggest TFile should try to obtain the remote dir config from 
> yarn.log-aggregation.TFile.remote-app-log-dir first, and only if that is not 
> specified falls back to the yarn.nodemanager.remote-app-log-dir config.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9519) TFile log aggregation file format is not working for yarn.log-aggregation.TFile.remote-app-log-dir config

2019-05-14 Thread Sunil Govindan (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-9519:
-
Summary: TFile log aggregation file format is not working for 
yarn.log-aggregation.TFile.remote-app-log-dir config  (was: TFile log 
aggregation file format is insensitive to the 
yarn.log-aggregation.TFile.remote-app-log-dir config)

> TFile log aggregation file format is not working for 
> yarn.log-aggregation.TFile.remote-app-log-dir config
> -
>
> Key: YARN-9519
> URL: https://issues.apache.org/jira/browse/YARN-9519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9519.001.patch, YARN-9519.002.patch, 
> YARN-9519.003.patch, YARN-9519.004.patch, YARN-9519.005.patch
>
>
> The TFile log aggregation file format is not sensitive to the 
> yarn.log-aggregation.TFile.remote-app-log-dir config.
> In {{LogAggregationTFileController$initInternal}}:
> {code:java}
> this.remoteRootLogDir = new Path(
> conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
> YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR));
> {code}
> So the remoteRootLogDir is only aware of the 
> yarn.nodemanager.remote-app-log-dir config, while other file format, like 
> IFile defaults to the file format config, so its priority is higher.
> From {{LogAggregationIndexedFileController$initInternal}}:
> {code:java}
> String remoteDirStr = String.format(
> YarnConfiguration.LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT,
> this.fileControllerName);
> String remoteDir = conf.get(remoteDirStr);
> if (remoteDir == null || remoteDir.isEmpty()) {
>   remoteDir = conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
>   YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR);
> }
> {code}
> (Where these configs are: )
> {code:java}
> public static final String LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT
>   = YARN_PREFIX + "log-aggregation.%s.remote-app-log-dir";
> public static final String NM_REMOTE_APP_LOG_DIR = 
> NM_PREFIX + "remote-app-log-dir";
> {code}
> I suggest TFile should try to obtain the remote dir config from 
> yarn.log-aggregation.TFile.remote-app-log-dir first, and only if that is not 
> specified falls back to the yarn.nodemanager.remote-app-log-dir config.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7

2019-05-14 Thread Jonathan Hung (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839643#comment-16839643
 ] 

Jonathan Hung commented on YARN-9518:
-

Yes, I agree with [~Jim_Brennan]. Actually we had this patch backported 
internally, I uploaded it for convenience, hope you don't mind [~shurong.mai]. 
[~shurong.mai] can you check that this patch (YARN-9518-branch-2.7.002.patch) 
works for you? Also if you two could review it that would be great.

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.7
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518-branch-2.7.002.patch, 
> YARN-9518-branch-2.7.7.001.patch, YARN-9518-trunk.001.patch, YARN-9518.patch
>
>
> The os version is centos7. 
> {code:java}
> cat /etc/redhat-release
> CentOS Linux release 7.3.1611 (Core)
> {code}
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated by container-executor as 
> "/sys/fs/cgroup/cpu" rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is compatible with  cgroup path of history os version such as 
> centos6, centos7 , and universally applicable to cgroup subsystem paths such 
> as cgroup network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
>

[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7

2019-05-14 Thread Jonathan Hung (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9518:

Attachment: YARN-9518-branch-2.7.002.patch

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.7
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518-branch-2.7.002.patch, 
> YARN-9518-branch-2.7.7.001.patch, YARN-9518-trunk.001.patch, YARN-9518.patch
>
>
> The os version is centos7. 
> {code:java}
> cat /etc/redhat-release
> CentOS Linux release 7.3.1611 (Core)
> {code}
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated by container-executor as 
> "/sys/fs/cgroup/cpu" rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is compatible with  cgroup path of history os version such as 
> centos6, centos7 , and universally applicable to cgroup subsystem paths such 
> as cgroup network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
>

[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7

2019-05-14 Thread Jim Brennan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839570#comment-16839570
 ] 

Jim Brennan commented on YARN-9518:
---

[~shurong.mai], ideally we would use the same solution in 2.7 that we used in 
2.8 (YARN-2194).   It doesn't look like that patch applies to 2.7, so more work 
would need to be done to port that approach back to 2.7.  Can you look into 
whether that is possible?

 

 

 

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.7
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518-branch-2.7.7.001.patch, 
> YARN-9518-trunk.001.patch, YARN-9518.patch
>
>
> The os version is centos7. 
> {code:java}
> cat /etc/redhat-release
> CentOS Linux release 7.3.1611 (Core)
> {code}
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated by container-executor as 
> "/sys/fs/cgroup/cpu" rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is compatible with  cgroup path of history os version such as 
> centos6, centos7 , and universally applicable to cgroup subsystem paths such 
> as cgroup network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
>

[jira] [Commented] (YARN-9545) Create healthcheck REST endpoint for ATSv2

2019-05-14 Thread Prabhu Joseph (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839525#comment-16839525
 ] 

Prabhu Joseph commented on YARN-9545:
-

[~zsiegl] Yes separate endpoint health looks fine to me since we check health 
of reader connection failure.

> Create healthcheck REST endpoint for ATSv2
> --
>
> Key: YARN-9545
> URL: https://issues.apache.org/jira/browse/YARN-9545
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: ATSv2
>Affects Versions: 3.1.2
>Reporter: Zoltan Siegl
>Assignee: Zoltan Siegl
>Priority: Major
>
> RM UI2 and CM needs a health check url for ATSv2 service.
> Create a /health rest endpoint
>  * must respond with 200 \{health: ok} if all ok
>  * must respond with non 200 if any problem occurs
>  * could check reader/writer connection



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9552) FairScheduler: NODE_UPDATE can cause NoSuchElementException

2019-05-14 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839464#comment-16839464
 ] 

Hadoop QA commented on YARN-9552:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 39s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m  1s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}130m 34s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9552 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12968676/YARN-9552-002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux dadf97efb955 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6bcc1dc |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/24089/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24089/testReport/ |
| Max. process+thread count | 927 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U:

[jira] [Commented] (YARN-9545) Create healthcheck REST endpoint for ATSv2

2019-05-14 Thread Zoltan Siegl (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839456#comment-16839456
 ] 

Zoltan Siegl commented on YARN-9545:


Thanks for the thoughts.

Pros: no new endpoint is necessary. Cons: That's a about endpoint, so various 
junk data for health check purposes is returned, also if we mean to return non 
200 status code for /timeline in case of a reader connection failure, that 
would be a breaking API change. Returning a 200 OK and an ERROR json seems like 
a bad idea to me. As for the following reasons I'd rather stuck with /health.

[~Prabhu Joseph], [~sunilg] any thoughts on this?

> Create healthcheck REST endpoint for ATSv2
> --
>
> Key: YARN-9545
> URL: https://issues.apache.org/jira/browse/YARN-9545
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: ATSv2
>Affects Versions: 3.1.2
>Reporter: Zoltan Siegl
>Assignee: Zoltan Siegl
>Priority: Major
>
> RM UI2 and CM needs a health check url for ATSv2 service.
> Create a /health rest endpoint
>  * must respond with 200 \{health: ok} if all ok
>  * must respond with non 200 if any problem occurs
>  * could check reader/writer connection



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9552) FairScheduler: NODE_UPDATE can cause NoSuchElementException

2019-05-14 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839432#comment-16839432
 ] 

Szilard Nemeth commented on YARN-9552:
--

Hi [~pbacsko]!
Please add a javadoc to the newly added test method as for the first look, it's 
not that trivial what the testcase is actually testing.

> FairScheduler: NODE_UPDATE can cause NoSuchElementException
> ---
>
> Key: YARN-9552
> URL: https://issues.apache.org/jira/browse/YARN-9552
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9552-001.patch, YARN-9552-002.patch
>
>
> We observed a race condition inside YARN with the following stack trace:
> {noformat}
> 18/11/07 06:45:09.559 SchedulerEventDispatcher:Event Processor ERROR 
> EventDispatcher: Error in handling event type NODE_UPDATE to the Event 
> Dispatcher
> java.util.NoSuchElementException
> at 
> java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036)
> at 
> java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:373)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:941)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1373)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:353)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1094)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:961)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1183)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:132)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> This is basically the same as the one described in YARN-7382, but the root 
> cause is different.
> When we create an application attempt, we create an {{FSAppAttempt}} object. 
> This contains an {{AppSchedulingInfo}} which contains a set of 
> {{SchedulerRequestKey}}. Initially, this set is empty and only initialized a 
> bit later on a separate thread during a state transition:
> {noformat}
> 2019-05-07 15:58:02,659 INFO  [RM StateStore dispatcher] 
> recovery.RMStateStore (RMStateStore.java:transition(239)) - Storing info for 
> app: application_1557237478804_0001
> 2019-05-07 15:58:02,684 INFO  [RM Event dispatcher] rmapp.RMAppImpl 
> (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change 
> from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED
> 2019-05-07 15:58:02,690 INFO  [SchedulerEventDispatcher:Event Processor] 
> fair.FairScheduler (FairScheduler.java:addApplication(490)) - Accepted 
> application application_1557237478804_0001 from user: bacskop, in queue: 
> root.bacskop, currently num of applications: 1
> 2019-05-07 15:58:02,698 INFO  [RM Event dispatcher] rmapp.RMAppImpl 
> (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change 
> from SUBMITTED to ACCEPTED on event = APP_ACCEPTED
> 2019-05-07 15:58:02,731 INFO  [RM Event dispatcher] 
> resourcemanager.ApplicationMasterService 
> (ApplicationMasterService.java:registerAppAttempt(434)) - Registering app 
> attempt : appattempt_1557237478804_0001_01
> 2019-05-07 15:58:02,732 INFO  [RM Event dispatcher] attempt.RMAppAttemptImpl 
> (RMAppAttemptImpl.java:handle(920)) - appattempt_1557237478804_0001_01 
> State change from NEW to SUBMITTED on event = START
> 2019-05-07 15:58:02,746 INFO  [SchedulerEventDispatcher:Event Processor] 
> scheduler.SchedulerApplicationAttempt 
> (SchedulerApplicationAttempt.java:(207)) - *** In the constructor of 
> SchedulerApplicationAttempt
> 2019-05-07 15:58:02,747 INFO  [SchedulerEventDispatcher:Event Processor] 
> scheduler.SchedulerApplicationAttempt 
> (SchedulerApplicationAttempt.java:(230)) - *** Contents of 
> appSchedulingInfo: []
> 2019-05-07 15:58:02,752 INFO  [SchedulerEventDispatcher:Event Processor] 
> fair.FairScheduler (FairScheduler.java:addApplicationAttempt(546)) - Added 
> Application Attempt

[jira] [Updated] (YARN-9552) FairScheduler: NODE_UPDATE can cause NoSuchElementException

2019-05-14 Thread Peter Bacsko (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9552:
---
Attachment: YARN-9552-002.patch

> FairScheduler: NODE_UPDATE can cause NoSuchElementException
> ---
>
> Key: YARN-9552
> URL: https://issues.apache.org/jira/browse/YARN-9552
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9552-001.patch, YARN-9552-002.patch
>
>
> We observed a race condition inside YARN with the following stack trace:
> {noformat}
> 18/11/07 06:45:09.559 SchedulerEventDispatcher:Event Processor ERROR 
> EventDispatcher: Error in handling event type NODE_UPDATE to the Event 
> Dispatcher
> java.util.NoSuchElementException
> at 
> java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036)
> at 
> java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:373)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:941)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1373)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:353)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1094)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:961)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1183)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:132)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> This is basically the same as the one described in YARN-7382, but the root 
> cause is different.
> When we create an application attempt, we create an {{FSAppAttempt}} object. 
> This contains an {{AppSchedulingInfo}} which contains a set of 
> {{SchedulerRequestKey}}. Initially, this set is empty and only initialized a 
> bit later on a separate thread during a state transition:
> {noformat}
> 2019-05-07 15:58:02,659 INFO  [RM StateStore dispatcher] 
> recovery.RMStateStore (RMStateStore.java:transition(239)) - Storing info for 
> app: application_1557237478804_0001
> 2019-05-07 15:58:02,684 INFO  [RM Event dispatcher] rmapp.RMAppImpl 
> (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change 
> from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED
> 2019-05-07 15:58:02,690 INFO  [SchedulerEventDispatcher:Event Processor] 
> fair.FairScheduler (FairScheduler.java:addApplication(490)) - Accepted 
> application application_1557237478804_0001 from user: bacskop, in queue: 
> root.bacskop, currently num of applications: 1
> 2019-05-07 15:58:02,698 INFO  [RM Event dispatcher] rmapp.RMAppImpl 
> (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change 
> from SUBMITTED to ACCEPTED on event = APP_ACCEPTED
> 2019-05-07 15:58:02,731 INFO  [RM Event dispatcher] 
> resourcemanager.ApplicationMasterService 
> (ApplicationMasterService.java:registerAppAttempt(434)) - Registering app 
> attempt : appattempt_1557237478804_0001_01
> 2019-05-07 15:58:02,732 INFO  [RM Event dispatcher] attempt.RMAppAttemptImpl 
> (RMAppAttemptImpl.java:handle(920)) - appattempt_1557237478804_0001_01 
> State change from NEW to SUBMITTED on event = START
> 2019-05-07 15:58:02,746 INFO  [SchedulerEventDispatcher:Event Processor] 
> scheduler.SchedulerApplicationAttempt 
> (SchedulerApplicationAttempt.java:(207)) - *** In the constructor of 
> SchedulerApplicationAttempt
> 2019-05-07 15:58:02,747 INFO  [SchedulerEventDispatcher:Event Processor] 
> scheduler.SchedulerApplicationAttempt 
> (SchedulerApplicationAttempt.java:(230)) - *** Contents of 
> appSchedulingInfo: []
> 2019-05-07 15:58:02,752 INFO  [SchedulerEventDispatcher:Event Processor] 
> fair.FairScheduler (FairScheduler.java:addApplicationAttempt(546)) - Added 
> Application Attempt appattempt_1557237478804_0001_01 to scheduler from 
> user: bacskop
> 2019-05-07 15:58:02,756 INFO  [RM Event dispatcher] 
> scheduler.AppSchedulingInfo 
>

[jira] [Commented] (YARN-9552) FairScheduler: NODE_UPDATE can cause NoSuchElementException

2019-05-14 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839356#comment-16839356
 ] 

Hadoop QA commented on YARN-9552:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  3s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 82m 
57s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}132m 46s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9552 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12968664/YARN-9552-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a10208900547 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6bcc1dc |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24088/testReport/ |
| Max. process+thread count | 884 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24088/console |
| Powered by | Apache Yetus 0.8.0

[jira] [Created] (YARN-9553) TimelineWebService Rest Api {entityType}/events throws NPE

2019-05-14 Thread Prabhu Joseph (JIRA)

Prabhu Joseph created YARN-9553:
---

 Summary: TimelineWebService Rest Api {entityType}/events throws NPE
 Key: YARN-9553
 URL: https://issues.apache.org/jira/browse/YARN-9553
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


TimelineWebService Rest Api {entityType}/events throws NPE

{code}
http://172.25.35.146:8188/ws/v1/timeline/YARN_APPLICATION/events

{"exception":"WebApplicationException","message":"java.lang.NullPointerException","javaClassName":"javax.ws.rs.WebApplicationException"}


2019-05-14 11:47:38,376 WARN  webapp.GenericExceptionHandler 
(GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR
javax.ws.rs.WebApplicationException: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEvents(TimelineWebServices.java:206)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at 
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
at 
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at 
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at 
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
at 
com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.security.http.RestCsrfPreventionFilter$ServletFilterHttpInteraction.proceed(RestCsrfPreventionFilter.java:269)
at 
org.apache.hadoop.security.http.RestCsrfPreventionFilter.handleHttpInteraction(RestCsrfPreventionFilter.java:197)
at 
org.apache.hadoop.security.http.RestCsrfPreventionFilter.doFilter(RestCsrfPreventionFilter.java:209)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:617)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:294)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:576)
at

[jira] [Commented] (YARN-9508) YarnConfiguration areNodeLabel enabled is costly in allocation flow

2019-05-14 Thread Bibin A Chundatt (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839300#comment-16839300
 ] 

Bibin A Chundatt commented on YARN-9508:


Thank you [~BilwaST] for updated patch

+1 for latest patch. I will commit by EOD tommorrow

 

> YarnConfiguration areNodeLabel enabled is costly in allocation flow
> ---
>
> Key: YARN-9508
> URL: https://issues.apache.org/jira/browse/YARN-9508
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-9508-001.patch, YARN-9508-002.patch, 
> YARN-9508-003.patch
>
>
> For every allocate request locking can be avoided. Improving performance
> {noformat}
> "pool-6-thread-300" #624 prio=5 os_prio=0 tid=0x7f2f91152800 nid=0x8ec5 
> waiting for monitor entry [0x7f1ec6a8d000]
>  java.lang.Thread.State: BLOCKED (on object monitor)
>  at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841)
>  - waiting to lock <0x7f1f8107c748> (a 
> org.apache.hadoop.yarn.conf.YarnConfiguration)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268)
>  at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1674)
>  at 
> org.apache.hadoop.yarn.conf.YarnConfiguration.areNodeLabelsEnabled(YarnConfiguration.java:3646)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:274)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:242)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:427)
>  - locked <0x7f24dd3f9e40> (a 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock)
>  at 
> org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator$1.run(MRAMSimulator.java:352)
>  at 
> org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator$1.run(MRAMSimulator.java:349)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator.sendContainerRequest(MRAMSimulator.java:348)
>  at 
> org.apache.hadoop.yarn.sls.appmaster.AMSimulator.middleStep(AMSimulator.java:212)
>  at 
> org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:94)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9552) FairScheduler: NODE_UPDATE can cause NoSuchElementException

2019-05-14 Thread Peter Bacsko (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9552:
---
Attachment: YARN-9552-001.patch

> FairScheduler: NODE_UPDATE can cause NoSuchElementException
> ---
>
> Key: YARN-9552
> URL: https://issues.apache.org/jira/browse/YARN-9552
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9552-001.patch
>
>
> We observed a race condition inside YARN with the following stack trace:
> {noformat}
> 18/11/07 06:45:09.559 SchedulerEventDispatcher:Event Processor ERROR 
> EventDispatcher: Error in handling event type NODE_UPDATE to the Event 
> Dispatcher
> java.util.NoSuchElementException
> at 
> java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036)
> at 
> java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:373)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:941)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1373)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:353)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1094)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:961)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1183)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:132)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> This is basically the same as the one described in YARN-7382, but the root 
> cause is different.
> When we create an application attempt, we create an {{FSAppAttempt}} object. 
> This contains an {{AppSchedulingInfo}} which contains a set of 
> {{SchedulerRequestKey}}. Initially, this set is empty and only initialized a 
> bit later on a separate thread during a state transition:
> {noformat}
> 2019-05-07 15:58:02,659 INFO  [RM StateStore dispatcher] 
> recovery.RMStateStore (RMStateStore.java:transition(239)) - Storing info for 
> app: application_1557237478804_0001
> 2019-05-07 15:58:02,684 INFO  [RM Event dispatcher] rmapp.RMAppImpl 
> (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change 
> from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED
> 2019-05-07 15:58:02,690 INFO  [SchedulerEventDispatcher:Event Processor] 
> fair.FairScheduler (FairScheduler.java:addApplication(490)) - Accepted 
> application application_1557237478804_0001 from user: bacskop, in queue: 
> root.bacskop, currently num of applications: 1
> 2019-05-07 15:58:02,698 INFO  [RM Event dispatcher] rmapp.RMAppImpl 
> (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change 
> from SUBMITTED to ACCEPTED on event = APP_ACCEPTED
> 2019-05-07 15:58:02,731 INFO  [RM Event dispatcher] 
> resourcemanager.ApplicationMasterService 
> (ApplicationMasterService.java:registerAppAttempt(434)) - Registering app 
> attempt : appattempt_1557237478804_0001_01
> 2019-05-07 15:58:02,732 INFO  [RM Event dispatcher] attempt.RMAppAttemptImpl 
> (RMAppAttemptImpl.java:handle(920)) - appattempt_1557237478804_0001_01 
> State change from NEW to SUBMITTED on event = START
> 2019-05-07 15:58:02,746 INFO  [SchedulerEventDispatcher:Event Processor] 
> scheduler.SchedulerApplicationAttempt 
> (SchedulerApplicationAttempt.java:(207)) - *** In the constructor of 
> SchedulerApplicationAttempt
> 2019-05-07 15:58:02,747 INFO  [SchedulerEventDispatcher:Event Processor] 
> scheduler.SchedulerApplicationAttempt 
> (SchedulerApplicationAttempt.java:(230)) - *** Contents of 
> appSchedulingInfo: []
> 2019-05-07 15:58:02,752 INFO  [SchedulerEventDispatcher:Event Processor] 
> fair.FairScheduler (FairScheduler.java:addApplicationAttempt(546)) - Added 
> Application Attempt appattempt_1557237478804_0001_01 to scheduler from 
> user: bacskop
> 2019-05-07 15:58:02,756 INFO  [RM Event dispatcher] 
> scheduler.AppSchedulingInfo 
>

[jira] [Updated] (YARN-9552) FairScheduler: NODE_UPDATE can cause NoSuchElementException

2019-05-14 Thread Peter Bacsko (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9552:
---
Summary: FairScheduler: NODE_UPDATE can cause NoSuchElementException  (was: 
FairScheduler: NODE_UPDATE can cause a NoSuchElementException)

> FairScheduler: NODE_UPDATE can cause NoSuchElementException
> ---
>
> Key: YARN-9552
> URL: https://issues.apache.org/jira/browse/YARN-9552
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> We observed a race condition inside YARN with the following stack trace:
> {noformat}
> 18/11/07 06:45:09.559 SchedulerEventDispatcher:Event Processor ERROR 
> EventDispatcher: Error in handling event type NODE_UPDATE to the Event 
> Dispatcher
> java.util.NoSuchElementException
> at 
> java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036)
> at 
> java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:373)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:941)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1373)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:353)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1094)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:961)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1183)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:132)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> This is basically the same as the one described in YARN-7382, but the root 
> cause is different.
> When we create an application attempt, we create an {{FSAppAttempt}} object. 
> This contains an {{AppSchedulingInfo}} which contains a set of 
> {{SchedulerRequestKey}}. Initially, this set is empty and only initialized a 
> bit later on a separate thread during a state transition:
> {noformat}
> 2019-05-07 15:58:02,659 INFO  [RM StateStore dispatcher] 
> recovery.RMStateStore (RMStateStore.java:transition(239)) - Storing info for 
> app: application_1557237478804_0001
> 2019-05-07 15:58:02,684 INFO  [RM Event dispatcher] rmapp.RMAppImpl 
> (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change 
> from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED
> 2019-05-07 15:58:02,690 INFO  [SchedulerEventDispatcher:Event Processor] 
> fair.FairScheduler (FairScheduler.java:addApplication(490)) - Accepted 
> application application_1557237478804_0001 from user: bacskop, in queue: 
> root.bacskop, currently num of applications: 1
> 2019-05-07 15:58:02,698 INFO  [RM Event dispatcher] rmapp.RMAppImpl 
> (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change 
> from SUBMITTED to ACCEPTED on event = APP_ACCEPTED
> 2019-05-07 15:58:02,731 INFO  [RM Event dispatcher] 
> resourcemanager.ApplicationMasterService 
> (ApplicationMasterService.java:registerAppAttempt(434)) - Registering app 
> attempt : appattempt_1557237478804_0001_01
> 2019-05-07 15:58:02,732 INFO  [RM Event dispatcher] attempt.RMAppAttemptImpl 
> (RMAppAttemptImpl.java:handle(920)) - appattempt_1557237478804_0001_01 
> State change from NEW to SUBMITTED on event = START
> 2019-05-07 15:58:02,746 INFO  [SchedulerEventDispatcher:Event Processor] 
> scheduler.SchedulerApplicationAttempt 
> (SchedulerApplicationAttempt.java:(207)) - *** In the constructor of 
> SchedulerApplicationAttempt
> 2019-05-07 15:58:02,747 INFO  [SchedulerEventDispatcher:Event Processor] 
> scheduler.SchedulerApplicationAttempt 
> (SchedulerApplicationAttempt.java:(230)) - *** Contents of 
> appSchedulingInfo: []
> 2019-05-07 15:58:02,752 INFO  [SchedulerEventDispatcher:Event Processor] 
> fair.FairScheduler (FairScheduler.java:addApplicationAttempt(546)) - Added 
> Application Attempt appattempt_1557237478804_0001_01 to scheduler from 
> user: bacskop
> 2019-05-07 15:58:02,756 INFO  [RM Event dispatcher]

[jira] [Created] (YARN-9552) FairScheduler: NODE_UPDATE can cause a NoSuchElementException

2019-05-14 Thread Peter Bacsko (JIRA)

Peter Bacsko created YARN-9552:
--

 Summary: FairScheduler: NODE_UPDATE can cause a 
NoSuchElementException
 Key: YARN-9552
 URL: https://issues.apache.org/jira/browse/YARN-9552
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Peter Bacsko
Assignee: Peter Bacsko


We observed a race condition inside YARN with the following stack trace:

{noformat}
18/11/07 06:45:09.559 SchedulerEventDispatcher:Event Processor ERROR 
EventDispatcher: Error in handling event type NODE_UPDATE to the Event 
Dispatcher
java.util.NoSuchElementException
at 
java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036)
at 
java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:373)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:941)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1373)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:353)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:204)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1094)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:961)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1183)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:132)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
at java.lang.Thread.run(Thread.java:748)
{noformat}

This is basically the same as the one described in YARN-7382, but the root 
cause is different.

When we create an application attempt, we create an {{FSAppAttempt}} object. 
This contains an {{AppSchedulingInfo}} which contains a set of 
{{SchedulerRequestKey}}. Initially, this set is empty and only initialized a 
bit later on a separate thread during a state transition:
{noformat}
2019-05-07 15:58:02,659 INFO  [RM StateStore dispatcher] recovery.RMStateStore 
(RMStateStore.java:transition(239)) - Storing info for app: 
application_1557237478804_0001
2019-05-07 15:58:02,684 INFO  [RM Event dispatcher] rmapp.RMAppImpl 
(RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change from 
NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED
2019-05-07 15:58:02,690 INFO  [SchedulerEventDispatcher:Event Processor] 
fair.FairScheduler (FairScheduler.java:addApplication(490)) - Accepted 
application application_1557237478804_0001 from user: bacskop, in queue: 
root.bacskop, currently num of applications: 1
2019-05-07 15:58:02,698 INFO  [RM Event dispatcher] rmapp.RMAppImpl 
(RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change from 
SUBMITTED to ACCEPTED on event = APP_ACCEPTED
2019-05-07 15:58:02,731 INFO  [RM Event dispatcher] 
resourcemanager.ApplicationMasterService 
(ApplicationMasterService.java:registerAppAttempt(434)) - Registering app 
attempt : appattempt_1557237478804_0001_01
2019-05-07 15:58:02,732 INFO  [RM Event dispatcher] attempt.RMAppAttemptImpl 
(RMAppAttemptImpl.java:handle(920)) - appattempt_1557237478804_0001_01 
State change from NEW to SUBMITTED on event = START
2019-05-07 15:58:02,746 INFO  [SchedulerEventDispatcher:Event Processor] 
scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:(207)) - *** In the constructor of 
SchedulerApplicationAttempt
2019-05-07 15:58:02,747 INFO  [SchedulerEventDispatcher:Event Processor] 
scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:(230)) - *** Contents of 
appSchedulingInfo: []
2019-05-07 15:58:02,752 INFO  [SchedulerEventDispatcher:Event Processor] 
fair.FairScheduler (FairScheduler.java:addApplicationAttempt(546)) - Added 
Application Attempt appattempt_1557237478804_0001_01 to scheduler from 
user: bacskop
2019-05-07 15:58:02,756 INFO  [RM Event dispatcher] scheduler.AppSchedulingInfo 
(AppSchedulingInfo.java:updatePendingResources(257)) - *** Adding scheduler 
key: SchedulerRequestKey{priority=0, allocationRequestId=-1, 
containerToUpdate=null}  for attempt: appattempt_1557237478804_0001_01
2019-05-07 15:58:02,759 INFO  [RM Event dispatcher] attempt.RMAppAttemptImpl 
(RMAppAttemptImpl.java:handle(920)) - appattempt_1557237478804_0001_01 
State change from SUBMITTED to SCHEDULED on event = ATTEMPT_ADDED
2019-05-07 15:58:02,892

[jira] [Commented] (YARN-9508) YarnConfiguration areNodeLabel enabled is costly in allocation flow

2019-05-14 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839207#comment-16839207
 ] 

Hadoop QA commented on YARN-9508:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 21s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 25s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 83m 
48s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}133m 51s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9508 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12968637/YARN-9508-003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 060a981ae285 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 02c9efc |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24086/testReport/ |
| Max. process+thread count | 918 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24086/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> YarnConfiguration areNodeLabel enabled is costly

[jira] [Commented] (YARN-9547) ContainerStatusPBImpl default execution type is not returned

2019-05-14 Thread Bibin A Chundatt (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839188#comment-16839188
 ] 

Bibin A Chundatt commented on YARN-9547:


Thank you [~BilwaST] for patch

Fix looks good to me.

Will wait for a day before commiting


> ContainerStatusPBImpl default execution type is not returned
> 
>
> Key: YARN-9547
> URL: https://issues.apache.org/jira/browse/YARN-9547
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-9547-001.patch
>
>
> {code}
>   @Override
>   public synchronized ExecutionType getExecutionType() {
> ContainerStatusProtoOrBuilder p = viaProto ? proto : builder;
> if (!p.hasExecutionType()) {
>   return null;
> }
> return convertFromProtoFormat(p.getExecutionType());
>   }
> {code}
> ContainerStatusPBImpl executionType should return default as 
> ExecutionType.GUARANTEED.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9542) LogsCLI guessAppOwner ignores custom file format suffix

2019-05-14 Thread Prabhu Joseph (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839187#comment-16839187
 ] 

Prabhu Joseph commented on YARN-9542:
-

[~giovanni.fumarola] Can you review this jira when you get time. This fixes 
LogsCli guessAppOwner logic to consider the custom suffix of the Log 
Aggregation file format configured. 

Failing testcase TestTimelineClientV2Impl is not related, have reported 
YARN-9551 to fix the same. Thanks.

> LogsCLI guessAppOwner ignores custom file format suffix
> ---
>
> Key: YARN-9542
> URL: https://issues.apache.org/jira/browse/YARN-9542
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: YARN-9542-001.patch
>
>
> LogsCLI guessAppOwner ignores custom file format suffix 
> yarn.log-aggregation.%s.remote-app-log-dir-suffix / Default 
> IndexedFileController Suffix 
> ({yarn.nodemanager.remote-app-log-dir-suffix}-ifile or logs-ifile). It 
> considers only yarn.nodemanager.remote-app-log-dir-suffix or default logs.
> *Repro:*
> {code}
> yarn-site.xml
> yarn.log-aggregation.file-formats ifile
> yarn.log-aggregation.file-controller.ifile.class 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController
> yarn.log-aggregation.ifile.remote-app-log-dir app-logs
> yarn.resourcemanager.connect.max-wait.ms 1000
> core-site.xml:
> ipc.client.connect.max.retries 3
> ipc.client.connect.retry.interval 10
> Run a Job with above configs and Stop the RM.
> [ambari-qa@yarn-ats-1 ~]$ yarn logs -applicationId 
> application_1557482389195_0001
> 2019-05-10 10:03:58,215 INFO client.RMProxy: Connecting to ResourceManager at 
> yarn-ats-1/172.26.81.91:8050
> Unable to get ApplicationState. Attempting to fetch logs directly from the 
> filesystem.
> Can not find the appOwner. Please specify the correct appOwner
> Could not locate application logs for application_1557482389195_0001
> [ambari-qa@yarn-ats-1 ~]$ hadoop fs -ls 
> /app-logs/ambari-qa/logs-ifile/application_1557482389195_0001
> Found 1 items
> -rw-r-   3 ambari-qa supergroup  18058 2019-05-10 10:01 
> /app-logs/ambari-qa/logs-ifile/application_1557482389195_0001/yarn-ats-1_45454
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-9551) TestTimelineClientV2Impl.testSyncCall fails intermittent

2019-05-14 Thread Prabhu Joseph (JIRA)

Prabhu Joseph created YARN-9551:
---

 Summary: TestTimelineClientV2Impl.testSyncCall fails intermittent
 Key: YARN-9551
 URL: https://issues.apache.org/jira/browse/YARN-9551
 Project: Hadoop YARN
  Issue Type: Bug
  Components: ATSv2, test
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


TestTimelineClientV2Impl.testSyncCall fails intermittent
{code:java}
Failed
org.apache.hadoop.yarn.client.api.impl.TestTimelineClientV2Impl.testSyncCall

Failing for the past 1 build (Since #24083 )
Took 1.5 sec.
Error Message
TimelineEntities not published as desired expected:<3> but was:<4>
Stacktrace
java.lang.AssertionError: TimelineEntities not published as desired 
expected:<3> but was:<4>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at 
org.apache.hadoop.yarn.client.api.impl.TestTimelineClientV2Impl.testSyncCall(TestTimelineClientV2Impl.java:251)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
Standard Output
2019-05-13 15:33:46,596 WARN  [main] util.NativeCodeLoader 
(NativeCodeLoader.java:(60)) - Unable to load native-hadoop library for 
your platform... using builtin-java classes where applicable
2019-05-13 15:33:47,763 INFO  [main] impl.TestTimelineClientV2Impl 
(TestTimelineClientV2Impl.java:printReceivedEntities(413)) - Entities Published 
@ index 0 : 1,
2019-05-13 15:33:47,764 INFO  [main] impl.TestTimelineClientV2Impl 
(TestTimelineClientV2Impl.java:printReceivedEntities(413)) - Entities Published 
@ index 1 : 2,
2019-05-13 15:33:47,764 INFO  [main] impl.TestTimelineClientV2Impl 
(TestTimelineClientV2Impl.java:printReceivedEntities(413)) - Entities Published 
@ index 2 : 3,
2019-05-13 15:33:47,764 INFO  [main] impl.TestTimelineClientV2Impl 
(TestTimelineClientV2Impl.java:printReceivedEntities(413)) - Entities Published 
@ index 3 : 4,
2019-05-13 15:33:47,765 INFO  [main] impl.TimelineV2ClientImpl 
(TimelineV2ClientImpl.java:stop(563)) - Stopping TimelineClient.
2019-05-13 15:33:47,765 INFO  [pool-1-thread-1] impl.TimelineV2ClientImpl 
(TimelineV2ClientImpl.java:run(429)) - Timeline dispatcher thread was 
interrupted 
2019-05-13 15:33:47,766 INFO  [main] impl.TimelineV2ClientImpl 
(TimelineV2ClientImpl.java:stop(563)) - Stopping TimelineClient.
2019-05-13 15:33:47,766 INFO  [pool-2-thread-1] impl.TimelineV2ClientImpl

[jira] [Updated] (YARN-9551) TestTimelineClientV2Impl.testSyncCall fails intermittent

2019-05-14 Thread Prabhu Joseph (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9551:

Priority: Minor  (was: Major)

> TestTimelineClientV2Impl.testSyncCall fails intermittent
> 
>
> Key: YARN-9551
> URL: https://issues.apache.org/jira/browse/YARN-9551
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2, test
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
>
> TestTimelineClientV2Impl.testSyncCall fails intermittent
> {code:java}
> Failed
> org.apache.hadoop.yarn.client.api.impl.TestTimelineClientV2Impl.testSyncCall
> Failing for the past 1 build (Since #24083 )
> Took 1.5 sec.
> Error Message
> TimelineEntities not published as desired expected:<3> but was:<4>
> Stacktrace
> java.lang.AssertionError: TimelineEntities not published as desired 
> expected:<3> but was:<4>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestTimelineClientV2Impl.testSyncCall(TestTimelineClientV2Impl.java:251)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> Standard Output
> 2019-05-13 15:33:46,596 WARN  [main] util.NativeCodeLoader 
> (NativeCodeLoader.java:(60)) - Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 2019-05-13 15:33:47,763 INFO  [main] impl.TestTimelineClientV2Impl 
> (TestTimelineClientV2Impl.java:printReceivedEntities(413)) - Entities 
> Published @ index 0 : 1,
> 2019-05-13 15:33:47,764 INFO  [main] impl.TestTimelineClientV2Impl 
> (TestTimelineClientV2Impl.java:printReceivedEntities(413)) - Entities 
> Published @ index 1 : 2,
> 2019-05-13 15:33:47,764 INFO  [main] impl.TestTimelineClientV2Impl 
> (TestTimelineClientV2Impl.java:printReceivedEntities(413)) - Entities 
> Published @ index 2 : 3,
> 2019-05-13 15:33:47,764 INFO  [main] impl.TestTimelineClientV2Impl 
> (TestTimelineClientV2Impl.java:printReceivedEntities(413)) - Entities 
> Published @ index 3 : 4,
> 2019-05-13 15:33:47,765 INFO  [main] impl.TimelineV2ClientImpl 
> (TimelineV2ClientImpl.java:stop(563)) - Stopping TimelineClient.
> 2019-05-13 15:33:47,765 INFO

[jira] [Commented] (YARN-8625) Aggregate Resource Allocation for each job is not present in ATS

2019-05-14 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839182#comment-16839182
 ] 

Hadoop QA commented on YARN-8625:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} YARN-8625 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8625 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24087/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Aggregate Resource Allocation for each job is not present in ATS
> 
>
> Key: YARN-8625
> URL: https://issues.apache.org/jira/browse/YARN-8625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 2.7.4
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: 0001-YARN-8625.patch, 0002-YARN-8625.patch, 
> ApplicationHistoryServer_Rest_Api.png, ApplicationHistoryServer_UI.png, 
> yarn-site.xml
>
>
> Aggregate Resource Allocation shown on RM UI for finished job is very useful 
> metric to understand how much resource a job has consumed. But this does not 
> get stored in ATS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8625) Aggregate Resource Allocation for each job is not present in ATS

2019-05-14 Thread Prabhu Joseph (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-8625:

Attachment: ApplicationHistoryServer_UI.png
ApplicationHistoryServer_Rest_Api.png

> Aggregate Resource Allocation for each job is not present in ATS
> 
>
> Key: YARN-8625
> URL: https://issues.apache.org/jira/browse/YARN-8625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 2.7.4
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: 0001-YARN-8625.patch, 0002-YARN-8625.patch, 
> ApplicationHistoryServer_Rest_Api.png, ApplicationHistoryServer_UI.png, 
> yarn-site.xml
>
>
> Aggregate Resource Allocation shown on RM UI for finished job is very useful 
> metric to understand how much resource a job has consumed. But this does not 
> get stored in ATS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8625) Aggregate Resource Allocation for each job is not present in ATS

2019-05-14 Thread Prabhu Joseph (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839177#comment-16839177
 ] 

Prabhu Joseph commented on YARN-8625:
-

[~eepayne] Suspect Resourcemanager address and port is used. Can you check with 
ApplicationHistoryServer (TimelineServer) address and port 8188. 

REST Api - http://:8188/ws/v1/applicationhistory/apps

UI - http://:8188/applicationhistory


Some Troubleshooting steps and attached screenshots for reference.
{code}
1. Check if TimelineServer 1.5 is Running and using port 8188

[yarn-ats@yarn-ats-1 hdfs]$ yarn-daemon.sh start timelineserver

[yarn-ats@yarn-ats-1 hdfs]$ ps -ef | grep ApplicationHistoryServer
yarn-ats 23243 1 24 07:27 pts/000:00:13 
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.212.b04-0.el7_6.x86_64/jre//bin/java 
-Dproc_timelineserver -Djava.net.preferIPv4Stack=true 
-Dyarn.log.dir=/HADOOP/hadoop-3.2.0/logs 
-Dyarn.log.file=hadoop-yarn-ats-timelineserver-yarn-ats-1.log 
-Dyarn.home.dir=/HADOOP/hadoop-3.2.0 -Dyarn.root.logger=INFO,console 
-Djava.library.path=/HADOOP/hadoop-3.2.0/lib/native 
-Dhadoop.log.dir=/HADOOP/hadoop-3.2.0/logs 
-Dhadoop.log.file=hadoop-yarn-ats-timelineserver-yarn-ats-1.log 
-Dhadoop.home.dir=/HADOOP/hadoop-3.2.0 -Dhadoop.id.str=yarn-ats 
-Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml 
-Dhadoop.security.logger=INFO,NullAppender 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer

[yarn-ats@yarn-ats-1 hdfs]$ netstat -plant | grep 8188 
tcp0  0 0.0.0.0:81880.0.0.0:*   LISTEN  
23243/java  

2. Run a sample MR job
 
3. Test http request with curl using ApplicationHistoryServer and port 8188

[ambari-qa@yarn-ats-3 ~]$ curl 
http://yarn-ats-1:8188/ws/v1/applicationhistory/apps | jq .
{
  "app": [
{
  "appId": "application_1557818827301_0001",
  "currentAppAttemptId": "appattempt_1557818827301_0001_01",
  .
  "aggregateResourceAllocation": "67374 MB-seconds, 34 vcore-seconds",
  "aggregatePreemptedResourceAllocation": "0 MB-seconds, 0 vcore-seconds"
}
{code}

> Aggregate Resource Allocation for each job is not present in ATS
> 
>
> Key: YARN-8625
> URL: https://issues.apache.org/jira/browse/YARN-8625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 2.7.4
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: 0001-YARN-8625.patch, 0002-YARN-8625.patch, yarn-site.xml
>
>
> Aggregate Resource Allocation shown on RM UI for finished job is very useful 
> metric to understand how much resource a job has consumed. But this does not 
> get stored in ATS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9547) ContainerStatusPBImpl default execution type is not returned

2019-05-14 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839175#comment-16839175
 ] 

Hadoop QA commented on YARN-9547:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
38s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 27s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  6s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
59s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
57s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
44s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 81m 27s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9547 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12968638/YARN-9547-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 9687ef0d88c8 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 02c9efc |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24085/testReport/ |
| Max. process+thread count | 412 (vs. ulimit of 1) |
| modules

[jira] [Updated] (YARN-9547) ContainerStatusPBImpl default execution type is not returned

2019-05-14 Thread Bilwa S T (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-9547:

Attachment: YARN-9547-001.patch

> ContainerStatusPBImpl default execution type is not returned
> 
>
> Key: YARN-9547
> URL: https://issues.apache.org/jira/browse/YARN-9547
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-9547-001.patch
>
>
> {code}
>   @Override
>   public synchronized ExecutionType getExecutionType() {
> ContainerStatusProtoOrBuilder p = viaProto ? proto : builder;
> if (!p.hasExecutionType()) {
>   return null;
> }
> return convertFromProtoFormat(p.getExecutionType());
>   }
> {code}
> ContainerStatusPBImpl executionType should return default as 
> ExecutionType.GUARANTEED.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9508) YarnConfiguration areNodeLabel enabled is costly in allocation flow

2019-05-14 Thread Bilwa S T (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839135#comment-16839135
 ] 

Bilwa S T commented on YARN-9508:
-

{quote}Hi [~bibinchundatt]                  


  
We cannot skip queueName because we need it when queueInfo is null. I have 
removed parameter scheduler. 
{quote}

> YarnConfiguration areNodeLabel enabled is costly in allocation flow
> ---
>
> Key: YARN-9508
> URL: https://issues.apache.org/jira/browse/YARN-9508
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-9508-001.patch, YARN-9508-002.patch, 
> YARN-9508-003.patch
>
>
> For every allocate request locking can be avoided. Improving performance
> {noformat}
> "pool-6-thread-300" #624 prio=5 os_prio=0 tid=0x7f2f91152800 nid=0x8ec5 
> waiting for monitor entry [0x7f1ec6a8d000]
>  java.lang.Thread.State: BLOCKED (on object monitor)
>  at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841)
>  - waiting to lock <0x7f1f8107c748> (a 
> org.apache.hadoop.yarn.conf.YarnConfiguration)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268)
>  at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1674)
>  at 
> org.apache.hadoop.yarn.conf.YarnConfiguration.areNodeLabelsEnabled(YarnConfiguration.java:3646)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:274)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:242)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:427)
>  - locked <0x7f24dd3f9e40> (a 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock)
>  at 
> org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator$1.run(MRAMSimulator.java:352)
>  at 
> org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator$1.run(MRAMSimulator.java:349)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator.sendContainerRequest(MRAMSimulator.java:348)
>  at 
> org.apache.hadoop.yarn.sls.appmaster.AMSimulator.middleStep(AMSimulator.java:212)
>  at 
> org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:94)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9508) YarnConfiguration areNodeLabel enabled is costly in allocation flow

2019-05-14 Thread Bilwa S T (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-9508:

Attachment: YARN-9508-003.patch

> YarnConfiguration areNodeLabel enabled is costly in allocation flow
> ---
>
> Key: YARN-9508
> URL: https://issues.apache.org/jira/browse/YARN-9508
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-9508-001.patch, YARN-9508-002.patch, 
> YARN-9508-003.patch
>
>
> For every allocate request locking can be avoided. Improving performance
> {noformat}
> "pool-6-thread-300" #624 prio=5 os_prio=0 tid=0x7f2f91152800 nid=0x8ec5 
> waiting for monitor entry [0x7f1ec6a8d000]
>  java.lang.Thread.State: BLOCKED (on object monitor)
>  at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841)
>  - waiting to lock <0x7f1f8107c748> (a 
> org.apache.hadoop.yarn.conf.YarnConfiguration)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268)
>  at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1674)
>  at 
> org.apache.hadoop.yarn.conf.YarnConfiguration.areNodeLabelsEnabled(YarnConfiguration.java:3646)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:274)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:242)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:427)
>  - locked <0x7f24dd3f9e40> (a 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock)
>  at 
> org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator$1.run(MRAMSimulator.java:352)
>  at 
> org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator$1.run(MRAMSimulator.java:349)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator.sendContainerRequest(MRAMSimulator.java:348)
>  at 
> org.apache.hadoop.yarn.sls.appmaster.AMSimulator.middleStep(AMSimulator.java:212)
>  at 
> org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:94)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

50 matches

Mail list logo