date:20150811


[ 
https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14687395#comment-14687395
 ] 

Anubhav Dhoot commented on YARN-4046:
-

[~cnauroth] appreciate your review

 Applications fail on NM restart on some linux distro because NM container 
 recovery declares AM container as LOST
 

 Key: YARN-4046
 URL: https://issues.apache.org/jira/browse/YARN-4046
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Attachments: YARN-4096.001.patch


 On a debian machine we have seen node manager recovery of containers fail 
 because the signal syntax for process group may not work. We see errors in 
 checking if process is alive during container recovery which causes the 
 container to be declared as LOST (154) on a NodeManager restart.
 The application will fail with error
 {noformat}
 Application application_1439244348718_0001 failed 1 times due to Attempt 
 recovered after RM restartAM Container for 
 appattempt_1439244348718_0001_01 exited with exitCode: 154
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS

2015-08-11 Thread Naganarasimha G R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3045:

Attachment: YARN-3045-YARN-2928.009.patch

Hi [~djp],
Attaching a new patching resolving your comments also have modified one 
approach,, for cases where we to publish the timeline entities  directly (not  
through wrapped application or container events) like ContainerMetrics, i have 
added a new NMTimelineEvent which accepts the TimelineEntity and ApplicationId, 
this approach avoids creating new event classes and would just suffice exposing 
method in NMTimelinePublisher.
Also have fixed the test case failures but the javac warnings seems not to be 
related to my modifications and findbugs dint have any issue reported in the 
report. will check for it in next jenkins run

 [Event producers] Implement NM writing container lifecycle events to ATS
 

 Key: YARN-3045
 URL: https://issues.apache.org/jira/browse/YARN-3045
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3045-YARN-2928.002.patch, 
 YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, 
 YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, 
 YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, 
 YARN-3045-YARN-2928.009.patch, YARN-3045.20150420-1.patch


 Per design in YARN-2928, implement NM writing container lifecycle events and 
 container system metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4045) Negative avaialbleMB is being reported for root queue.

2015-08-11 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14687389#comment-14687389
 ] 

Wangda Tan commented on YARN-4045:
--

[~tgraves]/[~shahrs87], 

I think the case could happen when container reservation interacts with node 
disconnect, one example is:
{code}
A cluster has 6 nodes, each node has 20G resource, and usage is
N1-N4, are all used
N5-N6, both of them are used 10G.
An app ask 15G container, assume it is reserved at N5, so total used resource = 
20G * 4 + 10G * 2 + 15G (just reserved) = 115G
Then, N6 disconnected, now cluster resource becomes 100G, and used resource = 
105G.
{code}

I've just checked fixes, YARN-3361 doesn't have related fixes. And currently we 
don't have a fix for above corner case. 

Another problem is caused by DRC, from 2.7.1, we have set availableResource = 
max(availableResource, Resources.none()). 
{code}
childQueue.getMetrics().setAvailableResourcesToQueue(
Resources.max(
calculator, 
clusterResource, 
available, 
Resources.none()
)
);
{code}

But if you're using DRC, if a resource has availableMB  0 and availableVCores 
 0, it could report such resource  Resources.None(). We may need to fix this 
case as well.

Thoughts?

 Negative avaialbleMB is being reported for root queue.
 --

 Key: YARN-4045
 URL: https://issues.apache.org/jira/browse/YARN-4045
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Rushabh S Shah

 We recently deployed 2.7 in one of our cluster.
 We are seeing negative availableMB being reported for queue=root.
 This is from the jmx output:
 {noformat}
 clusterMetrics
 ...
 availableMB-163328/availableMB
 ...
 /clusterMetrics
 {noformat}
 The following is the RM log:
 {noformat}
 2015-08-10 14:42:28,280 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:28,404 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:35,548 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:35,549 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:39,088 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:39,089 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:39,338 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:39,339 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:39,757

[jira] [Updated] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST


 [ 
https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-4046:

Attachment: YARN-4096.001.patch

Attaching patch that prefixes -- when using negative pid for kill

 Applications fail on NM restart on some linux distro because NM container 
 recovery declares AM container as LOST
 

 Key: YARN-4046
 URL: https://issues.apache.org/jira/browse/YARN-4046
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Attachments: YARN-4096.001.patch


 On a debian machine we have seen node manager recovery of containers fail 
 because the signal syntax for process group may not work. We see errors in 
 checking if process is alive during container recovery which causes the 
 container to be declared as LOST (154) on a NodeManager restart.
 The application will fail with error
 {noformat}
 Application application_1439244348718_0001 failed 1 times due to Attempt 
 recovered after RM restartAM Container for 
 appattempt_1439244348718_0001_01 exited with exitCode: 154
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4026) FiCaSchedulerApp: ContainerAllocator should be able to choose how to order pending resource requests


[ 
https://issues.apache.org/jira/browse/YARN-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14687398#comment-14687398
 ] 

Jian He commented on YARN-4026:
---

- why assignment.setFulfilledReservation(true); is called in Reserved state ?
{code}
  if (result.getAllocationState() == AllocationState.RESERVED) {
// This is a reserved container
LOG.info(Reserved container  +  application=
+ application.getApplicationId() +  resource= + allocatedResource
+  queue= + this.toString() +  cluster= + clusterResource);
assignment.getAssignmentInformation().addReservationDetails(
updatedContainer.getId(),
application.getCSLeafQueue().getQueuePath());
assignment.getAssignmentInformation().incrReservations();
Resources.addTo(assignment.getAssignmentInformation().getReserved(),
allocatedResource);
assignment.setFulfilledReservation(true);
  } else {

{code}
-  I think here can always return ContainerAllocation.LOCALITY_SKIPPED as the 
semantics of this method is to try to allocate a container for certain 
locality. 
{code}
  return type == NodeType.OFF_SWITCH ? ContainerAllocation.APP_SKIPPED
  : ContainerAllocation.LOCALITY_SKIPPED;
{code}
The caller here can choose to return APP_SKIPPED if it sees the LOCALITY_SKIPPED
{code}
  assigned =
  assignOffSwitchContainers(clusterResource, offSwitchResourceRequest,
  node, priority, reservedContainer, schedulingMode,
  currentResoureLimits);
  assigned.requestNodeType = requestType;

  return assigned;
}
{code}


 FiCaSchedulerApp: ContainerAllocator should be able to choose how to order 
 pending resource requests
 

 Key: YARN-4026
 URL: https://issues.apache.org/jira/browse/YARN-4026
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-4026.1.patch


 After YARN-3983, we have an extensible ContainerAllocator which can be used 
 by FiCaSchedulerApp to decide how to allocate resources.
 While working on YARN-1651 (allocate resource to increase container), I found 
 one thing in existing logic not flexible enough:
 - ContainerAllocator decides what to allocate for a given node and priority: 
 To support different kinds of resource allocation, for example, priority as 
 weight / skip priority or not, etc. It's better to let ContainerAllocator to 
 choose how to order pending resource requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST


 [ 
https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-4046:

Description: 
On a debian machine we have seen node manager recovery of containers fail 
because the signal syntax for process group may not work. We see errors in 
checking if process is alive during container recovery which causes the 
container to be declared as LOST (154) on a NodeManager restart.

The application will fail with error. The attempts are not retried.
{noformat}
Application application_1439244348718_0001 failed 1 times due to Attempt 
recovered after RM restartAM Container for appattempt_1439244348718_0001_01 
exited with exitCode: 154
{noformat}


  was:
On a debian machine we have seen node manager recovery of containers fail 
because the signal syntax for process group may not work. We see errors in 
checking if process is alive during container recovery which causes the 
container to be declared as LOST (154) on a NodeManager restart.

The application will fail with error
{noformat}
Application application_1439244348718_0001 failed 1 times due to Attempt 
recovered after RM restartAM Container for appattempt_1439244348718_0001_01 
exited with exitCode: 154
{noformat}



 Applications fail on NM restart on some linux distro because NM container 
 recovery declares AM container as LOST
 

 Key: YARN-4046
 URL: https://issues.apache.org/jira/browse/YARN-4046
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Attachments: YARN-4096.001.patch


 On a debian machine we have seen node manager recovery of containers fail 
 because the signal syntax for process group may not work. We see errors in 
 checking if process is alive during container recovery which causes the 
 container to be declared as LOST (154) on a NodeManager restart.
 The application will fail with error. The attempts are not retried.
 {noformat}
 Application application_1439244348718_0001 failed 1 times due to Attempt 
 recovered after RM restartAM Container for 
 appattempt_1439244348718_0001_01 exited with exitCode: 154
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2657) MiniYARNCluster to (optionally) add MicroZookeeper service


[ 
https://issues.apache.org/jira/browse/YARN-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692525#comment-14692525
 ] 

Sangjin Lee commented on YARN-2657:
---

Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me 
know.

 MiniYARNCluster to (optionally) add MicroZookeeper service
 --

 Key: YARN-2657
 URL: https://issues.apache.org/jira/browse/YARN-2657
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: test
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-2567-001.patch, YARN-2657-002.patch


 This is needed for testing things like YARN-2646: add an option for the 
 {{MiniYarnCluster}} to start a {{MicroZookeeperService}}.
 This is just another YARN service to create and track the lifecycle. The 
 {{MicroZookeeperService}} publishes its binding information for direct takeup 
 by the registry services...this can address in-VM race conditions.
 The default setting for this service is off



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2599) Standby RM should also expose some jmx and metrics


[ 
https://issues.apache.org/jira/browse/YARN-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692526#comment-14692526
 ] 

Sangjin Lee commented on YARN-2599:
---

Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me 
know.

 Standby RM should also expose some jmx and metrics
 --

 Key: YARN-2599
 URL: https://issues.apache.org/jira/browse/YARN-2599
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Rohith Sharma K S

 YARN-1898 redirects jmx and metrics to the Active. As discussed there, we 
 need to separate out metrics displayed so the Standby RM can also be 
 monitored. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2746) YARNDelegationTokenID misses serializing version from the common abstract ID


[ 
https://issues.apache.org/jira/browse/YARN-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692523#comment-14692523
 ] 

Sangjin Lee commented on YARN-2746:
---

Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me 
know.

 YARNDelegationTokenID misses serializing version from the common abstract ID
 

 Key: YARN-2746
 URL: https://issues.apache.org/jira/browse/YARN-2746
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He

 I found this during review of YARN-2743.
 bq. AbstractDTId had a version, we dropped that in the protobuf 
 serialization. We should just write it during the serialization and read it 
 back?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3478) FairScheduler page not performed because different enum of YarnApplicationState and RMAppState


[ 
https://issues.apache.org/jira/browse/YARN-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692520#comment-14692520
 ] 

Sangjin Lee commented on YARN-3478:
---

Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me 
know.

 FairScheduler page not performed because different enum of 
 YarnApplicationState and RMAppState 
 ---

 Key: YARN-3478
 URL: https://issues.apache.org/jira/browse/YARN-3478
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Xu Chen
 Attachments: YARN-3478.1.patch, YARN-3478.2.patch, YARN-3478.3.patch, 
 screenshot-1.png


 Got exception from log 
 java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
 at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
 at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
 at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
 at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
 at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:79)
 at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
 at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
 at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
 at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
 at 
 com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:96)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1225)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.apache.hadoop.http.lib.DynamicUserWebFilter$DynamicUserFilter.doFilter(DynamicUserWebFilter.java:59)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
 at

[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS


[ 
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692524#comment-14692524
 ] 

Hadoop QA commented on YARN-3045:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 17s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 9 new or modified test files. |
| {color:red}-1{color} | javac |   7m 55s | The applied patch generated  3  
additional warning messages. |
| {color:green}+1{color} | javadoc |   9m 53s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 48s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  8s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   2m 46s | The patch appears to introduce 5 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   8m  6s | Tests passed in 
hadoop-yarn-applications-distributedshell. |
| {color:red}-1{color} | yarn tests |   6m  4s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |   1m 22s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  55m 53s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-nodemanager |
| Failed unit tests | hadoop.yarn.server.nodemanager.TestDeletionService |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12749943/YARN-3045-YARN-2928.009.patch
 |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | YARN-2928 / 07433c2 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/8826/artifact/patchprocess/diffJavacWarnings.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8826/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
 |
| hadoop-yarn-applications-distributedshell test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8826/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8826/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8826/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8826/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8826/console |


This message was automatically generated.

 [Event producers] Implement NM writing container lifecycle events to ATS
 

 Key: YARN-3045
 URL: https://issues.apache.org/jira/browse/YARN-3045
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3045-YARN-2928.002.patch, 
 YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, 
 YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, 
 YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, 
 YARN-3045-YARN-2928.009.patch, YARN-3045.20150420-1.patch


 Per design in YARN-2928, implement NM writing container lifecycle events and 
 container system metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2457) FairScheduler: Handle preemption to help starved parent queues


[ 
https://issues.apache.org/jira/browse/YARN-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692528#comment-14692528
 ] 

Sangjin Lee commented on YARN-2457:
---

Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me 
know.

 FairScheduler: Handle preemption to help starved parent queues
 --

 Key: YARN-2457
 URL: https://issues.apache.org/jira/browse/YARN-2457
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.5.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 YARN-2395/YARN-2394 add preemption timeout and threshold per queue, but don't 
 check for parent queue starvation. 
 We need to check that. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster


[ 
https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692521#comment-14692521
 ] 

Sangjin Lee commented on YARN-2859:
---

Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me 
know.

 ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
 --

 Key: YARN-2859
 URL: https://issues.apache.org/jira/browse/YARN-2859
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Hitesh Shah
Assignee: Zhijie Shen
Priority: Critical
  Labels: 2.6.1-candidate

 In mini cluster, a random port should be used. 
 Also, the config is not updated to the host that the process got bound to.
 {code}
 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
 (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer 
 address: localhost:10200
 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
 (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer 
 web address: 0.0.0.0:8188
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2038) Revisit how AMs learn of containers from previous attempts

2015-08-11 Thread sandflee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692709#comment-14692709
 ] 

sandflee commented on YARN-2038:


I thought it's the same issue to YARN-3519, but it seems not. I'm also 
confusing what the purpose of this issue now

 Revisit how AMs learn of containers from previous attempts
 --

 Key: YARN-2038
 URL: https://issues.apache.org/jira/browse/YARN-2038
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla

 Based on YARN-556, we need to update the way AMs learn about containers 
 allocation previous attempts. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3999) RM hangs on draing events

2015-08-11 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692729#comment-14692729
 ] 

Xuan Gong commented on YARN-3999:
-

Thanks, Jian. Committed into trunk/branch-2/branch-2.7.

 RM hangs on draing events
 -

 Key: YARN-3999
 URL: https://issues.apache.org/jira/browse/YARN-3999
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.7.2

 Attachments: YARN-3999-branch-2.7.patch, YARN-3999.1.patch, 
 YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, 
 YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch


 If external systems like ATS, or ZK becomes very slow, draining all the 
 events take a lot of time. If this time becomes larger than 10 mins, all 
 applications will expire. Fixes include:
 1. add a timeout and stop the dispatcher even if not all events are drained.
 2. Move ATS service out from RM active service so that RM doesn't need to 
 wait for ATS to flush the events when transitioning to standby.
 3. Stop client-facing services (ClientRMService etc.) first so that clients 
 get fast notification that RM is stopping/transitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4047) ClientRMService getApplications has high scheduler lock contention


[ 
https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692412#comment-14692412
 ] 

Hadoop QA commented on YARN-4047:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 47s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  1s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 59s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 50s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 22s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 27s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  53m 27s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m 53s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
|   | hadoop.yarn.server.resourcemanager.TestRMAdminService |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12749935/YARN-4047.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 7c796fd |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8824/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8824/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8824/console |


This message was automatically generated.

 ClientRMService getApplications has high scheduler lock contention
 --

 Key: YARN-4047
 URL: https://issues.apache.org/jira/browse/YARN-4047
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: 2.6.1-candidate
 Attachments: YARN-4047.001.patch


 The getApplications call can be particuarly expensive because the code can 
 call checkAccess on every application being tracked by the RM.  checkAccess 
 will often call scheduler.checkAccess which will grab the big scheduler lock. 
  This can cause a lot of contention with the scheduler thread which is busy 
 trying to process node heartbeats, app allocation requests, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority


[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681416#comment-14681416
 ] 

Hadoop QA commented on YARN-3250:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12749621/0002-YARN-3250.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / fa1d84a |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8820/console |


This message was automatically generated.

 Support admin cli interface in for Application Priority
 ---

 Key: YARN-3250
 URL: https://issues.apache.org/jira/browse/YARN-3250
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch


 Current Application Priority Manager supports only configuration via file. 
 To support runtime configurations for admin cli and REST, a common management 
 interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-08-11 Thread Dian Fu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated YARN-3964:
--
Attachment: YARN-3964.002.patch

 Support NodeLabelsProvider at Resource Manager side
 ---

 Key: YARN-3964
 URL: https://issues.apache.org/jira/browse/YARN-3964
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Dian Fu
Assignee: Dian Fu
 Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
 YARN-3964.1.patch


 Currently, CLI/REST API is provided in Resource Manager to allow users to 
 specify labels for nodes. For labels which may change over time, users will 
 have to start a cron job to update the labels. This has the following 
 limitations:
 - The cron job needs to be run in the YARN admin user.
 - This makes it a little complicate to maintain as users will have to make 
 sure this service/daemon is alive.
 Adding a Node Labels Provider in Resource Manager will provide user more 
 flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4037) Hadoop - failed redirect for container

2015-08-11 Thread Mohammad Shahid Khan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681316#comment-14681316
 ] 

Mohammad Shahid Khan commented on YARN-4037:


Hi Gagan,
The below convention in configuring the log server url.

{CODE}
property 
 nameyarn.log.server.url/name 
 valuehttp://$jobhistoryserver.full.hostname:port/jobhistory/logs/value
 descriptionURL for job history server  /description
/property
{CODE}

Note: 
The port should be the port being used for the 
mapreduce.jobhistory.webapp.address.
for example if the mapreduce.jobhistory.webapp.address has the value ip1:10988, 
then the host and port for the log server shuld ip1:10988

Please verify once your configuration.


 Hadoop - failed redirect for container
 --

 Key: YARN-4037
 URL: https://issues.apache.org/jira/browse/YARN-4037
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.1
 Environment: Windows 7, Apache Hadoop 2.7.1
Reporter: Gagan

 I believe this issue has been addressed earlier in 
 https://issues.apache.org/jira/browse/YARN-1473 though I am not sure because 
 the description of the JIRA does not talk about the following message 
 Failed while trying to construct the redirect url to the log server. Log 
 Server url may not be configured
 java.lang.Exception: Unknown container. Container either has not started or 
 has already completed or doesn't belong to this node at all.
 Could some one look at the same and provide detail on the root cause and 
 resolution ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3924) Submitting an application to standby ResourceManager should respond better than Connection Refused

[
https://issues.apache.org/jira/browse/YARN-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681526#comment-14681526
]

Rohith Sharma K S commented on YARN-3924:
-

If I see from user side, user would be expecting that to differente exception
for *connecting to Standby RM* and *connecting to invalid/not started
resourceManager address*. But as per RM HA design, both the scenario's are
treated as same. The reason is StandBy RM does not opens any rpc server for
client communication. If the client is trying to submit a job, then client
retry for certain amout of time for both configured rm.ha-ids and throw
connectionRefused exception.
There are 2 possibilities client might throw connection refused
# Configuring wrong/invalid *ha.rm-ids* at client is user mistake, this can be
rechecked by user.
# Both RM's are in StandBy for long time is problem from YARN and need to find
the reason for this state. Ideally if any issue with ZK, after sometime RM will
shutdown. If you can share logs for both RM's in standBy would be helpful for
analysis.

Submitting an application to standby ResourceManager should respond better
than Connection Refused
--

Key: YARN-3924
URL: https://issues.apache.org/jira/browse/YARN-3924
Project: Hadoop YARN
Issue Type: Improvement
Components: resourcemanager
Reporter: Dustin Cote
Assignee: Ajith S
Priority: Minor

When submitting an application directly to a standby resource manager, the
resource manager responds with 'Connection Refused' rather than indicating
that it is a standby resource manager. Because the resource manager is aware
of its own state, I feel like we can have the 8032 port open for standby
resource managers and reject the request with something like 'Cannot process
application submission from this standby resource manager'.
This would be especially helpful for debugging oozie problems when users put
in the wrong address for the 'jobtracker' (i.e. they don't put the logical RM
address but rather point to a specific resource manager).

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-08-11 Thread Dian Fu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681574#comment-14681574
 ] 

Dian Fu commented on YARN-3964:
---

Updated the patch with the following updates:
- remove the interface modification to NodeLabelsProvider
- improve Fetcher implementation to update node labels in batch

 Support NodeLabelsProvider at Resource Manager side
 ---

 Key: YARN-3964
 URL: https://issues.apache.org/jira/browse/YARN-3964
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Dian Fu
Assignee: Dian Fu
 Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
 YARN-3964.1.patch


 Currently, CLI/REST API is provided in Resource Manager to allow users to 
 specify labels for nodes. For labels which may change over time, users will 
 have to start a cron job to update the labels. This has the following 
 limitations:
 - The cron job needs to be run in the YARN admin user.
 - This makes it a little complicate to maintain as users will have to make 
 sure this service/daemon is alive.
 Adding a Node Labels Provider in Resource Manager will provide user more 
 flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader

2015-08-11 Thread Li Lu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681312#comment-14681312
]

Li Lu commented on YARN-3814:
-

Thank [~varun_saxena]!
bq. So if GenericObjectMapper cannot convert the value and throws an
Exception, we merely forward whatever came in the request. Hope I am not
misunderstanding your comment.
OK that's fine. Actually originally I meant parseKeyStrValuesStr and
parseKeyStrValueStr, but not parseKeyStrValueObj, but never mind.

bq. Ok. Will move these helper functions to another file.
Actually my key point here is not about separating the file, but to maximize
code reuse for those parse methods. Part of parseKeyStrValuesStr and
parseKeyValue are quite similar. If we can reuse most of the code in
parseKeyValue and not mixing it in the middle of a sequence of parse helper
methods, I'm totally fine with keeping them in the same file.

bq. Would a code comment be fine ?
Sure, but it's certainly not harmful to generate one more log, especially when
FS storage implementation is for debug only? It's up to you though. :)

REST API implementation for getting raw entities in TimelineReader
--

Key: YARN-3814
URL: https://issues.apache.org/jira/browse/YARN-3814
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena
Attachments: YARN-3814-YARN-2928.01.patch,
YARN-3814-YARN-2928.02.patch, YARN-3814-YARN-2928.03.patch,
YARN-3814-YARN-2928.04.patch, YARN-3814.reference.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader

2015-08-11 Thread Varun Saxena (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681288#comment-14681288
]

Varun Saxena commented on YARN-3814:

[~gtCarrera9],

bq. Why we're not using if statement but using exception handling to decide the
type of pairStrs[1]? I think we can pretty much restrict the type of
pairStrs[1] here, so maybe instanceof will do the work?
We are not using exception handling to determine the type. readValue can throw
an Exception which has been caught here. So if GenericObjectMapper cannot
convert the value and throws an Exception, we merely forward whatever came in
the request. Hope I am not misunderstanding your comment.

bq. We may further want to move those parse methods into a helper method
collection file?
Ok. Will move these helper functions to another file.

bq. Change its name into newEntity or initEntity?
Hmm...Well this function entity is used as a shorthand for creating entity and
used during assertion. Maybe name it as newEntity.

bq. But still we need some rationales in the code as comment for why we can
swallow the exception. It will be also helpful if we output something about the
404? I think it will be quite hard to debug if we hit an exception which cause
a null return value and then a 404.
Would a code comment be fine ?

bq. String msg = new String(); instead of using immediate value
Ok.

bq. We don't really need a separate JIRA for each set of RESTful APIs since
this part is relatively trivial.
I agree. Even I meant to do it in a single JIRA.

REST API implementation for getting raw entities in TimelineReader
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3873) pendingApplications in LeafQueue should also use OrderingPolicy


[ 
https://issues.apache.org/jira/browse/YARN-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681649#comment-14681649
 ] 

Hudson commented on YARN-3873:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1014 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1014/])
YARN-3873. PendingApplications in LeafQueue should also use OrderingPolicy. 
(Sunil G via wangda) (wangda: rev cf9d3c925608e8bc650d43975382ed3014081057)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicyForNodePartitions.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java


 pendingApplications in LeafQueue should also use OrderingPolicy
 ---

 Key: YARN-3873
 URL: https://issues.apache.org/jira/browse/YARN-3873
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.7.0
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 2.8.0

 Attachments: 0001-YARN-3873.patch, 0002-YARN-3873.patch, 
 0003-YARN-3873.patch, 0004-YARN-3873.patch, 0005-YARN-3873.patch, 
 0006-YARN-3873.patch


 Currently *pendingApplications* in LeafQueue is using 
 {{applicationComparator}} from CapacityScheduler. This can be changed and 
 pendingApplications can use the OrderingPolicy configured in Queue level 
 (Fifo/Fair as configured). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3887) Support for changing Application priority during runtime


[ 
https://issues.apache.org/jira/browse/YARN-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681650#comment-14681650
 ] 

Hudson commented on YARN-3887:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1014 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1014/])
YARN-3887. Support changing Application priority during runtime. Contributed by 
Sunil G (jianhe: rev fa1d84ae2739a1e76f58b9c96d1378f9453cc0d2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriority.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java


 Support for changing Application priority during runtime
 

 Key: YARN-3887
 URL: https://issues.apache.org/jira/browse/YARN-3887
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 2.8.0

 Attachments: 0001-YARN-3887.patch, 0002-YARN-3887.patch, 
 0003-YARN-3887.patch, 0004-YARN-3887.patch, 0005-YARN-3887.patch, 
 0006-YARN-3887.patch


 After YARN-2003, adding support to change priority of an application after 
 submission. This ticket will handle the server side implementation for same.
 A new RMAppEvent will be created to handle this, and will be common for all 
 schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3924) Submitting an application to standby ResourceManager should respond better than Connection Refused

2015-08-11 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681677#comment-14681677
 ] 

Ajith S commented on YARN-3924:
---

Hi [~rohithsharma]

+1 and Thanks for the input, agree with you regarding RM HA design. But 
however, i think what [~cotedm] is conveying is, in any scenario, if both RM 
nodes in HA(for whatever reason maybe) are in Standby, then client should have 
got back a reasonable StandbyException instead of connection refused. If i can 
suggest, can we change it so that rpc server can be started in standby too, but 
before it sends response we can check if its active or else throw 
StandbyException

any thoughts.?

 Submitting an application to standby ResourceManager should respond better 
 than Connection Refused
 --

 Key: YARN-3924
 URL: https://issues.apache.org/jira/browse/YARN-3924
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Dustin Cote
Assignee: Ajith S
Priority: Minor

 When submitting an application directly to a standby resource manager, the 
 resource manager responds with 'Connection Refused' rather than indicating 
 that it is a standby resource manager.  Because the resource manager is aware 
 of its own state, I feel like we can have the 8032 port open for standby 
 resource managers and reject the request with something like 'Cannot process 
 application submission from this standby resource manager'.  
 This would be especially helpful for debugging oozie problems when users put 
 in the wrong address for the 'jobtracker' (i.e. they don't put the logical RM 
 address but rather point to a specific resource manager).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side


[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681778#comment-14681778
 ] 

Hadoop QA commented on YARN-3964:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  20m 44s | Pre-patch trunk has 7 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 8 new or modified test files. |
| {color:green}+1{color} | javac |   7m 56s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 44s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 21s | The applied patch generated  2 
new checkstyle issues (total was 211, now 212). |
| {color:red}-1{color} | checkstyle |   2m 45s | The applied patch generated  9 
new checkstyle issues (total was 0, now 9). |
| {color:red}-1{color} | whitespace |   0m  2s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 24s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   6m 15s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |   0m 21s | Tests failed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   7m  0s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |   6m 38s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | yarn tests |  56m 43s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 121m 20s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
| Failed unit tests | hadoop.yarn.conf.TestYarnConfigurationFields |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
|   | hadoop.yarn.server.resourcemanager.TestRMAdminService |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12749814/YARN-3964.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / fa1d84a |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8822/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8822/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 
https://builds.apache.org/job/PreCommit-YARN-Build/8822/artifact/patchprocess/diffcheckstylehadoop-yarn-server-common.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8822/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8822/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8822/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8822/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8822/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8822/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8822/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8822/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8822/console |


This message was automatically generated.

 Support NodeLabelsProvider at Resource Manager side
 ---

 Key: YARN-3964
 URL: https://issues.apache.org/jira/browse/YARN-3964
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Dian Fu
Assignee: Dian Fu

[jira] [Commented] (YARN-3873) pendingApplications in LeafQueue should also use OrderingPolicy


[ 
https://issues.apache.org/jira/browse/YARN-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681644#comment-14681644
 ] 

Hudson commented on YARN-3873:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #284 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/284/])
YARN-3873. PendingApplications in LeafQueue should also use OrderingPolicy. 
(Sunil G via wangda) (wangda: rev cf9d3c925608e8bc650d43975382ed3014081057)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicyForNodePartitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java


 pendingApplications in LeafQueue should also use OrderingPolicy
 ---

 Key: YARN-3873
 URL: https://issues.apache.org/jira/browse/YARN-3873
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.7.0
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 2.8.0

 Attachments: 0001-YARN-3873.patch, 0002-YARN-3873.patch, 
 0003-YARN-3873.patch, 0004-YARN-3873.patch, 0005-YARN-3873.patch, 
 0006-YARN-3873.patch


 Currently *pendingApplications* in LeafQueue is using 
 {{applicationComparator}} from CapacityScheduler. This can be changed and 
 pendingApplications can use the OrderingPolicy configured in Queue level 
 (Fifo/Fair as configured). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3999) RM hangs on draing events


[ 
https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681583#comment-14681583
 ] 

Hadoop QA commented on YARN-3999:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m 54s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 8 new or modified test files. |
| {color:green}+1{color} | javac |   7m 48s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 41s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   3m  2s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:red}-1{color} | checkstyle |   3m 40s | The applied patch generated  7 
new checkstyle issues (total was 87, now 88). |
| {color:red}-1{color} | whitespace |   0m  2s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 23s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 31s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   6m 20s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  22m 17s | Tests failed in 
hadoop-common. |
| {color:green}+1{color} | yarn tests |   0m 22s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   1m 54s | Tests failed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |  53m  9s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 128m 26s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.net.TestNetUtils |
|   | hadoop.ha.TestZKFailoverController |
|   | hadoop.yarn.util.TestRackResolver |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12749775/YARN-3999.5.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / fa1d84a |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8821/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 
https://builds.apache.org/job/PreCommit-YARN-Build/8821/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8821/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8821/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8821/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8821/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8821/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8821/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8821/console |


This message was automatically generated.

 RM hangs on draing events
 -

 Key: YARN-3999
 URL: https://issues.apache.org/jira/browse/YARN-3999
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, 
 YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.5.patch, YARN-3999.patch, 
 YARN-3999.patch


 If external systems like ATS, or ZK becomes very slow, draining all the 
 events take a lot of time. If this time becomes larger than 10 mins, all 
 applications will expire. Fixes include:
 1. add a timeout and stop the dispatcher even if not all events are drained.
 2. Move ATS service out from RM active service so that RM doesn't need to 
 wait for ATS to flush the events when transitioning to standby.
 3. Stop client-facing services (ClientRMService etc.) first so that clients 
 get fast notification that RM is stopping/transitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3887) Support for changing Application priority during runtime


[ 
https://issues.apache.org/jira/browse/YARN-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681645#comment-14681645
 ] 

Hudson commented on YARN-3887:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #284 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/284/])
YARN-3887. Support changing Application priority during runtime. Contributed by 
Sunil G (jianhe: rev fa1d84ae2739a1e76f58b9c96d1378f9453cc0d2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriority.java


 Support for changing Application priority during runtime
 

 Key: YARN-3887
 URL: https://issues.apache.org/jira/browse/YARN-3887
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 2.8.0

 Attachments: 0001-YARN-3887.patch, 0002-YARN-3887.patch, 
 0003-YARN-3887.patch, 0004-YARN-3887.patch, 0005-YARN-3887.patch, 
 0006-YARN-3887.patch


 After YARN-2003, adding support to change priority of an application after 
 submission. This ticket will handle the server side implementation for same.
 A new RMAppEvent will be created to handle this, and will be common for all 
 schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3999) RM hangs on draing events

[
https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681691#comment-14681691
]

Rohith Sharma K S commented on YARN-3999:
-

Thanks [~jianhe] for updating the patch..
One doubt is SystemMetricsPublisher has been moved from RMActiveServices to
ResourceManager. So this service will not be reinitialized on every RM switch.
Thinking that this would lead for processing stale events even after RM is in
standby. If any case, the same RM becomes active SystemMetricsPublisher
dispatcher publishes stale events plus recovered application events. Anyway
events processing will happen in the sequential order if same RM comes back
Active. But issue may can ocure when the different RM becomes active i.e
# RM1 is active and publishing the events
# RM1 is transitioning to standby,and some events are in the queue to be
updated in the timeline sever
# RM2 become active and recovered the applications. When application got
finished, RM2 systempublisher publishes app status as finished.
# RM1 is still processing the events for app which would process bit late i.e
after RM2 processed.

Doesn't it cause problem? Any thoughts?

RM hangs on draing events
-

Key: YARN-3999
URL: https://issues.apache.org/jira/browse/YARN-3999
Project: Hadoop YARN
Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch,
YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.5.patch, YARN-3999.patch,
YARN-3999.patch

If external systems like ATS, or ZK becomes very slow, draining all the
events take a lot of time. If this time becomes larger than 10 mins, all
applications will expire. Fixes include:
1. add a timeout and stop the dispatcher even if not all events are drained.
2. Move ATS service out from RM active service so that RM doesn't need to
wait for ATS to flush the events when transitioning to standby.
3. Stop client-facing services (ClientRMService etc.) first so that clients
get fast notification that RM is stopping/transitioning.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3873) pendingApplications in LeafQueue should also use OrderingPolicy


[ 
https://issues.apache.org/jira/browse/YARN-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681702#comment-14681702
 ] 

Sunil G commented on YARN-3873:
---

Thank you very much [~leftnoteasy] for the review and commit!

 pendingApplications in LeafQueue should also use OrderingPolicy
 ---

 Key: YARN-3873
 URL: https://issues.apache.org/jira/browse/YARN-3873
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.7.0
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 2.8.0

 Attachments: 0001-YARN-3873.patch, 0002-YARN-3873.patch, 
 0003-YARN-3873.patch, 0004-YARN-3873.patch, 0005-YARN-3873.patch, 
 0006-YARN-3873.patch


 Currently *pendingApplications* in LeafQueue is using 
 {{applicationComparator}} from CapacityScheduler. This can be changed and 
 pendingApplications can use the OrderingPolicy configured in Queue level 
 (Fifo/Fair as configured). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3887) Support for changing Application priority during runtime


[ 
https://issues.apache.org/jira/browse/YARN-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681920#comment-14681920
 ] 

Hudson commented on YARN-3887:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #281 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/281/])
YARN-3887. Support changing Application priority during runtime. Contributed by 
Sunil G (jianhe: rev fa1d84ae2739a1e76f58b9c96d1378f9453cc0d2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriority.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java


 Support for changing Application priority during runtime
 

 Key: YARN-3887
 URL: https://issues.apache.org/jira/browse/YARN-3887
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 2.8.0

 Attachments: 0001-YARN-3887.patch, 0002-YARN-3887.patch, 
 0003-YARN-3887.patch, 0004-YARN-3887.patch, 0005-YARN-3887.patch, 
 0006-YARN-3887.patch


 After YARN-2003, adding support to change priority of an application after 
 submission. This ticket will handle the server side implementation for same.
 A new RMAppEvent will be created to handle this, and will be common for all 
 schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3873) pendingApplications in LeafQueue should also use OrderingPolicy


[ 
https://issues.apache.org/jira/browse/YARN-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681916#comment-14681916
 ] 

Hudson commented on YARN-3873:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2230 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2230/])
YARN-3873. PendingApplications in LeafQueue should also use OrderingPolicy. 
(Sunil G via wangda) (wangda: rev cf9d3c925608e8bc650d43975382ed3014081057)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicyForNodePartitions.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java


 pendingApplications in LeafQueue should also use OrderingPolicy
 ---

 Key: YARN-3873
 URL: https://issues.apache.org/jira/browse/YARN-3873
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.7.0
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 2.8.0

 Attachments: 0001-YARN-3873.patch, 0002-YARN-3873.patch, 
 0003-YARN-3873.patch, 0004-YARN-3873.patch, 0005-YARN-3873.patch, 
 0006-YARN-3873.patch


 Currently *pendingApplications* in LeafQueue is using 
 {{applicationComparator}} from CapacityScheduler. This can be changed and 
 pendingApplications can use the OrderingPolicy configured in Queue level 
 (Fifo/Fair as configured). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4037) Hadoop - failed redirect for container

2015-08-11 Thread Gagan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gagan updated YARN-4037:

Attachment: mapred-site.xml
yarn-site.xml

Hi Mohammad,

May be I am making a fundamental mistake.

My target is to run - wordcount example jar with 2.7.1 on yarn in a psuedo 
distributed mode (on my laptop). 

When I run it I get an exception, in order to check what it is  I am trying to 
look for logs at 
http://garima-pc:8088/cluster/apps/FAILED to 
http://garima-pc:8088/cluster/app/application_1439303739376_0001 to logs link 
and after implementing what you have described it takes me to a link
http://garima-pc:19888/jobhistory/logs/Garima-PC:50415/container_1439303739376_0001_02_01/container_1439303739376_0001_02_01/Garima
which is broken. seems like port 50415 port is getting dynamically generated.
I am attaching my configuration xml files.

 Hadoop - failed redirect for container
 --

 Key: YARN-4037
 URL: https://issues.apache.org/jira/browse/YARN-4037
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.1
 Environment: Windows 7, Apache Hadoop 2.7.1
Reporter: Gagan
 Attachments: mapred-site.xml, yarn-site.xml


 I believe this issue has been addressed earlier in 
 https://issues.apache.org/jira/browse/YARN-1473 though I am not sure because 
 the description of the JIRA does not talk about the following message 
 Failed while trying to construct the redirect url to the log server. Log 
 Server url may not be configured
 java.lang.Exception: Unknown container. Container either has not started or 
 has already completed or doesn't belong to this node at all.
 Could some one look at the same and provide detail on the root cause and 
 resolution ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3887) Support for changing Application priority during runtime


[ 
https://issues.apache.org/jira/browse/YARN-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681917#comment-14681917
 ] 

Hudson commented on YARN-3887:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2230 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2230/])
YARN-3887. Support changing Application priority during runtime. Contributed by 
Sunil G (jianhe: rev fa1d84ae2739a1e76f58b9c96d1378f9453cc0d2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriority.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java


 Support for changing Application priority during runtime
 

 Key: YARN-3887
 URL: https://issues.apache.org/jira/browse/YARN-3887
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 2.8.0

 Attachments: 0001-YARN-3887.patch, 0002-YARN-3887.patch, 
 0003-YARN-3887.patch, 0004-YARN-3887.patch, 0005-YARN-3887.patch, 
 0006-YARN-3887.patch


 After YARN-2003, adding support to change priority of an application after 
 submission. This ticket will handle the server side implementation for same.
 A new RMAppEvent will be created to handle this, and will be common for all 
 schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3887) Support for changing Application priority during runtime


[ 
https://issues.apache.org/jira/browse/YARN-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681964#comment-14681964
 ] 

Hudson commented on YARN-3887:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2211 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2211/])
YARN-3887. Support changing Application priority during runtime. Contributed by 
Sunil G (jianhe: rev fa1d84ae2739a1e76f58b9c96d1378f9453cc0d2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriority.java


 Support for changing Application priority during runtime
 

 Key: YARN-3887
 URL: https://issues.apache.org/jira/browse/YARN-3887
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 2.8.0

 Attachments: 0001-YARN-3887.patch, 0002-YARN-3887.patch, 
 0003-YARN-3887.patch, 0004-YARN-3887.patch, 0005-YARN-3887.patch, 
 0006-YARN-3887.patch


 After YARN-2003, adding support to change priority of an application after 
 submission. This ticket will handle the server side implementation for same.
 A new RMAppEvent will be created to handle this, and will be common for all 
 schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3887) Support for changing Application priority during runtime


[ 
https://issues.apache.org/jira/browse/YARN-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681998#comment-14681998
 ] 

Hudson commented on YARN-3887:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #273 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/273/])
YARN-3887. Support changing Application priority during runtime. Contributed by 
Sunil G (jianhe: rev fa1d84ae2739a1e76f58b9c96d1378f9453cc0d2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriority.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java


 Support for changing Application priority during runtime
 

 Key: YARN-3887
 URL: https://issues.apache.org/jira/browse/YARN-3887
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 2.8.0

 Attachments: 0001-YARN-3887.patch, 0002-YARN-3887.patch, 
 0003-YARN-3887.patch, 0004-YARN-3887.patch, 0005-YARN-3887.patch, 
 0006-YARN-3887.patch


 After YARN-2003, adding support to change priority of an application after 
 submission. This ticket will handle the server side implementation for same.
 A new RMAppEvent will be created to handle this, and will be common for all 
 schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3873) pendingApplications in LeafQueue should also use OrderingPolicy


[ 
https://issues.apache.org/jira/browse/YARN-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681997#comment-14681997
 ] 

Hudson commented on YARN-3873:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #273 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/273/])
YARN-3873. PendingApplications in LeafQueue should also use OrderingPolicy. 
(Sunil G via wangda) (wangda: rev cf9d3c925608e8bc650d43975382ed3014081057)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicyForNodePartitions.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java


 pendingApplications in LeafQueue should also use OrderingPolicy
 ---

 Key: YARN-3873
 URL: https://issues.apache.org/jira/browse/YARN-3873
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.7.0
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 2.8.0

 Attachments: 0001-YARN-3873.patch, 0002-YARN-3873.patch, 
 0003-YARN-3873.patch, 0004-YARN-3873.patch, 0005-YARN-3873.patch, 
 0006-YARN-3873.patch


 Currently *pendingApplications* in LeafQueue is using 
 {{applicationComparator}} from CapacityScheduler. This can be changed and 
 pendingApplications can use the OrderingPolicy configured in Queue level 
 (Fifo/Fair as configured). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3999) RM hangs on draing events

[
https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682011#comment-14682011
]

Jian He commented on YARN-3999:
---

I talked to [~zjshen] about this too. I think it's fine as the event processing
order is not that critical. Also each timeline entity has a timestamp which
itself indicates the order of the event too.IMO, this is similar to multiple
containers writing to ATS at the same time. There's no guarantee that the
earliest generated event gets published into ATS first.

RM hangs on draing events
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state

2015-08-11 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682044#comment-14682044
 ] 

Junping Du commented on YARN-3212:
--

Can someone give it a review? With this patch get in, the basic flow for 
gracefully decommission can work now. Thanks!

 RMNode State Transition Update with DECOMMISSIONING state
 -

 Key: YARN-3212
 URL: https://issues.apache.org/jira/browse/YARN-3212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Junping Du
Assignee: Junping Du
 Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, 
 YARN-3212-v2.patch, YARN-3212-v3.patch, YARN-3212-v4.1.patch, 
 YARN-3212-v4.patch, YARN-3212-v5.1.patch, YARN-3212-v5.patch


 As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and 
 can transition from “running” state triggered by a new event - 
 “decommissioning”. 
 This new state can be transit to state of “decommissioned” when 
 Resource_Update if no running apps on this NM or NM reconnect after restart. 
 Or it received DECOMMISSIONED event (after timeout from CLI).
 In addition, it can back to “running” if user decides to cancel previous 
 decommission by calling recommission on the same node. The reaction to other 
 events is similar to RUNNING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4044) Running applications information changes such as movequeue is not published to TimeLine server

Sunil G created YARN-4044:
-

 Summary: Running applications information changes such as 
movequeue is not published to TimeLine server
 Key: YARN-4044
 URL: https://issues.apache.org/jira/browse/YARN-4044
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, timelineserver
Affects Versions: 2.7.0
Reporter: Sunil G
Assignee: Sunil G
Priority: Critical


SystemMetricsPublisher need to expose an appUpdated api to update any change 
for a running application.
Events can be 
- change of queue for a running application.
- change of application priority for a running application.

This ticket intends to handle both RM and timeline side changes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4044) Running applications information changes such as movequeue is not published to TimeLine server


[ 
https://issues.apache.org/jira/browse/YARN-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682033#comment-14682033
 ] 

Sunil G commented on YARN-4044:
---

Timeline v2 changes can be tracked in separate ticket once api changes are 
done. I will file a ticket under V2 umbrella jira if no issues.


 Running applications information changes such as movequeue is not published 
 to TimeLine server
 --

 Key: YARN-4044
 URL: https://issues.apache.org/jira/browse/YARN-4044
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, timelineserver
Affects Versions: 2.7.0
Reporter: Sunil G
Assignee: Sunil G
Priority: Critical

 SystemMetricsPublisher need to expose an appUpdated api to update any change 
 for a running application.
 Events can be 
   - change of queue for a running application.
 - change of application priority for a running application.
 This ticket intends to handle both RM and timeline side changes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state


[ 
https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682082#comment-14682082
 ] 

Sunil G commented on YARN-3212:
---

Hi [~djp]
I have one doubt in this. For {{StatusUpdateWhenHealthyTransition}}, if state 
of node is DECOMMISSIONING at init state, now we move to DECOMMISIONED 
directly. 
Cud we give a chance to move it to UNHEALTHY here , so later after some rounds 
we can mark as DECOMMISIONED if it cannot be revived. Your thoughts?

 RMNode State Transition Update with DECOMMISSIONING state
 -

 Key: YARN-3212
 URL: https://issues.apache.org/jira/browse/YARN-3212
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Junping Du
Assignee: Junping Du
 Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, 
 YARN-3212-v2.patch, YARN-3212-v3.patch, YARN-3212-v4.1.patch, 
 YARN-3212-v4.patch, YARN-3212-v5.1.patch, YARN-3212-v5.patch


 As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and 
 can transition from “running” state triggered by a new event - 
 “decommissioning”. 
 This new state can be transit to state of “decommissioned” when 
 Resource_Update if no running apps on this NM or NM reconnect after restart. 
 Or it received DECOMMISSIONED event (after timeout from CLI).
 In addition, it can back to “running” if user decides to cancel previous 
 decommission by calling recommission on the same node. The reaction to other 
 events is similar to RUNNING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4045) Negative avaialbleMB is being reported for root queue.

2015-08-11 Thread Rushabh S Shah (JIRA)

Rushabh S Shah created YARN-4045:


 Summary: Negative avaialbleMB is being reported for root queue.
 Key: YARN-4045
 URL: https://issues.apache.org/jira/browse/YARN-4045
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Rushabh S Shah


We recently deployed 2.7 in one of our cluster.
We are seeing negative availableMB being reported for queue=root.
This is from the jmx output:
{noformat}
clusterMetrics
...
availableMB-163328/availableMB
...
/clusterMetrics
{noformat}

The following is the RM log:
{noformat}
2015-08-10 14:42:28,280 [ResourceManager Event Processor] INFO 
capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
cluster=memory:5316608, vCores:28320
2015-08-10 14:42:28,404 [ResourceManager Event Processor] INFO 
capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
cluster=memory:5316608, vCores:28320
2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO 
capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
cluster=memory:5316608, vCores:28320
2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO 
capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
cluster=memory:5316608, vCores:28320
2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO 
capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
cluster=memory:5316608, vCores:28320
2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO 
capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
cluster=memory:5316608, vCores:28320
2015-08-10 14:42:35,548 [ResourceManager Event Processor] INFO 
capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
cluster=memory:5316608, vCores:28320
2015-08-10 14:42:35,549 [ResourceManager Event Processor] INFO 
capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
cluster=memory:5316608, vCores:28320
2015-08-10 14:42:39,088 [ResourceManager Event Processor] INFO 
capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
cluster=memory:5316608, vCores:28320
2015-08-10 14:42:39,089 [ResourceManager Event Processor] INFO 
capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
cluster=memory:5316608, vCores:28320
2015-08-10 14:42:39,338 [ResourceManager Event Processor] INFO 
capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
cluster=memory:5316608, vCores:28320
2015-08-10 14:42:39,339 [ResourceManager Event Processor] INFO 
capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
cluster=memory:5316608, vCores:28320
2015-08-10 14:42:39,757 [ResourceManager Event Processor] INFO 
capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
cluster=memory:5316608, vCores:28320
2015-08-10 14:42:39,758 [ResourceManager Event Processor] INFO 
capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
cluster=memory:5316608, vCores:28320
2015-08-10 14:42:43,056 [ResourceManager Event Processor] INFO 
capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
cluster=memory:5316608, vCores:28320
2015-08-10 14:42:43,070 [ResourceManager Event Processor] INFO 
capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
cluster=memory:5316608, vCores:28320
2015-08-10 14:42:44,486 [ResourceManager Event Processor] INFO 
capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
cluster=memory:5316608, vCores:28320
2015-08-10 14:42:44,487 [ResourceManager Event Processor] INFO 
capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212

[jira] [Commented] (YARN-4045) Negative avaialbleMB is being reported for root queue.

2015-08-11 Thread Rushabh S Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682118#comment-14682118
 ] 

Rushabh S Shah commented on YARN-4045:
--

bq. Thanks Rushabh S Shah for reporting this. One doubt, Which 
ResourceCalculator is used here? Is it Dominant RC.
yes.

 Negative avaialbleMB is being reported for root queue.
 --

 Key: YARN-4045
 URL: https://issues.apache.org/jira/browse/YARN-4045
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Rushabh S Shah

 We recently deployed 2.7 in one of our cluster.
 We are seeing negative availableMB being reported for queue=root.
 This is from the jmx output:
 {noformat}
 clusterMetrics
 ...
 availableMB-163328/availableMB
 ...
 /clusterMetrics
 {noformat}
 The following is the RM log:
 {noformat}
 2015-08-10 14:42:28,280 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:28,404 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:35,548 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:35,549 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:39,088 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:39,089 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:39,338 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:39,339 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:39,757 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:39,758 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:43,056 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:43,070 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:44,486 [ResourceManager Event

[jira] [Commented] (YARN-4023) Publish Application Priority to TimelineServer


[ 
https://issues.apache.org/jira/browse/YARN-4023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682077#comment-14682077
 ] 

Rohith Sharma K S commented on YARN-4023:
-

+1 for the latest patch, If no objection will commit it.. 

 Publish Application Priority to TimelineServer
 --

 Key: YARN-4023
 URL: https://issues.apache.org/jira/browse/YARN-4023
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-4023.patch, 0001-YARN-4023.patch, 
 ApplicationPage.png, TimelineserverMainpage.png


 Publish Application priority details to Timeline Server. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3924) Submitting an application to standby ResourceManager should respond better than Connection Refused


[ 
https://issues.apache.org/jira/browse/YARN-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682099#comment-14682099
 ] 

Rohith Sharma K S commented on YARN-3924:
-

I agree with the concern that user should be able to obtain standby exception.I 
am not sure whether this point was discussed when initially RM HA was designed. 
keeping cc:\ [~ka...@cloudera.com] [~jianhe] [~xgong] [~vinodkv] for more 
discussion on this. 

 Submitting an application to standby ResourceManager should respond better 
 than Connection Refused
 --

 Key: YARN-3924
 URL: https://issues.apache.org/jira/browse/YARN-3924
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Dustin Cote
Assignee: Ajith S
Priority: Minor

 When submitting an application directly to a standby resource manager, the 
 resource manager responds with 'Connection Refused' rather than indicating 
 that it is a standby resource manager.  Because the resource manager is aware 
 of its own state, I feel like we can have the 8032 port open for standby 
 resource managers and reject the request with something like 'Cannot process 
 application submission from this standby resource manager'.  
 This would be especially helpful for debugging oozie problems when users put 
 in the wrong address for the 'jobtracker' (i.e. they don't put the logical RM 
 address but rather point to a specific resource manager).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3906) split the application table from the entity table

2015-08-11 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682102#comment-14682102
 ] 

Junping Du commented on YARN-3906:
--

Thanks [~sjlee0] for the patch work and [~gtCarrera9] for review! Latest patch 
LGTM. However, I will wait for our decision on sequence of YARN-4025.

 split the application table from the entity table
 -

 Key: YARN-3906
 URL: https://issues.apache.org/jira/browse/YARN-3906
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3906-YARN-2928.001.patch, 
 YARN-3906-YARN-2928.002.patch, YARN-3906-YARN-2928.003.patch, 
 YARN-3906-YARN-2928.004.patch, YARN-3906-YARN-2928.005.patch, 
 YARN-3906-YARN-2928.006.patch, YARN-3906-YARN-2928.007.patch


 Per discussions on YARN-3815, we need to split the application entities from 
 the main entity table into its own table (application).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4045) Negative avaialbleMB is being reported for root queue.


[ 
https://issues.apache.org/jira/browse/YARN-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682110#comment-14682110
 ] 

Sunil G commented on YARN-4045:
---

Thanks [~shahrs87] for reporting this. One doubt, Which ResourceCalculator is 
used here? Is it Dominant RC.

 Negative avaialbleMB is being reported for root queue.
 --

 Key: YARN-4045
 URL: https://issues.apache.org/jira/browse/YARN-4045
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Rushabh S Shah

 We recently deployed 2.7 in one of our cluster.
 We are seeing negative availableMB being reported for queue=root.
 This is from the jmx output:
 {noformat}
 clusterMetrics
 ...
 availableMB-163328/availableMB
 ...
 /clusterMetrics
 {noformat}
 The following is the RM log:
 {noformat}
 2015-08-10 14:42:28,280 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:28,404 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:35,548 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:35,549 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:39,088 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:39,089 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:39,338 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:39,339 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:39,757 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:39,758 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:43,056 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:43,070 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:44,486 [ResourceManager Event Processor] INFO

[jira] [Commented] (YARN-3999) RM hangs on draing events


[ 
https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682112#comment-14682112
 ] 

Rohith Sharma K S commented on YARN-3999:
-

thank [~jianhe] for the explanation. Overall patch looks good to me.. 

 RM hangs on draing events
 -

 Key: YARN-3999
 URL: https://issues.apache.org/jira/browse/YARN-3999
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, 
 YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.5.patch, YARN-3999.patch, 
 YARN-3999.patch


 If external systems like ATS, or ZK becomes very slow, draining all the 
 events take a lot of time. If this time becomes larger than 10 mins, all 
 applications will expire. Fixes include:
 1. add a timeout and stop the dispatcher even if not all events are drained.
 2. Move ATS service out from RM active service so that RM doesn't need to 
 wait for ATS to flush the events when transitioning to standby.
 3. Stop client-facing services (ClientRMService etc.) first so that clients 
 get fast notification that RM is stopping/transitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4045) Negative avaialbleMB is being reported for root queue.

2015-08-11 Thread Thomas Graves (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682115#comment-14682115
 ] 

Thomas Graves commented on YARN-4045:
-

I remember seeing that this was fixed in branch-2 by some of the capacity 
scheduler work for labels.

I thought this might be fixed by 
https://issues.apache.org/jira/browse/YARN-3243 but that is included.  

This might be fixed as part of https://issues.apache.org/jira/browse/YARN-3361 
which is probably to big to backport totally.

[~leftnoteasy]  Do you remember this issue?

Note that it also shows up in capacity scheduler UI as root queue going over 
100%.  I remember when I was testing YARN-3434 it wasn't occurring for me on 
branch-2 (2.8) and I thought it was one of the above jiras that fixed.

 Negative avaialbleMB is being reported for root queue.
 --

 Key: YARN-4045
 URL: https://issues.apache.org/jira/browse/YARN-4045
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Rushabh S Shah

 We recently deployed 2.7 in one of our cluster.
 We are seeing negative availableMB being reported for queue=root.
 This is from the jmx output:
 {noformat}
 clusterMetrics
 ...
 availableMB-163328/availableMB
 ...
 /clusterMetrics
 {noformat}
 The following is the RM log:
 {noformat}
 2015-08-10 14:42:28,280 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:28,404 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:35,548 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:35,549 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:39,088 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:39,089 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:39,338 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:39,339 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:39,757 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:39,758 [ResourceManager Event Processor] INFO 
 capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 
 cluster=memory:5316608, vCores:28320
 2015-08-10 14:42:43,056 [ResourceManager Event Processor]

[jira] [Commented] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM


[ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682240#comment-14682240
 ] 

Rohith Sharma K S commented on YARN-3979:
-

I had look at the RM logs shared, I strongly suspect that it is because of the 
same reason in YARN-3990.
From the shared log, I see below logs which indicates that asyncdispatcher is 
overloaded with unnecessary events. May be you can use patch of YARN-3990 and 
test it.
{noformat}
2015-07-29 01:58:27,112 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
BJHC-HERA-18352.hadoop.jd.local:50086 Node Transitioned from RUNNING to LOST
2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.util.RackResolver: Resolved 
BJHC-HADOOP-HERA-17280.jd.local to /rack/rack4065
2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 2515000
2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 2515000
2015-07-29 01:58:27,112 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
NodeManager from node BJHC-HADOOP-HERA-17280.jd.local(cmPort: 50086 httpPort: 
8042) registered with capability: memory:57344, vCores:28, assigned nodeId 
BJHC-HADOOP-HERA-17280.jd.local:50086
2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.util.RackResolver: Resolved 
BJHC-HERA-164102.hadoop.jd.local to /rack/rack41007
2015-07-29 01:58:27,112 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
NodeManager from node BJHC-HERA-164102.hadoop.jd.local(cmPort: 50086 httpPort: 
8042) registered with capability: memory:57344, vCores:28, assigned nodeId 
BJHC-HERA-164102.hadoop.jd.local:50086
2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 2516000
2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 2516000
2015-07-29 01:58:27,112 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Node not 
found resyncing BJHC-HERA-18043.hadoop.jd.local:50086
2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 2517000
2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 2517000
2015-07-29 01:58:27,113 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 2518000
2015-07-29 01:58:27,113 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 2518000
2015-07-29 01:58:27,113 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 2519000
{noformat}

 Am in ResourceLocalizationService hang 10 min cause RM kill  AM
 ---

 Key: YARN-3979
 URL: https://issues.apache.org/jira/browse/YARN-3979
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.5  Hadoop-2.2.0
Reporter: zhangyubiao
 Attachments: ERROR103.log


 2015-07-27 02:46:17,348 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1437735375558
 _104282_01_01
 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
 2015-07-27 02:56:18,510 INFO 
 SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
  Authorization successful for appattempt_1437735375558_104282_0
 1 (auth:TOKEN) for protocol=interface 
 org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST


[ 
https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682284#comment-14682284
 ] 

Anubhav Dhoot commented on YARN-4046:
-

The error in NodeManager shows 
{noformat}
2015-08-10 15:14:05,567 ERROR 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch:
 Unable to recover container container_e45_1439244348718_0001_01_01
java.io.IOException: Timeout while waiting for exit code from 
container_e45_1439244348718_0001_01_01
at 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:199)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:83)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}

Looking under the debugger the actual shell command to check if container is 
alive fails because the kill command syntax  kill -0 -20773 fails.
{noformat}
his = {org.apache.hadoop.util.Shell$ShellCommandExecutor@6740} kill -0 -20773 
builder = {java.lang.ProcessBuilder@6789} 
 command = {java.util.ArrayList@6813}  size = 3
 directory = null
 environment = null
 redirectErrorStream = false
 redirects = null
timeOutTimer = null
timeoutTimerTask = null
errReader = {java.io.BufferedReader@6830} 
inReader = {java.io.BufferedReader@6833} 
errMsg = {java.lang.StringBuffer@6836} kill: invalid option -- '2'\n\nUsage:\n 
kill [options] pid [...]\n\nOptions:\n pid [...]send signal to 
every pid listed\n -signal, -s, --signal signal\n
specify the signal to be sent\n -l, --list=[signal]  list all signal names, 
or convert one to a name\n -L, --tablelist all signal names in a 
nice table\n\n -h, --help display this help and exit\n -V, --version  
output version information and exit\n\nFor more details see kill(1).\n
errThread = {org.apache.hadoop.util.Shell$1@6839} Thread[Thread-102,5,]
line = null
exitCode = 1
completed = {java.util.concurrent.atomic.AtomicBoolean@6806} true
{noformat}

This causes DefaultContainerExecutor#containerIsAlive to catch 
ExitCodeException thrown by ShellCommandExecutor.execute making it assume the 
container is lost.

 Applications fail on NM restart on some linux distro because NM container 
 recovery declares AM container as LOST
 

 Key: YARN-4046
 URL: https://issues.apache.org/jira/browse/YARN-4046
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical

 On a debian machine we have seen node manager recovery of containers fail 
 because the signal syntax for process group may not work. We see errors in 
 checking if process is alive during container recovery which causes the 
 container to be declared as LOST (154) on a NodeManager restart.
 The application will fail with error
 {noformat}
 Application application_1439244348718_0001 failed 1 times due to Attempt 
 recovered after RM restartAM Container for 
 appattempt_1439244348718_0001_01 exited with exitCode: 154
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST


[ 
https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682293#comment-14682293
 ] 

Anubhav Dhoot commented on YARN-4046:
-

As per GNU linux 
[documentation|http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html#kill-invocation]
 -- may not be needed, but looks like all distros (Debian) do not support  
not having --.
{noformat} If a negative pid argument is desired as the first one, it should be 
preceded by --. However, as a common extension to POSIX, -- is not required 
with ‘kill -signal -pid’. {noformat}
So a fix is to prefix -- always to match the recommendation.

 Applications fail on NM restart on some linux distro because NM container 
 recovery declares AM container as LOST
 

 Key: YARN-4046
 URL: https://issues.apache.org/jira/browse/YARN-4046
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical

 On a debian machine we have seen node manager recovery of containers fail 
 because the signal syntax for process group may not work. We see errors in 
 checking if process is alive during container recovery which causes the 
 container to be declared as LOST (154) on a NodeManager restart.
 The application will fail with error
 {noformat}
 Application application_1439244348718_0001 failed 1 times due to Attempt 
 recovered after RM restartAM Container for 
 appattempt_1439244348718_0001_01 exited with exitCode: 154
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4023) Publish Application Priority to TimelineServer


[ 
https://issues.apache.org/jira/browse/YARN-4023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682360#comment-14682360
 ] 

Hadoop QA commented on YARN-4023:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  24m 12s | Pre-patch trunk has 7 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   7m 40s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 34s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   2m 57s | Site still builds. |
| {color:red}-1{color} | checkstyle |   2m 38s | The applied patch generated  1 
new checkstyle issues (total was 16, now 16). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 23s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 31s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   7m 22s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   6m 56s | Tests passed in 
hadoop-yarn-client. |
| {color:red}-1{color} | yarn tests |   1m 53s | Tests failed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   3m 13s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |  53m 22s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 123m 59s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.util.TestRackResolver |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
|   | hadoop.yarn.server.resourcemanager.TestRMAdminService |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12749303/0001-YARN-4023.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle site |
| git revision | trunk / 1fc3c77 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8823/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8823/console |


This message was automatically generated.

 Publish Application Priority to TimelineServer
 --

 Key: YARN-4023
 URL: https://issues.apache.org/jira/browse/YARN-4023
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-4023.patch, 0001-YARN-4023.patch, 
 ApplicationPage.png, TimelineserverMainpage.png


 Publish Application priority details to Timeline Server. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4046) NM container recovery is broken on some linux distro because of syntax of signal

Anubhav Dhoot created YARN-4046:
---

 Summary: NM container recovery is broken on some linux distro 
because of syntax of signal
 Key: YARN-4046
 URL: https://issues.apache.org/jira/browse/YARN-4046
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical


On a debian machine we have seen node manager recovery of containers fail 
because the signal syntax for process group may not work. We see errors in 
checking if process is alive during container recovery which causes the 
container to be declared as LOST (154) on a NodeManager restart.

The application will fail with error
{noformat}
Application application_1439244348718_0001 failed 1 times due to Attempt 
recovered after RM restartAM Container for appattempt_1439244348718_0001_01 
exited with exitCode: 154
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST


 [ 
https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-4046:

Summary: Applications fail on NM restart on some linux distro because NM 
container recovery declares AM container as LOST  (was: NM container recovery 
is broken on some linux distro because of syntax of signal)

 Applications fail on NM restart on some linux distro because NM container 
 recovery declares AM container as LOST
 

 Key: YARN-4046
 URL: https://issues.apache.org/jira/browse/YARN-4046
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical

 On a debian machine we have seen node manager recovery of containers fail 
 because the signal syntax for process group may not work. We see errors in 
 checking if process is alive during container recovery which causes the 
 container to be declared as LOST (154) on a NodeManager restart.
 The application will fail with error
 {noformat}
 Application application_1439244348718_0001 failed 1 times due to Attempt 
 recovered after RM restartAM Container for 
 appattempt_1439244348718_0001_01 exited with exitCode: 154
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-4047) ClientRMService getApplications has high scheduler lock contention

2015-08-11 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned YARN-4047:


Assignee: Jason Lowe

In OOZIE-1729 Oozie started calling getApplications to look for applications 
with specific tags.  This significantly increases the utilization of this 
method on a cluster that makes heavy use of Oozie.

One quick fix for the Oozie use-case may be to swap the filter order.  Rather 
than doing the expensive checkAccess call first, we can do all the other 
filtering first and finally verify the user has access before adding the app to 
the response.  In the Oozie scenario most apps will be filtered by the tag 
check before we ever get to the checkAccess call.

 ClientRMService getApplications has high scheduler lock contention
 --

 Key: YARN-4047
 URL: https://issues.apache.org/jira/browse/YARN-4047
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jason Lowe
Assignee: Jason Lowe

 The getApplications call can be particuarly expensive because the code can 
 call checkAccess on every application being tracked by the RM.  checkAccess 
 will often call scheduler.checkAccess which will grab the big scheduler lock. 
  This can cause a lot of contention with the scheduler thread which is busy 
 trying to process node heartbeats, app allocation requests, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4047) ClientRMService getApplications has high scheduler lock contention

2015-08-11 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-4047:
--
Labels: 2.6.1-candidate  (was: )

 ClientRMService getApplications has high scheduler lock contention
 --

 Key: YARN-4047
 URL: https://issues.apache.org/jira/browse/YARN-4047
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: 2.6.1-candidate
 Attachments: YARN-4047.001.patch


 The getApplications call can be particuarly expensive because the code can 
 call checkAccess on every application being tracked by the RM.  checkAccess 
 will often call scheduler.checkAccess which will grab the big scheduler lock. 
  This can cause a lot of contention with the scheduler thread which is busy 
 trying to process node heartbeats, app allocation requests, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4047) ClientRMService getApplications has high scheduler lock contention

2015-08-11 Thread Jason Lowe (JIRA)

Jason Lowe created YARN-4047:


 Summary: ClientRMService getApplications has high scheduler lock 
contention
 Key: YARN-4047
 URL: https://issues.apache.org/jira/browse/YARN-4047
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jason Lowe


The getApplications call can be particuarly expensive because the code can call 
checkAccess on every application being tracked by the RM.  checkAccess will 
often call scheduler.checkAccess which will grab the big scheduler lock.  This 
can cause a lot of contention with the scheduler thread which is busy trying to 
process node heartbeats, app allocation requests, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4047) ClientRMService getApplications has high scheduler lock contention

2015-08-11 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-4047:
-
Attachment: YARN-4047.001.patch

Patch that performs the checkAccess filter last rather than first.

 ClientRMService getApplications has high scheduler lock contention
 --

 Key: YARN-4047
 URL: https://issues.apache.org/jira/browse/YARN-4047
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-4047.001.patch


 The getApplications call can be particuarly expensive because the code can 
 call checkAccess on every application being tracked by the RM.  checkAccess 
 will often call scheduler.checkAccess which will grab the big scheduler lock. 
  This can cause a lot of contention with the scheduler thread which is busy 
 trying to process node heartbeats, app allocation requests, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2369) Environment variable handling assumes values should be appended

2015-08-11 Thread Dustin Cote (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682379#comment-14682379
 ] 

Dustin Cote commented on YARN-2369:
---

[~jlowe] thanks for all the input.  I'll clean this latest patch up based on 
these comments this week.

Happy to throw this in the MAPREDUCE project instead as well, since basically 
all the changes are in the MR client.  I don't think sub JIRAs would be 
necessary since it's a pretty small change on the YARN side, but I leave that 
to the project management experts.  I don't see any organizational problem 
keeping it all in one JIRA here.  

 Environment variable handling assumes values should be appended
 ---

 Key: YARN-2369
 URL: https://issues.apache.org/jira/browse/YARN-2369
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Jason Lowe
Assignee: Dustin Cote
 Attachments: YARN-2369-1.patch, YARN-2369-2.patch, YARN-2369-3.patch, 
 YARN-2369-4.patch, YARN-2369-5.patch, YARN-2369-6.patch


 When processing environment variables for a container context the code 
 assumes that the value should be appended to any pre-existing value in the 
 environment.  This may be desired behavior for handling path-like environment 
 variables such as PATH, LD_LIBRARY_PATH, CLASSPATH, etc. but it is a 
 non-intuitive and harmful way to handle any variable that does not have 
 path-like semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3999) RM hangs on draing events


 [ 
https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3999:
--
Attachment: (was: YARN-3999.5.patch)

 RM hangs on draing events
 -

 Key: YARN-3999
 URL: https://issues.apache.org/jira/browse/YARN-3999
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, 
 YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.patch, YARN-3999.patch


 If external systems like ATS, or ZK becomes very slow, draining all the 
 events take a lot of time. If this time becomes larger than 10 mins, all 
 applications will expire. Fixes include:
 1. add a timeout and stop the dispatcher even if not all events are drained.
 2. Move ATS service out from RM active service so that RM doesn't need to 
 wait for ATS to flush the events when transitioning to standby.
 3. Stop client-facing services (ClientRMService etc.) first so that clients 
 get fast notification that RM is stopping/transitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3999) RM hangs on draing events


 [ 
https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3999:
--
Attachment: YARN-3999.5.patch

 RM hangs on draing events
 -

 Key: YARN-3999
 URL: https://issues.apache.org/jira/browse/YARN-3999
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, 
 YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.5.patch, YARN-3999.patch, 
 YARN-3999.patch


 If external systems like ATS, or ZK becomes very slow, draining all the 
 events take a lot of time. If this time becomes larger than 10 mins, all 
 applications will expire. Fixes include:
 1. add a timeout and stop the dispatcher even if not all events are drained.
 2. Move ATS service out from RM active service so that RM doesn't need to 
 wait for ATS to flush the events when transitioning to standby.
 3. Stop client-facing services (ClientRMService etc.) first so that clients 
 get fast notification that RM is stopping/transitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4026) FiCaSchedulerApp: ContainerAllocator should be able to choose how to order pending resource requests

2015-08-11 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4026:
-
Attachment: YARN-4026.2.patch

Thanks for comments [~jianhe], attached ver.2 patch.

 FiCaSchedulerApp: ContainerAllocator should be able to choose how to order 
 pending resource requests
 

 Key: YARN-4026
 URL: https://issues.apache.org/jira/browse/YARN-4026
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-4026.1.patch, YARN-4026.2.patch


 After YARN-3983, we have an extensible ContainerAllocator which can be used 
 by FiCaSchedulerApp to decide how to allocate resources.
 While working on YARN-1651 (allocate resource to increase container), I found 
 one thing in existing logic not flexible enough:
 - ContainerAllocator decides what to allocate for a given node and priority: 
 To support different kinds of resource allocation, for example, priority as 
 weight / skip priority or not, etc. It's better to let ContainerAllocator to 
 choose how to order pending resource requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2599) Standby RM should also expose some jmx and metrics


 [ 
https://issues.apache.org/jira/browse/YARN-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2599:
--
Target Version/s: 2.7.2, 2.6.2  (was: 2.6.1, 2.7.2)

 Standby RM should also expose some jmx and metrics
 --

 Key: YARN-2599
 URL: https://issues.apache.org/jira/browse/YARN-2599
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Rohith Sharma K S

 YARN-1898 redirects jmx and metrics to the Active. As discussed there, we 
 need to separate out metrics displayed so the Standby RM can also be 
 monitored. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2506) TimelineClient should NOT be in yarn-common project


 [ 
https://issues.apache.org/jira/browse/YARN-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2506:
--
Target Version/s: 2.7.2, 2.6.2  (was: 2.6.1, 2.7.2)

 TimelineClient should NOT be in yarn-common project
 ---

 Key: YARN-2506
 URL: https://issues.apache.org/jira/browse/YARN-2506
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
Priority: Critical

 YARN-2298 incorrectly moved TimelineClient to yarn-common project. It doesn't 
 belong there, we should move it back to yarn-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2037) Add restart support for Unmanaged AMs


 [ 
https://issues.apache.org/jira/browse/YARN-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2037:
--
Target Version/s: 2.7.2, 2.6.2  (was: 2.6.1, 2.7.2)

 Add restart support for Unmanaged AMs
 -

 Key: YARN-2037
 URL: https://issues.apache.org/jira/browse/YARN-2037
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla

 It would be nice to allow Unmanaged AMs also to restart in a work-preserving 
 way. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST


 [ 
https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-4046:

Attachment: YARN-4046.002.patch

Fixed whitespace 

 Applications fail on NM restart on some linux distro because NM container 
 recovery declares AM container as LOST
 

 Key: YARN-4046
 URL: https://issues.apache.org/jira/browse/YARN-4046
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Attachments: YARN-4046.002.patch, YARN-4046.002.patch, 
 YARN-4096.001.patch


 On a debian machine we have seen node manager recovery of containers fail 
 because the signal syntax for process group may not work. We see errors in 
 checking if process is alive during container recovery which causes the 
 container to be declared as LOST (154) on a NodeManager restart.
 The application will fail with error. The attempts are not retried.
 {noformat}
 Application application_1439244348718_0001 failed 1 times due to Attempt 
 recovered after RM restartAM Container for 
 appattempt_1439244348718_0001_01 exited with exitCode: 154
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3999) RM hangs on draing events


 [ 
https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3999:
--
Attachment: YARN-3999-branch-2.7.patch

upload branch-2.7 patch

 RM hangs on draing events
 -

 Key: YARN-3999
 URL: https://issues.apache.org/jira/browse/YARN-3999
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3999-branch-2.7.patch, YARN-3999.1.patch, 
 YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, 
 YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch


 If external systems like ATS, or ZK becomes very slow, draining all the 
 events take a lot of time. If this time becomes larger than 10 mins, all 
 applications will expire. Fixes include:
 1. add a timeout and stop the dispatcher even if not all events are drained.
 2. Move ATS service out from RM active service so that RM doesn't need to 
 wait for ATS to flush the events when transitioning to standby.
 3. Stop client-facing services (ClientRMService etc.) first so that clients 
 get fast notification that RM is stopping/transitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3978) Configurably turn off the saving of container info in Generic AHS

2015-08-11 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-3978:
--
Labels: 2.6.1-candidate  (was: )

 Configurably turn off the saving of container info in Generic AHS
 -

 Key: YARN-3978
 URL: https://issues.apache.org/jira/browse/YARN-3978
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver, yarn
Affects Versions: 2.8.0, 2.7.1
Reporter: Eric Payne
Assignee: Eric Payne
  Labels: 2.6.1-candidate
 Fix For: 3.0.0, 2.8.0, 2.7.2

 Attachments: YARN-3978.001.patch, YARN-3978.002.patch, 
 YARN-3978.003.patch, YARN-3978.004.patch


 Depending on how each application's metadata is stored, one week's worth of 
 data stored in the Generic Application History Server's database can grow to 
 be almost a terabyte of local disk space. In order to alleviate this, I 
 suggest that there is a need for a configuration option to turn off saving of 
 non-AM container metadata in the GAHS data store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster


[ 
https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692916#comment-14692916
 ] 

Sangjin Lee commented on YARN-2859:
---

[~zjshen], can this be done for 2.6.1, or are you OK with deferring it to 2.6.2?

 ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
 --

 Key: YARN-2859
 URL: https://issues.apache.org/jira/browse/YARN-2859
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Hitesh Shah
Assignee: Zhijie Shen
Priority: Critical
  Labels: 2.6.1-candidate

 In mini cluster, a random port should be used. 
 Also, the config is not updated to the host that the process got bound to.
 {code}
 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
 (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer 
 address: localhost:10200
 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
 (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer 
 web address: 0.0.0.0:8188
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4025) Deal with byte representations of Longs in writer code

2015-08-11 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692511#comment-14692511
 ] 

Vrushali C commented on YARN-4025:
--

Yes, +1 

 Deal with byte representations of Longs in writer code
 --

 Key: YARN-4025
 URL: https://issues.apache.org/jira/browse/YARN-4025
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Vrushali C
Assignee: Vrushali C
 Attachments: YARN-4025-YARN-2928.001.patch


 Timestamps are being stored as Longs in hbase by the HBaseTimelineWriterImpl 
 code. There seem to be some places in the code where there are conversions 
 between Long to byte[] to String for easier argument passing between function 
 calls. Then these values end up being converted back to byte[] while storing 
 in hbase. 
 It would be better to pass around byte[] or the Longs themselves  as 
 applicable. 
 This may result in some api changes (store function) as well in adding a few 
 more function calls like getColumnQualifier which accepts a pre-encoded byte 
 array. It will be in addition to the existing api which accepts a String and 
 the ColumnHelper to return a byte[] column name instead of a String one. 
 Filing jira to track these changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1848) Persist ClusterMetrics across RM HA transitions


 [ 
https://issues.apache.org/jira/browse/YARN-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-1848:
--
Target Version/s: 2.7.2, 2.6.2  (was: 2.6.1, 2.7.2)

 Persist ClusterMetrics across RM HA transitions
 ---

 Key: YARN-1848
 URL: https://issues.apache.org/jira/browse/YARN-1848
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla

 Post YARN-1705, ClusterMetrics are reset on transition to standby. This is 
 acceptable as the metrics show statistics since an RM has become active. 
 Users might want to see metrics since the cluster was ever started.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2014) Performance: AM scaleability is 10% slower in 2.4 compared to 0.23.9


 [ 
https://issues.apache.org/jira/browse/YARN-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2014:
--
Target Version/s: 2.7.2, 2.6.2  (was: 2.6.1, 2.7.2)

 Performance: AM scaleability is 10% slower in 2.4 compared to 0.23.9
 

 Key: YARN-2014
 URL: https://issues.apache.org/jira/browse/YARN-2014
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: patrick white
Assignee: Jason Lowe

 Performance comparison benchmarks from 2.x against 0.23 shows AM scalability 
 benchmark's runtime is approximately 10% slower in 2.4.0. The trend is 
 consistent across later releases in both lines, latest release numbers are:
 2.4.0.0 runtime 255.6 seconds (avg 5 passes)
 0.23.9.12 runtime 230.4 seconds (avg 5 passes)
 Diff: -9.9% 
 AM Scalability test is essentially a sleep job that measures time to launch 
 and complete a large number of mappers.
 The diff is consistent and has been reproduced in both a larger (350 node, 
 100,000 mappers) perf environment, as well as a small (10 node, 2,900 
 mappers) demo cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2055) Preemption: Jobs are failing due to AMs are getting launched and killed multiple times


 [ 
https://issues.apache.org/jira/browse/YARN-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2055:
--
Target Version/s: 2.7.2, 2.6.2  (was: 2.6.1, 2.7.2)

 Preemption: Jobs are failing due to AMs are getting launched and killed 
 multiple times
 --

 Key: YARN-2055
 URL: https://issues.apache.org/jira/browse/YARN-2055
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal

 If Queue A does not have enough capacity to run AM, then AM will borrow 
 capacity from queue B to run AM in that case AM will be killed if queue B 
 will reclaim its capacity and again AM will be launched and killed again, in 
 that case job will be failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2457) FairScheduler: Handle preemption to help starved parent queues


 [ 
https://issues.apache.org/jira/browse/YARN-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2457:
--
Target Version/s: 2.7.2, 2.6.2  (was: 2.6.1, 2.7.2)

 FairScheduler: Handle preemption to help starved parent queues
 --

 Key: YARN-2457
 URL: https://issues.apache.org/jira/browse/YARN-2457
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.5.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 YARN-2395/YARN-2394 add preemption timeout and threshold per queue, but don't 
 check for parent queue starvation. 
 We need to check that. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1856) cgroups based memory monitoring for containers


 [ 
https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-1856:
--
Target Version/s: 2.7.2, 2.6.2  (was: 2.6.1, 2.7.2)

 cgroups based memory monitoring for containers
 --

 Key: YARN-1856
 URL: https://issues.apache.org/jira/browse/YARN-1856
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Varun Vasudev





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3906) split the application table from the entity table

2015-08-11 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692476#comment-14692476
 ] 

Junping Du commented on YARN-3906:
--

Ok. Committing this patch now.

 split the application table from the entity table
 -

 Key: YARN-3906
 URL: https://issues.apache.org/jira/browse/YARN-3906
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3906-YARN-2928.001.patch, 
 YARN-3906-YARN-2928.002.patch, YARN-3906-YARN-2928.003.patch, 
 YARN-3906-YARN-2928.004.patch, YARN-3906-YARN-2928.005.patch, 
 YARN-3906-YARN-2928.006.patch, YARN-3906-YARN-2928.007.patch


 Per discussions on YARN-3815, we need to split the application entities from 
 the main entity table into its own table (application).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-313) Add Admin API for supporting node resource configuration in command line

2015-08-11 Thread Inigo Goiri (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692632#comment-14692632
 ] 

Inigo Goiri commented on YARN-313:
--

Not critical, I think it can be deferred.
I would appreciate ideas on why this change breaks the refreshNodes with a 
graceful period.

 Add Admin API for supporting node resource configuration in command line
 

 Key: YARN-313
 URL: https://issues.apache.org/jira/browse/YARN-313
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Attachments: YARN-313-sample.patch, YARN-313-v1.patch, 
 YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, 
 YARN-313-v6.patch, YARN-313-v7.patch


 We should provide some admin interface, e.g. yarn rmadmin -refreshResources 
 to support changes of node's resource specified in a config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2038) Revisit how AMs learn of containers from previous attempts


 [ 
https://issues.apache.org/jira/browse/YARN-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2038:
--
Target Version/s: 2.7.2, 2.6.2  (was: 2.6.1, 2.7.2)

 Revisit how AMs learn of containers from previous attempts
 --

 Key: YARN-2038
 URL: https://issues.apache.org/jira/browse/YARN-2038
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla

 Based on YARN-556, we need to update the way AMs learn about containers 
 allocation previous attempts. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-313) Add Admin API for supporting node resource configuration in command line


[ 
https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692843#comment-14692843
 ] 

Hadoop QA commented on YARN-313:


\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  19m 19s | Findbugs (version 3.0.0) 
appears to be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  3s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 56s | The applied patch generated  4 
new checkstyle issues (total was 229, now 232). |
| {color:green}+1{color} | whitespace |   0m  6s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m 39s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 27s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   6m 58s | Tests failed in 
hadoop-yarn-client. |
| {color:red}-1{color} | yarn tests |   2m  0s | Tests failed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |  53m 35s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 111m  5s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.client.cli.TestRMAdminCLI |
|   | hadoop.yarn.client.api.impl.TestYarnClient |
|   | hadoop.yarn.util.TestRackResolver |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12749993/YARN-313-v7.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3ae716f |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8828/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8828/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8828/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8828/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8828/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8828/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8828/console |


This message was automatically generated.

 Add Admin API for supporting node resource configuration in command line
 

 Key: YARN-313
 URL: https://issues.apache.org/jira/browse/YARN-313
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Attachments: YARN-313-sample.patch, YARN-313-v1.patch, 
 YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, 
 YARN-313-v6.patch, YARN-313-v7.patch


 We should provide some admin interface, e.g. yarn rmadmin -refreshResources 
 to support changes of node's resource specified in a config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI list command


 [ 
https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-1480:
--
Target Version/s: 2.7.2, 2.6.2  (was: 2.6.1, 2.7.2)

 RM web services getApps() accepts many more filters than ApplicationCLI 
 list command
 --

 Key: YARN-1480
 URL: https://issues.apache.org/jira/browse/YARN-1480
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Kenji Kikushima
 Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480-4.patch, 
 YARN-1480-5.patch, YARN-1480-6.patch, YARN-1480.patch


 Nowadays RM web services getApps() accepts many more filters than 
 ApplicationCLI list command, which only accepts state and type. IMHO, 
 ideally, different interfaces should provide consistent functionality. Is it 
 better to allow more filters in ApplicationCLI?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1681) When banned.users is not set in LCE's container-executor.cfg, submit job with user in DEFAULT_BANNED_USERS will receive unclear error message


 [ 
https://issues.apache.org/jira/browse/YARN-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-1681:
--
Target Version/s: 2.7.2, 2.6.2  (was: 2.6.1, 2.7.2)

 When banned.users is not set in LCE's container-executor.cfg, submit job 
 with user in DEFAULT_BANNED_USERS will receive unclear error message
 ---

 Key: YARN-1681
 URL: https://issues.apache.org/jira/browse/YARN-1681
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Zhichun Wu
Assignee: Zhichun Wu
  Labels: container, usability
 Attachments: YARN-1681.patch


 When using LCE in a secure setup, if banned.users is not set in 
 container-executor.cfg, submit job with user in DEFAULT_BANNED_USERS 
 (mapred, hdfs, bin, 0)  will receive unclear error message.
 for example, if we use hdfs to submit a mr job, we may see the following the 
 yarn app overview page:
 {code}
 appattempt_1391353981633_0003_02 exited with exitCode: -1000 due to: 
 Application application_1391353981633_0003 initialization failed 
 (exitCode=139) with output: 
 {code}
 while the prefer error message may look like:
 {code}
 appattempt_1391353981633_0003_02 exited with exitCode: -1000 due to: 
 Application application_1391353981633_0003 initialization failed 
 (exitCode=139) with output: Requested user hdfs is banned 
 {code}
 just a minor bug and I would like to start contributing to hadoop-common with 
 it:)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1767) Windows: Allow a way for users to augment classpath of YARN daemons


 [ 
https://issues.apache.org/jira/browse/YARN-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-1767:
--
Target Version/s: 2.7.2, 2.6.2  (was: 2.6.1, 2.7.2)

 Windows: Allow a way for users to augment classpath of YARN daemons
 ---

 Key: YARN-1767
 URL: https://issues.apache.org/jira/browse/YARN-1767
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Karthik Kambatla

 YARN-1429 adds a way to augment the classpath for *nix-based systems. Need 
 something similar for Windows. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST


[ 
https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692518#comment-14692518
 ] 

Hadoop QA commented on YARN-4046:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m  7s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 59s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 52s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m  9s | The applied patch generated  3 
new checkstyle issues (total was 97, now 99). |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 21s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 57s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  22m 42s | Tests failed in 
hadoop-common. |
| | |  63m  5s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.ha.TestZKFailoverController |
|   | hadoop.net.TestNetUtils |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12749949/YARN-4096.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 7c796fd |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8825/artifact/patchprocess/diffcheckstylehadoop-common.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8825/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8825/artifact/patchprocess/testrun_hadoop-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8825/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8825/console |


This message was automatically generated.

 Applications fail on NM restart on some linux distro because NM container 
 recovery declares AM container as LOST
 

 Key: YARN-4046
 URL: https://issues.apache.org/jira/browse/YARN-4046
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Attachments: YARN-4096.001.patch


 On a debian machine we have seen node manager recovery of containers fail 
 because the signal syntax for process group may not work. We see errors in 
 checking if process is alive during container recovery which causes the 
 container to be declared as LOST (154) on a NodeManager restart.
 The application will fail with error. The attempts are not retried.
 {noformat}
 Application application_1439244348718_0001 failed 1 times due to Attempt 
 recovered after RM restartAM Container for 
 appattempt_1439244348718_0001_01 exited with exitCode: 154
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4026) FiCaSchedulerApp: ContainerAllocator should be able to choose how to order pending resource requests

2015-08-11 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4026:
-
Attachment: YARN-4026.3.patch

Attached ver.3, added more comments and fixed findbugs warning.

 FiCaSchedulerApp: ContainerAllocator should be able to choose how to order 
 pending resource requests
 

 Key: YARN-4026
 URL: https://issues.apache.org/jira/browse/YARN-4026
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-4026.1.patch, YARN-4026.2.patch, YARN-4026.3.patch


 After YARN-3983, we have an extensible ContainerAllocator which can be used 
 by FiCaSchedulerApp to decide how to allocate resources.
 While working on YARN-1651 (allocate resource to increase container), I found 
 one thing in existing logic not flexible enough:
 - ContainerAllocator decides what to allocate for a given node and priority: 
 To support different kinds of resource allocation, for example, priority as 
 weight / skip priority or not, etc. It's better to let ContainerAllocator to 
 choose how to order pending resource requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4025) Deal with byte representations of Longs in writer code


[ 
https://issues.apache.org/jira/browse/YARN-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692468#comment-14692468
 ] 

Sangjin Lee commented on YARN-4025:
---

For the record, we will go ahead with YARN-3906 first. We'll need to update 
this patch to reflect the changes in YARN-3906. I'll work with [~vrushalic] on 
that.

 Deal with byte representations of Longs in writer code
 --

 Key: YARN-4025
 URL: https://issues.apache.org/jira/browse/YARN-4025
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Vrushali C
Assignee: Vrushali C
 Attachments: YARN-4025-YARN-2928.001.patch


 Timestamps are being stored as Longs in hbase by the HBaseTimelineWriterImpl 
 code. There seem to be some places in the code where there are conversions 
 between Long to byte[] to String for easier argument passing between function 
 calls. Then these values end up being converted back to byte[] while storing 
 in hbase. 
 It would be better to pass around byte[] or the Longs themselves  as 
 applicable. 
 This may result in some api changes (store function) as well in adding a few 
 more function calls like getColumnQualifier which accepts a pre-encoded byte 
 array. It will be in addition to the existing api which accepts a String and 
 the ColumnHelper to return a byte[] column name instead of a String one. 
 Filing jira to track these changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3906) split the application table from the entity table


[ 
https://issues.apache.org/jira/browse/YARN-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692465#comment-14692465
 ] 

Sangjin Lee commented on YARN-3906:
---

I checked with [~vrushalic], and we decided to put the patch for this JIRA 
(YARN-3906) first.

 split the application table from the entity table
 -

 Key: YARN-3906
 URL: https://issues.apache.org/jira/browse/YARN-3906
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3906-YARN-2928.001.patch, 
 YARN-3906-YARN-2928.002.patch, YARN-3906-YARN-2928.003.patch, 
 YARN-3906-YARN-2928.004.patch, YARN-3906-YARN-2928.005.patch, 
 YARN-3906-YARN-2928.006.patch, YARN-3906-YARN-2928.007.patch


 Per discussions on YARN-3815, we need to split the application entities from 
 the main entity table into its own table (application).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2014) Performance: AM scaleability is 10% slower in 2.4 compared to 0.23.9


[ 
https://issues.apache.org/jira/browse/YARN-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692537#comment-14692537
 ] 

Sangjin Lee commented on YARN-2014:
---

Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me 
know.

 Performance: AM scaleability is 10% slower in 2.4 compared to 0.23.9
 

 Key: YARN-2014
 URL: https://issues.apache.org/jira/browse/YARN-2014
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: patrick white
Assignee: Jason Lowe

 Performance comparison benchmarks from 2.x against 0.23 shows AM scalability 
 benchmark's runtime is approximately 10% slower in 2.4.0. The trend is 
 consistent across later releases in both lines, latest release numbers are:
 2.4.0.0 runtime 255.6 seconds (avg 5 passes)
 0.23.9.12 runtime 230.4 seconds (avg 5 passes)
 Diff: -9.9% 
 AM Scalability test is essentially a sleep job that measures time to launch 
 and complete a large number of mappers.
 The diff is consistent and has been reproduced in both a larger (350 node, 
 100,000 mappers) perf environment, as well as a small (10 node, 2,900 
 mappers) demo cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1848) Persist ClusterMetrics across RM HA transitions


[ 
https://issues.apache.org/jira/browse/YARN-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692541#comment-14692541
 ] 

Sangjin Lee commented on YARN-1848:
---

Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me 
know.

 Persist ClusterMetrics across RM HA transitions
 ---

 Key: YARN-1848
 URL: https://issues.apache.org/jira/browse/YARN-1848
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla

 Post YARN-1705, ClusterMetrics are reset on transition to standby. This is 
 acceptable as the metrics show statistics since an RM has become active. 
 Users might want to see metrics since the cluster was ever started.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1856) cgroups based memory monitoring for containers


[ 
https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692538#comment-14692538
 ] 

Sangjin Lee commented on YARN-1856:
---

Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me 
know.

 cgroups based memory monitoring for containers
 --

 Key: YARN-1856
 URL: https://issues.apache.org/jira/browse/YARN-1856
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Varun Vasudev





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI list command


[ 
https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692545#comment-14692545
 ] 

Sangjin Lee commented on YARN-1480:
---

Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me 
know.

 RM web services getApps() accepts many more filters than ApplicationCLI 
 list command
 --

 Key: YARN-1480
 URL: https://issues.apache.org/jira/browse/YARN-1480
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Kenji Kikushima
 Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480-4.patch, 
 YARN-1480-5.patch, YARN-1480-6.patch, YARN-1480.patch


 Nowadays RM web services getApps() accepts many more filters than 
 ApplicationCLI list command, which only accepts state and type. IMHO, 
 ideally, different interfaces should provide consistent functionality. Is it 
 better to allow more filters in ApplicationCLI?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-313) Add Admin API for supporting node resource configuration in command line


[ 
https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692546#comment-14692546
 ] 

Sangjin Lee commented on YARN-313:
--

Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me 
know.

 Add Admin API for supporting node resource configuration in command line
 

 Key: YARN-313
 URL: https://issues.apache.org/jira/browse/YARN-313
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Attachments: YARN-313-sample.patch, YARN-313-v1.patch, 
 YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, 
 YARN-313-v6.patch


 We should provide some admin interface, e.g. yarn rmadmin -refreshResources 
 to support changes of node's resource specified in a config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1767) Windows: Allow a way for users to augment classpath of YARN daemons


[ 
https://issues.apache.org/jira/browse/YARN-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692543#comment-14692543
 ] 

Sangjin Lee commented on YARN-1767:
---

Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me 
know.

 Windows: Allow a way for users to augment classpath of YARN daemons
 ---

 Key: YARN-1767
 URL: https://issues.apache.org/jira/browse/YARN-1767
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Karthik Kambatla

 YARN-1429 adds a way to augment the classpath for *nix-based systems. Need 
 something similar for Windows. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI list command


 [ 
https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-1480:
--

Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it 
to 2.6.2. Let me know if you have comments. Thanks!

 RM web services getApps() accepts many more filters than ApplicationCLI 
 list command
 --

 Key: YARN-1480
 URL: https://issues.apache.org/jira/browse/YARN-1480
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Kenji Kikushima
 Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480-4.patch, 
 YARN-1480-5.patch, YARN-1480-6.patch, YARN-1480.patch


 Nowadays RM web services getApps() accepts many more filters than 
 ApplicationCLI list command, which only accepts state and type. IMHO, 
 ideally, different interfaces should provide consistent functionality. Is it 
 better to allow more filters in ApplicationCLI?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2506) TimelineClient should NOT be in yarn-common project


 [ 
https://issues.apache.org/jira/browse/YARN-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2506:
--

Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it 
to 2.6.2. Let me know if you have comments. Thanks!

 TimelineClient should NOT be in yarn-common project
 ---

 Key: YARN-2506
 URL: https://issues.apache.org/jira/browse/YARN-2506
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
Priority: Critical

 YARN-2298 incorrectly moved TimelineClient to yarn-common project. It doesn't 
 belong there, we should move it back to yarn-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2599) Standby RM should also expose some jmx and metrics