[jira] [Updated] (YARN-3478) FairScheduler page not performed because different enum of YarnApplicationState and RMAppState

2015-08-11 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3478:
--

Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it 
to 2.6.2. Let me know if you have comments. Thanks!

 FairScheduler page not performed because different enum of 
 YarnApplicationState and RMAppState 
 ---

 Key: YARN-3478
 URL: https://issues.apache.org/jira/browse/YARN-3478
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Xu Chen
 Attachments: YARN-3478.1.patch, YARN-3478.2.patch, YARN-3478.3.patch, 
 screenshot-1.png


 Got exception from log 
 java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
 at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
 at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
 at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
 at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
 at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:79)
 at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
 at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
 at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
 at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
 at 
 com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:96)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1225)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.apache.hadoop.http.lib.DynamicUserWebFilter$DynamicUserFilter.doFilter(DynamicUserWebFilter.java:59)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
 at 
 

[jira] [Updated] (YARN-2076) Minor error in TestLeafQueue files

2015-08-11 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2076:
--

Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it 
to 2.6.2. Let me know if you have comments. Thanks!

 Minor error in TestLeafQueue files
 --

 Key: YARN-2076
 URL: https://issues.apache.org/jira/browse/YARN-2076
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Chen He
Assignee: Chen He
Priority: Minor
  Labels: test
 Attachments: YARN-2076.patch


 numNodes should be 2 instead of 3 in testReservationExchange() since only 
 two nodes are defined.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1767) Windows: Allow a way for users to augment classpath of YARN daemons

2015-08-11 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-1767:
--

Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it 
to 2.6.2. Let me know if you have comments. Thanks!

 Windows: Allow a way for users to augment classpath of YARN daemons
 ---

 Key: YARN-1767
 URL: https://issues.apache.org/jira/browse/YARN-1767
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Karthik Kambatla

 YARN-1429 adds a way to augment the classpath for *nix-based systems. Need 
 something similar for Windows. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster

2015-08-11 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2859:
--

Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it 
to 2.6.2. Let me know if you have comments. Thanks!

 ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
 --

 Key: YARN-2859
 URL: https://issues.apache.org/jira/browse/YARN-2859
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Hitesh Shah
Assignee: Zhijie Shen
Priority: Critical
  Labels: 2.6.1-candidate

 In mini cluster, a random port should be used. 
 Also, the config is not updated to the host that the process got bound to.
 {code}
 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
 (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer 
 address: localhost:10200
 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
 (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer 
 web address: 0.0.0.0:8188
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2746) YARNDelegationTokenID misses serializing version from the common abstract ID

2015-08-11 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2746:
--

Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it 
to 2.6.2. Let me know if you have comments. Thanks!

 YARNDelegationTokenID misses serializing version from the common abstract ID
 

 Key: YARN-2746
 URL: https://issues.apache.org/jira/browse/YARN-2746
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He

 I found this during review of YARN-2743.
 bq. AbstractDTId had a version, we dropped that in the protobuf 
 serialization. We should just write it during the serialization and read it 
 back?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1681) When banned.users is not set in LCE's container-executor.cfg, submit job with user in DEFAULT_BANNED_USERS will receive unclear error message

2015-08-11 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-1681:
--

Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it 
to 2.6.2. Let me know if you have comments. Thanks!

 When banned.users is not set in LCE's container-executor.cfg, submit job 
 with user in DEFAULT_BANNED_USERS will receive unclear error message
 ---

 Key: YARN-1681
 URL: https://issues.apache.org/jira/browse/YARN-1681
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Zhichun Wu
Assignee: Zhichun Wu
  Labels: container, usability
 Attachments: YARN-1681.patch


 When using LCE in a secure setup, if banned.users is not set in 
 container-executor.cfg, submit job with user in DEFAULT_BANNED_USERS 
 (mapred, hdfs, bin, 0)  will receive unclear error message.
 for example, if we use hdfs to submit a mr job, we may see the following the 
 yarn app overview page:
 {code}
 appattempt_1391353981633_0003_02 exited with exitCode: -1000 due to: 
 Application application_1391353981633_0003 initialization failed 
 (exitCode=139) with output: 
 {code}
 while the prefer error message may look like:
 {code}
 appattempt_1391353981633_0003_02 exited with exitCode: -1000 due to: 
 Application application_1391353981633_0003 initialization failed 
 (exitCode=139) with output: Requested user hdfs is banned 
 {code}
 just a minor bug and I would like to start contributing to hadoop-common with 
 it:)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1856) cgroups based memory monitoring for containers

2015-08-11 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-1856:
--

Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it 
to 2.6.2. Let me know if you have comments. Thanks!

 cgroups based memory monitoring for containers
 --

 Key: YARN-1856
 URL: https://issues.apache.org/jira/browse/YARN-1856
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Varun Vasudev





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2657) MiniYARNCluster to (optionally) add MicroZookeeper service

2015-08-11 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2657:
--

Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it 
to 2.6.2. Let me know if you have comments. Thanks!

 MiniYARNCluster to (optionally) add MicroZookeeper service
 --

 Key: YARN-2657
 URL: https://issues.apache.org/jira/browse/YARN-2657
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: test
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-2567-001.patch, YARN-2657-002.patch


 This is needed for testing things like YARN-2646: add an option for the 
 {{MiniYarnCluster}} to start a {{MicroZookeeperService}}.
 This is just another YARN service to create and track the lifecycle. The 
 {{MicroZookeeperService}} publishes its binding information for direct takeup 
 by the registry services...this can address in-VM race conditions.
 The default setting for this service is off



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2014) Performance: AM scaleability is 10% slower in 2.4 compared to 0.23.9

2015-08-11 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2014:
--

Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it 
to 2.6.2. Let me know if you have comments. Thanks!

 Performance: AM scaleability is 10% slower in 2.4 compared to 0.23.9
 

 Key: YARN-2014
 URL: https://issues.apache.org/jira/browse/YARN-2014
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: patrick white
Assignee: Jason Lowe

 Performance comparison benchmarks from 2.x against 0.23 shows AM scalability 
 benchmark's runtime is approximately 10% slower in 2.4.0. The trend is 
 consistent across later releases in both lines, latest release numbers are:
 2.4.0.0 runtime 255.6 seconds (avg 5 passes)
 0.23.9.12 runtime 230.4 seconds (avg 5 passes)
 Diff: -9.9% 
 AM Scalability test is essentially a sleep job that measures time to launch 
 and complete a large number of mappers.
 The diff is consistent and has been reproduced in both a larger (350 node, 
 100,000 mappers) perf environment, as well as a small (10 node, 2,900 
 mappers) demo cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-313) Add Admin API for supporting node resource configuration in command line

2015-08-11 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-313:
-
Attachment: YARN-313-v7.patch

Updated to trunk. It looks like it still breaks the unit test for the graceful 
refresh but I cannot figure out why.

 Add Admin API for supporting node resource configuration in command line
 

 Key: YARN-313
 URL: https://issues.apache.org/jira/browse/YARN-313
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Attachments: YARN-313-sample.patch, YARN-313-v1.patch, 
 YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, 
 YARN-313-v6.patch, YARN-313-v7.patch


 We should provide some admin interface, e.g. yarn rmadmin -refreshResources 
 to support changes of node's resource specified in a config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2055) Preemption: Jobs are failing due to AMs are getting launched and killed multiple times

2015-08-11 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2055:
--

Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it 
to 2.6.2. Let me know if you have comments. Thanks!

 Preemption: Jobs are failing due to AMs are getting launched and killed 
 multiple times
 --

 Key: YARN-2055
 URL: https://issues.apache.org/jira/browse/YARN-2055
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal

 If Queue A does not have enough capacity to run AM, then AM will borrow 
 capacity from queue B to run AM in that case AM will be killed if queue B 
 will reclaim its capacity and again AM will be launched and killed again, in 
 that case job will be failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1848) Persist ClusterMetrics across RM HA transitions

2015-08-11 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-1848:
--

Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it 
to 2.6.2. Let me know if you have comments. Thanks!

 Persist ClusterMetrics across RM HA transitions
 ---

 Key: YARN-1848
 URL: https://issues.apache.org/jira/browse/YARN-1848
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla

 Post YARN-1705, ClusterMetrics are reset on transition to standby. This is 
 acceptable as the metrics show statistics since an RM has become active. 
 Users might want to see metrics since the cluster was ever started.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2037) Add restart support for Unmanaged AMs

2015-08-11 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2037:
--

Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it 
to 2.6.2. Let me know if you have comments. Thanks!

 Add restart support for Unmanaged AMs
 -

 Key: YARN-2037
 URL: https://issues.apache.org/jira/browse/YARN-2037
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla

 It would be nice to allow Unmanaged AMs also to restart in a work-preserving 
 way. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2457) FairScheduler: Handle preemption to help starved parent queues

2015-08-11 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2457:
--

Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it 
to 2.6.2. Let me know if you have comments. Thanks!

 FairScheduler: Handle preemption to help starved parent queues
 --

 Key: YARN-2457
 URL: https://issues.apache.org/jira/browse/YARN-2457
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.5.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 YARN-2395/YARN-2394 add preemption timeout and threshold per queue, but don't 
 check for parent queue starvation. 
 We need to check that. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-313) Add Admin API for supporting node resource configuration in command line

2015-08-11 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-313:
-

Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it 
to 2.6.2. Let me know if you have comments. Thanks!

 Add Admin API for supporting node resource configuration in command line
 

 Key: YARN-313
 URL: https://issues.apache.org/jira/browse/YARN-313
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Attachments: YARN-313-sample.patch, YARN-313-v1.patch, 
 YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, 
 YARN-313-v6.patch, YARN-313-v7.patch


 We should provide some admin interface, e.g. yarn rmadmin -refreshResources 
 to support changes of node's resource specified in a config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2038) Revisit how AMs learn of containers from previous attempts

2015-08-11 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2038:
--

Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it 
to 2.6.2. Let me know if you have comments. Thanks!

 Revisit how AMs learn of containers from previous attempts
 --

 Key: YARN-2038
 URL: https://issues.apache.org/jira/browse/YARN-2038
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla

 Based on YARN-556, we need to update the way AMs learn about containers 
 allocation previous attempts. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2076) Minor error in TestLeafQueue files

2015-08-11 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692605#comment-14692605
 ] 

Chen He commented on YARN-2076:
---

I will update patch.

 Minor error in TestLeafQueue files
 --

 Key: YARN-2076
 URL: https://issues.apache.org/jira/browse/YARN-2076
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Chen He
Assignee: Chen He
Priority: Minor
  Labels: test
 Attachments: YARN-2076.patch


 numNodes should be 2 instead of 3 in testReservationExchange() since only 
 two nodes are defined.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3999) RM hangs on draing events

2015-08-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692901#comment-14692901
 ] 

Hudson commented on YARN-3999:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8286 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8286/])
YARN-3999. RM hangs on draing events. Contributed by Jian He (xgong: rev 
3ae716fa696b87e849dae40225dc59fb5ed114cb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/logaggregationstatus/TestRMAppLogAggregationStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


 RM hangs on draing events
 -

 Key: YARN-3999
 URL: https://issues.apache.org/jira/browse/YARN-3999
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.7.2

 Attachments: YARN-3999-branch-2.7.patch, YARN-3999.1.patch, 
 YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, 
 YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch


 If external systems like ATS, or ZK becomes very slow, draining all the 
 events take a lot of time. If this time becomes larger than 10 mins, all 
 applications will expire. Fixes include:
 1. add a timeout and stop the dispatcher even if not all events are drained.
 2. Move ATS service out from RM active service so that RM doesn't need to 
 wait for ATS to flush the events when transitioning to standby.
 3. Stop client-facing services (ClientRMService etc.) first so that clients 
 get fast notification that RM is stopping/transitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2038) Revisit how AMs learn of containers from previous attempts

2015-08-11 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692533#comment-14692533
 ] 

Sangjin Lee commented on YARN-2038:
---

Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me 
know.

 Revisit how AMs learn of containers from previous attempts
 --

 Key: YARN-2038
 URL: https://issues.apache.org/jira/browse/YARN-2038
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla

 Based on YARN-556, we need to update the way AMs learn about containers 
 allocation previous attempts. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2506) TimelineClient should NOT be in yarn-common project

2015-08-11 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692527#comment-14692527
 ] 

Sangjin Lee commented on YARN-2506:
---

Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me 
know.

 TimelineClient should NOT be in yarn-common project
 ---

 Key: YARN-2506
 URL: https://issues.apache.org/jira/browse/YARN-2506
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
Priority: Critical

 YARN-2298 incorrectly moved TimelineClient to yarn-common project. It doesn't 
 belong there, we should move it back to yarn-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2076) Minor error in TestLeafQueue files

2015-08-11 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692529#comment-14692529
 ] 

Sangjin Lee commented on YARN-2076:
---

Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me 
know.

 Minor error in TestLeafQueue files
 --

 Key: YARN-2076
 URL: https://issues.apache.org/jira/browse/YARN-2076
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Chen He
Assignee: Chen He
Priority: Minor
  Labels: test
 Attachments: YARN-2076.patch


 numNodes should be 2 instead of 3 in testReservationExchange() since only 
 two nodes are defined.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2037) Add restart support for Unmanaged AMs

2015-08-11 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692534#comment-14692534
 ] 

Sangjin Lee commented on YARN-2037:
---

Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me 
know.

 Add restart support for Unmanaged AMs
 -

 Key: YARN-2037
 URL: https://issues.apache.org/jira/browse/YARN-2037
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla

 It would be nice to allow Unmanaged AMs also to restart in a work-preserving 
 way. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2055) Preemption: Jobs are failing due to AMs are getting launched and killed multiple times

2015-08-11 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692532#comment-14692532
 ] 

Sangjin Lee commented on YARN-2055:
---

Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me 
know.

 Preemption: Jobs are failing due to AMs are getting launched and killed 
 multiple times
 --

 Key: YARN-2055
 URL: https://issues.apache.org/jira/browse/YARN-2055
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal

 If Queue A does not have enough capacity to run AM, then AM will borrow 
 capacity from queue B to run AM in that case AM will be killed if queue B 
 will reclaim its capacity and again AM will be launched and killed again, in 
 that case job will be failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST

2015-08-11 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-4046:

Attachment: YARN-4046.002.patch

Fixed a new checkstyle that was added, the other two are preexisting and should 
not be fixed.

 Applications fail on NM restart on some linux distro because NM container 
 recovery declares AM container as LOST
 

 Key: YARN-4046
 URL: https://issues.apache.org/jira/browse/YARN-4046
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Attachments: YARN-4046.002.patch, YARN-4096.001.patch


 On a debian machine we have seen node manager recovery of containers fail 
 because the signal syntax for process group may not work. We see errors in 
 checking if process is alive during container recovery which causes the 
 container to be declared as LOST (154) on a NodeManager restart.
 The application will fail with error. The attempts are not retried.
 {noformat}
 Application application_1439244348718_0001 failed 1 times due to Attempt 
 recovered after RM restartAM Container for 
 appattempt_1439244348718_0001_01 exited with exitCode: 154
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-313) Add Admin API for supporting node resource configuration in command line

2015-08-11 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-313:
-
Target Version/s: 2.7.2, 2.6.2  (was: 2.6.1, 2.7.2)

 Add Admin API for supporting node resource configuration in command line
 

 Key: YARN-313
 URL: https://issues.apache.org/jira/browse/YARN-313
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Attachments: YARN-313-sample.patch, YARN-313-v1.patch, 
 YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, 
 YARN-313-v6.patch, YARN-313-v7.patch


 We should provide some admin interface, e.g. yarn rmadmin -refreshResources 
 to support changes of node's resource specified in a config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4026) FiCaSchedulerApp: ContainerAllocator should be able to choose how to order pending resource requests

2015-08-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692827#comment-14692827
 ] 

Hadoop QA commented on YARN-4026:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 16s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 18s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 17s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 54s | The applied patch generated  3 
new checkstyle issues (total was 128, now 128). |
| {color:red}-1{color} | whitespace |   0m  5s | The patch has 30  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 34s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  53m 47s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  94m 41s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.resourcemanager.TestRMAdminService |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750004/YARN-4026.3.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 7c796fd |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8827/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8827/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8827/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8827/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8827/console |


This message was automatically generated.

 FiCaSchedulerApp: ContainerAllocator should be able to choose how to order 
 pending resource requests
 

 Key: YARN-4026
 URL: https://issues.apache.org/jira/browse/YARN-4026
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-4026.1.patch, YARN-4026.2.patch, YARN-4026.3.patch


 After YARN-3983, we have an extensible ContainerAllocator which can be used 
 by FiCaSchedulerApp to decide how to allocate resources.
 While working on YARN-1651 (allocate resource to increase container), I found 
 one thing in existing logic not flexible enough:
 - ContainerAllocator decides what to allocate for a given node and priority: 
 To support different kinds of resource allocation, for example, priority as 
 weight / skip priority or not, etc. It's better to let ContainerAllocator to 
 choose how to order pending resource requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2657) MiniYARNCluster to (optionally) add MicroZookeeper service

2015-08-11 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2657:
--
Target Version/s: 2.7.2, 2.6.2  (was: 2.6.1, 2.7.2)

 MiniYARNCluster to (optionally) add MicroZookeeper service
 --

 Key: YARN-2657
 URL: https://issues.apache.org/jira/browse/YARN-2657
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: test
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-2567-001.patch, YARN-2657-002.patch


 This is needed for testing things like YARN-2646: add an option for the 
 {{MiniYarnCluster}} to start a {{MicroZookeeperService}}.
 This is just another YARN service to create and track the lifecycle. The 
 {{MicroZookeeperService}} publishes its binding information for direct takeup 
 by the registry services...this can address in-VM race conditions.
 The default setting for this service is off



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2746) YARNDelegationTokenID misses serializing version from the common abstract ID

2015-08-11 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2746:
--
Target Version/s: 2.7.2, 2.6.2  (was: 2.6.1, 2.7.2)

 YARNDelegationTokenID misses serializing version from the common abstract ID
 

 Key: YARN-2746
 URL: https://issues.apache.org/jira/browse/YARN-2746
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He

 I found this during review of YARN-2743.
 bq. AbstractDTId had a version, we dropped that in the protobuf 
 serialization. We should just write it during the serialization and read it 
 back?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    1   2