date:20150717


[ 
https://issues.apache.org/jira/browse/YARN-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631400#comment-14631400
 ] 

Hudson commented on YARN-3930:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #248 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/248/])
YARN-3930. FileSystemNodeLabelsStore should make sure edit log file closed when 
exception is thrown. (Dian Fu via wangda) (wangda: rev 
fa2b63ed162410ba05eadf211a1da068351b293a)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/FileSystemNodeLabelsStore.java


 FileSystemNodeLabelsStore should make sure edit log file closed when 
 exception is thrown 
 -

 Key: YARN-3930
 URL: https://issues.apache.org/jira/browse/YARN-3930
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 2.8.0

 Attachments: YARN-3930.001.patch


 When I test the node label feature in my local environment, I encountered the 
 following exception:
 {code}
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2426)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2523)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2498)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:662)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:418)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2174)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2170)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2168)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.handleStoreEvent(CommonNodeLabelsManager.java:196)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager$ForwardingEventHandler.handle(CommonNodeLabelsManager.java:168)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager$ForwardingEventHandler.handle(CommonNodeLabelsManager.java:163)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:176)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 The reason is that HDFS throws an exception when calling 
 {{ensureAppendEditlogFile}} because of some reason which causes the edit log 
 output stream isn't closed. This caused that the next time we call 
 {{ensureAppendEditlogFile}}, lease recovery will failed because we are just 
 the lease holder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level


[ 
https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631401#comment-14631401
 ] 

Hudson commented on YARN-3885:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #248 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/248/])
YARN-3885. ProportionalCapacityPreemptionPolicy doesn't preempt if queue is 
more than 2 level. (Ajith S via wangda) (wangda: rev 
3540d5fe4b1da942ea80c9e7ca1126b1abb8a68a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java


 ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 
 level
 --

 Key: YARN-3885
 URL: https://issues.apache.org/jira/browse/YARN-3885
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.8.0
Reporter: Ajith S
Assignee: Ajith S
Priority: Blocker
 Fix For: 2.8.0

 Attachments: YARN-3885.02.patch, YARN-3885.03.patch, 
 YARN-3885.04.patch, YARN-3885.05.patch, YARN-3885.06.patch, 
 YARN-3885.07.patch, YARN-3885.08.patch, YARN-3885.patch


 when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}}
 this piece of code, to calculate {{untoucable}} doesnt consider al the 
 children, it considers only immediate childern



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS

2015-07-17 Thread Nam H. Do (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nam H. Do updated YARN-2681:

Attachment: YARN-2681.005.patch

fixed javadoc warnings

 Support bandwidth enforcement for containers while reading from HDFS
 

 Key: YARN-2681
 URL: https://issues.apache.org/jira/browse/YARN-2681
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.5.1
 Environment: Linux
Reporter: Nam H. Do
  Labels: BB2015-05-TBR
 Fix For: 2.7.0

 Attachments: Traffic Control Design.png, YARN-2681.001.patch, 
 YARN-2681.002.patch, YARN-2681.003.patch, YARN-2681.004.patch, 
 YARN-2681.005.patch, YARN-2681.patch


 To read/write data from HDFS on data node, applications establise TCP/IP 
 connections with the datanode. The HDFS read can be controled by setting 
 Linux Traffic Control  (TC) subsystem on the data node to make filters on 
 appropriate connections.
 The current cgroups net_cls concept can not be applied on the node where the 
 container is launched, netheir on data node since:
 -   TC hanldes outgoing bandwidth only, so it can be set on container node 
 (HDFS read = incoming data for the container)
 -   Since HDFS data node is handled by only one process,  it is not possible 
 to use net_cls to separate connections from different containers to the 
 datanode.
 Tasks:
 1) Extend Resource model to define bandwidth enforcement rate
 2) Monitor TCP/IP connection estabilised by container handling process and 
 its child processes
 3) Set Linux Traffic Control rules on data node base on address:port pairs in 
 order to enforce bandwidth of outgoing data
 Concept: http://www.hit.bme.hu/~do/papers/EnforcementDesign.pdf
 Implementation: 
 http://www.hit.bme.hu/~dohoai/documents/HdfsTrafficControl.pdf
 http://www.hit.bme.hu/~dohoai/documents/HdfsTrafficControl_UML_diagram.png



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631222#comment-14631222
 ] 

Hadoop QA commented on YARN-2003:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 53s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 10 new or modified test files. |
| {color:green}+1{color} | javac |   8m 25s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 21s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 57s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m 22s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m  8s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | tools/hadoop tests |   0m 53s | Tests passed in 
hadoop-sls. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |  52m 26s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 101m 32s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745796/0023-YARN-2003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / ee36f4f |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8571/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-sls test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8571/artifact/patchprocess/testrun_hadoop-sls.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8571/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8571/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8571/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8571/console |


This message was automatically generated.

 Support for Application priority : Changes in RM and Capacity Scheduler
 ---

 Key: YARN-2003
 URL: https://issues.apache.org/jira/browse/YARN-2003
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 
 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 
 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 
 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 
 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 
 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 
 0018-YARN-2003.patch, 0019-YARN-2003.patch, 0020-YARN-2003.patch, 
 0021-YARN-2003.patch, 0022-YARN-2003.patch, 0023-YARN-2003.patch


 AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from 
 Submission Context and store.
 Later this can be used by Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3535) Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED


[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631223#comment-14631223
 ] 

Hudson commented on YARN-3535:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8179 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8179/])
YARN-3535. Scheduler must re-request container resources when RMContainer 
transitions from ALLOCATED to KILLED (rohithsharma and peng.zhang via asuresh) 
(Arun Suresh: rev 9b272ccae78918e7d756d84920a9322187d61eed)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/ContainerRescheduledEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/SchedulerEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 Scheduler must re-request container resources when RMContainer transitions 
 from ALLOCATED to KILLED
 ---

 Key: YARN-3535
 URL: https://issues.apache.org/jira/browse/YARN-3535
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, fairscheduler, resourcemanager
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Peng Zhang
Priority: Critical
 Fix For: 2.8.0

 Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
 0005-YARN-3535.patch, 0006-YARN-3535.patch, YARN-3535-001.patch, 
 YARN-3535-002.patch, syslog.tgz, yarn-app.log


 During rolling update of NM, AM start of container on NM failed. 
 And then job hang there.
 Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level


[ 
https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631230#comment-14631230
 ] 

Hudson commented on YARN-3885:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #259 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/259/])
YARN-3885. ProportionalCapacityPreemptionPolicy doesn't preempt if queue is 
more than 2 level. (Ajith S via wangda) (wangda: rev 
3540d5fe4b1da942ea80c9e7ca1126b1abb8a68a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* hadoop-yarn-project/CHANGES.txt


 ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 
 level
 --

 Key: YARN-3885
 URL: https://issues.apache.org/jira/browse/YARN-3885
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.8.0
Reporter: Ajith S
Assignee: Ajith S
Priority: Blocker
 Fix For: 2.8.0

 Attachments: YARN-3885.02.patch, YARN-3885.03.patch, 
 YARN-3885.04.patch, YARN-3885.05.patch, YARN-3885.06.patch, 
 YARN-3885.07.patch, YARN-3885.08.patch, YARN-3885.patch


 when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}}
 this piece of code, to calculate {{untoucable}} doesnt consider al the 
 children, it considers only immediate childern



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3930) FileSystemNodeLabelsStore should make sure edit log file closed when exception is thrown


[ 
https://issues.apache.org/jira/browse/YARN-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631229#comment-14631229
 ] 

Hudson commented on YARN-3930:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #259 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/259/])
YARN-3930. FileSystemNodeLabelsStore should make sure edit log file closed when 
exception is thrown. (Dian Fu via wangda) (wangda: rev 
fa2b63ed162410ba05eadf211a1da068351b293a)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/FileSystemNodeLabelsStore.java


 FileSystemNodeLabelsStore should make sure edit log file closed when 
 exception is thrown 
 -

 Key: YARN-3930
 URL: https://issues.apache.org/jira/browse/YARN-3930
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 2.8.0

 Attachments: YARN-3930.001.patch


 When I test the node label feature in my local environment, I encountered the 
 following exception:
 {code}
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2426)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2523)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2498)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:662)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:418)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2174)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2170)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2168)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.handleStoreEvent(CommonNodeLabelsManager.java:196)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager$ForwardingEventHandler.handle(CommonNodeLabelsManager.java:168)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager$ForwardingEventHandler.handle(CommonNodeLabelsManager.java:163)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:176)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 The reason is that HDFS throws an exception when calling 
 {{ensureAppendEditlogFile}} because of some reason which causes the edit log 
 output stream isn't closed. This caused that the next time we call 
 {{ensureAppendEditlogFile}}, lease recovery will failed because we are just 
 the lease holder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3930) FileSystemNodeLabelsStore should make sure edit log file closed when exception is thrown


[ 
https://issues.apache.org/jira/browse/YARN-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631239#comment-14631239
 ] 

Hudson commented on YARN-3930:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #989 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/989/])
YARN-3930. FileSystemNodeLabelsStore should make sure edit log file closed when 
exception is thrown. (Dian Fu via wangda) (wangda: rev 
fa2b63ed162410ba05eadf211a1da068351b293a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/FileSystemNodeLabelsStore.java
* hadoop-yarn-project/CHANGES.txt


 FileSystemNodeLabelsStore should make sure edit log file closed when 
 exception is thrown 
 -

 Key: YARN-3930
 URL: https://issues.apache.org/jira/browse/YARN-3930
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 2.8.0

 Attachments: YARN-3930.001.patch


 When I test the node label feature in my local environment, I encountered the 
 following exception:
 {code}
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2426)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2523)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2498)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:662)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:418)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2174)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2170)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2168)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.handleStoreEvent(CommonNodeLabelsManager.java:196)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager$ForwardingEventHandler.handle(CommonNodeLabelsManager.java:168)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager$ForwardingEventHandler.handle(CommonNodeLabelsManager.java:163)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:176)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 The reason is that HDFS throws an exception when calling 
 {{ensureAppendEditlogFile}} because of some reason which causes the edit log 
 output stream isn't closed. This caused that the next time we call 
 {{ensureAppendEditlogFile}}, lease recovery will failed because we are just 
 the lease holder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level


[ 
https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631240#comment-14631240
 ] 

Hudson commented on YARN-3885:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #989 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/989/])
YARN-3885. ProportionalCapacityPreemptionPolicy doesn't preempt if queue is 
more than 2 level. (Ajith S via wangda) (wangda: rev 
3540d5fe4b1da942ea80c9e7ca1126b1abb8a68a)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java


 ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 
 level
 --

 Key: YARN-3885
 URL: https://issues.apache.org/jira/browse/YARN-3885
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.8.0
Reporter: Ajith S
Assignee: Ajith S
Priority: Blocker
 Fix For: 2.8.0

 Attachments: YARN-3885.02.patch, YARN-3885.03.patch, 
 YARN-3885.04.patch, YARN-3885.05.patch, YARN-3885.06.patch, 
 YARN-3885.07.patch, YARN-3885.08.patch, YARN-3885.patch


 when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}}
 this piece of code, to calculate {{untoucable}} doesnt consider al the 
 children, it considers only immediate childern



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level


[ 
https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631390#comment-14631390
 ] 

Hudson commented on YARN-3885:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2186 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2186/])
YARN-3885. ProportionalCapacityPreemptionPolicy doesn't preempt if queue is 
more than 2 level. (Ajith S via wangda) (wangda: rev 
3540d5fe4b1da942ea80c9e7ca1126b1abb8a68a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java
* hadoop-yarn-project/CHANGES.txt


 ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 
 level
 --

 Key: YARN-3885
 URL: https://issues.apache.org/jira/browse/YARN-3885
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.8.0
Reporter: Ajith S
Assignee: Ajith S
Priority: Blocker
 Fix For: 2.8.0

 Attachments: YARN-3885.02.patch, YARN-3885.03.patch, 
 YARN-3885.04.patch, YARN-3885.05.patch, YARN-3885.06.patch, 
 YARN-3885.07.patch, YARN-3885.08.patch, YARN-3885.patch


 when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}}
 this piece of code, to calculate {{untoucable}} doesnt consider al the 
 children, it considers only immediate childern



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3930) FileSystemNodeLabelsStore should make sure edit log file closed when exception is thrown


[ 
https://issues.apache.org/jira/browse/YARN-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631389#comment-14631389
 ] 

Hudson commented on YARN-3930:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2186 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2186/])
YARN-3930. FileSystemNodeLabelsStore should make sure edit log file closed when 
exception is thrown. (Dian Fu via wangda) (wangda: rev 
fa2b63ed162410ba05eadf211a1da068351b293a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/FileSystemNodeLabelsStore.java
* hadoop-yarn-project/CHANGES.txt


 FileSystemNodeLabelsStore should make sure edit log file closed when 
 exception is thrown 
 -

 Key: YARN-3930
 URL: https://issues.apache.org/jira/browse/YARN-3930
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 2.8.0

 Attachments: YARN-3930.001.patch


 When I test the node label feature in my local environment, I encountered the 
 following exception:
 {code}
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2426)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2523)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2498)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:662)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:418)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2174)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2170)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2168)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.handleStoreEvent(CommonNodeLabelsManager.java:196)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager$ForwardingEventHandler.handle(CommonNodeLabelsManager.java:168)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager$ForwardingEventHandler.handle(CommonNodeLabelsManager.java:163)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:176)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 The reason is that HDFS throws an exception when calling 
 {{ensureAppendEditlogFile}} because of some reason which causes the edit log 
 output stream isn't closed. This caused that the next time we call 
 {{ensureAppendEditlogFile}}, lease recovery will failed because we are just 
 the lease holder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3905) Application History Server UI NPEs when accessing apps run after RM restart


[ 
https://issues.apache.org/jira/browse/YARN-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631395#comment-14631395
 ] 

Hadoop QA commented on YARN-3905:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 16s | Pre-patch trunk has 6 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 31s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 23s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 35s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m  7s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-server-common. |
| | |  40m 44s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745819/YARN-3905.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 9b272cc |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8572/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8572/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8572/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8572/console |


This message was automatically generated.

 Application History Server UI NPEs when accessing apps run after RM restart
 ---

 Key: YARN-3905
 URL: https://issues.apache.org/jira/browse/YARN-3905
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.7.0, 2.8.0, 2.7.1
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: YARN-3905.001.patch, YARN-3905.002.patch


 From the Application History URL (http://RmHostName:8188/applicationhistory), 
 clicking on the application ID of an app that was run after the RM daemon has 
 been restarted results in a 500 error:
 {noformat}
 Sorry, got error 500
 Please consult RFC 2616 for meanings of the error code.
 {noformat}
 The stack trace is as follows:
 {code}
 2015-07-09 20:13:15,584 [2068024519@qtp-769046918-3] INFO 
 applicationhistoryservice.FileSystemApplicationHistoryStore: Completed 
 reading history information of all application attempts of application 
 application_1436472584878_0001
 2015-07-09 20:13:15,591 [2068024519@qtp-769046918-3] ERROR webapp.AppBlock: 
 Failed to read the AM container of the application attempt 
 appattempt_1436472584878_0001_01.
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToContainerReport(ApplicationHistoryManagerImpl.java:206)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainer(ApplicationHistoryManagerImpl.java:199)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainerReport(ApplicationHistoryClientService.java:205)
 at 
 org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:272)
 at 
 org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:267)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
 at 
 org.apache.hadoop.yarn.server.webapp.AppBlock.generateApplicationTable(AppBlock.java:266)
 ...
 {code}



--
This

[jira] [Updated] (YARN-3905) Application History Server UI NPEs when accessing apps run after RM restart

2015-07-17 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-3905:
-
Attachment: YARN-3905.002.patch

Fixing checkstyle bug. I forgot to remove the now-unused {{ContainerID}} import.

 Application History Server UI NPEs when accessing apps run after RM restart
 ---

 Key: YARN-3905
 URL: https://issues.apache.org/jira/browse/YARN-3905
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.7.0, 2.8.0, 2.7.1
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: YARN-3905.001.patch, YARN-3905.002.patch


 From the Application History URL (http://RmHostName:8188/applicationhistory), 
 clicking on the application ID of an app that was run after the RM daemon has 
 been restarted results in a 500 error:
 {noformat}
 Sorry, got error 500
 Please consult RFC 2616 for meanings of the error code.
 {noformat}
 The stack trace is as follows:
 {code}
 2015-07-09 20:13:15,584 [2068024519@qtp-769046918-3] INFO 
 applicationhistoryservice.FileSystemApplicationHistoryStore: Completed 
 reading history information of all application attempts of application 
 application_1436472584878_0001
 2015-07-09 20:13:15,591 [2068024519@qtp-769046918-3] ERROR webapp.AppBlock: 
 Failed to read the AM container of the application attempt 
 appattempt_1436472584878_0001_01.
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToContainerReport(ApplicationHistoryManagerImpl.java:206)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainer(ApplicationHistoryManagerImpl.java:199)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainerReport(ApplicationHistoryClientService.java:205)
 at 
 org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:272)
 at 
 org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:267)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
 at 
 org.apache.hadoop.yarn.server.webapp.AppBlock.generateApplicationTable(AppBlock.java:266)
 ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3174) Consolidate the NodeManager and NodeManagerRestart documentation into one


[ 
https://issues.apache.org/jira/browse/YARN-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631298#comment-14631298
 ] 

Hudson commented on YARN-3174:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #256 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/256/])
YARN-3174. Consolidate the NodeManager and NodeManagerRestart documentation 
into one. Contributed by Masatake Iwasaki. (ozawa: rev 
f02dd146f58bcfa0595eec7f2433bafdd857630f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRestart.md
* hadoop-project/src/site/site.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManager.md
* hadoop-yarn-project/CHANGES.txt


 Consolidate the NodeManager and NodeManagerRestart documentation into one
 -

 Key: YARN-3174
 URL: https://issues.apache.org/jira/browse/YARN-3174
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.7.1
Reporter: Allen Wittenauer
Assignee: Masatake Iwasaki
 Fix For: 2.8.0

 Attachments: YARN-3174.001.patch


 We really don't need a different document for every individual nodemanager 
 feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level


[ 
https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631304#comment-14631304
 ] 

Hudson commented on YARN-3885:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #256 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/256/])
YARN-3885. ProportionalCapacityPreemptionPolicy doesn't preempt if queue is 
more than 2 level. (Ajith S via wangda) (wangda: rev 
3540d5fe4b1da942ea80c9e7ca1126b1abb8a68a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java
* hadoop-yarn-project/CHANGES.txt


 ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 
 level
 --

 Key: YARN-3885
 URL: https://issues.apache.org/jira/browse/YARN-3885
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.8.0
Reporter: Ajith S
Assignee: Ajith S
Priority: Blocker
 Fix For: 2.8.0

 Attachments: YARN-3885.02.patch, YARN-3885.03.patch, 
 YARN-3885.04.patch, YARN-3885.05.patch, YARN-3885.06.patch, 
 YARN-3885.07.patch, YARN-3885.08.patch, YARN-3885.patch


 when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}}
 this piece of code, to calculate {{untoucable}} doesnt consider al the 
 children, it considers only immediate childern



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3805) Update the documentation of Disk Checker based on YARN-90


[ 
https://issues.apache.org/jira/browse/YARN-3805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631297#comment-14631297
 ] 

Hudson commented on YARN-3805:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #256 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/256/])
YARN-3805. Update the documentation of Disk Checker based on YARN-90. 
Contributed by Masatake Iwasaki. (ozawa: rev 
1ba2986dee4bbb64d67ada005f8f132e69575274)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManager.md
* hadoop-yarn-project/CHANGES.txt


 Update the documentation of Disk Checker based on YARN-90
 -

 Key: YARN-3805
 URL: https://issues.apache.org/jira/browse/YARN-3805
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3805.001.patch, YARN-3805.002.patch


 NodeManager is able to recover status of the disk once broken and fixed 
 without restarting by YARN-90.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3535) Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED


[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631300#comment-14631300
 ] 

Hudson commented on YARN-3535:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #256 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/256/])
YARN-3535. Scheduler must re-request container resources when RMContainer 
transitions from ALLOCATED to KILLED (rohithsharma and peng.zhang via asuresh) 
(Arun Suresh: rev 9b272ccae78918e7d756d84920a9322187d61eed)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/SchedulerEventType.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/ContainerRescheduledEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java


 Scheduler must re-request container resources when RMContainer transitions 
 from ALLOCATED to KILLED
 ---

 Key: YARN-3535
 URL: https://issues.apache.org/jira/browse/YARN-3535
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, fairscheduler, resourcemanager
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Peng Zhang
Priority: Critical
 Fix For: 2.8.0

 Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
 0005-YARN-3535.patch, 0006-YARN-3535.patch, YARN-3535-001.patch, 
 YARN-3535-002.patch, syslog.tgz, yarn-app.log


 During rolling update of NM, AM start of container on NM failed. 
 And then job hang there.
 Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good again


[ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631302#comment-14631302
 ] 

Hudson commented on YARN-90:


FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #256 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/256/])
YARN-3805. Update the documentation of Disk Checker based on YARN-90. 
Contributed by Masatake Iwasaki. (ozawa: rev 
1ba2986dee4bbb64d67ada005f8f132e69575274)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManager.md


 NodeManager should identify failed disks becoming good again
 

 Key: YARN-90
 URL: https://issues.apache.org/jira/browse/YARN-90
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Ravi Gummadi
Assignee: Varun Vasudev
 Fix For: 2.6.0

 Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
 YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, 
 apache-yarn-90.10.patch, apache-yarn-90.2.patch, apache-yarn-90.3.patch, 
 apache-yarn-90.4.patch, apache-yarn-90.5.patch, apache-yarn-90.6.patch, 
 apache-yarn-90.7.patch, apache-yarn-90.8.patch, apache-yarn-90.9.patch


 MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
 down, it is marked as failed forever. To reuse that disk (after it becomes 
 good), NodeManager needs restart. This JIRA is to improve NodeManager to 
 reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3930) FileSystemNodeLabelsStore should make sure edit log file closed when exception is thrown


[ 
https://issues.apache.org/jira/browse/YARN-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631301#comment-14631301
 ] 

Hudson commented on YARN-3930:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #256 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/256/])
YARN-3930. FileSystemNodeLabelsStore should make sure edit log file closed when 
exception is thrown. (Dian Fu via wangda) (wangda: rev 
fa2b63ed162410ba05eadf211a1da068351b293a)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/FileSystemNodeLabelsStore.java


 FileSystemNodeLabelsStore should make sure edit log file closed when 
 exception is thrown 
 -

 Key: YARN-3930
 URL: https://issues.apache.org/jira/browse/YARN-3930
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 2.8.0

 Attachments: YARN-3930.001.patch


 When I test the node label feature in my local environment, I encountered the 
 following exception:
 {code}
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2426)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2523)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2498)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:662)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:418)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2174)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2170)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2168)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.handleStoreEvent(CommonNodeLabelsManager.java:196)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager$ForwardingEventHandler.handle(CommonNodeLabelsManager.java:168)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager$ForwardingEventHandler.handle(CommonNodeLabelsManager.java:163)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:176)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 The reason is that HDFS throws an exception when calling 
 {{ensureAppendEditlogFile}} because of some reason which causes the edit log 
 output stream isn't closed. This caused that the next time we call 
 {{ensureAppendEditlogFile}}, lease recovery will failed because we are just 
 the lease holder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3938) AM Resources for leaf queues zero when DEFAULT PARTITION resource is zero with NodeLabel


[ 
https://issues.apache.org/jira/browse/YARN-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631317#comment-14631317
 ] 

Bibin A Chundatt commented on YARN-3938:


Hi [~leftnoteasy] .As i understand {{ 
labelManager.getResourceByLabel(RMNodeLabelsManager.NO_LABEL, 
clusterResource)}} will return {{0}} that is the reason its going wrong. Please 
correct me if i am wrong. Any thoughts?


 AM Resources for leaf queues zero when DEFAULT PARTITION resource is zero 
 with NodeLabel
 

 Key: YARN-3938
 URL: https://issues.apache.org/jira/browse/YARN-3938
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: Am limit for subqueue.jpg


 In case of leaf queue  the AM resource calculation is based on 
 {{absoluteCapacityResource}}. Below is the calculation for absolute capacity
 {{LeafQueue#updateAbsoluteCapacityResource()}}
 {code}
   private void updateAbsoluteCapacityResource(Resource clusterResource) {
 absoluteCapacityResource =
 Resources.multiplyAndNormalizeUp(resourceCalculator, labelManager
 .getResourceByLabel(RMNodeLabelsManager.NO_LABEL, 
 clusterResource),
 queueCapacities.getAbsoluteCapacity(), minimumAllocation);
   }
 {code}
 If default partition resource is zero for all Leaf queue the resource for AM 
 will be zero
 Snapshot also attached for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3930) FileSystemNodeLabelsStore should make sure edit log file closed when exception is thrown


[ 
https://issues.apache.org/jira/browse/YARN-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631455#comment-14631455
 ] 

Hudson commented on YARN-3930:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2205 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2205/])
YARN-3930. FileSystemNodeLabelsStore should make sure edit log file closed when 
exception is thrown. (Dian Fu via wangda) (wangda: rev 
fa2b63ed162410ba05eadf211a1da068351b293a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/FileSystemNodeLabelsStore.java
* hadoop-yarn-project/CHANGES.txt


 FileSystemNodeLabelsStore should make sure edit log file closed when 
 exception is thrown 
 -

 Key: YARN-3930
 URL: https://issues.apache.org/jira/browse/YARN-3930
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 2.8.0

 Attachments: YARN-3930.001.patch


 When I test the node label feature in my local environment, I encountered the 
 following exception:
 {code}
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2426)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2523)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2498)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:662)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:418)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2174)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2170)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2168)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.handleStoreEvent(CommonNodeLabelsManager.java:196)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager$ForwardingEventHandler.handle(CommonNodeLabelsManager.java:168)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager$ForwardingEventHandler.handle(CommonNodeLabelsManager.java:163)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:176)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 The reason is that HDFS throws an exception when calling 
 {{ensureAppendEditlogFile}} because of some reason which causes the edit log 
 output stream isn't closed. This caused that the next time we call 
 {{ensureAppendEditlogFile}}, lease recovery will failed because we are just 
 the lease holder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level


[ 
https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631456#comment-14631456
 ] 

Hudson commented on YARN-3885:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2205 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2205/])
YARN-3885. ProportionalCapacityPreemptionPolicy doesn't preempt if queue is 
more than 2 level. (Ajith S via wangda) (wangda: rev 
3540d5fe4b1da942ea80c9e7ca1126b1abb8a68a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java


 ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 
 level
 --

 Key: YARN-3885
 URL: https://issues.apache.org/jira/browse/YARN-3885
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.8.0
Reporter: Ajith S
Assignee: Ajith S
Priority: Blocker
 Fix For: 2.8.0

 Attachments: YARN-3885.02.patch, YARN-3885.03.patch, 
 YARN-3885.04.patch, YARN-3885.05.patch, YARN-3885.06.patch, 
 YARN-3885.07.patch, YARN-3885.08.patch, YARN-3885.patch


 when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}}
 this piece of code, to calculate {{untoucable}} doesnt consider al the 
 children, it considers only immediate childern



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3535) Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED


[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631454#comment-14631454
 ] 

Hudson commented on YARN-3535:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2205 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2205/])
YARN-3535. Scheduler must re-request container resources when RMContainer 
transitions from ALLOCATED to KILLED (rohithsharma and peng.zhang via asuresh) 
(Arun Suresh: rev 9b272ccae78918e7d756d84920a9322187d61eed)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/ContainerRescheduledEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/SchedulerEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


 Scheduler must re-request container resources when RMContainer transitions 
 from ALLOCATED to KILLED
 ---

 Key: YARN-3535
 URL: https://issues.apache.org/jira/browse/YARN-3535
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, fairscheduler, resourcemanager
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Peng Zhang
Priority: Critical
 Fix For: 2.8.0

 Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
 0005-YARN-3535.patch, 0006-YARN-3535.patch, YARN-3535-001.patch, 
 YARN-3535-002.patch, syslog.tgz, yarn-app.log


 During rolling update of NM, AM start of container on NM failed. 
 And then job hang there.
 Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2306) leak of reservation metrics (fair scheduler)


[ 
https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630842#comment-14630842
 ] 

Hadoop QA commented on YARN-2306:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   9m 57s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |  10m  1s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 28s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m  2s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 39s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 40s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  52m  5s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  77m 36s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745751/YARN-2306-3.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / ee36f4f |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8567/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8567/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8567/console |


This message was automatically generated.

 leak of reservation metrics (fair scheduler)
 

 Key: YARN-2306
 URL: https://issues.apache.org/jira/browse/YARN-2306
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2306-2.patch, YARN-2306-3.patch, YARN-2306.patch


 This only applies to fair scheduler. Capacity scheduler is OK.
 When appAttempt or node is removed, the metrics for 
 reservation(reservedContainers, reservedMB, reservedVCores) is not reduced 
 back.
 These are important metrics for administrator. The wrong metrics confuses may 
 confuse them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3933) Resources(both core and memory) are being negative

2015-07-17 Thread Lavkesh Lahngir (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lavkesh Lahngir updated YARN-3933:
--
Summary: Resources(both core and memory) are being negative  (was: 
Resources(bothe core and memory) are being negative)

 Resources(both core and memory) are being negative
 --

 Key: YARN-3933
 URL: https://issues.apache.org/jira/browse/YARN-3933
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir

 In our cluster we are seeing available memory and cores being negative. 
 Initial inspection:
 Scenario no. 1: 
 In capacity scheduler the method allocateContainersToNode() checks if 
 there are excess reservation of containers for an application, and they are 
 no longer needed then it calls queue.completedContainer() which causes 
 resources being negative. And they were never assigned in the first place. 
 I am still looking through the code. Can somebody suggest how to simulate 
 excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3936) Add metrics for RMStateStore

Ming Ma created YARN-3936:
-

 Summary: Add metrics for RMStateStore
 Key: YARN-3936
 URL: https://issues.apache.org/jira/browse/YARN-3936
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma


It might be useful to collect some metrics w.r.t. RMStateStore such as:

* Write latency
* The ApplicationStateData size distribution




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3845) [YARN] YARN status in web ui does not show correctly in IE 11


[ 
https://issues.apache.org/jira/browse/YARN-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630869#comment-14630869
 ] 

Hadoop QA commented on YARN-3845:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 24s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 53s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 45s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 46s | The applied patch generated  3 
new checkstyle issues (total was 70, now 70). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 20s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 25s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  51m 10s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  89m 45s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745746/YARN-3845.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / ee36f4f |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8569/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8569/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8569/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8569/console |


This message was automatically generated.

 [YARN] YARN status in web ui does not show correctly in IE 11
 -

 Key: YARN-3845
 URL: https://issues.apache.org/jira/browse/YARN-3845
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jagadesh Kiran N
Assignee: Mohammad Shahid Khan
Priority: Trivial
 Attachments: IE11_yarn.gif, YARN-3845.patch


 In IE 11 , the color display is not proper for the scheduler . In other 
 browser it is showing correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3934) Application with large ApplicationSubmissionContext can cause RM to exit when ZK store is used

[
https://issues.apache.org/jira/browse/YARN-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630884#comment-14630884
]

Sunil G commented on YARN-3934:
---

As of now there are no checks for the size of ApplicationSubmissionContext
while processing submitApplication in RM. I feel we can have a check for the
size in RMAppManager for this. An upper check with the ZK's max size will be a
good solution here. I will check whether we can get the object size from ZK
here and will update.

Application with large ApplicationSubmissionContext can cause RM to exit when
ZK store is used
--

Key: YARN-3934
URL: https://issues.apache.org/jira/browse/YARN-3934
Project: Hadoop YARN
Issue Type: Bug
Reporter: Ming Ma

Use the following steps to test.
1. Set up ZK as the RM HA store.
2. Submit a job that refers to lots of distributed cache files with long HDFS
path, which will cause the app state size to exceed ZK's max object size
limit.
3. RM can't write to ZK and exit with the following exception.
{noformat}
2015-07-10 22:21:13,002 FATAL
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode
= Session expired
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:944)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:941)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1083)
{noformat}
In this case, RM could have rejected the app during submitApplication RPC if
the size of ApplicationSubmissionContext is too large.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.

2015-07-17 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630860#comment-14630860
 ] 

Rohith Sharma K S commented on YARN-3543:
-

Discussed with [~xgong] offline, as per YARN-1462 
[comment|https://issues.apache.org/jira/browse/YARN-1462?focusedCommentId=14568189page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14568189]
 discussion ApplicationReport should be backword compatible.

 ApplicationReport should be able to tell whether the Application is AM 
 managed or not. 
 ---

 Key: YARN-3543
 URL: https://issues.apache.org/jira/browse/YARN-3543
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Spandan Dutta
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 
 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, 
 0004-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, 
 YARN-3543-AH.PNG, YARN-3543-RM.PNG


 Currently we can know whether the application submitted by the user is AM 
 managed from the applicationSubmissionContext. This can be only done  at the 
 time when the user submits the job. We should have access to this info from 
 the ApplicationReport as well so that we can check whether an app is AM 
 managed or not anytime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3932) SchedulerApplicationAttempt#getResourceUsageReport should be based on NodeLabel


 [ 
https://issues.apache.org/jira/browse/YARN-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3932:
---
Attachment: TestResult.jpg

 SchedulerApplicationAttempt#getResourceUsageReport should be based on 
 NodeLabel
 ---

 Key: YARN-3932
 URL: https://issues.apache.org/jira/browse/YARN-3932
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
 Attachments: 0001-YARN-3932.patch, ApplicationReport.jpg, 
 TestResult.jpg


 Application Resource Report shown wrong when node Label is used.
 1.Submit application with NodeLabel
 2.Check RM UI for resources used 
 Allocated CPU VCores and Allocated Memory MB is always {{zero}}
 {code}
  public synchronized ApplicationResourceUsageReport getResourceUsageReport() {
 AggregateAppResourceUsage runningResourceUsage =
 getRunningAggregateAppResourceUsage();
 Resource usedResourceClone =
 Resources.clone(attemptResourceUsage.getUsed());
 Resource reservedResourceClone =
 Resources.clone(attemptResourceUsage.getReserved());
 return ApplicationResourceUsageReport.newInstance(liveContainers.size(),
 reservedContainers.size(), usedResourceClone, reservedResourceClone,
 Resources.add(usedResourceClone, reservedResourceClone),
 runningResourceUsage.getMemorySeconds(),
 runningResourceUsage.getVcoreSeconds());
   }
 {code}
 should be {{attemptResourceUsage.getUsed(label)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3934) Application with large ApplicationSubmissionContext can cause RM to exit when ZK store is used

Ming Ma created YARN-3934:
-

Summary: Application with large ApplicationSubmissionContext can
cause RM to exit when ZK store is used
Key: YARN-3934
URL: https://issues.apache.org/jira/browse/YARN-3934
Project: Hadoop YARN
Issue Type: Bug
Reporter: Ming Ma

Use the following steps to test.

1. Set up ZK as the RM HA store.
2. Submit a job that refers to lots of distributed cache files with long HDFS
path, which will cause the app state size to exceed ZK's max object size limit.
3. RM can't write to ZK and exit with the following exception.

{noformat}
2015-07-10 22:21:13,002 FATAL
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =
Session expired
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:944)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:941)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1083)
{noformat}

In this case, RM could have rejected the app during submitApplication RPC if
the size of ApplicationSubmissionContext is too large.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3932) SchedulerApplicationAttempt#getResourceUsageReport should be based on NodeLabel


 [ 
https://issues.apache.org/jira/browse/YARN-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3932:
---
Attachment: 0001-YARN-3932.patch

 SchedulerApplicationAttempt#getResourceUsageReport should be based on 
 NodeLabel
 ---

 Key: YARN-3932
 URL: https://issues.apache.org/jira/browse/YARN-3932
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
 Attachments: 0001-YARN-3932.patch, ApplicationReport.jpg


 Application Resource Report shown wrong when node Label is used.
 1.Submit application with NodeLabel
 2.Check RM UI for resources used 
 Allocated CPU VCores and Allocated Memory MB is always {{zero}}
 {code}
  public synchronized ApplicationResourceUsageReport getResourceUsageReport() {
 AggregateAppResourceUsage runningResourceUsage =
 getRunningAggregateAppResourceUsage();
 Resource usedResourceClone =
 Resources.clone(attemptResourceUsage.getUsed());
 Resource reservedResourceClone =
 Resources.clone(attemptResourceUsage.getReserved());
 return ApplicationResourceUsageReport.newInstance(liveContainers.size(),
 reservedContainers.size(), usedResourceClone, reservedResourceClone,
 Resources.add(usedResourceClone, reservedResourceClone),
 runningResourceUsage.getMemorySeconds(),
 runningResourceUsage.getVcoreSeconds());
   }
 {code}
 should be {{attemptResourceUsage.getUsed(label)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-07-17 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630867#comment-14630867
 ] 

Varun Saxena commented on YARN-3049:


[~zjshen], should cluster ID be mandatory in REST URL ?
We can assume it to be belonging to same cluster as where this timeline reader 
is running and take it from config, if its not supplied by client.
Thats how I did it in YARN-3814.


 [Storage Implementation] Implement storage reader interface to fetch raw data 
 from HBase backend
 

 Key: YARN-3049
 URL: https://issues.apache.org/jira/browse/YARN-3049
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen
 Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch


 Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3936) Add metrics for RMStateStore


[ 
https://issues.apache.org/jira/browse/YARN-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630878#comment-14630878
 ] 

Sunil G commented on YARN-3936:
---

Hi [~mingma]
I would like to work on this. Please let me know if you are looking into this.

 Add metrics for RMStateStore
 

 Key: YARN-3936
 URL: https://issues.apache.org/jira/browse/YARN-3936
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma

 It might be useful to collect some metrics w.r.t. RMStateStore such as:
 * Write latency
 * The ApplicationStateData size distribution



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3935) Support compression for RM HA ApplicationStateData

Ming Ma created YARN-3935:
-

 Summary: Support compression for RM HA ApplicationStateData
 Key: YARN-3935
 URL: https://issues.apache.org/jira/browse/YARN-3935
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma


If we use ZK as the RM HA, it is possible for some application state to exceed 
the max object size imposed by ZK service. We can apply compression before 
storing the data to ZK.

We might want to add the compression functionality at RMStateStore layer so 
that different store implementations can use it.

The design might also want to take care of compatibility issue. After 
compression is enabled and RM restarts; the older state store should still be 
loaded properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED


[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630861#comment-14630861
 ] 

Hadoop QA commented on YARN-3535:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 18s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 48s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 39s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 47s | The applied patch generated  5 
new checkstyle issues (total was 337, now 342). |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 24s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  51m 21s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  89m 43s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745756/0006-YARN-3535.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / ee36f4f |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8568/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8568/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8568/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8568/console |


This message was automatically generated.

  ResourceRequest should be restored back to scheduler when RMContainer is 
 killed at ALLOCATED
 -

 Key: YARN-3535
 URL: https://issues.apache.org/jira/browse/YARN-3535
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Peng Zhang
Priority: Critical
 Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
 0005-YARN-3535.patch, 0006-YARN-3535.patch, YARN-3535-001.patch, 
 YARN-3535-002.patch, syslog.tgz, yarn-app.log


 During rolling update of NM, AM start of container on NM failed. 
 And then job hang there.
 Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3932) SchedulerApplicationAttempt#getResourceUsageReport should be based on NodeLabel


[ 
https://issues.apache.org/jira/browse/YARN-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630877#comment-14630877
 ] 

Bibin A Chundatt commented on YARN-3932:


[~leftnoteasy] used {{attemptResourceUsage.getAllUsed()}} already available 
method.

 SchedulerApplicationAttempt#getResourceUsageReport should be based on 
 NodeLabel
 ---

 Key: YARN-3932
 URL: https://issues.apache.org/jira/browse/YARN-3932
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
 Attachments: 0001-YARN-3932.patch, ApplicationReport.jpg


 Application Resource Report shown wrong when node Label is used.
 1.Submit application with NodeLabel
 2.Check RM UI for resources used 
 Allocated CPU VCores and Allocated Memory MB is always {{zero}}
 {code}
  public synchronized ApplicationResourceUsageReport getResourceUsageReport() {
 AggregateAppResourceUsage runningResourceUsage =
 getRunningAggregateAppResourceUsage();
 Resource usedResourceClone =
 Resources.clone(attemptResourceUsage.getUsed());
 Resource reservedResourceClone =
 Resources.clone(attemptResourceUsage.getReserved());
 return ApplicationResourceUsageReport.newInstance(liveContainers.size(),
 reservedContainers.size(), usedResourceClone, reservedResourceClone,
 Resources.add(usedResourceClone, reservedResourceClone),
 runningResourceUsage.getMemorySeconds(),
 runningResourceUsage.getVcoreSeconds());
   }
 {code}
 should be {{attemptResourceUsage.getUsed(label)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3905) Application History Server UI NPEs when accessing apps run after RM restart

2015-07-17 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631501#comment-14631501
 ] 

Jonathan Eagles commented on YARN-3905:
---

+1. Committing this patch [~eepayne].

 Application History Server UI NPEs when accessing apps run after RM restart
 ---

 Key: YARN-3905
 URL: https://issues.apache.org/jira/browse/YARN-3905
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.7.0, 2.8.0, 2.7.1
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: YARN-3905.001.patch, YARN-3905.002.patch


 From the Application History URL (http://RmHostName:8188/applicationhistory), 
 clicking on the application ID of an app that was run after the RM daemon has 
 been restarted results in a 500 error:
 {noformat}
 Sorry, got error 500
 Please consult RFC 2616 for meanings of the error code.
 {noformat}
 The stack trace is as follows:
 {code}
 2015-07-09 20:13:15,584 [2068024519@qtp-769046918-3] INFO 
 applicationhistoryservice.FileSystemApplicationHistoryStore: Completed 
 reading history information of all application attempts of application 
 application_1436472584878_0001
 2015-07-09 20:13:15,591 [2068024519@qtp-769046918-3] ERROR webapp.AppBlock: 
 Failed to read the AM container of the application attempt 
 appattempt_1436472584878_0001_01.
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToContainerReport(ApplicationHistoryManagerImpl.java:206)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainer(ApplicationHistoryManagerImpl.java:199)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainerReport(ApplicationHistoryClientService.java:205)
 at 
 org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:272)
 at 
 org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:267)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
 at 
 org.apache.hadoop.yarn.server.webapp.AppBlock.generateApplicationTable(AppBlock.java:266)
 ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3905) Application History Server UI NPEs when accessing apps run after RM restart


[ 
https://issues.apache.org/jira/browse/YARN-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631530#comment-14631530
 ] 

Hudson commented on YARN-3905:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8180 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8180/])
YARN-3905. Application History Server UI NPEs when accessing apps run after RM 
restart (Eric Payne via jeagles) (jeagles: rev 
7faae0e6fe027a3886d9f4e290b6a488a2c55b3a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* hadoop-yarn-project/CHANGES.txt


 Application History Server UI NPEs when accessing apps run after RM restart
 ---

 Key: YARN-3905
 URL: https://issues.apache.org/jira/browse/YARN-3905
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.7.0, 2.8.0, 2.7.1
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: YARN-3905.001.patch, YARN-3905.002.patch


 From the Application History URL (http://RmHostName:8188/applicationhistory), 
 clicking on the application ID of an app that was run after the RM daemon has 
 been restarted results in a 500 error:
 {noformat}
 Sorry, got error 500
 Please consult RFC 2616 for meanings of the error code.
 {noformat}
 The stack trace is as follows:
 {code}
 2015-07-09 20:13:15,584 [2068024519@qtp-769046918-3] INFO 
 applicationhistoryservice.FileSystemApplicationHistoryStore: Completed 
 reading history information of all application attempts of application 
 application_1436472584878_0001
 2015-07-09 20:13:15,591 [2068024519@qtp-769046918-3] ERROR webapp.AppBlock: 
 Failed to read the AM container of the application attempt 
 appattempt_1436472584878_0001_01.
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToContainerReport(ApplicationHistoryManagerImpl.java:206)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainer(ApplicationHistoryManagerImpl.java:199)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainerReport(ApplicationHistoryClientService.java:205)
 at 
 org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:272)
 at 
 org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:267)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
 at 
 org.apache.hadoop.yarn.server.webapp.AppBlock.generateApplicationTable(AppBlock.java:266)
 ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3937) Introducing REMOVE_CONTAINER_FROM_PREEMPTION event to notify Scheduler and AM when a container is no longer to be preempted

Sunil G created YARN-3937:
-

 Summary: Introducing REMOVE_CONTAINER_FROM_PREEMPTION event to 
notify Scheduler and AM when a container is no longer to be preempted
 Key: YARN-3937
 URL: https://issues.apache.org/jira/browse/YARN-3937
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.7.1
Reporter: Sunil G
Assignee: Sunil G


As discussed in YARN-3784, there are scenarios like few other applications 
released containers or same application has revoked its resource requests. In 
these cases, we may not have to preempt a container which would have been 
marked for preemption earlier. 
Introduce a new event to remove such containers if present in the 
to-be-preempted list of scheduler or inform AM about such a scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3938) AM Resources for leaf queues zero when DEFAULT PARTITION resource is zero with NodeLabel


 [ 
https://issues.apache.org/jira/browse/YARN-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3938:
---
Summary: AM Resources for leaf queues zero when DEFAULT PARTITION resource 
is zero with NodeLabel  (was: AM Resources for leaf queues zero when DEFAULT 
PARTITION resource is zero)

 AM Resources for leaf queues zero when DEFAULT PARTITION resource is zero 
 with NodeLabel
 

 Key: YARN-3938
 URL: https://issues.apache.org/jira/browse/YARN-3938
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: Am limit for subqueue.jpg


 In case of leaf queue  the AM resource calculation is based on 
 {{absoluteCapacityResource}}. Below is the calculation for absolute capacity
 {{LeafQueue#updateAbsoluteCapacityResource()}}
 {code}
   private void updateAbsoluteCapacityResource(Resource clusterResource) {
 absoluteCapacityResource =
 Resources.multiplyAndNormalizeUp(resourceCalculator, labelManager
 .getResourceByLabel(RMNodeLabelsManager.NO_LABEL, 
 clusterResource),
 queueCapacities.getAbsoluteCapacity(), minimumAllocation);
   }
 {code}
 If default partition resource is zero for all Leaf queue the resource for AM 
 will be zero
 Snapshot also attached for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED


 [ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-3535:
--
Component/s: resourcemanager
 fairscheduler
 capacityscheduler

  ResourceRequest should be restored back to scheduler when RMContainer is 
 killed at ALLOCATED
 -

 Key: YARN-3535
 URL: https://issues.apache.org/jira/browse/YARN-3535
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, fairscheduler, resourcemanager
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Peng Zhang
Priority: Critical
 Fix For: 2.8.0

 Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
 0005-YARN-3535.patch, 0006-YARN-3535.patch, YARN-3535-001.patch, 
 YARN-3535-002.patch, syslog.tgz, yarn-app.log


 During rolling update of NM, AM start of container on NM failed. 
 And then job hang there.
 Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED


 [ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-3535:
--
Fix Version/s: 2.8.0

  ResourceRequest should be restored back to scheduler when RMContainer is 
 killed at ALLOCATED
 -

 Key: YARN-3535
 URL: https://issues.apache.org/jira/browse/YARN-3535
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, fairscheduler, resourcemanager
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Peng Zhang
Priority: Critical
 Fix For: 2.8.0

 Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
 0005-YARN-3535.patch, 0006-YARN-3535.patch, YARN-3535-001.patch, 
 YARN-3535-002.patch, syslog.tgz, yarn-app.log


 During rolling update of NM, AM start of container on NM failed. 
 And then job hang there.
 Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED


[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631205#comment-14631205
 ] 

Arun Suresh commented on YARN-3535:
---

+1, Committing this shortly.
Thanks to everyone involved.

  ResourceRequest should be restored back to scheduler when RMContainer is 
 killed at ALLOCATED
 -

 Key: YARN-3535
 URL: https://issues.apache.org/jira/browse/YARN-3535
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, fairscheduler, resourcemanager
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Peng Zhang
Priority: Critical
 Fix For: 2.8.0

 Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
 0005-YARN-3535.patch, 0006-YARN-3535.patch, YARN-3535-001.patch, 
 YARN-3535-002.patch, syslog.tgz, yarn-app.log


 During rolling update of NM, AM start of container on NM failed. 
 And then job hang there.
 Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3938) AM Resources for leaf queues zero when DEFAULT PARTITION resource is zero

Bibin A Chundatt created YARN-3938:
--

 Summary: AM Resources for leaf queues zero when DEFAULT PARTITION 
resource is zero
 Key: YARN-3938
 URL: https://issues.apache.org/jira/browse/YARN-3938
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical


In case of leaf queue  the AM resource calculation is based on 
{{absoluteCapacityResource}}. Below is the calculation for absolute capacity

{{LeafQueue#updateAbsoluteCapacityResource()}}


{code}
  private void updateAbsoluteCapacityResource(Resource clusterResource) {
absoluteCapacityResource =
Resources.multiplyAndNormalizeUp(resourceCalculator, labelManager
.getResourceByLabel(RMNodeLabelsManager.NO_LABEL, clusterResource),
queueCapacities.getAbsoluteCapacity(), minimumAllocation);
  }
{code}

If default partition resource is zero for all Leaf queue the resource for AM 
will be zero

Snapshot also attached for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3938) AM Resources for leaf queues zero when DEFAULT PARTITION resource is zero


 [ 
https://issues.apache.org/jira/browse/YARN-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3938:
---
Attachment: Am limit for subqueue.jpg

 AM Resources for leaf queues zero when DEFAULT PARTITION resource is zero
 -

 Key: YARN-3938
 URL: https://issues.apache.org/jira/browse/YARN-3938
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: Am limit for subqueue.jpg


 In case of leaf queue  the AM resource calculation is based on 
 {{absoluteCapacityResource}}. Below is the calculation for absolute capacity
 {{LeafQueue#updateAbsoluteCapacityResource()}}
 {code}
   private void updateAbsoluteCapacityResource(Resource clusterResource) {
 absoluteCapacityResource =
 Resources.multiplyAndNormalizeUp(resourceCalculator, labelManager
 .getResourceByLabel(RMNodeLabelsManager.NO_LABEL, 
 clusterResource),
 queueCapacities.getAbsoluteCapacity(), minimumAllocation);
   }
 {code}
 If default partition resource is zero for all Leaf queue the resource for AM 
 will be zero
 Snapshot also attached for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3535) Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED


 [ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-3535:
--
Summary: Scheduler must re-request container resources when RMContainer 
transitions from ALLOCATED to KILLED  (was:  ResourceRequest should be restored 
back to scheduler when RMContainer is killed at ALLOCATED)

 Scheduler must re-request container resources when RMContainer transitions 
 from ALLOCATED to KILLED
 ---

 Key: YARN-3535
 URL: https://issues.apache.org/jira/browse/YARN-3535
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, fairscheduler, resourcemanager
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Peng Zhang
Priority: Critical
 Fix For: 2.8.0

 Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
 0005-YARN-3535.patch, 0006-YARN-3535.patch, YARN-3535-001.patch, 
 YARN-3535-002.patch, syslog.tgz, yarn-app.log


 During rolling update of NM, AM start of container on NM failed. 
 And then job hang there.
 Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2003:
--
Attachment: 0023-YARN-2003.patch

Thank you [~leftnoteasy] for the comments. Uploading a new patch addressing 
these.
{{compareTo}} is used with priority of containers where lower integer value is 
highest in priority. Now its used in opposite context. Hence I added a comment 
there. Kindly review the same.

 Support for Application priority : Changes in RM and Capacity Scheduler
 ---

 Key: YARN-2003
 URL: https://issues.apache.org/jira/browse/YARN-2003
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 
 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 
 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 
 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 
 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 
 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 
 0018-YARN-2003.patch, 0019-YARN-2003.patch, 0020-YARN-2003.patch, 
 0021-YARN-2003.patch, 0022-YARN-2003.patch, 0023-YARN-2003.patch


 AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from 
 Submission Context and store.
 Later this can be used by Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3937) Introducing REMOVE_CONTAINER_FROM_PREEMPTION event to notify Scheduler and AM when a container is no longer to be preempted


 [ 
https://issues.apache.org/jira/browse/YARN-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3937:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-45

 Introducing REMOVE_CONTAINER_FROM_PREEMPTION event to notify Scheduler and AM 
 when a container is no longer to be preempted
 ---

 Key: YARN-3937
 URL: https://issues.apache.org/jira/browse/YARN-3937
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.7.1
Reporter: Sunil G
Assignee: Sunil G

 As discussed in YARN-3784, there are scenarios like few other applications 
 released containers or same application has revoked its resource requests. In 
 these cases, we may not have to preempt a container which would have been 
 marked for preemption earlier. 
 Introduce a new event to remove such containers if present in the 
 to-be-preempted list of scheduler or inform AM about such a scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3453) Fair Scheduler: Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing


 [ 
https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-3453:
--
Fix Version/s: 2.8.0

 Fair Scheduler: Parts of preemption logic uses DefaultResourceCalculator even 
 in DRF mode causing thrashing
 ---

 Key: YARN-3453
 URL: https://issues.apache.org/jira/browse/YARN-3453
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Ashwin Shankar
Assignee: Arun Suresh
 Fix For: 2.8.0

 Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch, 
 YARN-3453.4.patch, YARN-3453.5.patch


 There are two places in preemption code flow where DefaultResourceCalculator 
 is used, even in DRF mode.
 Which basically results in more resources getting preempted than needed, and 
 those extra preempted containers aren’t even getting to the “starved” queue 
 since scheduling logic is based on DRF's Calculator.
 Following are the two places :
 1. {code:title=FSLeafQueue.java|borderStyle=solid}
 private boolean isStarved(Resource share)
 {code}
 A queue shouldn’t be marked as “starved” if the dominant resource usage
 is =  fair/minshare.
 2. {code:title=FairScheduler.java|borderStyle=solid}
 protected Resource resToPreempt(FSLeafQueue sched, long curTime)
 {code}
 --
 One more thing that I believe needs to change in DRF mode is : during a 
 preemption round,if preempting a few containers results in satisfying needs 
 of a resource type, then we should exit that preemption round, since the 
 containers that we just preempted should bring the dominant resource usage to 
 min/fair share.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3938) AM Resources for leaf queues zero when DEFAULT PARTITION resource is zero with NodeLabel


[ 
https://issues.apache.org/jira/browse/YARN-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631652#comment-14631652
 ] 

Wangda Tan commented on YARN-3938:
--

Hi [~bibinchundatt],
Thanks for reporting this issue, this is a known issue of node label. Possible 
solutions:
# Make {{maxAMResource = queue's-total-guaranteed-resource (Sum of queue's 
guaranteed resource on all partitions) * maxAmResourcePercent}}. It will be 
straightforward, but also can lead to too many AMs launched under a single 
partition.
# Make maxAMResource computed per queue per partition, this can make AM usages 
under partitions are more balanced, but can also lead to hard debugging (My 
application get stuck because of AMResourceLimit for a partition is violated).

I prefer 1st solution since it's easier to understand and debug.

Thoughts?

And could I take over this issue if you haven't get started?

 AM Resources for leaf queues zero when DEFAULT PARTITION resource is zero 
 with NodeLabel
 

 Key: YARN-3938
 URL: https://issues.apache.org/jira/browse/YARN-3938
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: Am limit for subqueue.jpg


 In case of leaf queue  the AM resource calculation is based on 
 {{absoluteCapacityResource}}. Below is the calculation for absolute capacity
 {{LeafQueue#updateAbsoluteCapacityResource()}}
 {code}
   private void updateAbsoluteCapacityResource(Resource clusterResource) {
 absoluteCapacityResource =
 Resources.multiplyAndNormalizeUp(resourceCalculator, labelManager
 .getResourceByLabel(RMNodeLabelsManager.NO_LABEL, 
 clusterResource),
 queueCapacities.getAbsoluteCapacity(), minimumAllocation);
   }
 {code}
 If default partition resource is zero for all Leaf queue the resource for AM 
 will be zero
 Snapshot also attached for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631623#comment-14631623
 ] 

Wangda Tan commented on YARN-2003:
--

Latest patch looks good, [~sunilg], could you take a look at failed tests?

 Support for Application priority : Changes in RM and Capacity Scheduler
 ---

 Key: YARN-2003
 URL: https://issues.apache.org/jira/browse/YARN-2003
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 
 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 
 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 
 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 
 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 
 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 
 0018-YARN-2003.patch, 0019-YARN-2003.patch, 0020-YARN-2003.patch, 
 0021-YARN-2003.patch, 0022-YARN-2003.patch, 0023-YARN-2003.patch


 AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from 
 Submission Context and store.
 Later this can be used by Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3932) SchedulerApplicationAttempt#getResourceUsageReport should be based on NodeLabel


[ 
https://issues.apache.org/jira/browse/YARN-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631656#comment-14631656
 ] 

Wangda Tan commented on YARN-3932:
--

Thanks for update [~bibinchundatt], could you add a test for this to avoid 
future regression?

 SchedulerApplicationAttempt#getResourceUsageReport should be based on 
 NodeLabel
 ---

 Key: YARN-3932
 URL: https://issues.apache.org/jira/browse/YARN-3932
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
 Attachments: 0001-YARN-3932.patch, ApplicationReport.jpg, 
 TestResult.jpg


 Application Resource Report shown wrong when node Label is used.
 1.Submit application with NodeLabel
 2.Check RM UI for resources used 
 Allocated CPU VCores and Allocated Memory MB is always {{zero}}
 {code}
  public synchronized ApplicationResourceUsageReport getResourceUsageReport() {
 AggregateAppResourceUsage runningResourceUsage =
 getRunningAggregateAppResourceUsage();
 Resource usedResourceClone =
 Resources.clone(attemptResourceUsage.getUsed());
 Resource reservedResourceClone =
 Resources.clone(attemptResourceUsage.getReserved());
 return ApplicationResourceUsageReport.newInstance(liveContainers.size(),
 reservedContainers.size(), usedResourceClone, reservedResourceClone,
 Resources.add(usedResourceClone, reservedResourceClone),
 runningResourceUsage.getMemorySeconds(),
 runningResourceUsage.getVcoreSeconds());
   }
 {code}
 should be {{attemptResourceUsage.getUsed(label)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3905) Application History Server UI NPEs when accessing apps run after RM restart

2015-07-17 Thread Jonathan Eagles (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-3905:
--
Fix Version/s: 3.0.0
   2.7.2
   2.8.0

 Application History Server UI NPEs when accessing apps run after RM restart
 ---

 Key: YARN-3905
 URL: https://issues.apache.org/jira/browse/YARN-3905
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.7.0, 2.8.0, 2.7.1
Reporter: Eric Payne
Assignee: Eric Payne
 Fix For: 3.0.0, 2.8.0, 2.7.2

 Attachments: YARN-3905.001.patch, YARN-3905.002.patch


 From the Application History URL (http://RmHostName:8188/applicationhistory), 
 clicking on the application ID of an app that was run after the RM daemon has 
 been restarted results in a 500 error:
 {noformat}
 Sorry, got error 500
 Please consult RFC 2616 for meanings of the error code.
 {noformat}
 The stack trace is as follows:
 {code}
 2015-07-09 20:13:15,584 [2068024519@qtp-769046918-3] INFO 
 applicationhistoryservice.FileSystemApplicationHistoryStore: Completed 
 reading history information of all application attempts of application 
 application_1436472584878_0001
 2015-07-09 20:13:15,591 [2068024519@qtp-769046918-3] ERROR webapp.AppBlock: 
 Failed to read the AM container of the application attempt 
 appattempt_1436472584878_0001_01.
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToContainerReport(ApplicationHistoryManagerImpl.java:206)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainer(ApplicationHistoryManagerImpl.java:199)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainerReport(ApplicationHistoryClientService.java:205)
 at 
 org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:272)
 at 
 org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:267)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
 at 
 org.apache.hadoop.yarn.server.webapp.AppBlock.generateApplicationTable(AppBlock.java:266)
 ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS


[ 
https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631638#comment-14631638
 ] 

Hadoop QA commented on YARN-2681:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  23m 21s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 13 new or modified test files. |
| {color:green}+1{color} | javac |   8m  3s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  3s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   3m 32s | The applied patch generated  1 
new checkstyle issues (total was 221, now 221). |
| {color:green}+1{color} | whitespace |   0m 46s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 26s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   8m 30s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   9m 16s | Tests passed in 
hadoop-mapreduce-client-app. |
| {color:green}+1{color} | mapreduce tests |   1m 46s | Tests passed in 
hadoop-mapreduce-client-core. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 57s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |   6m 37s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |  51m 49s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 129m 16s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.nodemanager.TestDeletionService |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745824/YARN-2681.005.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 9b272cc |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8573/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-mapreduce-client-app test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8573/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt
 |
| hadoop-mapreduce-client-core test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8573/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8573/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8573/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8573/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8573/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8573/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8573/console |


This message was automatically generated.

 Support bandwidth enforcement for containers while reading from HDFS
 

 Key: YARN-2681
 URL: https://issues.apache.org/jira/browse/YARN-2681
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.5.1
 Environment: Linux
Reporter: Nam H. Do
  Labels: BB2015-05-TBR
 Fix For: 2.7.0

 Attachments: Traffic Control Design.png, YARN-2681.001.patch, 
 YARN-2681.002.patch, YARN-2681.003.patch, YARN-2681.004.patch, 
 YARN-2681.005.patch, YARN-2681.patch


 To read/write data from HDFS on data node, applications establise TCP/IP 
 connections with the datanode. The HDFS read can be controled by setting 
 Linux Traffic Control  (TC) subsystem on the data node to make filters on 
 appropriate connections.
 The current cgroups net_cls concept can not be applied on the node where the 
 container is launched, netheir on data node since:
 -   TC hanldes outgoing

[jira] [Commented] (YARN-1645) ContainerManager implementation to support container resizing


[ 
https://issues.apache.org/jira/browse/YARN-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631667#comment-14631667
 ] 

Hadoop QA commented on YARN-1645:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m  2s | Findbugs (version ) appears to 
be broken on YARN-1197. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   7m 38s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 42s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 18s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m 17s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 21s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 12s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |   6m 14s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  42m 45s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestContainerManager |
|   | hadoop.yarn.server.nodemanager.TestContainerManagerWithLCE |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745467/YARN-1645-YARN-1197.3.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-1197 / 8041fd8 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8575/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8575/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8575/console |


This message was automatically generated.

 ContainerManager implementation to support container resizing
 -

 Key: YARN-1645
 URL: https://issues.apache.org/jira/browse/YARN-1645
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Wangda Tan
Assignee: MENG DING
 Attachments: YARN-1645-YARN-1197.3.patch, YARN-1645.1.patch, 
 YARN-1645.2.patch, yarn-1645.1.patch


 Implementation of ContainerManager for container resize, including:
 1) ContainerManager resize logic 
 2) Relevant test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631732#comment-14631732
 ] 

Hadoop QA commented on YARN-2003:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 26s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 10 new or modified test files. |
| {color:green}+1{color} | javac |   8m  1s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 46s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 49s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m 23s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 25s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m  3s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | tools/hadoop tests |   0m 23s | Tests failed in 
hadoop-sls. |
| {color:green}+1{color} | yarn tests |   0m 28s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |  52m  4s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  99m  1s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.sls.nodemanager.TestNMSimulator |
|   | hadoop.yarn.sls.appmaster.TestAMSimulator |
|   | hadoop.yarn.sls.TestSLSRunner |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745796/0023-YARN-2003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 7faae0e |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8574/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-sls test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8574/artifact/patchprocess/testrun_hadoop-sls.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8574/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8574/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8574/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8574/console |


This message was automatically generated.

 Support for Application priority : Changes in RM and Capacity Scheduler
 ---

 Key: YARN-2003
 URL: https://issues.apache.org/jira/browse/YARN-2003
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 
 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 
 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 
 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 
 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 
 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 
 0018-YARN-2003.patch, 0019-YARN-2003.patch, 0020-YARN-2003.patch, 
 0021-YARN-2003.patch, 0022-YARN-2003.patch, 0023-YARN-2003.patch


 AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from 
 Submission Context and store.
 Later this can be used by Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-17 Thread Anubhav Dhoot (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3878:

Attachment: YARN-3878.09_reprorace.pat_h

Attaching a patch just to demonstrate the race. Since its trying to demonstrate 
the race it injects an artificial delay, hence not making it an official patch. 
Run the test testBlockNewEvents to show that an event can be in the queue while 
serviceStop happens.

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, 
 YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, 
 YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, 
 YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at

[jira] [Commented] (YARN-3937) Introducing REMOVE_CONTAINER_FROM_PREEMPTION event to notify Scheduler and AM when a container is no longer to be preempted


[ 
https://issues.apache.org/jira/browse/YARN-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631677#comment-14631677
 ] 

Wangda Tan commented on YARN-3937:
--

[~sunilg],
I agree to have a separated event add to API/scheduler. And maybe add to 
scheduler is more important since YARN-3769 can potentially leverage it. I 
don't have a solid design for YARN-3769 yet, but I think if a container is 
removed from to-be-preempted list, we shouldn't do lazy preemption for such 
containers. For API changes, I'm not sure if we need it, since a container can 
occur on list / off list frequently, we cannot guarantee once a container is 
removed from list, it won't be marked again. Personally I think we can make 
this is an internal event first to avoid too much noises.

 Introducing REMOVE_CONTAINER_FROM_PREEMPTION event to notify Scheduler and AM 
 when a container is no longer to be preempted
 ---

 Key: YARN-3937
 URL: https://issues.apache.org/jira/browse/YARN-3937
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.7.1
Reporter: Sunil G
Assignee: Sunil G

 As discussed in YARN-3784, there are scenarios like few other applications 
 released containers or same application has revoked its resource requests. In 
 these cases, we may not have to preempt a container which would have been 
 marked for preemption earlier. 
 Introduce a new event to remove such containers if present in the 
 to-be-preempted list of scheduler or inform AM about such a scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3844) Make hadoop-yarn-project Native code -Wall-clean

2015-07-17 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631720#comment-14631720
 ] 

Colin Patrick McCabe commented on YARN-3844:


+1.  Thanks, Alan.

 Make hadoop-yarn-project Native code -Wall-clean
 

 Key: YARN-3844
 URL: https://issues.apache.org/jira/browse/YARN-3844
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: build
Affects Versions: 2.7.0
 Environment: As we specify -Wall as a default compilation flag, it 
 would be helpful if the Native code was -Wall-clean
Reporter: Alan Burlison
Assignee: Alan Burlison
 Attachments: YARN-3844.001.patch, YARN-3844.002.patch, 
 YARN-3844.007.patch


 As we specify -Wall as a default compilation flag, it would be helpful if the 
 Native code was -Wall-clean



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3784) Indicate preemption timout along with the list of containers to AM (preemption message)


[ 
https://issues.apache.org/jira/browse/YARN-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631671#comment-14631671
 ] 

Wangda Tan commented on YARN-3784:
--

Hi [~sunilg],
Thanks for your comments, I will post cancel-preemption event related comments 
to YARN-3937 soon.

 Indicate preemption timout along with the list of containers to AM 
 (preemption message)
 ---

 Key: YARN-3784
 URL: https://issues.apache.org/jira/browse/YARN-3784
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3784.patch, 0002-YARN-3784.patch


 Currently during preemption, AM is notified with a list of containers which 
 are marked for preemption. Introducing a timeout duration also along with 
 this container list so that AM can know how much time it will get to do a 
 graceful shutdown to its containers (assuming one of preemption policy is 
 loaded in AM).
 This will help in decommissioning NM scenarios, where NM will be 
 decommissioned after a timeout (also killing containers on it). This timeout 
 will be helpful to indicate AM that those containers can be killed by RM 
 forcefully after the timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3784) Indicate preemption timout along with the list of containers to AM (preemption message)


[ 
https://issues.apache.org/jira/browse/YARN-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631691#comment-14631691
 ] 

Wangda Tan commented on YARN-3784:
--

And also about this patch, same as commented by [~chris.douglas]. I found 
timeout sent to AM is maxWaitTime, which I think should be how much time 
till the container preempted. Maybe one solution is compute a absolute time 
for each to-be-preempted containers, and timeout will be computed when AM is 
pulling these information.

 Indicate preemption timout along with the list of containers to AM 
 (preemption message)
 ---

 Key: YARN-3784
 URL: https://issues.apache.org/jira/browse/YARN-3784
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3784.patch, 0002-YARN-3784.patch


 Currently during preemption, AM is notified with a list of containers which 
 are marked for preemption. Introducing a timeout duration also along with 
 this container list so that AM can know how much time it will get to do a 
 graceful shutdown to its containers (assuming one of preemption policy is 
 loaded in AM).
 This will help in decommissioning NM scenarios, where NM will be 
 decommissioned after a timeout (also killing containers on it). This timeout 
 will be helpful to indicate AM that those containers can be killed by RM 
 forcefully after the timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3934) Application with large ApplicationSubmissionContext can cause RM to exit when ZK store is used

2015-07-17 Thread Karthik Kambatla (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631783#comment-14631783
]

Karthik Kambatla commented on YARN-3934:

Are we sure this is because of the size of a single ASC and not the number of
applications at all? The latter can be fixed by setting the
max-completed-applications.

Application with large ApplicationSubmissionContext can cause RM to exit when
ZK store is used
--

Key: YARN-3934
URL: https://issues.apache.org/jira/browse/YARN-3934
Project: Hadoop YARN
Issue Type: Bug
Reporter: Ming Ma

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3853) Add docker container runtime support to LinuxContainterExecutor

2015-07-17 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632026#comment-14632026
 ] 

Varun Vasudev commented on YARN-3853:
-

Thanks for the patch [~sidharta-s]. One question -
Can you explain what the purpose of
{code}
whitelist.add(YarnConfiguration.NM_DOCKER_CONTAINER_EXECUTOR_IMAGE_NAME);
{code}
is?

Some feedback on the patch:
# Can we rename the DefaultLinuxContainerRuntime to ProcessContainerRuntime and 
rename DockerLinuxContainerRuntime to DockerContainerRuntime - both are already 
in the nodemanager.containermanager.linux.runtime package so the Linux seems 
redundant and I think Process is better than Default.
# In LinuxContainerExecutor
{code}
+
+  public LinuxContainerExecutor() {
+  }
+
+  // created primarily for testing
+  public LinuxContainerExecutor(LinuxContainerRuntime linuxContainerRuntime) {
+this.linuxContainerRuntime = linuxContainerRuntime;
+  }
{code}
Maybe these should be protected? In addition, the VisibleForTesting annotation 
should be used
# In LinuxContainerExecutor
{code}
-  containerSchedPriorityIsSet = true;
-  containerSchedPriorityAdjustment = conf
-  .getInt(YarnConfiguration.NM_CONTAINER_EXECUTOR_SCHED_PRIORITY,
-  YarnConfiguration.DEFAULT_NM_CONTAINER_EXECUTOR_SCHED_PRIORITY);
+ containerSchedPriorityIsSet = true;
+ containerSchedPriorityAdjustment = conf
+ .getInt(YarnConfiguration.NM_CONTAINER_EXECUTOR_SCHED_PRIORITY,
+ YarnConfiguration.DEFAULT_NM_CONTAINER_EXECUTOR_SCHED_PRIORITY);
 }
{code}
Looks like the formatting is messed up.
# In LinuxContainerExecutor, we've removed some debug statements; we should put 
them back in
{code}
-if (LOG.isDebugEnabled()) {
-  LOG.debug(Output from LinuxContainerExecutor's launchContainer 
follows:);
-  logOutput(shExec.getOutput());
-}
{code}
and
{code}
-if (LOG.isDebugEnabled()) {
-  LOG.debug(signalContainer:  + Arrays.toString(command));
-}
{code}
# In ContainerLaunch.java
{code}
 @Override
+public void whitelistedEnv(String key, String value) throws IOException {
+  lineWithLenCheck(@set , key, =, value);
+  errorCheck();
+}
{code}
This code is exactly the same as the env() function. Maybe it should just call 
the env() function instead?
# There are some unused imports in DockerLinuxContainerRuntime, DockerClient 
and TestDockerContainerRuntime

 Add docker container runtime support to LinuxContainterExecutor
 ---

 Key: YARN-3853
 URL: https://issues.apache.org/jira/browse/YARN-3853
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: yarn
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Attachments: YARN-3853.001.patch


 Create a new DockerContainerRuntime that implements support for docker 
 containers via container-executor. LinuxContainerExecutor should default to 
 current behavior when launching containers but switch to docker when 
 requested. 
 Overview
 ===
 The current mechanism of launching/signaling containers is moved to its own 
 (default) container runtime. In order to use docker container runtime a 
 couple of environment variables have to be set. This will have to be 
 revisited when we have a first class client side API to specify different 
 container types and associated parameters. Using ‘pi’ as an example and using 
 a custom docker image, this is how you could use the docker container runtime 
 (LinuxContainerExecutor must be in use and the docker daemon needs to be 
 running) :
 {code}
 export 
 YARN_EXAMPLES_JAR=./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar
 bin/yarn jar $YARN_EXAMPLES_JAR pi 
 -Dmapreduce.map.env=YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=ashahab/hadoop-trunk
  
 -Dyarn.app.mapreduce.am.env=YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=ashahab/hadoop-trunk
   
 -Dmapreduce.reduce.env=YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=ashahab/hadoop-trunk
  4 1000
 {code}
  
 LinuxContainerExecutor can delegate to either runtime on a per container 
 basis. If the docker container type is selected, LinuxContainerExecutor 
 delegates to the DockerContainerRuntime which in turn uses docker support in 
 the container-executor binary to launch/manage docker containers ( see 
 YARN-3852 ) . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632094#comment-14632094
 ] 

Hadoop QA commented on YARN-3908:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m  5s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m  3s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 55s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 18s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 21s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 22s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  43m  4s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745903/YARN-3908-YARN-2928.005.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / eb1932d |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8578/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8578/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8578/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8578/console |


This message was automatically generated.

 Bugs in HBaseTimelineWriterImpl
 ---

 Key: YARN-3908
 URL: https://issues.apache.org/jira/browse/YARN-3908
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Attachments: YARN-3908-YARN-2928.001.patch, 
 YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, 
 YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, 
 YARN-3908-YARN-2928.005.patch


 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
 fields of a timeline entity plus events. However, entity#info map is not 
 stored at all.
 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

[
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinod Kumar Vavilapalli updated YARN-2964:
--
Labels: 2.6.1-candidate (was: )

RM prematurely cancels tokens for jobs that submit jobs (oozie)
---

Key: YARN-2964
URL: https://issues.apache.org/jira/browse/YARN-2964
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
Labels: 2.6.1-candidate
Fix For: 2.7.0

Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch

The RM used to globally track the unique set of tokens for all apps. It
remembered the first job that was submitted with the token. The first job
controlled the cancellation of the token. This prevented completion of
sub-jobs from canceling tokens used by the main job.
As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no
notion of the first/main job. This results in sub-jobs canceling tokens and
failing the main job and other sub-jobs. It also appears to schedule
multiple redundant renewals.
The issue is not immediately obvious because the RM will cancel tokens ~10
min (NM livelyness interval) after log aggregation completes. The result is
an oozie job, ex. pig, that will launch many sub-jobs over time will fail if
any sub-jobs are launched 10 min after any sub-job completes. If all other
sub-jobs complete within that 10 min window, then the issue goes unnoticed.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-07-17 Thread zhihai xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632280#comment-14632280
 ] 

zhihai xu commented on YARN-3798:
-

Thanks for the new patch [~ozawa]! the patch looks good to me except two nits:
# Using {{rc == Code.OK.intValue()}} instead of {{rc == 0}} may be more 
maintainable and readable when checking the return value from AsyncCallback.
# It may be better to add {{Thread.currentThread().interrupt();}} to restore 
the interrupted status after catching InterruptedException from 
{{syncInternal}}.

 ZKRMStateStore shouldn't create new session without occurrance of 
 SESSIONEXPIED
 ---

 Key: YARN-3798
 URL: https://issues.apache.org/jira/browse/YARN-3798
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Assignee: Varun Saxena
Priority: Blocker
 Attachments: RM.log, YARN-3798-2.7.002.patch, 
 YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, 
 YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.005.patch, 
 YARN-3798-branch-2.7.patch


 RM going down with NoNode exception during create of znode for appattempt
 *Please find the exception logs*
 {code}
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-09 10:09:44,886 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
   at java.lang.Thread.run(Thread.java:745)
 2015-06-09 10:09:44,887 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
 out ZK retries. Giving up!
 2015-06-09 10:09:44,887 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 updating appAttempt: appattempt_1433764310492_7152_01
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at

[jira] [Updated] (YARN-2890) MiniYarnCluster should turn on timeline service if configured to do so

2015-07-17 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-2890:
--
Labels: 2.6.1-candidate  (was: )

 MiniYarnCluster should turn on timeline service if configured to do so
 --

 Key: YARN-2890
 URL: https://issues.apache.org/jira/browse/YARN-2890
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: 2.6.1-candidate
 Fix For: 2.8.0

 Attachments: YARN-2890.1.patch, YARN-2890.2.patch, YARN-2890.3.patch, 
 YARN-2890.4.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
 YARN-2890.patch, YARN-2890.patch


 Currently the MiniMRYarnCluster does not consider the configuration value for 
 enabling timeline service before starting. The MiniYarnCluster should only 
 start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster

2015-07-17 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-2859:
--
Labels: 2.6.1-candidate  (was: )

 ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
 --

 Key: YARN-2859
 URL: https://issues.apache.org/jira/browse/YARN-2859
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Hitesh Shah
Assignee: Zhijie Shen
Priority: Critical
  Labels: 2.6.1-candidate

 In mini cluster, a random port should be used. 
 Also, the config is not updated to the host that the process got bound to.
 {code}
 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
 (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer 
 address: localhost:10200
 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
 (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer 
 web address: 0.0.0.0:8188
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3535) Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED

2015-07-17 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631832#comment-14631832
 ] 

Jason Lowe commented on YARN-3535:
--

Should this go in to 2.7.2?  It's been seen by multiple users and seems 
appropriate for that release.

 Scheduler must re-request container resources when RMContainer transitions 
 from ALLOCATED to KILLED
 ---

 Key: YARN-3535
 URL: https://issues.apache.org/jira/browse/YARN-3535
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, fairscheduler, resourcemanager
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Peng Zhang
Priority: Critical
 Fix For: 2.8.0

 Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
 0005-YARN-3535.patch, 0006-YARN-3535.patch, YARN-3535-001.patch, 
 YARN-3535-002.patch, syslog.tgz, yarn-app.log


 During rolling update of NM, AM start of container on NM failed. 
 And then job hang there.
 Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631911#comment-14631911
 ] 

Wangda Tan commented on YARN-2003:
--

It seems latest tests are all passed. But [~sunilg], for the failed test of 
previous build, it reports:
{code}
Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 60.123 sec  
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority
testPriorityWithPendingApplications(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority)
  Time elapsed: 48.422 sec   FAILURE!
java.lang.AssertionError: Attempt state is not correct (timedout): expected: 
ALLOCATED actual: SCHEDULED
at org.junit.Assert.fail(Assert.java:88)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:98)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:573)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority.testPriorityWithPendingApplications(TestApplicationPriority.java:315)
{code}
Is it caused by your patch or implementation of MockRM since it is related to 
your changes.

 Support for Application priority : Changes in RM and Capacity Scheduler
 ---

 Key: YARN-2003
 URL: https://issues.apache.org/jira/browse/YARN-2003
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 
 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 
 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 
 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 
 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 
 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 
 0018-YARN-2003.patch, 0019-YARN-2003.patch, 0020-YARN-2003.patch, 
 0021-YARN-2003.patch, 0022-YARN-2003.patch, 0023-YARN-2003.patch


 AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from 
 Submission Context and store.
 Later this can be used by Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3934) Application with large ApplicationSubmissionContext can cause RM to exit when ZK store is used

[
https://issues.apache.org/jira/browse/YARN-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632017#comment-14632017
]

Ming Ma commented on YARN-3934:
---

This is due to a single ASC object size. You can repro this with RM starting
with empty state. So it is different from YARN-2962.

Application with large ApplicationSubmissionContext can cause RM to exit when
ZK store is used
--

Key: YARN-3934
URL: https://issues.apache.org/jira/browse/YARN-3934
Project: Hadoop YARN
Issue Type: Bug
Reporter: Ming Ma

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3700) ATS Web Performance issue at load time when large number of jobs


 [ 
https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3700:
--
Labels: 2.6.1-candidate 2.7.2-candidate  (was: )

 ATS Web Performance issue at load time when large number of jobs
 

 Key: YARN-3700
 URL: https://issues.apache.org/jira/browse/YARN-3700
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
  Labels: 2.6.1-candidate, 2.7.2-candidate
 Fix For: 2.8.0

 Attachments: YARN-3700.1.patch, YARN-3700.2.1.patch, 
 YARN-3700.2.2.patch, YARN-3700.2.patch, YARN-3700.3.patch, YARN-3700.4.patch


 Currently, we will load all the apps when we try to load the yarn 
 timelineservice web page. If we have large number of jobs, it will be very 
 slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2816) NM fail to start with NPE during container recovery


 [ 
https://issues.apache.org/jira/browse/YARN-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2816:
--
Labels: 2.6.1-candidate  (was: )

 NM fail to start with NPE during container recovery
 ---

 Key: YARN-2816
 URL: https://issues.apache.org/jira/browse/YARN-2816
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: YARN-2816.000.patch, YARN-2816.001.patch, 
 YARN-2816.002.patch, leveldb_records.txt


 NM fail to start with NPE during container recovery.
 We saw the following crash happen:
 2014-10-30 22:22:37,211 INFO org.apache.hadoop.service.AbstractService: 
 Service 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
  failed in state INITED; cause: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:289)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:252)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:235)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:250)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:445)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:492)
 The reason is some DB files used in NMLeveldbStateStoreService are 
 accidentally deleted to save disk space at 
 /tmp/hadoop-yarn/yarn-nm-recovery/yarn-nm-state. This leaves some incomplete 
 container record which don't have CONTAINER_REQUEST_KEY_SUFFIX(startRequest) 
 entry in the DB. When container is recovered at 
 ContainerManagerImpl#recoverContainer, 
 The NullPointerException at the following code cause NM shutdown.
 {code}
 StartContainerRequest req = rcs.getStartRequest();
 ContainerLaunchContext launchContext = req.getContainerLaunchContext();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1645) ContainerManager implementation to support container resizing


[ 
https://issues.apache.org/jira/browse/YARN-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632029#comment-14632029
 ] 

Hadoop QA commented on YARN-1645:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 13s | Findbugs (version ) appears to 
be broken on YARN-1197. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   7m 40s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 19s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  9s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 21s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 12s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |   6m 19s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  42m 50s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestContainerManager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745884/YARN-1645-YARN-1197.4.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-1197 / 8041fd8 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8577/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8577/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8577/console |


This message was automatically generated.

 ContainerManager implementation to support container resizing
 -

 Key: YARN-1645
 URL: https://issues.apache.org/jira/browse/YARN-1645
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Wangda Tan
Assignee: MENG DING
 Attachments: YARN-1645-YARN-1197.3.patch, 
 YARN-1645-YARN-1197.4.patch, YARN-1645.1.patch, YARN-1645.2.patch, 
 yarn-1645.1.patch


 Implementation of ContainerManager for container resize, including:
 1) ContainerManager resize logic 
 2) Relevant test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-17 Thread Sangjin Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3908:
--
Attachment: YARN-3908-YARN-2928.005.patch

v.5 patch posted

The {{readTimeseriesResults()}} method has been renamed to 
{{readResultsWithTimestamps()}}. Hopefully it's bit more appropriate.

 Bugs in HBaseTimelineWriterImpl
 ---

 Key: YARN-3908
 URL: https://issues.apache.org/jira/browse/YARN-3908
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Attachments: YARN-3908-YARN-2928.001.patch, 
 YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, 
 YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, 
 YARN-3908-YARN-2928.005.patch


 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
 fields of a timeline entity plus events. However, entity#info map is not 
 stored at all.
 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3654) ContainerLogsPage web UI should not have meta-refresh


 [ 
https://issues.apache.org/jira/browse/YARN-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3654:
--
Labels: 2.7.2-candidate  (was: )

 ContainerLogsPage web UI should not have meta-refresh
 -

 Key: YARN-3654
 URL: https://issues.apache.org/jira/browse/YARN-3654
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.1
Reporter: Xuan Gong
Assignee: Xuan Gong
  Labels: 2.7.2-candidate
 Fix For: 2.8.0

 Attachments: YARN-3654.1.patch, YARN-3654.2.patch


 Currently, When we try to find the container logs for the finished 
 application, it will re-direct to the url which we re-configured for 
 yarn.log.server.url in yarn-site.xml. But in ContainerLogsPage, we are using 
 meta-refresh:
 {code}
 set(TITLE, join(Redirecting to log server for , $(CONTAINER_ID)));
 html.meta_http(refresh, 1; url= + redirectUrl);
 {code}
 which is not good for some browsers which need to enable the meta-refresh in 
 their security setting, especially for IE which meta-refresh is considered a 
 security hole.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632079#comment-14632079
 ] 

Sunil G commented on YARN-2003:
---

Hi Wangda. 
This issue in MockRM was intermittent issue we faced early. This random failure 
was supposed to be fixed in YARN-3533. this is not happened because of my 
change as I have not added any new api in MockRM now. YARN-3533 fixed this 
issue in launchAM. May be issue is there for send AM launched. I ll check this 
and if needed will open a test ticket to track this. 

 Support for Application priority : Changes in RM and Capacity Scheduler
 ---

 Key: YARN-2003
 URL: https://issues.apache.org/jira/browse/YARN-2003
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 
 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 
 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 
 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 
 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 
 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 
 0018-YARN-2003.patch, 0019-YARN-2003.patch, 0020-YARN-2003.patch, 
 0021-YARN-2003.patch, 0022-YARN-2003.patch, 0023-YARN-2003.patch


 AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from 
 Submission Context and store.
 Later this can be used by Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby


 [ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2340:
--
Labels: 2.6.1-candidate  (was: )

 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith Sharma K S
Priority: Critical
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created


 [ 
https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2414:
--
Labels: 2.6.1-candidate  (was: )

 RM web UI: app page will crash if app is failed before any attempt has been 
 created
 ---

 Key: YARN-2414
 URL: https://issues.apache.org/jira/browse/YARN-2414
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Zhijie Shen
Assignee: Wangda Tan
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: YARN-2414.20141104-1.patch, YARN-2414.20141104-2.patch, 
 YARN-2414.patch


 {code}
 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /cluster/app/application_1407887030038_0001
 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.handle(Server.java:326)
   at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
   at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
   at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
   at

[jira] [Updated] (YARN-3227) Timeline renew delegation token fails when RM user's TGT is expired


 [ 
https://issues.apache.org/jira/browse/YARN-3227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3227:
--
Labels: 2.6.1-candidate  (was: )

 Timeline renew delegation token fails when RM user's TGT is expired
 ---

 Key: YARN-3227
 URL: https://issues.apache.org/jira/browse/YARN-3227
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Zhijie Shen
Priority: Critical
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: YARN-3227.1.patch, YARN-3227.test.patch


 When the RM user's kerberos TGT is expired, the RM renew delegation token 
 operation fails as part of job submission. Expected behavior is that RM will 
 relogin to get a new TGT.
 {quote}
 2015-02-06 18:54:05,617 [DelegationTokenRenewer #25954] WARN
 security.DelegationTokenRenewer: Unable to add the application to the
 delegation token renewer.
 java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN,
 Service: timelineserver.example.com:4080, Ident: (owner=user,
 renewer=rmuser, realUser=oozie, issueDate=1423248845528,
 maxDate=1423853645528, sequenceNumber=9716, masterKeyId=9)
 at
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:443)
 at
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$800(DelegationTokenRenewer.java:77)
 at
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:808)
 at
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:789)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: java.io.IOException: HTTP status [401], message [Unauthorized]
 at
 org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:169)
 at
 org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:286)
 at
 org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:211)
 at
 org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:414)
 at
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:374)
 at
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:360)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
 at
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$4.run(TimelineClientImpl.java:429)
 at
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:161)
 at
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:444)
 at
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:378)
 at
 org.apache.hadoop.yarn.security.client.TimelineDelegationTokenIdentifier$Renewer.renew(TimelineDelegationTokenIdentifier.java:81)
 at org.apache.hadoop.security.token.Token.renew(Token.java:377)
 at
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:532)
 at
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:529)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3393) Getting application(s) goes wrong when app finishes before starting the attempt


 [ 
https://issues.apache.org/jira/browse/YARN-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3393:
--
Labels: 2.6.1-candidate  (was: )

 Getting application(s) goes wrong when app finishes before starting the 
 attempt
 ---

 Key: YARN-3393
 URL: https://issues.apache.org/jira/browse/YARN-3393
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: YARN-3393.1.patch


 When generating app report in ApplicationHistoryManagerOnTimelineStore, it 
 checks if appAttempt == null.
 {code}
 ApplicationAttemptReport appAttempt = 
 getApplicationAttempt(app.appReport.getCurrentApplicationAttemptId());
 if (appAttempt != null) {
   app.appReport.setHost(appAttempt.getHost());
   app.appReport.setRpcPort(appAttempt.getRpcPort());
   app.appReport.setTrackingUrl(appAttempt.getTrackingUrl());
   
 app.appReport.setOriginalTrackingUrl(appAttempt.getOriginalTrackingUrl());
 }
 {code}
 However, {{getApplicationAttempt}} doesn't return null but throws 
 ApplicationAttemptNotFoundException:
 {code}
 if (entity == null) {
   throw new ApplicationAttemptNotFoundException(
   The entity for application attempt  + appAttemptId +
doesn't exist in the timeline store);
 } else {
   return convertToApplicationAttemptReport(entity);
 }
 {code}
 They code isn't coupled well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop


[ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631816#comment-14631816
 ] 

Hadoop QA commented on YARN-3878:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | patch |   0m  1s | The patch file was not named 
according to hadoop's naming conventions. Please see 
https://wiki.apache.org/hadoop/HowToContribute for instructions. |
| {color:red}-1{color} | pre-patch |  15m  1s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:red}-1{color} | javac |   7m 37s | The applied patch generated  1  
additional warning messages. |
| {color:green}+1{color} | javadoc |   9m 42s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 25s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 21s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 33s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |   1m 57s | Tests failed in 
hadoop-yarn-common. |
| | |  38m 34s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.event.TestAsyncDispatcher |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745865/YARN-3878.09_reprorace.pat_h
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 419c51d |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/8576/artifact/patchprocess/diffJavacWarnings.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8576/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8576/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8576/console |


This message was automatically generated.

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, 
 YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, 
 YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, 
 YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at

[jira] [Updated] (YARN-3216) Max-AM-Resource-Percentage should respect node labels


 [ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3216:
-
Description: Currently, max-am-resource-percentage considers 
default_partition only. When a queue can access multiple partitions, we should 
be able to compute max-am-resource-percentage based on that.

 Max-AM-Resource-Percentage should respect node labels
 -

 Key: YARN-3216
 URL: https://issues.apache.org/jira/browse/YARN-3216
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan

 Currently, max-am-resource-percentage considers default_partition only. When 
 a queue can access multiple partitions, we should be able to compute 
 max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs


[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632125#comment-14632125
 ] 

Jian He commented on YARN-2005:
---

Seems the patch will blacklist a node immediately once the AM container fails,  
I think we may black list a node only after a configurable threshold ? Some 
apps may still like to be re-started on the same node for reasons like data 
locality - AM does not want to transfer the local data to a different machine 
when restarted.

 Blacklisting support for scheduling AMs
 ---

 Key: YARN-2005
 URL: https://issues.apache.org/jira/browse/YARN-2005
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 0.23.10, 2.4.0
Reporter: Jason Lowe
Assignee: Anubhav Dhoot
 Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
 YARN-2005.003.patch, YARN-2005.004.patch


 It would be nice if the RM supported blacklisting a node for an AM launch 
 after the same node fails a configurable number of AM attempts.  This would 
 be similar to the blacklisting support for scheduling task attempts in the 
 MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1645) ContainerManager implementation to support container resizing

2015-07-17 Thread MENG DING (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MENG DING updated YARN-1645:

Attachment: YARN-1645-YARN-1197.5.patch

I think you are right that these functions don't need to be synchronized. 
Originally I was directly modifying container sizes in 
{{changeContainerResourceInternal}}, so I thought I need to synchronize 
functions that may potentially access the same containers. This is no longer 
the case as container size are now changed in ContainerImpl via events, and 
access to a container is already properly synchronized in ContainerImpl.

Thanks for catching this. Attach updated patch.

 ContainerManager implementation to support container resizing
 -

 Key: YARN-1645
 URL: https://issues.apache.org/jira/browse/YARN-1645
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Wangda Tan
Assignee: MENG DING
 Attachments: YARN-1645-YARN-1197.3.patch, 
 YARN-1645-YARN-1197.4.patch, YARN-1645-YARN-1197.5.patch, YARN-1645.1.patch, 
 YARN-1645.2.patch, yarn-1645.1.patch


 Implementation of ContainerManager for container resize, including:
 1) ContainerManager resize logic 
 2) Relevant test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1645) ContainerManager implementation to support container resizing


[ 
https://issues.apache.org/jira/browse/YARN-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632037#comment-14632037
 ] 

Jian He commented on YARN-1645:
---

looks good overall, one question is:
- why is this changed to be synchronized?
{code}
 private synchronized void stopContainerInternal(
private synchronized ContainerStatus getContainerStatusInternal(
{code}

 ContainerManager implementation to support container resizing
 -

 Key: YARN-1645
 URL: https://issues.apache.org/jira/browse/YARN-1645
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Wangda Tan
Assignee: MENG DING
 Attachments: YARN-1645-YARN-1197.3.patch, 
 YARN-1645-YARN-1197.4.patch, YARN-1645.1.patch, YARN-1645.2.patch, 
 yarn-1645.1.patch


 Implementation of ContainerManager for container resize, including:
 1) ContainerManager resize logic 
 2) Relevant test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3900) Protobuf layout of yarn_security_token causes errors in other protos that include it


[ 
https://issues.apache.org/jira/browse/YARN-3900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632042#comment-14632042
 ] 

Jian He commented on YARN-3900:
---

lgtm

 Protobuf layout  of yarn_security_token causes errors in other protos that 
 include it
 -

 Key: YARN-3900
 URL: https://issues.apache.org/jira/browse/YARN-3900
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3900.001.patch, YARN-3900.001.patch, 
 YARN-3900.002.patch


 Because of the subdirectory server used in 
 {{hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto}}
  there are errors in other protos that include them.
 As per the docs http://sergei-ivanov.github.io/maven-protoc-plugin/usage.html 
 {noformat} Any subdirectories under src/main/proto/ are treated as package 
 structure for protobuf definition imports.{noformat}
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1645) ContainerManager implementation to support container resizing


[ 
https://issues.apache.org/jira/browse/YARN-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632200#comment-14632200
 ] 

Hadoop QA commented on YARN-1645:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 29s | Findbugs (version ) appears to 
be broken on YARN-1197. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   7m 47s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 49s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 20s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  8s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 24s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 13s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   6m 17s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  43m 26s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745917/YARN-1645-YARN-1197.5.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-1197 / 8041fd8 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8579/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8579/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8579/console |


This message was automatically generated.

 ContainerManager implementation to support container resizing
 -

 Key: YARN-1645
 URL: https://issues.apache.org/jira/browse/YARN-1645
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Wangda Tan
Assignee: MENG DING
 Attachments: YARN-1645-YARN-1197.3.patch, 
 YARN-1645-YARN-1197.4.patch, YARN-1645-YARN-1197.5.patch, YARN-1645.1.patch, 
 YARN-1645.2.patch, yarn-1645.1.patch


 Implementation of ContainerManager for container resize, including:
 1) ContainerManager resize logic 
 2) Relevant test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-3938) AM Resources for leaf queues zero when DEFAULT PARTITION resource is zero with NodeLabel


 [ 
https://issues.apache.org/jira/browse/YARN-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan resolved YARN-3938.
--
Resolution: Duplicate

I just found I filed one JIRA for this issue before, which is YARN-3216. 
Closing this as duplicated. Thanks for reporting, [~bibinchundatt].

 AM Resources for leaf queues zero when DEFAULT PARTITION resource is zero 
 with NodeLabel
 

 Key: YARN-3938
 URL: https://issues.apache.org/jira/browse/YARN-3938
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: Am limit for subqueue.jpg


 In case of leaf queue  the AM resource calculation is based on 
 {{absoluteCapacityResource}}. Below is the calculation for absolute capacity
 {{LeafQueue#updateAbsoluteCapacityResource()}}
 {code}
   private void updateAbsoluteCapacityResource(Resource clusterResource) {
 absoluteCapacityResource =
 Resources.multiplyAndNormalizeUp(resourceCalculator, labelManager
 .getResourceByLabel(RMNodeLabelsManager.NO_LABEL, 
 clusterResource),
 queueCapacities.getAbsoluteCapacity(), minimumAllocation);
   }
 {code}
 If default partition resource is zero for all Leaf queue the resource for AM 
 will be zero
 Snapshot also attached for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice


 [ 
https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2246:
--
Labels: 2.6.1-candidate  (was: )

 Job History Link in RM UI is redirecting to the URL which contains Job Id 
 twice
 ---

 Key: YARN-2246
 URL: https://issues.apache.org/jira/browse/YARN-2246
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Devaraj K
Assignee: Devaraj K
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch, 
 YARN-2246-3.patch, YARN-2246-4.patch, YARN-2246.2.patch, YARN-2246.patch


 {code:xml}
 http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params


 [ 
https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3239:
--
Labels: 2.6.1-candidate  (was: )

 WebAppProxy does not support a final tracking url which has query fragments 
 and params 
 ---

 Key: YARN-3239
 URL: https://issues.apache.org/jira/browse/YARN-3239
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jian He
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: YARN-3239.1.patch


 Examples of failures:
 Expected: 
 {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}}
 Actual: {{http://uihost:8080}}
 Tried with a minor change to remove the #. Saw a different issue:
 Expected: 
 {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}}
 Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}}
 yarn application -status appId returns the expected value correctly. However, 
 invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3207) secondary filter matches entites which do not have the key being filtered for.


 [ 
https://issues.apache.org/jira/browse/YARN-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3207:
--
Labels: 2.6.1-candidate  (was: )

 secondary filter matches entites which do not have the key being filtered for.
 --

 Key: YARN-3207
 URL: https://issues.apache.org/jira/browse/YARN-3207
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Prakash Ramachandran
Assignee: Zhijie Shen
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: YARN-3207.1.patch


 in the leveldb implementation of the TimelineStore the secondary filter 
 matches entities where the key being searched for is not present.
 ex query from tez ui
 http://uvm:8188/ws/v1/timeline/TEZ_DAG_ID/?limit=1secondaryFilter=foo:bar
 will match and return the entity even though there is no entity with 
 otherinfo.foo defined.
 the issue seems to be in 
 {code:title=LeveldbTimelineStore:675}
 if (vs != null  !vs.contains(filter.getValue())) {
   filterPassed = false;
   break;
 }
 {code}
 this should be IMHO
 vs == null || !vs.contains(filter.getValue())



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3936) Add metrics for RMStateStore


 [ 
https://issues.apache.org/jira/browse/YARN-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-3936:
-

Assignee: Sunil G

 Add metrics for RMStateStore
 

 Key: YARN-3936
 URL: https://issues.apache.org/jira/browse/YARN-3936
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Sunil G

 It might be useful to collect some metrics w.r.t. RMStateStore such as:
 * Write latency
 * The ApplicationStateData size distribution



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1645) ContainerManager implementation to support container resizing

2015-07-17 Thread MENG DING (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MENG DING updated YARN-1645:

Attachment: YARN-1645-YARN-1197.4.patch

The {{testChangeContainerResource}} has dependency on YARN-3867 and YARN-1643. 
Will move the test case to YARN-1643.

 ContainerManager implementation to support container resizing
 -

 Key: YARN-1645
 URL: https://issues.apache.org/jira/browse/YARN-1645
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Wangda Tan
Assignee: MENG DING
 Attachments: YARN-1645-YARN-1197.3.patch, 
 YARN-1645-YARN-1197.4.patch, YARN-1645.1.patch, YARN-1645.2.patch, 
 yarn-1645.1.patch


 Implementation of ContainerManager for container resize, including:
 1) ContainerManager resize logic 
 2) Relevant test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels


[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631928#comment-14631928
 ] 

Wangda Tan commented on YARN-3216:
--

There're two approaches of doing that,
- Make maxAMResource = queue's-total-guaranteed-resource (Sum of queue's 
guaranteed resource on all partitions) * maxAmResourcePercent. It will be 
straightforward, but also can lead to too many AMs launched under a single 
partition.
- Make maxAMResource computed per queue per partition, this can make AM usages 
under partitions are more balanced, but can also lead to hard debugging (My 
application get stuck because of AMResourceLimit for a partition is violated).

I prefer 1st solution since it's easier to understand and debugging.

 Max-AM-Resource-Percentage should respect node labels
 -

 Key: YARN-3216
 URL: https://issues.apache.org/jira/browse/YARN-3216
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan

 Currently, max-am-resource-percentage considers default_partition only. When 
 a queue can access multiple partitions, we should be able to compute 
 max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2905) AggregatedLogsBlock page can infinitely loop if the aggregated log file is corrupted


 [ 
https://issues.apache.org/jira/browse/YARN-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2905:
--
Labels: 2.6.1-candidate  (was: )

 AggregatedLogsBlock page can infinitely loop if the aggregated log file is 
 corrupted
 

 Key: YARN-2905
 URL: https://issues.apache.org/jira/browse/YARN-2905
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Varun Saxena
Priority: Blocker
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: YARN-2905.patch


 If the AggregatedLogsBlock page tries to serve up a portion of a log file 
 that has been corrupted (e.g.: like the case that was fixed by YARN-2724) 
 then it can spin forever trying to seek to the targeted log segment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2917) Potential deadlock in AsyncDispatcher when system.exit called in AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook


 [ 
https://issues.apache.org/jira/browse/YARN-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2917:
--
Labels: 2.6.1-candidate  (was: )

 Potential deadlock in AsyncDispatcher when system.exit called in 
 AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook
 

 Key: YARN-2917
 URL: https://issues.apache.org/jira/browse/YARN-2917
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
Priority: Critical
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: 0001-YARN-2917.patch, 0002-YARN-2917.patch


 I encoutered scenario where RM hanged while shutting down and keep on logging 
 {{2014-12-03 19:32:44,283 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Waiting for AsyncDispatcher to drain.}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-17 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631826#comment-14631826
 ] 

Sangjin Lee commented on YARN-3908:
---

Thanks for the comment [~gtCarrera9]. I agree the name is bit awkward. Let me 
see if I can rename it to something more appropriate. Will update.

 Bugs in HBaseTimelineWriterImpl
 ---

 Key: YARN-3908
 URL: https://issues.apache.org/jira/browse/YARN-3908
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Attachments: YARN-3908-YARN-2928.001.patch, 
 YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, 
 YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch


 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
 fields of a timeline entity plus events. However, entity#info map is not 
 stored at all.
 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop