date:20151209

[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2015-12-09 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048436#comment-15048436
 ] 

Karthik Kambatla commented on YARN-2975:


Makes sense. [~sjlee0] - want to do the honors? 

> FSLeafQueue app lists are accessed without required locks
> -
>
> Key: YARN-2975
> URL: https://issues.apache.org/jira/browse/YARN-2975
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: yarn-2975-1.patch, yarn-2975-2.patch, yarn-2975-3.patch
>
>
> YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
> FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
> without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4427) NPE on handleNMContainerStatus when NM is registering to RM

2015-12-09 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048397#comment-15048397
 ] 

Brahma Reddy Battula commented on YARN-4427:


[~sunilg] Thanks for taking a look into this issue.. final state was null.. 

 *Resource Manger Log:* 
{noformat}
2015-12-05 15:22:00,780 | INFO  | main-EventThread | Recovering app: 
application_1449190041506_0175 with 0 attempts and final state = null | 
RMAppImpl.java:790
2015-12-05 15:26:12,433 | INFO  | main-EventThread | Recovering app: 
application_1449190041506_0175 with 0 attempts and final state = null | 
RMAppImpl.java:790
2015-12-05 15:32:46,810 | INFO  | main-EventThread | Recovering app: 
application_1449190041506_0175 with 0 attempts and final state = null | 
RMAppImpl.java:790
2015-12-05 15:45:33,199 | INFO  | main-EventThread | Recovering app: 
application_1449190041506_0175 with 0 attempts and final state = null | 
RMAppImpl.java:790
2015-12-05 16:11:41,556 | INFO  | main-EventThread | Recovering app: 
application_1449190041506_0175 with 0 attempts and final state = null | 
RMAppImpl.java:790
2015-12-05 16:14:28,228 | INFO  | AsyncDispatcher event handler | Storing 
attempt: AppId: application_1449190041506_0175 AttemptId: 
appattempt_1449190041506_0175_01 MasterContainer: Container: [ContainerId: 
container_1449190041506_0175_01_01, NodeId: 9-91-8-220:26009, 
NodeHttpAddress: 9-91-8-220:26010, Resource: , Priority: 
0, Token: Token { kind: ContainerToken, service: **.**.**.220:26009 }, ] | 
RMAppAttemptImpl.java:1959
2015-12-05 16:17:26,811 | INFO  | main-EventThread | Recovering app: 
application_1449190041506_0175 with 0 attempts and final state = null | 
RMAppImpl.java:790
{noformat}

 *ZNode Creation:* 

{noformat}
2015-12-05 16:14:28,241 | INFO  | CommitProcWorkThread-38 | 
session=0x130006c883cb000d  ip=**.**.**.217 operation=create znode  
target=ZooKeeperServer  
znode=/rmstore/ZKRMStateRoot/RMAppRoot/application_1449190041506_0175/appattempt_1449190041506_0175_01
  result=success | 
org.apache.zookeeper.ZKAuditLogger$LogLevel$5.printLog(ZKAuditLogger.java:70)
{noformat}

 *RM Frequent connections to different ZK:* 

{noformat}
2015-12-05 16:15:46,240 | INFO  | main-SendThread(9-91-8-217:24002) | Socket 
connection established, initiating session, client: /**.**.**.217:53843, 
server: host-217/**.**.**.217:24002 | ClientCnxn.java:1021
2015-12-05 16:15:46,810 | INFO  | main-SendThread(host-208:24002) | Socket 
connection established, initiating session, client: /**.**.**.217:46137, 
server: host-208/**.**.**.208:24002 | ClientCnxn.java:1021
2015-12-05 16:15:47,021 | INFO  | main-SendThread(host-220:24002) | Socket 
connection established, initiating session, client: /**.**.**.217:39488, 
server: host-220/**.**.**.220:24002 | ClientCnxn.java:1021
2015-12-05 16:15:47,265 | INFO  | main-SendThread(host-220:24002) | Socket 
connection established, initiating session, client: /**.**.**.217:39491, 
server: host-220/**.**.**.220:24002 | ClientCnxn.java:1021
2015-12-05 16:15:55,632 | INFO  | main-SendThread(host-218:24002) | Socket 
connection established, initiating session, client: /**.**.**.217:56809, 
server: host-218/**.**.**.218:24002 | ClientCnxn.java:1021
2015-12-05 16:16:33,619 | INFO  | main-SendThread(host-219:24002) | Socket 
connection established, initiating session, client: /**.**.**.217:48605, 
server: host-219/**.**.**.219:24002 | ClientCnxn.java:1021
2015-12-05 16:17:18,369 | INFO  | main-SendThread(host-220:24002) | Socket 
connection established, initiating session, client: /**.**.**.217:40376, 
server: host-220/**.**.**.220:24002 | ClientCnxn.java:1021
{noformat}

 *NPE In the RM:* 

{noformat}
2015-12-05 16:17:29,385 | WARN  | IPC Server handler 1 on 26003 | IPC Server 
handler 1 on 26003, call 
org.apache.hadoop.yarn.server.api.ResourceTrackerPB.registerNodeManager from 
**.**.**.220:49704 Call#4003 Retry#0 | Server.java:2107
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.handleNMContainerStatus(ResourceTrackerService.java:286)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.registerNodeManager(ResourceTrackerService.java:395)
at 
org.apache.hadoop.yarn.server.api.impl.pb.service.ResourceTrackerPBServiceImpl.registerNodeManager(ResourceTrackerPBServiceImpl.java:54)
at 
org.apache.hadoop.yarn.proto.ResourceTracker$ResourceTrackerService$2.callBlockingMethod(ResourceTracker.java:79)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:973)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2088)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2084)
at java.security.AccessController.doPrivileged(Native Method)
at

[jira] [Commented] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent

2015-12-09 Thread Carlo Curino (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048444#comment-15048444
 ] 

Carlo Curino commented on YARN-4358:


Thanks [~ajisakaa], we will get to this tomorrow.

> Improve relationship between SharingPolicy and ReservationAgent
> ---
>
> Key: YARN-4358
> URL: https://issues.apache.org/jira/browse/YARN-4358
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4358.2.patch, YARN-4358.3.patch, YARN-4358.4.patch, 
> YARN-4358.patch
>
>
> At the moment an agent places based on available resources, but has no 
> visibility to extra constraints imposed by the SharingPolicy. While not all 
> constraints are easily represented some (e.g., max-instantaneous resources) 
> are easily represented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2934) Improve handling of container's stderr

2015-12-09 Thread Gera Shegalov (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048280#comment-15048280
 ] 

Gera Shegalov commented on YARN-2934:
-

Thanks [~Naganarasimha]! I skimmed the patch, it is in a pretty good shape. 
Aiming to give you more detailed feedback over next few days.

> Improve handling of container's stderr 
> ---
>
> Key: YARN-2934
> URL: https://issues.apache.org/jira/browse/YARN-2934
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Gera Shegalov
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, 
> YARN-2934.v1.003.patch, YARN-2934.v1.004.patch, YARN-2934.v1.005.patch
>
>
> Most YARN applications redirect stderr to some file. That's why when 
> container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2934) Improve handling of container's stderr

2015-12-09 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048367#comment-15048367
 ] 

Naganarasimha G R commented on YARN-2934:
-

Thanks [~jira.shegalov] !

> Improve handling of container's stderr 
> ---
>
> Key: YARN-2934
> URL: https://issues.apache.org/jira/browse/YARN-2934
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Gera Shegalov
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, 
> YARN-2934.v1.003.patch, YARN-2934.v1.004.patch, YARN-2934.v1.005.patch
>
>
> Most YARN applications redirect stderr to some file. That's why when 
> container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4434) NodeManager Disk Checker parameter documentation is not correct

2015-12-09 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048313#comment-15048313
 ] 

Hudson commented on YARN-4434:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8947 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8947/])
YARN-4434. NodeManager Disk Checker parameter documentation is not (aajisaka: 
rev 50edcb947ccbb736924c43735d23f3c156961049)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManager.md
* hadoop-yarn-project/CHANGES.txt


> NodeManager Disk Checker parameter documentation is not correct
> ---
>
> Key: YARN-4434
> URL: https://issues.apache.org/jira/browse/YARN-4434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Affects Versions: 2.6.0, 2.7.1
>Reporter: Takashi Ohnishi
>Assignee: Weiwei Yang
>Priority: Minor
> Fix For: 2.8.0, 2.6.3, 2.7.3
>
> Attachments: YARN-4434.001.patch, YARN-4434.branch-2.6.patch
>
>
> In the description of 
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage,
>  it says
> {noformat}
> The default value is 100 i.e. the entire disk can be used.
> {noformat}
> But, in yarn-default.xml and source code, the default value is 90.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4301) NM disk health checker should have a timeout

2015-12-09 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048245#comment-15048245
 ] 

Tsuyoshi Ozawa commented on YARN-4301:
--

{quote}
it maybe change the behaviour of NM_MIN_HEALTHY_DISKS_FRACTION, could we add a 
timeout to mkdir? if mkdir timeout, the disk is treated as a failed disk.
{quote}

+1 for the suggestion by [~sandflee]. 

> NM disk health checker should have a timeout
> 
>
> Key: YARN-4301
> URL: https://issues.apache.org/jira/browse/YARN-4301
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akihiro Suda
>Assignee: Akihiro Suda
> Attachments: YARN-4301-1.patch, YARN-4301-2.patch, 
> concept-async-diskchecker.txt
>
>
> The disk health checker [verifies a 
> disk|https://github.com/apache/hadoop/blob/96677bef00b03057038157efeb3c2ad4702914da/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java#L371-L385]
>  by executing {{mkdir}} and {{rmdir}} periodically.
> If these operations does not return in a moderate timeout, the disk should be 
> marked bad, and thus {{nodeInfo.nodeHealthy}} should flip to {{false}}.
> I confirmed that current YARN does not have an implicit timeout (on JDK7, 
> Linux 4.2, ext4) using [Earthquake|https://github.com/osrg/earthquake], our 
> fault injector for distributed systems.
> (I'll introduce the reproduction script in a while)
> I consider we can fix this issue by making 
> [{{NodeHealthCheckerServer.isHealthy()}}|https://github.com/apache/hadoop/blob/96677bef00b03057038157efeb3c2ad4702914da/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java#L69-L73]
>  return {{false}} if the value of {{this.getLastHealthReportTime()}} is too 
> old.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4431) Not necessary to do unRegisterNM() if NM get stop due to failed to connect to RM

2015-12-09 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048359#comment-15048359
 ] 

Hudson commented on YARN-4431:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #678 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/678/])
YARN-4431. Not necessary to do unRegisterNM() if NM get stop due to 
(rohithsharmaks: rev 15c3e7ffe3d1c57ad36afd993f09fc47889c93bd)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* hadoop-yarn-project/CHANGES.txt


> Not necessary to do unRegisterNM() if NM get stop due to failed to connect to 
> RM
> 
>
> Key: YARN-4431
> URL: https://issues.apache.org/jira/browse/YARN-4431
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.8.0
>
> Attachments: YARN-4431.patch
>
>
> {noformat}
> 2015-12-07 12:16:57,873 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: 0.0.0.0/0.0.0.0:8031. Already tried 8 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2015-12-07 12:16:58,874 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2015-12-07 12:16:58,876 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: 
> Unregistration of the Node 10.200.10.53:25454 failed.
> java.net.ConnectException: Call From jduMBP.local/10.200.10.53 to 
> 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: 
> Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> at sun.reflect.GeneratedConstructorAccessor30.newInstance(Unknown 
> Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
> at org.apache.hadoop.ipc.Client.call(Client.java:1452)
> at org.apache.hadoop.ipc.Client.call(Client.java:1385)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy74.unRegisterNodeManager(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.unRegisterNodeManager(ResourceTrackerPBClientImpl.java:98)
> at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:255)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> at com.sun.proxy.$Proxy75.unRegisterNodeManager(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.unRegisterNM(NodeStatusUpdaterImpl.java:267)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStop(NodeStatusUpdaterImpl.java:245)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
> at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:377)
> {noformat}
> If RM down for some reason, NM's NodeStatusUpdaterImpl will retry the 
> connection with proper retry policy. After retry the maximum times (15 
> minutes by default), it will send NodeManagerEventType.SHUTDOWN to shutdown 
> NM. But NM shutdown will call NodeStatusUpdaterImpl.serviceStop() which will 
> call unRegisterNM() to unregister NM from RM and get retry again (another 15 
> minutes). This is completely unnecessary and we should skip unRegisterNM when 
> NM get shutdown because of connection issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-4407) Support Resource oversubscription in YARN scheduler

2015-12-09 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved YARN-4407.

Resolution: Duplicate

> Support Resource oversubscription in YARN scheduler
> ---
>
> Key: YARN-4407
> URL: https://issues.apache.org/jira/browse/YARN-4407
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>
> Many long running services do not fully use their allocated resources all the 
> time. We could takes advantage of temporarily unused resources to execute low 
> priority jobs such as background analytics, etc. This will definitely improve 
> the resource utilization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-09 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048671#comment-15048671
 ] 

Junping Du commented on YARN-4356:
--

I would like to review this as many of this patch related to my previous work. 
[~gtCarrera9], would you hold on the commit?

> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4356-feature-YARN-2928.002.patch, 
> YARN-4356-feature-YARN-2928.003.patch, YARN-4356-feature-YARN-2928.004.patch, 
> YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4434) NodeManager Disk Checker parameter documentation is not correct

2015-12-09 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048635#comment-15048635
 ] 

Hudson commented on YARN-4434:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #679 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/679/])
YARN-4434. NodeManager Disk Checker parameter documentation is not (aajisaka: 
rev 50edcb947ccbb736924c43735d23f3c156961049)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManager.md
* hadoop-yarn-project/CHANGES.txt


> NodeManager Disk Checker parameter documentation is not correct
> ---
>
> Key: YARN-4434
> URL: https://issues.apache.org/jira/browse/YARN-4434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Affects Versions: 2.6.0, 2.7.1
>Reporter: Takashi Ohnishi
>Assignee: Weiwei Yang
>Priority: Minor
> Fix For: 2.8.0, 2.6.3, 2.7.3
>
> Attachments: YARN-4434.001.patch, YARN-4434.branch-2.6.patch
>
>
> In the description of 
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage,
>  it says
> {noformat}
> The default value is 100 i.e. the entire disk can be used.
> {noformat}
> But, in yarn-default.xml and source code, the default value is 90.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4421) Remove dead code in RmAppImpl.RMAppRecoveredTransition

2015-12-09 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048633#comment-15048633
 ] 

Hudson commented on YARN-4421:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #679 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/679/])
YARN-4421. Remove dead code in RmAppImpl.RMAppRecoveredTransition. 
(rohithsharmaks: rev a5e2e1ecb06a3942903cb79f61f0f4bb02480f19)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* hadoop-yarn-project/CHANGES.txt


> Remove dead code in RmAppImpl.RMAppRecoveredTransition
> --
>
> Key: YARN-4421
> URL: https://issues.apache.org/jira/browse/YARN-4421
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: YARN-4421.001.patch
>
>
> The {{transition()}} method contains the following:
> {code}
>   // Last attempt is in final state, return ACCEPTED waiting for last
>   // RMAppAttempt to send finished or failed event back.
>   if (app.currentAttempt != null
>   && (app.currentAttempt.getState() == RMAppAttemptState.KILLED
>   || app.currentAttempt.getState() == RMAppAttemptState.FINISHED
>   || (app.currentAttempt.getState() == RMAppAttemptState.FAILED
>   && app.getNumFailedAppAttempts() == app.maxAppAttempts))) {
> return RMAppState.ACCEPTED;
>   }
>   // YARN-1507 is saving the application state after the application is
>   // accepted. So after YARN-1507, an app is saved meaning it is accepted.
>   // Thus we return ACCECPTED state on recovery.
>   return RMAppState.ACCEPTED;
> {code}
> The {{if}} statement is fully redundant and can be eliminated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4431) Not necessary to do unRegisterNM() if NM get stop due to failed to connect to RM

2015-12-09 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048734#comment-15048734
 ] 

Junping Du commented on YARN-4431:
--

Thanks [~rohithsharma] for review and commit!

> Not necessary to do unRegisterNM() if NM get stop due to failed to connect to 
> RM
> 
>
> Key: YARN-4431
> URL: https://issues.apache.org/jira/browse/YARN-4431
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.8.0
>
> Attachments: YARN-4431.patch
>
>
> {noformat}
> 2015-12-07 12:16:57,873 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: 0.0.0.0/0.0.0.0:8031. Already tried 8 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2015-12-07 12:16:58,874 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2015-12-07 12:16:58,876 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: 
> Unregistration of the Node 10.200.10.53:25454 failed.
> java.net.ConnectException: Call From jduMBP.local/10.200.10.53 to 
> 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: 
> Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> at sun.reflect.GeneratedConstructorAccessor30.newInstance(Unknown 
> Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
> at org.apache.hadoop.ipc.Client.call(Client.java:1452)
> at org.apache.hadoop.ipc.Client.call(Client.java:1385)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy74.unRegisterNodeManager(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.unRegisterNodeManager(ResourceTrackerPBClientImpl.java:98)
> at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:255)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> at com.sun.proxy.$Proxy75.unRegisterNodeManager(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.unRegisterNM(NodeStatusUpdaterImpl.java:267)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStop(NodeStatusUpdaterImpl.java:245)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
> at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:377)
> {noformat}
> If RM down for some reason, NM's NodeStatusUpdaterImpl will retry the 
> connection with proper retry policy. After retry the maximum times (15 
> minutes by default), it will send NodeManagerEventType.SHUTDOWN to shutdown 
> NM. But NM shutdown will call NodeStatusUpdaterImpl.serviceStop() which will 
> call unRegisterNM() to unregister NM from RM and get retry again (another 15 
> minutes). This is completely unnecessary and we should skip unRegisterNM when 
> NM get shutdown because of connection issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4340) Add "list" API to reservation system

2015-12-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048684#comment-15048684
 ] 

Hadoop QA commented on YARN-4340:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 10 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
48s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
20s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
11s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
5s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 20s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 
51s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 53s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 8s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 9m 8s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 8s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 9m 25s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 29m 27s 
{color} | {color:red} root-jdk1.7.0_91 with JDK v1.7.0_91 generated 1 new 
issues (was 746, now 746). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
3s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 
55s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 3m 21s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66
 with JDK v1.8.0_66 generated 1 new issues (was 100, now 100). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 6s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 1s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 54s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:red}-1{color} |

[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-09 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048970#comment-15048970
 ] 

Sangjin Lee commented on YARN-4356:
---

The jenkins run above is suspect as it doesn't seem to match v.3. And the next 
run for v.4 was aborted.

Kicking off a new run for v.4.

> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4356-feature-YARN-2928.002.patch, 
> YARN-4356-feature-YARN-2928.003.patch, YARN-4356-feature-YARN-2928.004.patch, 
> YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-09 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049039#comment-15049039
 ] 

Junping Du commented on YARN-4356:
--

Thanks [~sjlee0] for delivering the patch. 004 patch looks good to me in 
overall, but some minor comments:
In JobHistoryEventHandler.java,
{code}
LOG.info("Emitting job history data to the timeline server is enabled");
{code}
server => service, given we don't provide a centralized server in v2.

{code} 
+LOG.info("Timeline service is enabled; version: " +
+(timelineServiceV2Enabled? "v2" : "v1"));
{code}
Shall we get configuration version from configuration defined in YARN-3623? 
Especially, we have other versions, like: v1.5.

TestMRTimelineEventHandling.java,
{code}
// enable new timeline serivce
{code}
typo: serivce => service

YarnConfiguration.java,
{code}
@return whether the timelien service is enabled.
{code}
typo: timelien => timeline

{code}
+  public static boolean timelineServiceV2Enabled(Configuration conf) {
+return timelineServiceEnabled(conf) && getTimelineServiceVersion(conf) == 
2;
+  }
{code}
Would it be possible to have ATS v2.5 in future? If so, may be we should cast 
float number get from version config before comparing with 2?


In NodeHeartbeatRequestPBImpl.java,
{code}
+  this.registeredCollectors = new HashMap<>();
Update new HashMap<>() => new HashMap ()
{code}

and the same problem in NodeHeartbeatResponsePBImpl.java
{code}
this.appCollectorsMap = new HashMap<>();
{code}

In NodeManager.java,
{code}
this.registeredCollectors = new ConcurrentHashMap<>();
{code}
We should also add back types as requirement/convension for generics.

{code}
+  if (this.registeredCollectors != null) {
+this.registeredCollectors.putAll(newRegisteredCollectors);
+  }
{code}
This check of null is unnecessary as the only caller - NMCollectorService is 
only running under v2 is enabled. If for some reason, we get NPE here which is 
still better than we ignore it silently.

In NodeStatusUpdaterImpl.java,
{code}
-  /**
-   * Caller should take care of sending non null nodelabels for both
-   * arguments
-   * 
-   * @param nodeLabelsNew
-   * @param nodeLabelsOld
-   * @return if the New node labels are diff from the older one.
-   */
-  private boolean areNodeLabelsUpdated(Set nodeLabelsNew,
-  Set nodeLabelsOld) {
-if (nodeLabelsNew.size() != nodeLabelsOld.size()
-|| !nodeLabelsOld.containsAll(nodeLabelsNew)) {
-  return true;
-}
-return false;
-  }
{code}
Please remove this unrelated change out for more focus and better tracking.

{code}
+!context.getRegisteredCollectors().containsKey(appId)) {
{code}
I think this logic could be problemtic if collector address get updated due to 
NM restart or collector service failure. However, this shouldn't be addressed 
in this JIRA. But kindly add a TODO would a good reminder. 

In ContainerManagerImpl.java,
{code}
+} else {
+  flowContext = null;
 }
{code}
this else branch is not necessary as we can define flowContext to be null at 
the beginning.

In ResourceManager.java,
{code}
+  if (version < 2 &&
...
+  } else if (version == 2 &&
{code}
Can we continuously to use YarnConfiguration.timelineServiceV2Enabled() in 
every place? Or we could miss these places if we need to change version logic 
in future.

In ResourceTrackerService.java,
{code}
 List keepAliveApps =
 remoteNodeStatus.getKeepAliveApplications();
-if (keepAliveApps != null) {
+if (timelineV2Enabled && keepAliveApps != null) {
{code}
Just a reminder, keepAliveApps is a wrong list to identify running apps on 
specific node. YARN-3586 (with patch) is already filed to fix this. We can 
either merge that patch in or rebase that patch when this patch done.

In TimelineServiceV2Publisher.java,
{code}
- * This class is responsible for posting application, appattempt & Container
+ * This class is responsible for posting application, appattempt  
Container
{code}
Why we need this change?

In PerNodeTimelineCollectorsAuxService.java,
{code}
+  // enable timeline service v.2
+  conf.setBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, true);
+  conf.setFloat(YarnConfiguration.TIMELINE_SERVICE_VERSION, 2.0f);
{code}
We should disable PerNodeTimelineCollectorsAuxService if we don't enable 
timeline service v2. Isn't it? If so, I think this is not a necessary change 
and we should remove.

In TimelineReaderServer.java,
{code}
+YarnConfiguration conf = new YarnConfiguration();
+// enable timeline service v.2
+conf.setBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, true);
+conf.setFloat(YarnConfiguration.TIMELINE_SERVICE_VERSION, 2.0f);
{code}
The same question with above.

In addition, I think we should split the change that duplicated

[jira] [Updated] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster

2015-12-09 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4389:
--
Attachment: 0003-YARN-4389.patch

Patch has gone stale. Updating as per latest trunk. [~djp], pls help to check 
the same.

> "yarn.am.blacklisting.enabled" and 
> "yarn.am.blacklisting.disable-failure-threshold" should be app specific 
> rather than a setting for whole YARN cluster
> ---
>
> Key: YARN-4389
> URL: https://issues.apache.org/jira/browse/YARN-4389
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Reporter: Junping Du
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-4389.patch, 0002-YARN-4389.patch, 
> 0003-YARN-4389.patch
>
>
> "yarn.am.blacklisting.enabled" and 
> "yarn.am.blacklisting.disable-failure-threshold" should be application 
> specific rather than a setting in cluster level, or we should't maintain 
> amBlacklistingEnabled and blacklistDisableThreshold in per rmApp level. We 
> should allow each am to override this config, i.e. via submissionContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run

2015-12-09 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049050#comment-15049050
 ] 

Varun Vasudev commented on YARN-3998:
-

I'm thinking of re-opening this issue because we've seen a use case for this - 
long running services which (ideally) shouldn't lose local data if the service 
crashes.

Some pros about supporting some forms of restart policies on the NM -

1. Retry policies can be unified instead of every application having to 
re-implement their own.

2. Faster restarts - instead of the NM reaching out to the AM and then deciding 
what to do(and maintaining the container work dir), it can make an immediate 
decision. It's also an easier change to make - if the NMs need to talk to the 
AMs to decide whether to restart a container - we'll probably need a new state 
transition. Instead if we allow the AMs to specify a restart policy, the NM can 
make an immediate decision as soon as the container exits.

3. Similar to what Jun mentioned - when running Docker containers, it's useful 
to be able to restart containers that exit with an error code.

When I say restart policies - off the top of my head - I can think of 3 
policies - never restart(default), restart on all errors, restart on specific 
error codes.

[~jlowe], [~steve_l] - do you guys still feel that this should be done at the 
app level(and essentially re-implemented by every app)?

> Add retry-times to let NM re-launch container when it fails to run
> --
>
> Key: YARN-3998
> URL: https://issues.apache.org/jira/browse/YARN-3998
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Jun Gong
>Assignee: Jun Gong
>
> I'd like to add a field(retry-times) in ContainerLaunchContext. When AM 
> launches containers, it could specify the value. Then NM will re-launch the 
> container 'retry-times' times when it fails to run(e.g.exit code is not 0). 
> It will save a lot of time. It avoids container localization. RM does not 
> need to re-schedule the container. And local files in container's working 
> directory will be left for re-use.(If container have downloaded some big 
> files, it does not need to re-download them when running again.) 
> We find it is useful in systems like Storm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-09 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049204#comment-15049204
 ] 

Naganarasimha G R commented on YARN-4356:
-

Hi [~sjlee0]

few points from my side
# I felt better to extract a interface from NMTimelinePublisher a no op impl 
and regular impl so that we can avoid null checks by the caller everytime
# We would not require two different configs to indicate enabling of the 
SystemMetricsPublisher we can 
YarnConfiguration.SYSTEM_METRICS_PUBLISHER_ENABLED and depricate 
YarnConfiguration.RM_SYSTEM_METRICS_PUBLISHER_ENABLED (at the same time map the 
value from the prev key to the new key)


> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4356-feature-YARN-2928.002.patch, 
> YARN-4356-feature-YARN-2928.003.patch, YARN-4356-feature-YARN-2928.004.patch, 
> YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4309) Add debug information to application logs when a container fails

2015-12-09 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049241#comment-15049241
 ] 

Wangda Tan commented on YARN-4309:
--

Thanks reviews from [~ivanmi],

I've tried to deploy Hadoop locally with this patch, it works. The only comment 
from my side is:

Do you think is it better to rename 
{{yarn.nodemanager.log-container-debug-info}} to 
{{yarn.nodemanager.log-container-debug-info.enabled}}. Since 
{{yarn.nodemanager.log-container-debug-info}} doesn't show it's a boolean.



> Add debug information to application logs when a container fails
> 
>
> Key: YARN-4309
> URL: https://issues.apache.org/jira/browse/YARN-4309
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4309.001.patch, YARN-4309.002.patch, 
> YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch, 
> YARN-4309.006.patch, YARN-4309.007.patch, YARN-4309.008.patch, 
> YARN-4309.009.patch
>
>
> Sometimes when a container fails, it can be pretty hard to figure out why it 
> failed.
> My proposal is that if a container fails, we collect information about the 
> container local dir and dump it into the container log dir. Ideally, I'd like 
> to tar up the directory entirely, but I'm not sure of the security and space 
> implications of such a approach. At the very least, we can list all the files 
> in the container local dir, and dump the contents of launch_container.sh(into 
> the container log dir).
> When log aggregation occurs, all this information will automatically get 
> collected and make debugging such failures much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler

2015-12-09 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049115#comment-15049115
 ] 

Eric Payne commented on YARN-4225:
--

I'd like to address the issues raised by the above pre-commit build:

- Unit Tests: The following unit tests failed during the above pre-commit 
build, but they all pass for me in my local build environment:

||Test Name||Modified by this patch||Pre-commit failure||
|hadoop.yarn.client.api.impl.TestAMRMClient|No|Java HotSpot(TM) 64-Bit Server 
VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0|
|hadoop.yarn.client.api.impl.TestNMClient|No|Java HotSpot(TM) 64-Bit Server VM 
warning: ignoring option MaxPermSize=768m; support was removed in 8.0|
|hadoop.yarn.client.api.impl.TestYarnClient|No|TEST TIMED OUT|
|hadoop.yarn.client.cli.TestYarnCLI|Yes|Java HotSpot(TM) 64-Bit Server VM 
warning: ignoring option MaxPermSize=768m; support was removed in 8.0|
|hadoop.yarn.client.TestGetGroups|No|java.net.UnknownHostException: Invalid 
host name: local host is: (unknown); destination host is: "48cbb2d33ebc":8033; 
java.net.UnknownHostException|
|hadoop.yarn.server.resourcemanager.TestAMAuthorization|No|java.net.UnknownHostException:
 Invalid host name: local host is: (unknown); destination host is: 
"48cbb2d33ebc":8030; java.net.UnknownHostException|
|hadoop.yarn.server.resourcemanager.TestClientRMTokens|No|java.lang.NullPointerException:|

- Findbugs warnings:
{{org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getPreemptionDisabled()
 has Boolean return type and returns explicit null At QueueInfoPBImpl.java:and 
returns explicit null At QueueInfoPBImpl.java:[line 402]}}
This is a result of {{QueueInfo#getPreemptionDisabled}} returning a Boolean. 
Again, we could expose the {{hasPreemptionDisabled}} method and use that 
instead.
- JavaDocs warnings/failures: I don't think these are caused by this patch:
{{[WARNING] The requested profile "docs" could not be activated because it does 
not exist.}}
{{[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-javadoc-plugin:2.8.1:javadoc (default-cli) on 
project hadoop-yarn-server-resourcemanager: An error has occurred in JavaDocs 
report generation:}}
{{...}}

> Add preemption status to yarn queue -status for capacity scheduler
> --
>
> Key: YARN-4225
> URL: https://issues.apache.org/jira/browse/YARN-4225
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Attachments: YARN-4225.001.patch, YARN-4225.002.patch, 
> YARN-4225.003.patch, YARN-4225.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-09 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049169#comment-15049169
 ] 

Li Lu commented on YARN-3623:
-

Hi folks, seems like we're reaching agreement on the config itself, but still 
having some concerns with timeline v2 rolling upgrade. How about fix the 
description as Junping suggested, put this patch in while we keep our v2 
discussion in YARN-3196? Looks like the only action item for this JIRA is to 
update the description? 

> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-3623-2015-11-19.1.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2015-12-09 Thread Sangjin Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2975:
--
Fix Version/s: 2.6.4

> FSLeafQueue app lists are accessed without required locks
> -
>
> Key: YARN-2975
> URL: https://issues.apache.org/jira/browse/YARN-2975
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Fix For: 2.7.0, 2.6.4
>
> Attachments: yarn-2975-1.patch, yarn-2975-2.patch, yarn-2975-3.patch
>
>
> YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
> FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
> without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2015-12-09 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049188#comment-15049188
 ] 

Sangjin Lee commented on YARN-2975:
---

I cherry-picked the fix to branch-2.6 (2.6.4).

> FSLeafQueue app lists are accessed without required locks
> -
>
> Key: YARN-2975
> URL: https://issues.apache.org/jira/browse/YARN-2975
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Fix For: 2.7.0, 2.6.4
>
> Attachments: yarn-2975-1.patch, yarn-2975-2.patch, yarn-2975-3.patch
>
>
> YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
> FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
> without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent

2015-12-09 Thread Subru Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049364#comment-15049364
 ] 

Subru Krishnan commented on YARN-4358:
--

[~asuresh], you need not update the Javadoc of _getReservationById_. The 
problem is caused because we are specifying *Set* inside _{@ link}_ so the fix 
should be just be to update the Javadoc of the return parameter of 
_getReservations_ to:
bq @return set of active {@link ReservationAllocation}s for the specified user 
at the requested time

> Improve relationship between SharingPolicy and ReservationAgent
> ---
>
> Key: YARN-4358
> URL: https://issues.apache.org/jira/browse/YARN-4358
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4358.2.patch, YARN-4358.3.patch, YARN-4358.4.patch, 
> YARN-4358.addendum.patch, YARN-4358.patch
>
>
> At the moment an agent places based on available resources, but has no 
> visibility to extra constraints imposed by the SharingPolicy. While not all 
> constraints are easily represented some (e.g., max-instantaneous resources) 
> are easily represented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-09 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049488#comment-15049488
 ] 

Sangjin Lee commented on YARN-3623:
---

I'm obviously +1 as that's exactly what I said. :) [~djp]?

> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-3623-2015-11-19.1.patch, YARN-3623-2015-12-09.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-12-09 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4234:

Attachment: YARN-4234.2015-12-09.patch

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, 
> YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, 
> YARN-4234.2015-12-09.patch, YARN-4234.20151109.patch, 
> YARN-4234.20151110.1.patch, YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-12-09 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4234:

Attachment: YARN-4234.2015-12-09.patch

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, 
> YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, 
> YARN-4234.2015-12-09.patch, YARN-4234.2015-12-09.patch, 
> YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, 
> YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4438) Implement RM leader election with curator

2015-12-09 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4438:
--
Attachment: YARN-4438.1.patch

A flag is now introduced to enable curator based leader election, eventually 
I'd like to remove the embeddedLeaderElector and keep the curator one only

> Implement RM leader election with curator
> -
>
> Key: YARN-4438
> URL: https://issues.apache.org/jira/browse/YARN-4438
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4438.1.patch
>
>
> This is to implement the leader election with curator instead of the 
> ActiveStandbyElector from common package,  this also avoids adding more 
> configs in common to suit RM's own needs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent

2015-12-09 Thread Chris Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049364#comment-15049364
 ] 

Chris Douglas edited comment on YARN-4358 at 12/9/15 9:15 PM:
--

[~asuresh], you need not update the Javadoc of {{getReservationById}}. The 
problem is caused because we are specifying *Set* inside {{\{@ link\}}} so the 
fix should be just be to update the Javadoc of the return parameter of 
{{getReservations}} to:
{{@return set of active \{\@link ReservationAllocation\}s for the specified 
user at the requested time}}


was (Author: subru):
[~asuresh], you need not update the Javadoc of _getReservationById_. The 
problem is caused because we are specifying *Set* inside _{@ link}_ so the fix 
should be just be to update the Javadoc of the return parameter of 
_getReservations_ to:
bq @return set of active {@link ReservationAllocation}s for the specified user 
at the requested time

> Improve relationship between SharingPolicy and ReservationAgent
> ---
>
> Key: YARN-4358
> URL: https://issues.apache.org/jira/browse/YARN-4358
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4358.2.patch, YARN-4358.3.patch, YARN-4358.4.patch, 
> YARN-4358.addendum.patch, YARN-4358.patch
>
>
> At the moment an agent places based on available resources, but has no 
> visibility to extra constraints imposed by the SharingPolicy. While not all 
> constraints are easily represented some (e.g., max-instantaneous resources) 
> are easily represented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-09 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3623:

Attachment: YARN-3623-2015-12-09.patch

fix the description.

> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-3623-2015-11-19.1.patch, YARN-3623-2015-12-09.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS

2015-12-09 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049428#comment-15049428
 ] 

Wangda Tan commented on YARN-3946:
--

[~Naganarasimha],

Thanks for update, tried patch locally, the latest patch looks good, could you 
check if javadoc and test failures are related?

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> 
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, 
> YARN-3946.v1.004.patch, YARN-3946.v1.005.patch, YARN-3946.v1.006.patch, 
> YARN-3946.v1.007.patch
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-09 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049449#comment-15049449
 ] 

Li Lu commented on YARN-3623:
-

The new description LGTM. [~sjlee0] any suggestions? Thanks! 

> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-3623-2015-11-19.1.patch, YARN-3623-2015-12-09.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent

2015-12-09 Thread Arun Suresh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4358:
--
Attachment: YARN-4358.addendum.patch

> Improve relationship between SharingPolicy and ReservationAgent
> ---
>
> Key: YARN-4358
> URL: https://issues.apache.org/jira/browse/YARN-4358
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4358.2.patch, YARN-4358.3.patch, YARN-4358.4.patch, 
> YARN-4358.addendum.patch, YARN-4358.patch
>
>
> At the moment an agent places based on available resources, but has no 
> visibility to extra constraints imposed by the SharingPolicy. While not all 
> constraints are easily represented some (e.g., max-instantaneous resources) 
> are easily represented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4437) JobEndNotification info logs are missing in AM container syslog

2015-12-09 Thread Prabhu Joseph (JIRA)

Prabhu Joseph created YARN-4437:
---

 Summary: JobEndNotification info logs are missing in AM container 
syslog
 Key: YARN-4437
 URL: https://issues.apache.org/jira/browse/YARN-4437
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.7.0
Reporter: Prabhu Joseph
Priority: Minor


JobEndNotification logs are not written by MRAppMaster and JobEndNotifier 
classes even though Log.info is present. The reason was  
MRAppMaster.this.stop() has been called before the JobEndNotification and hence 
somewhere during the stop log appenders also made null.

AM container syslog is not having below logs from JobEndNotifier

   Job end notification trying + urlToNotify
   Job end notification to + urlToNotify + succeeded / failed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent

2015-12-09 Thread Carlo Curino (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049328#comment-15049328
 ] 

Carlo Curino commented on YARN-4358:


[~asuresh] just uploaded a fix, and he will commit it after QA approval. 

Interestingly for both me and him the mvn package works just fine (maybe 
different JDK?). In fact, there was a similar use of the @link in @return 
before our changes in PlanView. I was following that template. 

In any case, both should and are being fixed, thanks for spotting this. 

> Improve relationship between SharingPolicy and ReservationAgent
> ---
>
> Key: YARN-4358
> URL: https://issues.apache.org/jira/browse/YARN-4358
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4358.2.patch, YARN-4358.3.patch, YARN-4358.4.patch, 
> YARN-4358.addendum.patch, YARN-4358.patch
>
>
> At the moment an agent places based on available resources, but has no 
> visibility to extra constraints imposed by the SharingPolicy. While not all 
> constraints are easily represented some (e.g., max-instantaneous resources) 
> are easily represented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster

2015-12-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049353#comment-15049353
 ] 

Hadoop QA commented on YARN-4389:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
44s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
31s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 51s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 0s 
{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 56s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 17s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 29s 
{color} | {color:red} Patch generated 5 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 161, now 166). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
34s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 1s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 1s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 44s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 16s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} unit

[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-09 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049383#comment-15049383
 ] 

Sangjin Lee commented on YARN-3623:
---

I'm +1. Thanks.

> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-3623-2015-11-19.1.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049392#comment-15049392
 ] 

Hadoop QA commented on YARN-4356:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 12 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
55s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
52s {color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} 
|
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
21s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} 
|
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
33s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 7m 14s 
{color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 3m 
27s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 53s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
feature-YARN-2928 has 3 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 53s 
{color} | {color:red} hadoop-yarn-common in feature-YARN-2928 failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 43s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in feature-YARN-2928 
failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 8m 17s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
55s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 37m 33s 
{color} | {color:red} root-jdk1.8.0_66 with JDK v1.8.0_66 generated 5 new 
issues (was 779, now 779). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
24s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 50m 57s 
{color} | {color:red} root-jdk1.7.0_91 with JDK v1.7.0_91 generated 5 new 
issues (was 772, now 772). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 24s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 29s 
{color} | {color:red} Patch generated 7 new checkstyle issues in root (total 
was 1970, now 1942). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 7m 16s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 3m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 15m 
7s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 7m 36s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common-jdk1.8.0_66 with JDK 
v1.8.0_66 generated 1 new issues (was 100, now 100). {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 7m 36s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66
 with JDK v1.8.0_66 generated 1 new issues (was 100, now 100). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 6m 16s 
{color} | {color:green} the patch passed

[jira] [Commented] (YARN-4293) ResourceUtilization should be a part of yarn node CLI

2015-12-09 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049468#comment-15049468
 ] 

Wangda Tan commented on YARN-4293:
--

[~kasha],

Could you take a look at my comment if you have chance? 
https://issues.apache.org/jira/browse/YARN-4293?focusedCommentId=15036630=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15036630.
 I was asking this since I want to understand if it is discussed before.

> ResourceUtilization should be a part of yarn node CLI
> -
>
> Key: YARN-4293
> URL: https://issues.apache.org/jira/browse/YARN-4293
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: 0001-YARN-4293.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4418) AM Resource Limit per partition can be updated to ResourceUsage as well

2015-12-09 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049455#comment-15049455
 ] 

Wangda Tan commented on YARN-4418:
--

[~sunilg],

Thanks for explaining, I feel we shouldn't calculate am-resource-limit based on 
available resource.

It is possible that a queue's available resource is just temporarily more than 
queue's configured capacity. It may over allocate AM container. Do you agree?

> AM Resource Limit per partition can be updated to ResourceUsage as well
> ---
>
> Key: YARN-4418
> URL: https://issues.apache.org/jira/browse/YARN-4418
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4418.patch, 0002-YARN-4418.patch, 
> 0003-YARN-4418.patch
>
>
> AMResourceLimit is now extended to all partitions after YARN-3216. Its also 
> better to track this ResourceLimit in existing {{ResourceUsage}} so that REST 
> framework can be benefited to avail this information easily. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-09 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049493#comment-15049493
 ] 

Wangda Tan commented on YARN-4416:
--

Thanks for sharing your thoughts, [~Naganarasimha]/[~sunilg].

Looked at code, I think we need to be very careful with the locking changes of 
OrderingPolicy. Since it will likely cause CME in the future.

I would prefer to split the JIRA into two parts:
- Remove redundant locks, such as getAbsoluteCapacity.
- Improve locks of OrderingPolicy. Even if it closely related to LeafQueue, but 
I think we should try best to decouple it from LeafQueue to better API design. 
Potentially we need to rethink API of OrderingPolicy.

I suggest to convert both JIRAs to sub jiras of YARN-3091.

> Deadlock due to synchronised get Methods in AbstractCSQueue
> ---
>
> Key: YARN-4416
> URL: https://issues.apache.org/jira/browse/YARN-4416
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, 
> deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to 
> know the name of the queue but every time i tried to see the queue it was 
> getting hung. On seeing the stack realized there was a deadlock but on 
> analysis found out that it was only due to *queue.toString()* during 
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized 
> and better be handled through read and write locks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049478#comment-15049478
 ] 

Hadoop QA commented on YARN-3623:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 20s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
46s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
0s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 18s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 29s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 214, now 214). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 5s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 16s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 44m 28s {color} 
| {color:black}

[jira] [Updated] (YARN-4439) Clarify NMContainerStatus#toString method.

2015-12-09 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4439:
--
Attachment: YARN-4439.1.patch

upload a patch to clarify the NMContainerStatus#toString method

> Clarify NMContainerStatus#toString method.
> --
>
> Key: YARN-4439
> URL: https://issues.apache.org/jira/browse/YARN-4439
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4439.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4438) Implement RM leader election with curator

2015-12-09 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4438:
-
Issue Type: Improvement  (was: Bug)

> Implement RM leader election with curator
> -
>
> Key: YARN-4438
> URL: https://issues.apache.org/jira/browse/YARN-4438
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4438.1.patch
>
>
> This is to implement the leader election with curator instead of the 
> ActiveStandbyElector from common package,  this also avoids adding more 
> configs in common to suit RM's own needs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4341) add doc about timeline performance tool usage

2015-12-09 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049718#comment-15049718
 ] 

Sangjin Lee commented on YARN-4341:
---

One remaining issue:
- in usage, "Each mappe" -> "Each mapper"

> add doc about timeline performance tool usage
> -
>
> Key: YARN-4341
> URL: https://issues.apache.org/jira/browse/YARN-4341
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4341.2.patch, YARN-4341.3.patch, YARN-4341.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4439) Clarify NMContainerStatus#toString method.

2015-12-09 Thread Jian He (JIRA)

Jian He created YARN-4439:
-

 Summary: Clarify NMContainerStatus#toString method.
 Key: YARN-4439
 URL: https://issues.apache.org/jira/browse/YARN-4439
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-09 Thread Sangjin Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4356:
--
Attachment: YARN-4356-feature-YARN-2928.005.patch

Posted patch v.5.

Addressed most of Junping's feedback.

> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4356-feature-YARN-2928.002.patch, 
> YARN-4356-feature-YARN-2928.003.patch, YARN-4356-feature-YARN-2928.004.patch, 
> YARN-4356-feature-YARN-2928.005.patch, 
> YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4439) Clarify NMContainerStatus#toString method.

2015-12-09 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049809#comment-15049809
 ] 

Tsuyoshi Ozawa commented on YARN-4439:
--

[~jianhe] should we also add Priority to the printing string?

> Clarify NMContainerStatus#toString method.
> --
>
> Key: YARN-4439
> URL: https://issues.apache.org/jira/browse/YARN-4439
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4439.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4439) Clarify NMContainerStatus#toString method.

2015-12-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049854#comment-15049854
 ] 

Hadoop QA commented on YARN-4439:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
56s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
49s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 9s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
(total was 13, now 14). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
57s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 27s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
25s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 16m 36s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12776680/YARN-4439.1.patch |
| JIRA Issue | YARN-4439 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux e84412033ba0 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014

[jira] [Commented] (YARN-4415) Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application doesnt get assigned

2015-12-09 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049581#comment-15049581
 ] 

Wangda Tan commented on YARN-4415:
--

[~Naganarasimha]/[~xinxianyin].

Let me try to summary what we were discussing.

There're 2 different configurations:
1) Accessible-node-labels for queue
2) Maximum-capacity for partitions

There're 4 different combinations for default values:
a. 1)=*, 2)=100
Pros:
- User doesn't need to update configurations a lot if new labels added (Assume 
partition will be shared to all queues)
Cons:
- User has to change configurations a lot if new labels added (Assume partition 
will be shared to few queues only)

b. 1)=*, 2)=0
Pros:
- User doesn't need to update configurations a lot if new labels added (Assume 
partition will be shared to few queues only)
Cons:
- User has to change configurations a lot if new labels added (Assume partition 
will be shared to all queues)

c. 1)=, 2=100
Same as b.

d. 1)=, 2=0
Same as b.

You can see that there're different pros and cons to choose default values of 
the two options. Frankly I don't have strong preference for all these choices. 
But since we have decided default values since 2.6, I would suggest don't 
change the default values.

But I think there's one thing we need to fix:
When queue.accessible-node-labels == *, 
{{QueueCapacitiesInfo#QueueCapacitiesInfo(QueueCapacities)}} should call 
RMNodeLabelsManager.getClusterNodeLabelNames to get all labels instead of 
calling {{getExistingNodeLabels}}. So after we add/remove labels, queue's 
capacities in webUI/REST response will be updated as well.

> Scheduler Web Ui shows max capacity for the queue is 100% but when we submit 
> application doesnt get assigned
> 
>
> Key: YARN-4415
> URL: https://issues.apache.org/jira/browse/YARN-4415
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: App info with diagnostics info.png, 
> capacity-scheduler.xml, screenshot-1.png
>
>
> Steps to reproduce the issue :
> Scenario 1:
> # Configure a queue(default) with accessible node labels as *
> # create a exclusive partition *xxx* and map a NM to it
> # ensure no capacities are configured for default for label xxx
> # start an RM app with queue as default and label as xxx
> # application is stuck but scheduler ui shows 100% as max capacity for that 
> queue
> Scenario 2:
> # create a nonexclusive partition *sharedPartition* and map a NM to it
> # ensure no capacities are configured for default queue
> # start an RM app with queue as *default* and label as *sharedPartition*
> # application is stuck but scheduler ui shows 100% as max capacity for that 
> queue for *sharedPartition*
> For both issues cause is the same default max capacity and abs max capacity 
> is set to Zero %



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4418) AM Resource Limit per partition can be updated to ResourceUsage as well

2015-12-09 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4418:
--
Attachment: 0004-YARN-4418.patch

Thank you [~leftnoteasy] for the explanation. That sounds good.
So as you mentioned earlier, we will have this metric updated only when 
{{activateApplications}} is called. and this covers all cases. Uploading a 
patch addressing same. Kindly help to check.

> AM Resource Limit per partition can be updated to ResourceUsage as well
> ---
>
> Key: YARN-4418
> URL: https://issues.apache.org/jira/browse/YARN-4418
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4418.patch, 0002-YARN-4418.patch, 
> 0003-YARN-4418.patch, 0004-YARN-4418.patch
>
>
> AMResourceLimit is now extended to all partitions after YARN-3216. Its also 
> better to track this ResourceLimit in existing {{ResourceUsage}} so that REST 
> framework can be benefited to avail this information easily. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4417) Make RM and Timeline-server REST APIs more consistent

2015-12-09 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049641#comment-15049641
 ] 

Jian He commented on YARN-4417:
---

lgtm 

> Make RM and Timeline-server REST APIs more consistent
> -
>
> Key: YARN-4417
> URL: https://issues.apache.org/jira/browse/YARN-4417
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4417.1.patch, YARN-4417.2.patch
>
>
> There're some differences between RM and timeline-server's REST APIs, for 
> example, RM REST API doesn't support get application attempt info by app-id 
> and attempt-id but timeline server supports. We could make them more 
> consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2015-12-09 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049536#comment-15049536
 ] 

Li Lu commented on YARN-3816:
-

Thanks for the update [~djp]! I went through an earlier version of the patch a 
while ago, and I can see most of the problems got addressed. Just a few things 
to check here:
- There are 3 types of aggregation basis, but only application aggregation has 
its own entity type. How do we represent the result entity of the other 2 types?
- In TimelineMetricCalculator, the name of "delta" looks a little bit awkward. 
It's actually the delta on their areas of two numbers over a time? 
- By the way, as [~varun_saxena] pointed out earlier, we need to decide if 
calculating area is a useful use case itself. I remember we had some discussion 
on this a few months ago. I noticed the accumulateTo method is expandable, so 
probably we can add more function in future? 

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> 
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
>  Labels: yarn-2928-1st-milestone
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, 
> YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, 
> YARN-3816-feature-YARN-2928-v4.1.patch, 
> YARN-3816-feature-YARN-2928.v4.1.patch, YARN-3816-poc-v1.patch, 
> YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4422) Generic AHS sometimes doesn't show started, node, or logs on App page

2015-12-09 Thread Ming Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049740#comment-15049740
 ] 

Ming Ma commented on YARN-4422:
---

Thanks! Will this fix address MAPREDUCE-5502 or MAPREDUCE-4428? It doesn't seem 
so, but would like to confirm.

> Generic AHS sometimes doesn't show started, node, or logs on App page
> -
>
> Key: YARN-4422
> URL: https://issues.apache.org/jira/browse/YARN-4422
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 3.0.0, 2.8.0, 2.7.3
>
> Attachments: AppAttemptPage no container or node.jpg, AppPage no logs 
> or node.jpg, YARN-4422.001.patch
>
>
> Sometimes the AM container for an app isn't able to start the JVM. This can 
> happen if bogus JVM options are given to the AM container ( 
> {{-Dyarn.app.mapreduce.am.command-opts=-InvalidJvmOption}}) or when 
> misconfiguring the AM container's environment variables 
> ({{-Dyarn.app.mapreduce.am.env="JAVA_HOME=/foo/bar/baz}})
> When the AM container for an app isn't able to start the JVM, the Application 
> page for that application shows {{N/A}} for the {{Started}}, {{Node}}, and 
> {{Logs}} columns. It _does_ have links for each app attempt, and if you click 
> on one of them, you go to the Application Attempt page, where you can see all 
> containers with links to their logs and nodes, including the AM container. 
> But none of that shows up for the app attempts on the Application page.
> Also, on the Application Attempt page, in the {{Application Attempt 
> Overview}} section, the {{AM Container}} value is {{null}} and the {{Node}} 
> value is {{N/A}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-09 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049817#comment-15049817
 ] 

Sangjin Lee commented on YARN-4356:
---

Thanks [~djp] for your comments. I addressed most of your comments in the new 
patch. The following is a response to your comments.

bq. We should also add back types as requirement/convension for generics.

This is the java 7 diamond operator (<>) which is a shorthand for inferring 
types. The type information is NOT removed. It's inferred by the compiler, and 
the compiler produces the same bytecode as specifying the types.

bq. This check of null is unnecessary as the only caller - NMCollectorService 
is only running under v2 is enabled. If for some reason, we get NPE here which 
is still better than we ignore it silently.

That's a good catch. I agree that it's a little better not to check for null 
here. It's changed in the latest patch.

bq. Please remove this unrelated change out for more focus and better tracking.

Agreed. I originally removed it because it was an unused private method, but I 
put it back in.

bq. Just a reminder, keepAliveApps is a wrong list to identify running apps on 
specific node. YARN-3586 (with patch) is already filed to fix this. We can 
either merge that patch in or rebase that patch when this patch done.

Got it. Can we proceed with the current patch and get that fix once YARN-3586 
goes in?

{quote}
In TimelineServiceV2Publisher.java,
- * This class is responsible for posting application, appattempt & Container
+ * This class is responsible for posting application, appattempt  
Container
Why we need this change?
{quote}

This is addressing a javadoc error. The ampersand ("&") is a special character 
for javadoc, and it breaks javadoc. It needs to be entity-escaped:
{noformat}
[ERROR] 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TimelineServiceV2Publisher.java:61:
 error: bad HTML entity
[ERROR] * This class is responsible for posting application, appattempt & 
Container
{noformat}

bq. We should disable PerNodeTimelineCollectorsAuxService if we don't enable 
timeline service v2. Isn't it? If so, I think this is not a necessary change 
and we should remove.

This is used for the test method launchServer(). This method is invoked 
directly by a unit test (thus the @VisibleForTesting annotation). The same for 
TimelineReaderServer.

bq. In addition, I think we should split the change that duplicated with 
YARN-3623 and cherry-pick it from trunk/branch-2 when that patch get commit in.

That's fine. I still put up the patch that includes a version of that because 
without it things won't even compile. I will wait until YARN-3623 goes in 
before I remove that piece from this patch, then this can get committed.

Let me know if this answers your questions.

> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4356-feature-YARN-2928.002.patch, 
> YARN-4356-feature-YARN-2928.003.patch, YARN-4356-feature-YARN-2928.004.patch, 
> YARN-4356-feature-YARN-2928.005.patch, 
> YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4438) Implement RM leader election with curator

2015-12-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049835#comment-15049835
 ] 

Hadoop QA commented on YARN-4438:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 35s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 34s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 16s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
31s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 1s 
{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 26s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 38s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 30s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 29s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 29s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 32s 
{color} | {color:red} Patch generated 19 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 365, now 383). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
21s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 25s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 25s {color} 
| {color:red} hadoop-yarn-api in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 34s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 30s {color} 
| {color:red} hadoop-yarn-api in the patch failed with JDK v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 4s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 20s 
{color} | {color:red} Patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 167m 2s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.conf.TestYarnConfigurationFields |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   |

[jira] [Commented] (YARN-4340) Add "list" API to reservation system

2015-12-09 Thread Subru Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049724#comment-15049724
 ] 

Subru Krishnan commented on YARN-4340:
--

Thanks [~seanpo03] for addressing my comments. This patch looks much better. 
Please find my feedback below.

The Javadocs for *listReservations* API in *ApplicationClientProtocol* looks 
pretty good. A few nits (same holds for Javadoc of *ReservationListRequest* and 
*YarnClient*):

  * We are still not explicitly calling out that the *ResourceAllocation* we 
are returing in *ReservationListResponse* is based on the current state of the 
Plan and we can change it for different reasons like replanning subject to the 
constraints of the user contract as described by *ReservationDefinition*
  * Specify the queue refers to the reservable queue in the scheduler. Refer 
_ReservationSubmissionRequest_
  * Remove username as it's no longer used
  * Typo in starttime - start not end after the startTime will be selected
  * Typo in endTime - end not start after the endTime will be selected
  
Minor Comments for rest of the patch:
  * Replace all occurance of _plan_ with _queue_ in *ReservationListRequest*
  * Typo in Javadoc of *ReservationAllocationState::getReservationDefinition*, 
replace _set_ with _get_. Also you can link _ReservationDefinition_ in the 
return param. This is cosmetic but I see it in a few places so if possible, 
kindly fix those too.
  * Can we order the arguments of *ResourceAllocationRequest::newInstance* as 
_startTime, endTime and capability_ to improve readability
  * In *ClientRMService::listReservations*, _includeResourceAllocations_ can be 
of primitive boolean type
  * Shouldn't we check for requestInfo.getEndTime() <= -1 in 
*ClientRMService::listReservations*?
  * Can we rename _info_ to say _reservationInfo_ or something more verbal to 
make it more readable in *ClientRMService::listReservations*
  * I am confused by the Javadocs of *PlanView::getReservations*, looks like 
start and end time are swapped
  * You can revert the change to *PlanView::getReservationByUserAtTime* as 
that's being addressed in YARN-4358
  * In *ReservationInputValidator*, the error strings can also be created in 
*getPlanFromQueue* as they can be made consistent and reduce redundant code
  * Can we rename _info_ to say _reservationInfo_ or something more verbal to 
make it more readable in 
*ReservationSystemUtil::convertAllocationsToReservationInfo*
  * In *ReservationSystemUtil::convertAllocationsToReservationInfo*, we need 
not create local variables for _acceptanceTime, user, id, definition_ as we 
don't use them locally
  * The _Map_ can be defined outside of the for 
loops in *ReservationSystemUtil::convertAllocationsToReservationInfo*
  * In *TestClientRMService*, can you add a query for _listReservations_ based 
on arrival and duration like you have done in *TestYarnClient*
  * The test case additions to *TestInMemoryPlan* are great. Can we cover a few 
more scenarios like more than reservations, invalid reservation id and boundary 
conditions of time interval
  * Looks like there are some whitespaces in *TestInMemoryPlan*

> Add "list" API to reservation system
> 
>
> Key: YARN-4340
> URL: https://issues.apache.org/jira/browse/YARN-4340
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Sean Po
> Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, 
> YARN-4340.v3.patch, YARN-4340.v4.patch, YARN-4340.v5.patch, 
> YARN-4340.v6.patch, YARN-4340.v7.patch
>
>
> This JIRA tracks changes to the APIs of the reservation system, and enables 
> querying the reservation system on which reservation exists by "time-range, 
> reservation-id".
> YARN-4420 has a dependency on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4341) add doc about timeline performance tool usage

2015-12-09 Thread Chang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4341:
---
Attachment: YARN-4341.4.patch

Thanks for detecting the issue [~sjlee0]. updated .4 patch to fix that

> add doc about timeline performance tool usage
> -
>
> Key: YARN-4341
> URL: https://issues.apache.org/jira/browse/YARN-4341
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4341.2.patch, YARN-4341.3.patch, YARN-4341.4.patch, 
> YARN-4341.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4341) add doc about timeline performance tool usage

2015-12-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049871#comment-15049871
 ] 

Hadoop QA commented on YARN-4341:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 1m 42s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12776690/YARN-4341.4.patch |
| JIRA Issue | YARN-4341 |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux b49a5e7be3a6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 132478e |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Max memory used | 30MB |
| Powered by | Apache Yetushttp://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9919/console |


This message was automatically generated.



> add doc about timeline performance tool usage
> -
>
> Key: YARN-4341
> URL: https://issues.apache.org/jira/browse/YARN-4341
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4341.2.patch, YARN-4341.3.patch, YARN-4341.4.patch, 
> YARN-4341.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-09 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1505#comment-1505
 ] 

Naganarasimha G R commented on YARN-4416:
-

Thanks [~wangda],
Will convert this jira and raise a new one under YARN-3091.

> Deadlock due to synchronised get Methods in AbstractCSQueue
> ---
>
> Key: YARN-4416
> URL: https://issues.apache.org/jira/browse/YARN-4416
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, 
> deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to 
> know the name of the queue but every time i tried to see the queue it was 
> getting hung. On seeing the stack realized there was a deadlock but on 
> analysis found out that it was only due to *queue.toString()* during 
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized 
> and better be handled through read and write locks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-09 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049982#comment-15049982
 ] 

Naganarasimha G R commented on YARN-4356:
-

Hi [~sjlee0],
bq. Right now those null checks are still limited to a couple of files. 
Although it may not be ideal, it is still manageable?
Its not a blocking thing and its just will help it in keeping it clean so no 
issues we can go ahead for this patch will try to take care in one of my other 
patches related to NMMetricsPublisher.

bq. Hmm, I know it's not great having 2 similar-sounding config params, but 
there is also RM_SYSTEM_METRICS_PUBLISHER_DISPATCHER_POOL_SIZE, so this might 
also make this patch a little bigger.
Ok we can take care in another jira, but 
YarnConfiguration.SYSTEM_METRICS_PUBLISHER_ENABLED and 
YarnConfiguration.RM_SYSTEM_METRICS_PUBLISHER_ENABLED was introduced because 
version was not there .  



> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4356-feature-YARN-2928.002.patch, 
> YARN-4356-feature-YARN-2928.003.patch, YARN-4356-feature-YARN-2928.004.patch, 
> YARN-4356-feature-YARN-2928.005.patch, 
> YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4340) Add "list" API to reservation system

2015-12-09 Thread Sean Po (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Po updated YARN-4340:
--
Attachment: YARN-4340.v8.patch

Thanks Subru for the comments. I have taken all your recommendations, except 
for the following: 

The startTime and endTime wasn't a typo - I actually intended on selecting all 
active reservations within the time interval. that's why only reservations that 
ended after the selection startTime, and reservations that start before or on 
the selection endTime are included in the response.

In ReservationInputValidator, I didn't want to create the error strings from 
getPlanFromQueue because it almost seems like we are bending over backwards to 
prevent repeated code. The reason is because validateReservation uses this 
method, and has a drastically different error message, and it doesn't seem easy 
to reword it to fit into the pattern of the other error messages. I have 
instead overloaded the method so that users can optionally pass in the error 
messages. Otherwise, the defaults will be used.

> Add "list" API to reservation system
> 
>
> Key: YARN-4340
> URL: https://issues.apache.org/jira/browse/YARN-4340
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Sean Po
> Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, 
> YARN-4340.v3.patch, YARN-4340.v4.patch, YARN-4340.v5.patch, 
> YARN-4340.v6.patch, YARN-4340.v7.patch, YARN-4340.v8.patch
>
>
> This JIRA tracks changes to the APIs of the reservation system, and enables 
> querying the reservation system on which reservation exists by "time-range, 
> reservation-id".
> YARN-4420 has a dependency on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4434) NodeManager Disk Checker parameter documentation is not correct

2015-12-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050079#comment-15050079
 ] 

ASF GitHub Bot commented on YARN-4434:
--

Github user bwtakacy commented on the pull request:

https://github.com/apache/hadoop/pull/62#issuecomment-163488815
  
OK.
I will close this PR.

Thanks!



> NodeManager Disk Checker parameter documentation is not correct
> ---
>
> Key: YARN-4434
> URL: https://issues.apache.org/jira/browse/YARN-4434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Affects Versions: 2.6.0, 2.7.1
>Reporter: Takashi Ohnishi
>Assignee: Weiwei Yang
>Priority: Minor
> Fix For: 2.8.0, 2.6.3, 2.7.3
>
> Attachments: YARN-4434.001.patch, YARN-4434.branch-2.6.patch
>
>
> In the description of 
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage,
>  it says
> {noformat}
> The default value is 100 i.e. the entire disk can be used.
> {noformat}
> But, in yarn-default.xml and source code, the default value is 90.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4434) NodeManager Disk Checker parameter documentation is not correct

2015-12-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050080#comment-15050080
 ] 

ASF GitHub Bot commented on YARN-4434:
--

Github user bwtakacy closed the pull request at:

https://github.com/apache/hadoop/pull/62


> NodeManager Disk Checker parameter documentation is not correct
> ---
>
> Key: YARN-4434
> URL: https://issues.apache.org/jira/browse/YARN-4434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Affects Versions: 2.6.0, 2.7.1
>Reporter: Takashi Ohnishi
>Assignee: Weiwei Yang
>Priority: Minor
> Fix For: 2.8.0, 2.6.3, 2.7.3
>
> Attachments: YARN-4434.001.patch, YARN-4434.branch-2.6.patch
>
>
> In the description of 
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage,
>  it says
> {noformat}
> The default value is 100 i.e. the entire disk can be used.
> {noformat}
> But, in yarn-default.xml and source code, the default value is 90.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4418) AM Resource Limit per partition can be updated to ResourceUsage as well

2015-12-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050163#comment-15050163
 ] 

Hadoop QA commented on YARN-4418:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
35s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 34s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
45s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 35s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 49s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 2s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
32s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 158m 26s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
|

[jira] [Updated] (YARN-4341) add doc about timeline performance tool usage

2015-12-09 Thread Chang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4341:
---
Attachment: YARN-4341.3.patch

Thanks a lot [~sjlee0] for review! updated .3 patch and addressed your 
suggestions there.

> add doc about timeline performance tool usage
> -
>
> Key: YARN-4341
> URL: https://issues.apache.org/jira/browse/YARN-4341
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4341.2.patch, YARN-4341.3.patch, YARN-4341.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4341) add doc about timeline performance tool usage

2015-12-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049288#comment-15049288
 ] 

Hadoop QA commented on YARN-4341:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
42s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 1m 33s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12776620/YARN-4341.3.patch |
| JIRA Issue | YARN-4341 |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux c50fc9bdce27 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 50edcb9 |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Max memory used | 29MB |
| Powered by | Apache Yetushttp://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9915/console |


This message was automatically generated.



> add doc about timeline performance tool usage
> -
>
> Key: YARN-4341
> URL: https://issues.apache.org/jira/browse/YARN-4341
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4341.2.patch, YARN-4341.3.patch, YARN-4341.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4418) AM Resource Limit per partition can be updated to ResourceUsage as well

2015-12-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049294#comment-15049294
 ] 

Hadoop QA commented on YARN-4418:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 1s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
20s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 26s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
31s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 26s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 3s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 23s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
27s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 168m 11s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMService |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12776602/0003-YARN-4418.patch |
| JIRA

[jira] [Commented] (YARN-110) AM releases too many containers due to the protocol

2015-12-09 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049301#comment-15049301
 ] 

Arun Suresh commented on YARN-110:
--

[~ka...@cloudera.com], [~vinodkv], I understand from MAPREDUCE-4671 that  the 
accounting burden for this has been pushed to the AM and it will not pose a 
latency issue for the AM requesting the resources, but it looks like this 
increases latencies for competing AMs (they might have to wait for subsequent 
allocate call for the resources). Also Custom AMs would need to be cognizant of 
this.

It also looks like [~giovanni.fumarola] is hitting this on some of the clusters 
he is working on. If [~acmurthy] is not actively looking into this, he would 
like to volunteer a patch.

Thoughts ?

> AM releases too many containers due to the protocol
> ---
>
> Key: YARN-110
> URL: https://issues.apache.org/jira/browse/YARN-110
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Attachments: YARN-110.patch
>
>
> - AM sends request asking 4 containers on host H1.
> - Asynchronously, host H1 reaches RM and gets assigned 4 containers. RM at 
> this point, sets the value against H1 to
> zero in its aggregate request-table for all apps.
> - In the mean-while AM gets to need 3 more containers, so a total of 7 
> including the 4 from previous request.
> - Today, AM sends the absolute number of 7 against H1 to RM as part of its 
> request table.
> - RM seems to be overriding its earlier value of zero against H1 to 7 against 
> H1. And thus allocating 7 more
> containers.
> - AM already gets 4 in this scheduling iteration, but gets 7 more, a total of 
> 11 instead of the required 7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4415) Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application doesnt get assigned

2015-12-09 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048968#comment-15048968
 ] 

Naganarasimha G R commented on YARN-4415:
-

sure [~xinxianyin] will start with the patch ones at least scenario is clear to 
others. If all acknowledge the issue then we can go ahead else may be effort 
will go wasted if my understanding or view is wrong.

> Scheduler Web Ui shows max capacity for the queue is 100% but when we submit 
> application doesnt get assigned
> 
>
> Key: YARN-4415
> URL: https://issues.apache.org/jira/browse/YARN-4415
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: App info with diagnostics info.png, 
> capacity-scheduler.xml, screenshot-1.png
>
>
> Steps to reproduce the issue :
> Scenario 1:
> # Configure a queue(default) with accessible node labels as *
> # create a exclusive partition *xxx* and map a NM to it
> # ensure no capacities are configured for default for label xxx
> # start an RM app with queue as default and label as xxx
> # application is stuck but scheduler ui shows 100% as max capacity for that 
> queue
> Scenario 2:
> # create a nonexclusive partition *sharedPartition* and map a NM to it
> # ensure no capacities are configured for default queue
> # start an RM app with queue as *default* and label as *sharedPartition*
> # application is stuck but scheduler ui shows 100% as max capacity for that 
> queue for *sharedPartition*
> For both issues cause is the same default max capacity and abs max capacity 
> is set to Zero %



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4418) AM Resource Limit per partition can be updated to ResourceUsage as well

2015-12-09 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4418:
--
Attachment: 0003-YARN-4418.patch

Uploading a patch by handling only NO_LABEL scenario. [~leftnoteasy] pls help 
to check this corner case, and if its not valid I remove the check.

> AM Resource Limit per partition can be updated to ResourceUsage as well
> ---
>
> Key: YARN-4418
> URL: https://issues.apache.org/jira/browse/YARN-4418
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4418.patch, 0002-YARN-4418.patch, 
> 0003-YARN-4418.patch
>
>
> AMResourceLimit is now extended to all partitions after YARN-3216. Its also 
> better to track this ResourceLimit in existing {{ResourceUsage}} so that REST 
> framework can be benefited to avail this information easily. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-09 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049005#comment-15049005
 ] 

Sangjin Lee commented on YARN-3623:
---

Thanks for your comment [~djp]. Per comments above, let's move the v.1-v.2 
compatibility discussion to YARN-3196. I just want to clarify my comments to 
the extent it is relevant to this JIRA as I think they might have been 
misunderstood.

{quote}
I would question on this. If "yarn.timeline-service.version" is 2 (after 
cluster is being upgrade), and we don't serve 1/1.5 ATS service any more, how 
can existing running applications survival for timeline services? Unless we 
have a clear answer in v2 that we will continue to maintain a ATS v1/v1.5 
service as a legacy daemon in v2 (I don't prefer this way), I don't think we 
should mark this config to indicate an unique version of ATS service running in 
the server side.
{quote}

When I said the cluster should bring up that exact version of the timeline 
service, I didn't mean that we will not support any compatibility. I definitely 
agree that the compatibility and support for a smooth rolling upgrade should be 
an objective, and that's why we want to continue the discussion and work on 
YARN-3196. What I meant to do is to separate the compatibility support (or 
rolling upgrade support) from the main interpretation of this config on the 
cluster.

It is true that the main mode of operation will be on the version that is 
declared via timeline-service.version. Also, I think there are many options in 
the way we can implement the rolling upgrade support. Supporting rolling 
upgrade does not necessarily mean that the v.1 write/read endpoints must be up 
in parallel with the v.2 write/read endpoints. We talked about having some kind 
of a temporary proxy or something in the timeline client itself. There may be 
other ways, but we're not mandating that the old endpoints must be up to 
implement the rolling upgrade support. My point was that when we say 
timeline-service.version = 2 doesn't *automatically* mean we still must bring 
up end points of the previous version (or versions), as that's more of an 
implementation choice for how to support rolling upgrade. I hope that clarifies 
my earlier comments.

{quote}
This works if we don't consider rolling upgrade case. For roll up cases, an 
running application/framework cannot switch its client version config if YARN 
cluster is upgrading to a new version ATS. We shouldn't claim that 
application's clients is expected to be no response if version is mis-match 
with serve or the user would misunderstand they have to kill these applications 
after upgrade. Instead, we should claim that client is not supposed to override 
this config that vary with cluster config unless they are pretty sure what 
cluster side are doing (like upgrading process, etc.).
{quote}

Again, I hope it is clear what I meant was NOT that we will not consider the 
rolling upgrade use case. Even if the cluster is running with version = 2, with 
a proper rolling upgrade support, it should be prepared to handle (during the 
transition) calls that are coming in from running apps with version = 1 or 1.5. 
That's why I said "depending on how robust the compatibility story is".

Let me know if this helps in any way.

> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-3623-2015-11-19.1.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

73 matches

Mail list logo