[jira] [Updated] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-08-31 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3798:
-
Target Version/s: 2.6.1, 2.7.2  (was: 2.7.2)

> ZKRMStateStore shouldn't create new session without occurrance of 
> SESSIONEXPIED
> ---
>
> Key: YARN-3798
> URL: https://issues.apache.org/jira/browse/YARN-3798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Attachments: RM.log, YARN-3798-2.7.002.patch, 
> YARN-3798-branch-2.6.01.patch, YARN-3798-branch-2.7.002.patch, 
> YARN-3798-branch-2.7.003.patch, YARN-3798-branch-2.7.004.patch, 
> YARN-3798-branch-2.7.005.patch, YARN-3798-branch-2.7.006.patch, 
> YARN-3798-branch-2.7.patch
>
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,887 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
> out ZK retries. Giving up!
> 2015-06-09 10:09:44,887 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating appAttempt: appattempt_1433764310492_7152_01
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at 

[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-08-31 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723023#comment-14723023
 ] 

Tsuyoshi Ozawa commented on YARN-3798:
--

zhihai, Thanks a lot.

[~vinodkv] cc: [~jianhe] please notify us if we need to update the patch. I 
think it's ready.

> ZKRMStateStore shouldn't create new session without occurrance of 
> SESSIONEXPIED
> ---
>
> Key: YARN-3798
> URL: https://issues.apache.org/jira/browse/YARN-3798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Attachments: RM.log, YARN-3798-2.7.002.patch, 
> YARN-3798-branch-2.6.01.patch, YARN-3798-branch-2.7.002.patch, 
> YARN-3798-branch-2.7.003.patch, YARN-3798-branch-2.7.004.patch, 
> YARN-3798-branch-2.7.005.patch, YARN-3798-branch-2.7.006.patch, 
> YARN-3798-branch-2.7.patch
>
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,887 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
> out ZK retries. Giving up!
> 2015-06-09 10:09:44,887 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating appAttempt: appattempt_1433764310492_7152_01
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at 

[jira] [Commented] (YARN-2884) Proxying all AM-RM communications

2015-08-31 Thread Kishore Chaliparambil (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723049#comment-14723049
 ] 

Kishore Chaliparambil commented on YARN-2884:
-

Thanks [~jianhe]. I will address these comments and upload the patch. Also as 
you suggested, I think I will create a new Jira for simulating the token 
renewal behavior in the proxy service since it might take more time.

> Proxying all AM-RM communications
> -
>
> Key: YARN-2884
> URL: https://issues.apache.org/jira/browse/YARN-2884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Kishore Chaliparambil
> Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, 
> YARN-2884-V11.patch, YARN-2884-V2.patch, YARN-2884-V3.patch, 
> YARN-2884-V4.patch, YARN-2884-V5.patch, YARN-2884-V6.patch, 
> YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch
>
>
> We introduce the notion of an RMProxy, running on each node (or once per 
> rack). Upon start the AM is forced (via tokens and configuration) to direct 
> all its requests to a new services running on the NM that provide a proxy to 
> the central RM. 
> This give us a place to:
> 1) perform distributed scheduling decisions
> 2) throttling mis-behaving AMs
> 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2945) FSLeafQueue#assignContainer - document the reason for using both write and read locks

2015-08-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723053#comment-14723053
 ] 

Hudson commented on YARN-2945:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2252 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2252/])
YARN-2945. Fixing the CHANGES.txt to have the right JIRA number. (vinodkv: rev 
cf831565e8344523e1bd0eaf686ed56a2b48b920)
* hadoop-yarn-project/CHANGES.txt


> FSLeafQueue#assignContainer - document the reason for using both write and 
> read locks
> -
>
> Key: YARN-2945
> URL: https://issues.apache.org/jira/browse/YARN-2945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
> Fix For: 2.7.0
>
> Attachments: YARN-2945.001.patch, YARN-2945.002.patch
>
>
> After YARN-2910, assignContainer hold WriteLock while sorting and ReadLock 
> while referencing runnableApps. This can cause interrupted assignment of 
> containers regardless of the policy.
> {code}
> writeLock.lock();
> try {
>   Collections.sort(runnableApps, comparator);
> } finally {
>   writeLock.unlock();
> }
> readLock.lock();
> try {
>   for (FSAppAttempt sched : runnableApps) {
> if (SchedulerAppUtils.isBlacklisted(sched, node, LOG)) {
>   continue;
> }
> assigned = sched.assignContainer(node);
> if (!assigned.equals(Resources.none())) {
>   break;
> }
>}
> } finally {
>   readLock.unlock();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-08-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724601#comment-14724601
 ] 

Hudson commented on YARN-4092:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1058 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1058/])
YARN-4092. Fixed UI redirection to print useful messages when both RMs are in 
standby mode. Contributed by Xuan Gong (jianhe: rev 
a3fd2ccc869dfc1f04d1cf0a8678d4d90a43a80f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/YarnWebParams.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebAppFilter.java
Move YARN-4092 to 2.7.2 (jianhe: rev 4eaa7fd3eae4412ac0b964c617b1bbb17a39d8be)
* hadoop-yarn-project/CHANGES.txt


> RM HA UI redirection needs to be fixed when both RMs are in standby mode
> 
>
> Key: YARN-4092
> URL: https://issues.apache.org/jira/browse/YARN-4092
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0, 2.7.2
>
> Attachments: YARN-4092-branch-2.6.patch, YARN-4092.1.patch, 
> YARN-4092.2.patch, YARN-4092.3.patch, YARN-4092.4.patch
>
>
> In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
> accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-08-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724595#comment-14724595
 ] 

Hudson commented on YARN-4092:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #331 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/331/])
YARN-4092. Fixed UI redirection to print useful messages when both RMs are in 
standby mode. Contributed by Xuan Gong (jianhe: rev 
a3fd2ccc869dfc1f04d1cf0a8678d4d90a43a80f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebAppFilter.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/YarnWebParams.java
Move YARN-4092 to 2.7.2 (jianhe: rev 4eaa7fd3eae4412ac0b964c617b1bbb17a39d8be)
* hadoop-yarn-project/CHANGES.txt


> RM HA UI redirection needs to be fixed when both RMs are in standby mode
> 
>
> Key: YARN-4092
> URL: https://issues.apache.org/jira/browse/YARN-4092
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0, 2.7.2
>
> Attachments: YARN-4092-branch-2.6.patch, YARN-4092.1.patch, 
> YARN-4092.2.patch, YARN-4092.3.patch, YARN-4092.4.patch
>
>
> In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
> accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-08-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724606#comment-14724606
 ] 

Hadoop QA commented on YARN-1651:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  21m 15s | Findbugs (version ) appears to 
be broken on YARN-1197. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 20 new or modified test files. |
| {color:red}-1{color} | javac |   9m  5s | The applied patch generated  1  
additional warning messages. |
| {color:red}-1{color} | javadoc |  12m  3s | The applied patch generated  2  
additional warning messages. |
| {color:red}-1{color} | release audit |   0m 22s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m  3s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |  26m 37s | The patch has 151  line(s) 
that end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   2m  3s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   6m 47s | The patch appears to introduce 8 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |  10m 17s | Tests passed in 
hadoop-mapreduce-client-app. |
| {color:green}+1{color} | tools/hadoop tests |   0m 57s | Tests passed in 
hadoop-sls. |
| {color:red}-1{color} | yarn tests |   7m 26s | Tests failed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   0m 26s | Tests passed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |  49m 36s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 149m 45s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-common |
| FindBugs | module:hadoop-yarn-server-resourcemanager |
| Failed unit tests | hadoop.yarn.client.TestApplicationClientProtocolOnHA |
|   | hadoop.yarn.client.api.impl.TestYarnClient |
|   | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates |
|   | hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing |
|   | hadoop.yarn.server.resourcemanager.TestResourceManager |
|   | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebAppFairScheduler |
|   | 
hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions |
|   | hadoop.yarn.server.resourcemanager.webapp.dao.TestFairSchedulerQueueInfo |
|   | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId |
|   | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12753408/YARN-1651-1.YARN-1197.patch
 |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | YARN-1197 / f35a945 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/8955/artifact/patchprocess/diffJavacWarnings.txt
 |
| javadoc | 
https://builds.apache.org/job/PreCommit-YARN-Build/8955/artifact/patchprocess/diffJavadocWarnings.txt
 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/8955/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8955/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8955/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8955/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-mapreduce-client-app test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8955/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt
 |
| hadoop-sls test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8955/artifact/patchprocess/testrun_hadoop-sls.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8955/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-server-common test log | 

[jira] [Commented] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-08-31 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723493#comment-14723493
 ] 

MENG DING commented on YARN-1651:
-

[~leftnoteasy], I just realized that there is one issue not discussed in the 
protocol design regarding the {{ContainerResourceChangeRequestProto}} that may 
affect the scheduler:

{code}
message ContainerResourceChangeRequestProto {
  optional ContainerIdProto container_id = 1;
  optional ResourceProto capability = 2;
} 
{code}

Shall we add a priority field to {{ContainerResourceChangeRequestProto}}? 
Without a priority field, how does the scheduler decide the priority between 
increase/decrease request and new allocation request within the same 
application? Does it simply assume that the increase/decrease request has the 
highest priority within the same application? If so, that may not be the 
correct thing to do ... What do you think?

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> ---
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app

2015-08-31 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-4096:


 Summary: App local logs are leaked if log aggregation fails to 
initialize for the app
 Key: YARN-4096
 URL: https://issues.apache.org/jira/browse/YARN-4096
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe


If log aggregation fails to initialize for an application then the local logs 
will never be deleted.  This is similar to YARN-3476 except this is a failure 
when log aggregation tries to initialize the app-specific log aggregator rather 
than a failure during a log upload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app

2015-08-31 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-4096:
-
Attachment: YARN-4096.001.patch

Patch that lets the app-specific aggregator continue to monitor the application 
in a disabled mode so it can delete the app logs when the app completes to 
prevent the leak.

> App local logs are leaked if log aggregation fails to initialize for the app
> 
>
> Key: YARN-4096
> URL: https://issues.apache.org/jira/browse/YARN-4096
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-4096.001.patch
>
>
> If log aggregation fails to initialize for an application then the local logs 
> will never be deleted.  This is similar to YARN-3476 except this is a failure 
> when log aggregation tries to initialize the app-specific log aggregator 
> rather than a failure during a log upload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-08-31 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723572#comment-14723572
 ] 

MENG DING commented on YARN-1651:
-

Correction, the concern is only for resource increase request. The decrease 
request should be irrelevant in this context.

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> ---
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-08-31 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1651:
-
Attachment: YARN-1651-WIP.YARN-1197.patch

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> ---
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-1651-WIP.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-08-31 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723608#comment-14723608
 ] 

Wangda Tan commented on YARN-1651:
--

Hi [~mding],
I think for now, we can assume increase has higher priority. We can add 
priority if people think it's important.

Attaching WIP patch, I think most of functionalities are completed, there're 
few pending items need to add some tests, and some part of code should be 
polished. The patch assumes increase request has higher priority than regular 
request.

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> ---
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-1651-WIP.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4081) Add support for multiple resource types in the Resource class

2015-08-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723618#comment-14723618
 ] 

Hadoop QA commented on YARN-4081:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 58s | Findbugs (version ) appears to 
be broken on YARN-3926. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 5 new or modified test files. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 39s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 29s | The applied patch generated  
87 new checkstyle issues (total was 10, now 97). |
| {color:red}-1{color} | whitespace |   0m 19s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 26s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   4m 37s | The patch appears to introduce 3 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 57s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |  52m 45s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  98m 42s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-api |
| Failed unit tests | hadoop.yarn.server.resourcemanager.TestApplicationCleanup 
|
|   | hadoop.yarn.server.resourcemanager.TestContainerResourceUsage |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12753296/YARN-4081-YARN-3926.002.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-3926 / c95993c |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8950/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8950/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8950/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8950/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8950/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8950/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8950/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8950/console |


This message was automatically generated.

> Add support for multiple resource types in the Resource class
> -
>
> Key: YARN-4081
> URL: https://issues.apache.org/jira/browse/YARN-4081
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4081-YARN-3926.001.patch, 
> YARN-4081-YARN-3926.002.patch
>
>
> For adding support for multiple resource types, we need to add support for 
> this in the Resource class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app

2015-08-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723628#comment-14723628
 ] 

Hadoop QA commented on YARN-4096:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 40s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 51s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 57s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 21s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 16s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   7m 31s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  45m  4s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12753307/YARN-4096.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / cf83156 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8951/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8951/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8951/console |


This message was automatically generated.

> App local logs are leaked if log aggregation fails to initialize for the app
> 
>
> Key: YARN-4096
> URL: https://issues.apache.org/jira/browse/YARN-4096
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-4096.001.patch
>
>
> If log aggregation fails to initialize for an application then the local logs 
> will never be deleted.  This is similar to YARN-3476 except this is a failure 
> when log aggregation tries to initialize the app-specific log aggregator 
> rather than a failure during a log upload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2928) YARN Timeline Service: Next generation

2015-08-31 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-2928:
-
Assignee: Sangjin Lee  (was: Vrushali C)

> YARN Timeline Service: Next generation
> --
>
> Key: YARN-2928
> URL: https://issues.apache.org/jira/browse/YARN-2928
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
> v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx, 
> TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf
>
>
> We have the application timeline server implemented in yarn per YARN-1530 and 
> YARN-321. Although it is a great feature, we have recognized several critical 
> issues and features that need to be addressed.
> This JIRA proposes the design and implementation changes to address those. 
> This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-31 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723646#comment-14723646
 ] 

Varun Saxena commented on YARN-4074:


Moreover, do we return metrics at all times for a flow run ? Or make returning 
on metric field conditional ? Should be fine as it is a single flow run. Just 
confirming because then I would not need fields as a query parameter in REST.

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.POC.001.patch, 
> YARN-4074-YARN-2928.POC.002.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4085) Generate file with container resource limits in the container work dir

2015-08-31 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723657#comment-14723657
 ] 

Varun Vasudev commented on YARN-4085:
-

How about both? When running slider for example, the env variables will be 
visible only to the slider agent and to the process that is launched by slider.

> Generate file with container resource limits in the container work dir
> --
>
> Key: YARN-4085
> URL: https://issues.apache.org/jira/browse/YARN-4085
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Minor
>
> Currently, a container doesn't know what resource limits are being imposed on 
> it. It would be helpful if the NM generated a simple file in the container 
> work dir with the resource limits specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4085) Generate file with container resource limits in the container work dir

2015-08-31 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723659#comment-14723659
 ] 

Varun Vasudev commented on YARN-4085:
-

[~hitesh] - let me know what else you would like exposed and I'll put all of it 
in one patch.

> Generate file with container resource limits in the container work dir
> --
>
> Key: YARN-4085
> URL: https://issues.apache.org/jira/browse/YARN-4085
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Minor
>
> Currently, a container doesn't know what resource limits are being imposed on 
> it. It would be helpful if the NM generated a simple file in the container 
> work dir with the resource limits specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-31 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723678#comment-14723678
 ] 

Vrushali C commented on YARN-4074:
--


bq. Should we support filtering on the basis of flow start time and probably 
end time as well ?

Going forward, yes, we do need to have a timerange parameter for queries. But 
for the PoC we work with what is being fetched/returned.

bq. Moreover, do we return metrics at all times for a flow run ? Or make 
returning on metric field conditional ? Just confirming because then I would 
not need fields as a query parameter in REST.

For the PoC we can return everything that is being fetched/ whatever is easier. 
But as such, we do need filtering out of metrics and returning subsets of 
metrics etc, so these query parameters would need to be worked out. We have to 
think about how we can allow for filtering of metrics, but we would anyways 
need a basic API that returns everything for a flow run, so I think for more 
enhancements can be added in later after the PoC, what do you think? 

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.POC.001.patch, 
> YARN-4074-YARN-2928.POC.002.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4081) Add support for multiple resource types in the Resource class

2015-08-31 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723691#comment-14723691
 ] 

Wangda Tan commented on YARN-4081:
--

[~vvasudev]:

Some comments/questions so far:

1) Uses String instead of URI for resource key? I think it maybe more efficient 
to use String, it will be easier to construct when using it, uses less resource 
(hasn't tested, but I think it will be true according to #fields in String and 
URI). I can understand the motivation of solving conflicts of resource 
namespace, but I think namespace conflict is not the major use case AND String 
can define namespace as well.

2) Relationship between ResourceInformation and ResourceMapEntry: currently 
it's 1-1 mapping, a ResourceInformation has value/unit from ResourceMapEntry, 
they're kind of overlapping and also confusing. I think it's better to make 
ResourceInformation to be one for each resource type. ResourceMapEntry contains 
runtime information, and ResourceInformation contains configured information. 
This will also avoid create ResourceInformation instance when invoking 
Resource.getResourceInformation()

3) Resource unit: I like the design which can easily convert a internal value 
to human-readable value. But I think maybe we don't need to support define unit 
in ResourceMapEntry. There're some cons of it:
- When we doing comparision of resources, we have to convert units, it's an 
extra overhead.
- It doesn't make a lot of sense to me that keep internal unit of resources: We 
should handle it when constructing Resource (something like 
Resource.newInstance("memory", 12, "GB")). And we will use the standard unit to 
do internal computations.
- We can define the standard unit in each "ResourceInformation" if you agree 
with #2.

4) Do you think it's better to have a global ResourceInformation map instead of 
storing it in each Resource instance?

5) Resource#compareTo/hashCode has debug logging.

6) It seems not necessary to instance ArrayList in Resource#compareTo. Just 
traverse the set can avoid create the temporary ArrayList.

> Add support for multiple resource types in the Resource class
> -
>
> Key: YARN-4081
> URL: https://issues.apache.org/jira/browse/YARN-4081
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4081-YARN-3926.001.patch, 
> YARN-4081-YARN-3926.002.patch
>
>
> For adding support for multiple resource types, we need to add support for 
> this in the Resource class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-31 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723733#comment-14723733
 ] 

Varun Saxena commented on YARN-4074:


Ok...For PoC, this should be fine.

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.POC.001.patch, 
> YARN-4074-YARN-2928.POC.002.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3970) REST api support for Application Priority

2015-08-31 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3970:

Attachment: YARN-3970.20150831-1.patch

Updated the following :
# test case name modified to *testUpdateAppPriority*
# More information in the console output  of Application CLI for updatePriority
# Formatting of the logged messages in CapacityScheduler
# Set proper error message in the response content when application is not in 
the desired state for updating the App's priority

> REST api support for Application Priority
> -
>
> Key: YARN-3970
> URL: https://issues.apache.org/jira/browse/YARN-3970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Naganarasimha G R
> Attachments: YARN-3970.20150828-1.patch, YARN-3970.20150829-1.patch, 
> YARN-3970.20150831-1.patch
>
>
> REST api support for application priority.
> - get/set priority of an application
> - get default priority of a queue
> - get cluster max priority



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-31 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723773#comment-14723773
 ] 

Sangjin Lee commented on YARN-4074:
---

Just to clarify on the flow run "end time". Note that there is no formal 
definition of the "end time" of a flow run, in the absence of a formal flow API 
in YARN. The "end time" in the flow run is the latest end time of an 
application that is part of that flow run. Just so that we're clear on that 
definition.

On a related note, there is no formal definition of the state of a flow run 
(i.e. we cannot say with certainty whether a flow run is ended). The only 
definitive thing we can say about this is if a flow run is still running, which 
can be determined by having a running app.

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.POC.001.patch, 
> YARN-4074-YARN-2928.POC.002.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4082) Container shouldn't be killed when node's label updated.

2015-08-31 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4082:
-
Attachment: YARN-4082.3.patch

Thanks [~vvasudev]! Uploaded ver.3 patch addressed all comments. 

> Container shouldn't be killed when node's label updated.
> 
>
> Key: YARN-4082
> URL: https://issues.apache.org/jira/browse/YARN-4082
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4082.1.patch, YARN-4082.2.patch, YARN-4082.3.patch
>
>
> From YARN-2920, containers will be killed if partition of a node changed. 
> Instead of killing containers, we should update resource-usage-by-partition 
> properly when node's partition updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3970) REST api support for Application Priority

2015-08-31 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723784#comment-14723784
 ] 

Varun Vasudev commented on YARN-3970:
-

Couple of things -
1. Can you add a GET method for the priority as well?
2. The JSON produced converts the integer to a string(\{"priority":"8"}). You 
can fix this by adding the AppPriority class to 
org.apache.hadoop.yarn.server.resourcemanager.webapp.JAXBContextResolver.

> REST api support for Application Priority
> -
>
> Key: YARN-3970
> URL: https://issues.apache.org/jira/browse/YARN-3970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Naganarasimha G R
> Attachments: YARN-3970.20150828-1.patch, YARN-3970.20150829-1.patch, 
> YARN-3970.20150831-1.patch
>
>
> REST api support for application priority.
> - get/set priority of an application
> - get default priority of a queue
> - get cluster max priority



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-31 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723790#comment-14723790
 ] 

Sangjin Lee commented on YARN-4074:
---

Also, while we're at it, I find the name {{FlowEntity}} quite confusing as it 
really encapsulates a flow run. I find myself having to comment that it is a 
flow run constantly.

Would there be an appetite for renaming this class to {{FlowRunEntity}} as part 
of this? The impact should be minimal. Let me know your thoughts.

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.POC.001.patch, 
> YARN-4074-YARN-2928.POC.002.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-08-31 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-3769:
-
Attachment: YARN-3769.001.branch-2.8.patch
YARN-3769.001.branch-2.7.patch

{quote}
One thing I've thought for a while is adding a "lazy preemption" mechanism, 
which is: when a container is marked preempted and wait for 
max_wait_before_time, it becomes a "can_be_killed" container. If there's 
another queue can allocate on a node with "can_be_killed" container, such 
container will be killed immediately to make room the new containers.

I will upload a design doc shortly for review.
{quote}

[~leftnoteasy], because it's been a couple of months since the last activity on 
this JIRA, would it be better to use this JIRA for the purpose of making the 
preemption monitor "user-limit" aware and open a separate JIRA to address a 
redesign?

Towards that end, I am uploading a couple of patches:
- {{YARN-3769.001.branch-2.7.patch}} is a patch to 2.7 (and also 2.6) which we 
have been using internally. This fix has dramatically reduced the instances of 
"ping-pong"-ing as I outlined in [the comment 
above|https://issues.apache.org/jira/browse/YARN-3769?focusedCommentId=14573619=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14573619].
 
- {{YARN-3769.001.branch-2.8.patch}} is similar to the fix made in 2.7, but it 
also takes into consideration node label partitions.
Thanks for your help and please let me know what you think.

> Preemption occurring unnecessarily because preemption doesn't consider user 
> limit
> -
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Attachments: YARN-3769.001.branch-2.7.patch, 
> YARN-3769.001.branch-2.8.patch
>
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-08-31 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4092:

Attachment: YARN-4092.4.patch

> RM HA UI redirection needs to be fixed when both RMs are in standby mode
> 
>
> Key: YARN-4092
> URL: https://issues.apache.org/jira/browse/YARN-4092
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4092.1.patch, YARN-4092.2.patch, YARN-4092.3.patch, 
> YARN-4092.4.patch
>
>
> In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
> accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4082) Container shouldn't be killed when node's label updated.

2015-08-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723927#comment-14723927
 ] 

Hadoop QA commented on YARN-4082:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 57s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 31s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 57s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 24s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  5s | The patch has 24  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 34s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  58m 54s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  97m 53s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12753328/YARN-4082.3.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / cf83156 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8953/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8953/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8953/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8953/console |


This message was automatically generated.

> Container shouldn't be killed when node's label updated.
> 
>
> Key: YARN-4082
> URL: https://issues.apache.org/jira/browse/YARN-4082
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4082.1.patch, YARN-4082.2.patch, YARN-4082.3.patch
>
>
> From YARN-2920, containers will be killed if partition of a node changed. 
> Instead of killing containers, we should update resource-usage-by-partition 
> properly when node's partition updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3970) REST api support for Application Priority

2015-08-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723934#comment-14723934
 ] 

Hadoop QA commented on YARN-3970:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 31s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   9m 36s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  13m  1s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 31s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 36s | The applied patch generated  3 
new checkstyle issues (total was 164, now 167). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 43s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 45s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 13s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   7m 14s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |  58m 36s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 114m 50s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12753325/YARN-3970.20150831-1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / cf83156 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8952/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8952/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8952/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8952/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8952/console |


This message was automatically generated.

> REST api support for Application Priority
> -
>
> Key: YARN-3970
> URL: https://issues.apache.org/jira/browse/YARN-3970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Naganarasimha G R
> Attachments: YARN-3970.20150828-1.patch, YARN-3970.20150829-1.patch, 
> YARN-3970.20150831-1.patch
>
>
> REST api support for application priority.
> - get/set priority of an application
> - get default priority of a queue
> - get cluster max priority



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4032) Corrupted state from a previous version can still cause RM to fail with NPE due to same reasons as YARN-2834

2015-08-31 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723953#comment-14723953
 ] 

Anubhav Dhoot commented on YARN-4032:
-

If fail-fast is false we would still need to take some corrective action to 
prevent a corrupted app in the state. That seems to me to fail the app attempts 
if the app is not present for this case.
Lemme know if you meant something else.

> Corrupted state from a previous version can still cause RM to fail with NPE 
> due to same reasons as YARN-2834
> 
>
> Key: YARN-4032
> URL: https://issues.apache.org/jira/browse/YARN-4032
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
>  Labels: 2.6.1-candidate
>
> YARN-2834 ensures in 2.6.0 there will not be any inconsistent state. But if 
> someone is upgrading from a previous version, the state can still be 
> inconsistent and then RM will still fail with NPE after upgrade to 2.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default

2015-08-31 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724003#comment-14724003
 ] 

Jian He commented on YARN-4087:
---

bq. In yarn-default.xml the default value for RM_FAIL_FAST is true.
DIdn't get you. Isn't the default value set to YARN_FAIL_FAST too?

> Set YARN_FAIL_FAST to be false by default
> -
>
> Key: YARN-4087
> URL: https://issues.apache.org/jira/browse/YARN-4087
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4087.1.patch, YARN-4087.2.patch
>
>
> Increasingly, I feel setting this property to be false makes more sense 
> especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-08-31 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3769:
-
Assignee: Eric Payne  (was: Wangda Tan)

> Preemption occurring unnecessarily because preemption doesn't consider user 
> limit
> -
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: YARN-3769.001.branch-2.7.patch, 
> YARN-3769.001.branch-2.8.patch
>
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-08-31 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724031#comment-14724031
 ] 

Wangda Tan commented on YARN-3769:
--

Sorry [~eepayne], I didn't make any progress on this :(, assigned this to you. 
I will create a new JIRA for long term solution.

> Preemption occurring unnecessarily because preemption doesn't consider user 
> limit
> -
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: YARN-3769.001.branch-2.7.patch, 
> YARN-3769.001.branch-2.8.patch
>
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-08-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724100#comment-14724100
 ] 

Hadoop QA commented on YARN-4092:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  23m  5s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m  7s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 10s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 26s | The applied patch generated  4 
new checkstyle issues (total was 0, now 4). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   4m  7s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   6m 52s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m  1s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  54m 32s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 113m 10s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12753355/YARN-4092.4.patch |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | trunk / caa04de |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8954/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8954/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8954/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8954/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8954/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8954/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8954/console |


This message was automatically generated.

> RM HA UI redirection needs to be fixed when both RMs are in standby mode
> 
>
> Key: YARN-4092
> URL: https://issues.apache.org/jira/browse/YARN-4092
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4092.1.patch, YARN-4092.2.patch, YARN-4092.3.patch, 
> YARN-4092.4.patch
>
>
> In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
> accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4097) Create POC timeline web UI with new YARN web UI framework

2015-08-31 Thread Li Lu (JIRA)
Li Lu created YARN-4097:
---

 Summary: Create POC timeline web UI with new YARN web UI framework
 Key: YARN-4097
 URL: https://issues.apache.org/jira/browse/YARN-4097
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu


As planned, we need to try out the new YARN web UI framework and implement 
timeline v2 web UI on top of it. This JIRA proposes to build the basic active 
flow and application lists of the timeline data. We can add more content after 
we get used to this framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3011) NM dies because of the failure of resource localization

2015-08-31 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3011:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. There were a bit of conflicts in 
ResourceLocalizationService.java, resolved them. Ran compilation and 
TestResourceLocalizationService before the push.

> NM dies because of the failure of resource localization
> ---
>
> Key: YARN-3011
> URL: https://issues.apache.org/jira/browse/YARN-3011
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.1
>Reporter: Wang Hao
>Assignee: Varun Saxena
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-3011.001.patch, YARN-3011.002.patch, 
> YARN-3011.003.patch, YARN-3011.004.patch
>
>
> NM dies because of IllegalArgumentException when localize resource.
> 2014-12-29 13:43:58,699 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Downloading public rsrc:{ 
> hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar,
>  1416997035456, FILE, null }
> 2014-12-29 13:43:58,699 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Downloading public rsrc:{ 
> hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/,
>  1419831474153, FILE, null }
> 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.IllegalArgumentException: Can not create a Path from an empty string
> at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
> at org.apache.hadoop.fs.Path.(Path.java:135)
> at org.apache.hadoop.fs.Path.(Path.java:94)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)  
>   
> at java.lang.Thread.run(Thread.java:745)
> 2014-12-29 13:43:58,701 INFO 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
> Initializing user hadoop
> 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Exiting, bbye..
> 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting 
> connection close header...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3103) AMRMClientImpl does not update AMRM token properly

2015-08-31 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3103:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestAMRMClient before the push. 
Patch applied cleanly.

> AMRMClientImpl does not update AMRM token properly
> --
>
> Key: YARN-3103
> URL: https://issues.apache.org/jira/browse/YARN-3103
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-3103.001.patch
>
>
> AMRMClientImpl.updateAMRMToken updates the token service _before_ storing it 
> to the credentials, so the token is mapped using the newly updated service 
> rather than the empty service that was used when the RM created the original 
> AMRM token.  This leads to two AMRM tokens in the credentials and can still 
> fail if the AMRMTokenSelector picks the wrong one.
> In addition the AMRMClientImpl grabs the login user rather than the current 
> user when security is enabled, so it's likely the UGI being updated is not 
> the UGI that will be used when reconnecting to the RM.
> The end result is that AMs can fail with invalid token errors when trying to 
> reconnect to an RM after a new AMRM secret has been activated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-08-31 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1651:
-
Attachment: YARN-1651-1.YARN-1197.patch

Attached ver.1 patch for review.

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> ---
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-1651-1.YARN-1197.patch, 
> YARN-1651-WIP.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-31 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724371#comment-14724371
 ] 

Wangda Tan commented on YARN-4024:
--

Latest patch LGTM, [~adhoot], any comments?

> YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
> --
>
> Key: YARN-4024
> URL: https://issues.apache.org/jira/browse/YARN-4024
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Hong Zhiguo
> Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, 
> YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, 
> YARN-4024-v6.patch, YARN-4024-v7.patch
>
>
> Currently, YARN RM NodesListManager will resolve IP address every time when 
> node doing heartbeat. When DNS server becomes slow, NM heartbeat will be 
> blocked and cannot make progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-31 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724393#comment-14724393
 ] 

Li Lu commented on YARN-4074:
-

bq. Would there be an appetite for renaming this class to FlowRunEntity as part 
of this? The impact should be minimal. Let me know your thoughts.
LGTM. IMO we only have flow run objects, but no actual flow objects? Even in 
flow-based offline aggregation we do not instantiate "flow" objects. This JIRA 
appears to be the right place to fix it. 

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.POC.001.patch, 
> YARN-4074-YARN-2928.POC.002.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run table

2015-08-31 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724416#comment-14724416
 ] 

Li Lu commented on YARN-3901:
-

Hi [~vrushalic], I thought you might be working on a new version of patch, so 
perhaps I should wait for another version and then post my detailed commnets? 
Back to the previous discussions:

bq. During table creation time, we specify the coprocessor class. This can also 
be done later by alter table command as desired.
This is totally fine for our POC. However, we do need to think about deployment 
in the future, together with many other challenges like Phoenix and/or offline 
aggregation. 

bq. There are some differences between the two aggregations, I think. Not sure 
if the classes can be reused without complicating development efforts. For the 
PoC I would like to focus on these tables independently. We could file follow 
up jiras to refactor the code as we see fit when the whole picture emerges, 
does that sound good?
Sure. Actually I was wondering if the strategy we're using here is also 
applicable to app level aggregations, since both of them receives online data 
and store them in HBase (our online storage, compare to Phoenix). Our 
time-based aggregator works in a quite different way, where it reads data from 
online storage and aggregate data in a batched fashion. This said, maybe we 
should use the approach in this patch as a general "online aggregation" 
approach, and provide aggregate APIs in timeline metric class for offline 
aggregators? 

> Populate flow run data in the flow_run table
> 
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-08-31 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724431#comment-14724431
 ] 

Eric Payne commented on YARN-3769:
--

bq. I didn't make any progress on this, assigned this to you.
No problem. Thanks [~leftnoteasy].

> Preemption occurring unnecessarily because preemption doesn't consider user 
> limit
> -
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: YARN-3769.001.branch-2.7.patch, 
> YARN-3769.001.branch-2.8.patch
>
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3368) Improve YARN web UI

2015-08-31 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3368:
-
Attachment: YARN-3368.poc.1.patch

Uploaded a simple POC YARN-UI patch for discussion, and sample screenshots.

It includes:
- An applications page and a queue page (WIP, now can only show a hierarchy of 
queues).
- Based on Ember.js 2.0.

It doesn't include:
- Integration to Hadoop's build system.
- Hosting in RM web UI

Try it:
- Follow the {{README.md}} under project directory.

Purpose of the POC patch is:
- Show how to use a front-end js framework retrive data via REST API and 
renders in browser.
- Unblock timeline server v2 UI POC: YARN-4097.
- *This is a WIP project, nobody should use it in production.*

Thanks a lot to [~pramachandran] and [~Sreenath] who spent a lot of time to 
help instructing how to use Ember.js framework. And suggestions/thoughts from 
[~vinodkv], [~jianhe] and [~gtCarrera9].

> Improve YARN web UI
> ---
>
> Key: YARN-3368
> URL: https://issues.apache.org/jira/browse/YARN-3368
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
> Attachments: YARN-3368.poc.1.patch
>
>
> The goal is to improve YARN UI for better usability.
> We may take advantage of some existing front-end frameworks to build a 
> fancier, easier-to-use UI. 
> The old UI continue to exist until  we feel it's ready to flip to the new UI.
> This serves as an umbrella jira to track the tasks. we can do this in a 
> branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3368) Improve YARN web UI

2015-08-31 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3368:
-
Attachment: Queue-Hierarchy-Screenshot.png

> Improve YARN web UI
> ---
>
> Key: YARN-3368
> URL: https://issues.apache.org/jira/browse/YARN-3368
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
> Attachments: Queue-Hierarchy-Screenshot.png, YARN-3368.poc.1.patch
>
>
> The goal is to improve YARN UI for better usability.
> We may take advantage of some existing front-end frameworks to build a 
> fancier, easier-to-use UI. 
> The old UI continue to exist until  we feel it's ready to flip to the new UI.
> This serves as an umbrella jira to track the tasks. we can do this in a 
> branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3368) Improve YARN web UI

2015-08-31 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3368:
-
Attachment: Applications-table-Screenshot.png

> Improve YARN web UI
> ---
>
> Key: YARN-3368
> URL: https://issues.apache.org/jira/browse/YARN-3368
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
> Attachments: Applications-table-Screenshot.png, 
> Queue-Hierarchy-Screenshot.png, YARN-3368.poc.1.patch
>
>
> The goal is to improve YARN UI for better usability.
> We may take advantage of some existing front-end frameworks to build a 
> fancier, easier-to-use UI. 
> The old UI continue to exist until  we feel it's ready to flip to the new UI.
> This serves as an umbrella jira to track the tasks. we can do this in a 
> branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3094) reset timer for liveness monitors after RM recovery

2015-08-31 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3094:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestAMLivelinessMonitor before the 
push. Patch applied cleanly.

> reset timer for liveness monitors after RM recovery
> ---
>
> Key: YARN-3094
> URL: https://issues.apache.org/jira/browse/YARN-3094
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-3094.2.patch, YARN-3094.3.patch, YARN-3094.4.patch, 
> YARN-3094.5.patch, YARN-3094.patch
>
>
> When RM restarts, it will recover RMAppAttempts and registry them to 
> AMLivenessMonitor if they are not in final state. AM will time out in RM if 
> the recover process takes long time due to some reasons(e.g. too many apps). 
> In our system, we found the recover process took about 3 mins, and all AM 
> time out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4097) Create POC timeline web UI with new YARN web UI framework

2015-08-31 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724458#comment-14724458
 ] 

Li Lu commented on YARN-4097:
-

We may want to make use of the latest progress in YARN-3368 and try it here. 
This can be the starting point of our experimental UI. 

> Create POC timeline web UI with new YARN web UI framework
> -
>
> Key: YARN-4097
> URL: https://issues.apache.org/jira/browse/YARN-4097
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
>
> As planned, we need to try out the new YARN web UI framework and implement 
> timeline v2 web UI on top of it. This JIRA proposes to build the basic active 
> flow and application lists of the timeline data. We can add more content 
> after we get used to this framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice

2015-08-31 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2246:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestRMAppAttemptTransitions before 
the push. Patch applied cleanly.


> Job History Link in RM UI is redirecting to the URL which contains Job Id 
> twice
> ---
>
> Key: YARN-2246
> URL: https://issues.apache.org/jira/browse/YARN-2246
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Devaraj K
>Assignee: Devaraj K
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch, 
> YARN-2246-3.patch, YARN-2246-4.patch, YARN-2246.2.patch, YARN-2246.patch
>
>
> {code:xml}
> http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-08-31 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4092:
--
Fix Version/s: 2.7.2

> RM HA UI redirection needs to be fixed when both RMs are in standby mode
> 
>
> Key: YARN-4092
> URL: https://issues.apache.org/jira/browse/YARN-4092
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0, 2.7.2
>
> Attachments: YARN-4092.1.patch, YARN-4092.2.patch, YARN-4092.3.patch, 
> YARN-4092.4.patch
>
>
> In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
> accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-08-31 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4092:
--
Target Version/s: 2.8.0, 2.7.2  (was: 2.8.0)

> RM HA UI redirection needs to be fixed when both RMs are in standby mode
> 
>
> Key: YARN-4092
> URL: https://issues.apache.org/jira/browse/YARN-4092
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0, 2.7.2
>
> Attachments: YARN-4092.1.patch, YARN-4092.2.patch, YARN-4092.3.patch, 
> YARN-4092.4.patch
>
>
> In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
> accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-08-31 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724493#comment-14724493
 ] 

Jian He commented on YARN-4092:
---

Committed to branch-2.7.

> RM HA UI redirection needs to be fixed when both RMs are in standby mode
> 
>
> Key: YARN-4092
> URL: https://issues.apache.org/jira/browse/YARN-4092
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0, 2.7.2
>
> Attachments: YARN-4092.1.patch, YARN-4092.2.patch, YARN-4092.3.patch, 
> YARN-4092.4.patch
>
>
> In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
> accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3207) secondary filter matches entites which do not have the key being filtered for.

2015-08-31 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3207:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestLeveldbTimelineStore, 
TestMemoryTimelineStore, TestTimelineDataManager before the push. Patch applied 
cleanly.



> secondary filter matches entites which do not have the key being filtered for.
> --
>
> Key: YARN-3207
> URL: https://issues.apache.org/jira/browse/YARN-3207
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Prakash Ramachandran
>Assignee: Zhijie Shen
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-3207.1.patch
>
>
> in the leveldb implementation of the TimelineStore the secondary filter 
> matches entities where the key being searched for is not present.
> ex query from tez ui
> http://uvm:8188/ws/v1/timeline/TEZ_DAG_ID/?limit=1=foo:bar
> will match and return the entity even though there is no entity with 
> otherinfo.foo defined.
> the issue seems to be in 
> {code:title=LeveldbTimelineStore:675}
> if (vs != null && !vs.contains(filter.getValue())) {
>   filterPassed = false;
>   break;
> }
> {code}
> this should be IMHO
> vs == null || !vs.contains(filter.getValue())



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-08-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724506#comment-14724506
 ] 

Hudson commented on YARN-4092:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8374 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8374/])
YARN-4092. Fixed UI redirection to print useful messages when both RMs are in 
standby mode. Contributed by Xuan Gong (jianhe: rev 
a3fd2ccc869dfc1f04d1cf0a8678d4d90a43a80f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebAppFilter.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/YarnWebParams.java
Move YARN-4092 to 2.7.2 (jianhe: rev 4eaa7fd3eae4412ac0b964c617b1bbb17a39d8be)
* hadoop-yarn-project/CHANGES.txt


> RM HA UI redirection needs to be fixed when both RMs are in standby mode
> 
>
> Key: YARN-4092
> URL: https://issues.apache.org/jira/browse/YARN-4092
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0, 2.7.2
>
> Attachments: YARN-4092.1.patch, YARN-4092.2.patch, YARN-4092.3.patch, 
> YARN-4092.4.patch
>
>
> In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
> accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run table

2015-08-31 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724518#comment-14724518
 ] 

Vrushali C commented on YARN-3901:
--

Hey [~gtCarrera9]

Yes, I am working on an update to the patch. Had one question though.

bq. This is totally fine for our POC. However, we do need to think about 
deployment in the future, together with many other challenges like Phoenix 
and/or offline aggregation
I am not sure I understand. HBase Coprocessor classes are added to table 
schemas at table creation time or by altering the table. This isn't a short cut 
being done for this PoC. This is the way a coprocessor class is being added. 
Are you saying this is not fine? Am open to recommendations. 

thanks
Vrushali

> Populate flow run data in the flow_run table
> 
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run table

2015-08-31 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724527#comment-14724527
 ] 

Li Lu commented on YARN-3901:
-

bq. HBase Coprocessor classes are added to table schemas at table creation time 
or by altering the table. This isn't a short cut being done for this PoC. This 
is the way a coprocessor class is being added. Are you saying this is not fine?
Oh this is totally fine. I'm just trying to make sure we're considering 
deployment issue. I'm OK with the proposal here. 

> Populate flow run data in the flow_run table
> 
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-31 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724534#comment-14724534
 ] 

Li Lu commented on YARN-4074:
-

bq. As a general question, since we're returning our timeline entities as jsons 
in our web service, we need to some sort "rebuild" those entities on the js 
client side, right? If this is the case, we need to provide some js object 
model to be consistent with our TimelineEntity object model? I'm not a 
front-end expert so I'd like to learn the typical practice on this problem.
bq. I'm not intimately familiar with that either. I hope someone who's familiar 
could comment?

I was told by the HDFS community that they are using Dust 
(github.com/linkedin/dustjs) templates to do this. They have code available in 
our codebase as well. I'm planning to look into this framework in our POC 
(YARN-4097). 

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.POC.001.patch, 
> YARN-4074-YARN-2928.POC.002.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-08-31 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724548#comment-14724548
 ] 

Xuan Gong edited comment on YARN-4092 at 9/1/15 1:34 AM:
-

Created a patch for branch-2.6


was (Author: xgong):
Create a patch for branch-2.6

> RM HA UI redirection needs to be fixed when both RMs are in standby mode
> 
>
> Key: YARN-4092
> URL: https://issues.apache.org/jira/browse/YARN-4092
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0, 2.7.2
>
> Attachments: YARN-4092-branch-2.6.patch, YARN-4092.1.patch, 
> YARN-4092.2.patch, YARN-4092.3.patch, YARN-4092.4.patch
>
>
> In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
> accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-08-31 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4092:

Attachment: YARN-4092-branch-2.6.patch

Create a patch for branch-2.6

> RM HA UI redirection needs to be fixed when both RMs are in standby mode
> 
>
> Key: YARN-4092
> URL: https://issues.apache.org/jira/browse/YARN-4092
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0, 2.7.2
>
> Attachments: YARN-4092-branch-2.6.patch, YARN-4092.1.patch, 
> YARN-4092.2.patch, YARN-4092.3.patch, YARN-4092.4.patch
>
>
> In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
> accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run table

2015-08-31 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724556#comment-14724556
 ] 

Vrushali C commented on YARN-3901:
--

bq.  I'm just trying to make sure we're considering deployment issue. 

Yes, right on deployment and upgrade of coprocessors is something that 
needs to be thought through and we need to have guidelines setup for those. It 
will need restart of region servers and in production, we need to think this 
through carefully. 

> Populate flow run data in the flow_run table
> 
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app

2015-08-31 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724577#comment-14724577
 ] 

zhihai xu commented on YARN-4096:
-

Hi [~jlowe], It is a good catch. It is nice to make an useless variable 
{{logAggregationDisabled}} become useful. {{initApp}} will catch 
{{YarnRuntimeException}} and send event {{APPLICATION_LOG_HANDLING_FAILED}} to 
{{ApplicationImpl}}, {{ApplicationImpl}} will send event 
{{LogHandlerAppFinishedEvent}} to clean up the local application logs 
{{doAppLogAggregationPostCleanUp}} when application finished.
+1 for the patch. 

> App local logs are leaked if log aggregation fails to initialize for the app
> 
>
> Key: YARN-4096
> URL: https://issues.apache.org/jira/browse/YARN-4096
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-4096.001.patch
>
>
> If log aggregation fails to initialize for an application then the local logs 
> will never be deleted.  This is similar to YARN-3476 except this is a failure 
> when log aggregation tries to initialize the app-specific log aggregator 
> rather than a failure during a log upload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default

2015-08-31 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723242#comment-14723242
 ] 

Bibin A Chundatt commented on YARN-4087:


In yarn-default.xml the default value for RM_FAIL_FAST is true.
In code the default value for RM_FAIL_FAST is taken from YARN_FAIL_FAST whose 
value is false.

> Set YARN_FAIL_FAST to be false by default
> -
>
> Key: YARN-4087
> URL: https://issues.apache.org/jira/browse/YARN-4087
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4087.1.patch, YARN-4087.2.patch
>
>
> Increasingly, I feel setting this property to be false makes more sense 
> especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-31 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723307#comment-14723307
 ] 

Varun Saxena commented on YARN-4074:


Just had a cursory glance at the patch. A couple of points.
# We will be returning a set of {{FlowActivityEntity}} to the user. Should this 
class be in hadoop-yarn-api instead then ? Because client will have to parse 
JSON as a set of FlowActivityEntity objects.
# Should we support filtering on the basis of flow start time and probably end 
time as well ?


> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.POC.001.patch, 
> YARN-4074-YARN-2928.POC.002.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-31 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723297#comment-14723297
 ] 

Rohith Sharma K S commented on YARN-3893:
-

+1 lgtm.. Will commit it tomorrow if there is no objections/comments from other 
folks.. 

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3970) REST api support for Application Priority

2015-08-31 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724764#comment-14724764
 ] 

Naganarasimha G R commented on YARN-3970:
-

Thanks [~vvasudev], for the review.
bq. Can you add a GET method for the priority as well?
Application's Priority is currently available as part of the application report 
({{/apps}} or {{/apps/\{appid\}}}) , explicit separate API for getting 
Application Priority is req ?

bq.The JSON produced converts the integer to a string({"priority":"8"})
Adding it to JAXBContextResolver.rootUnWrappedTypes worked as you suggested, 
but wanted to know when to put to {{rootUnWrappedTypes}} and {{cTypes}} i.e. 
when rootUnwrapping needs to be set to true and when to false? 

Only 3rd checkstyle issue is valid and once [~vvasudev] confirms, will provide 
the corrections in next patch

> REST api support for Application Priority
> -
>
> Key: YARN-3970
> URL: https://issues.apache.org/jira/browse/YARN-3970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Naganarasimha G R
> Attachments: YARN-3970.20150828-1.patch, YARN-3970.20150829-1.patch, 
> YARN-3970.20150831-1.patch
>
>
> REST api support for application priority.
> - get/set priority of an application
> - get default priority of a queue
> - get cluster max priority



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-08-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724797#comment-14724797
 ] 

Hudson commented on YARN-4092:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2255 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2255/])
YARN-4092. Fixed UI redirection to print useful messages when both RMs are in 
standby mode. Contributed by Xuan Gong (jianhe: rev 
a3fd2ccc869dfc1f04d1cf0a8678d4d90a43a80f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebAppFilter.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/YarnWebParams.java
Move YARN-4092 to 2.7.2 (jianhe: rev 4eaa7fd3eae4412ac0b964c617b1bbb17a39d8be)
* hadoop-yarn-project/CHANGES.txt


> RM HA UI redirection needs to be fixed when both RMs are in standby mode
> 
>
> Key: YARN-4092
> URL: https://issues.apache.org/jira/browse/YARN-4092
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0, 2.7.2
>
> Attachments: YARN-4092-branch-2.6.patch, YARN-4092.1.patch, 
> YARN-4092.2.patch, YARN-4092.3.patch, YARN-4092.4.patch
>
>
> In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
> accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3970) REST api support for Application Priority

2015-08-31 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724769#comment-14724769
 ] 

Sunil G commented on YARN-3970:
---

Hi [~Naganarasimha]
Thanks for the updated patch.

One more point.
{{XmlRootElement(name = "apppriority")}}
In AppPriority, I feel root name can be {{applicationpriority}}. Current one 
may cause typo pblms in real use cases.


> REST api support for Application Priority
> -
>
> Key: YARN-3970
> URL: https://issues.apache.org/jira/browse/YARN-3970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Naganarasimha G R
> Attachments: YARN-3970.20150828-1.patch, YARN-3970.20150829-1.patch, 
> YARN-3970.20150831-1.patch
>
>
> REST api support for application priority.
> - get/set priority of an application
> - get default priority of a queue
> - get cluster max priority



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-08-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724783#comment-14724783
 ] 

Hudson commented on YARN-4092:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #316 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/316/])
YARN-4092. Fixed UI redirection to print useful messages when both RMs are in 
standby mode. Contributed by Xuan Gong (jianhe: rev 
a3fd2ccc869dfc1f04d1cf0a8678d4d90a43a80f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebAppFilter.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/YarnWebParams.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
Move YARN-4092 to 2.7.2 (jianhe: rev 4eaa7fd3eae4412ac0b964c617b1bbb17a39d8be)
* hadoop-yarn-project/CHANGES.txt


> RM HA UI redirection needs to be fixed when both RMs are in standby mode
> 
>
> Key: YARN-4092
> URL: https://issues.apache.org/jira/browse/YARN-4092
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0, 2.7.2
>
> Attachments: YARN-4092-branch-2.6.patch, YARN-4092.1.patch, 
> YARN-4092.2.patch, YARN-4092.3.patch, YARN-4092.4.patch
>
>
> In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
> accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3970) REST api support for Application Priority

2015-08-31 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724793#comment-14724793
 ] 

Varun Vasudev commented on YARN-3970:
-

bq. Application's Priority is currently available as part of the application 
report (/apps or /apps/{appid}) , explicit separate API for getting Application 
Priority is req ?
It's useful to have.

bq. when rootUnwrapping needs to be set to true and when to false? 

If you want the AppPriority dao JSON to be \{ "apppriority": \{ "priority": 8 } 
}, set root unwrapping to be false. If you want it to be \{ "priority": 8}, set 
it to true.

I also forgot to mention, please update the web services documentation 
providing examples of how to use the REST service.

> REST api support for Application Priority
> -
>
> Key: YARN-3970
> URL: https://issues.apache.org/jira/browse/YARN-3970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Naganarasimha G R
> Attachments: YARN-3970.20150828-1.patch, YARN-3970.20150829-1.patch, 
> YARN-3970.20150831-1.patch
>
>
> REST api support for application priority.
> - get/set priority of an application
> - get default priority of a queue
> - get cluster max priority



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-08-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724770#comment-14724770
 ] 

Hudson commented on YARN-4092:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2274 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2274/])
YARN-4092. Fixed UI redirection to print useful messages when both RMs are in 
standby mode. Contributed by Xuan Gong (jianhe: rev 
a3fd2ccc869dfc1f04d1cf0a8678d4d90a43a80f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebAppFilter.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/YarnWebParams.java
Move YARN-4092 to 2.7.2 (jianhe: rev 4eaa7fd3eae4412ac0b964c617b1bbb17a39d8be)
* hadoop-yarn-project/CHANGES.txt


> RM HA UI redirection needs to be fixed when both RMs are in standby mode
> 
>
> Key: YARN-4092
> URL: https://issues.apache.org/jira/browse/YARN-4092
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0, 2.7.2
>
> Attachments: YARN-4092-branch-2.6.patch, YARN-4092.1.patch, 
> YARN-4092.2.patch, YARN-4092.3.patch, YARN-4092.4.patch
>
>
> In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
> accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4081) Add support for multiple resource types in the Resource class

2015-08-31 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4081:

Attachment: YARN-4081-YARN-3926.002.patch

Uploaded a new patch to get rid of un-neccessary formatting changes.

> Add support for multiple resource types in the Resource class
> -
>
> Key: YARN-4081
> URL: https://issues.apache.org/jira/browse/YARN-4081
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4081-YARN-3926.001.patch, 
> YARN-4081-YARN-3926.002.patch
>
>
> For adding support for multiple resource types, we need to add support for 
> this in the Resource class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4081) Add support for multiple resource types in the Resource class

2015-08-31 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723459#comment-14723459
 ] 

Varun Vasudev commented on YARN-4081:
-

So I ran a benchmark for the Resource.newInstance call using the Apache jmh 
framework.

The current implementation returned a performance of 0.033 ±(99.9%) 0.003 
ops/ns [Average]
The Map based implementation returned a performance of 0.005 ±(99.9%) 0.001 
ops/ns [Average]

So a significant drop but that still translates to being able to create 5M 
Resource objects per second per thread(the test was run in a single thread). 
This was on my MacBookPro(i7-2.3G). I'm unsure of how to test the GC 
performance.

> Add support for multiple resource types in the Resource class
> -
>
> Key: YARN-4081
> URL: https://issues.apache.org/jira/browse/YARN-4081
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4081-YARN-3926.001.patch
>
>
> For adding support for multiple resource types, we need to add support for 
> this in the Resource class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)