[jira] [Commented] (YARN-2884) Proxying all AM-RM communications

2015-08-30 Thread Kishore Chaliparambil (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723049#comment-14723049
 ] 

Kishore Chaliparambil commented on YARN-2884:
-

Thanks [~jianhe]. I will address these comments and upload the patch. Also as 
you suggested, I think I will create a new Jira for simulating the token 
renewal behavior in the proxy service since it might take more time.

> Proxying all AM-RM communications
> -
>
> Key: YARN-2884
> URL: https://issues.apache.org/jira/browse/YARN-2884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Kishore Chaliparambil
> Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, 
> YARN-2884-V11.patch, YARN-2884-V2.patch, YARN-2884-V3.patch, 
> YARN-2884-V4.patch, YARN-2884-V5.patch, YARN-2884-V6.patch, 
> YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch
>
>
> We introduce the notion of an RMProxy, running on each node (or once per 
> rack). Upon start the AM is forced (via tokens and configuration) to direct 
> all its requests to a new services running on the NM that provide a proxy to 
> the central RM. 
> This give us a place to:
> 1) perform distributed scheduling decisions
> 2) throttling mis-behaving AMs
> 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-08-30 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723023#comment-14723023
 ] 

Tsuyoshi Ozawa commented on YARN-3798:
--

zhihai, Thanks a lot.

[~vinodkv] cc: [~jianhe] please notify us if we need to update the patch. I 
think it's ready.

> ZKRMStateStore shouldn't create new session without occurrance of 
> SESSIONEXPIED
> ---
>
> Key: YARN-3798
> URL: https://issues.apache.org/jira/browse/YARN-3798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Attachments: RM.log, YARN-3798-2.7.002.patch, 
> YARN-3798-branch-2.6.01.patch, YARN-3798-branch-2.7.002.patch, 
> YARN-3798-branch-2.7.003.patch, YARN-3798-branch-2.7.004.patch, 
> YARN-3798-branch-2.7.005.patch, YARN-3798-branch-2.7.006.patch, 
> YARN-3798-branch-2.7.patch
>
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,887 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
> out ZK retries. Giving up!
> 2015-06-09 10:09:44,887 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating appAttempt: appattempt_1433764310492_7152_01
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at o

[jira] [Updated] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-08-30 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3798:
-
Target Version/s: 2.6.1, 2.7.2  (was: 2.7.2)

> ZKRMStateStore shouldn't create new session without occurrance of 
> SESSIONEXPIED
> ---
>
> Key: YARN-3798
> URL: https://issues.apache.org/jira/browse/YARN-3798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Attachments: RM.log, YARN-3798-2.7.002.patch, 
> YARN-3798-branch-2.6.01.patch, YARN-3798-branch-2.7.002.patch, 
> YARN-3798-branch-2.7.003.patch, YARN-3798-branch-2.7.004.patch, 
> YARN-3798-branch-2.7.005.patch, YARN-3798-branch-2.7.006.patch, 
> YARN-3798-branch-2.7.patch
>
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,887 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
> out ZK retries. Giving up!
> 2015-06-09 10:09:44,887 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating appAttempt: appattempt_1433764310492_7152_01
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java

[jira] [Commented] (YARN-4095) Avoid sharing AllocatorPerContext object in LocalDirAllocator between ShuffleHandler and LocalDirsHandlerService.

2015-08-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14722954#comment-14722954
 ] 

Hadoop QA commented on YARN-4095:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 20s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 50s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 46s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 21s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |   7m 29s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  56m  7s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12753223/YARN-4095.000.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / cf83156 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8949/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8949/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8949/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8949/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8949/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8949/console |


This message was automatically generated.

> Avoid sharing AllocatorPerContext object in LocalDirAllocator between 
> ShuffleHandler and LocalDirsHandlerService.
> -
>
> Key: YARN-4095
> URL: https://issues.apache.org/jira/browse/YARN-4095
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4095.000.patch
>
>
> Currently {{ShuffleHandler}} and {{LocalDirsHandlerService}} share 
> {{AllocatorPerContext}} object in {{LocalDirAllocator}} for configuration 
> {{NM_LOCAL_DIRS}} because {{AllocatorPerContext}} are stored in a static 
> TreeMap with configuration name as key
> {code}
>   private static Map  contexts = 
>  new TreeMap();
> {code}
> {{LocalDirsHandlerService}} and {{ShuffleHandler}} both create a 
> {{LocalDirAllocator}} using {{NM_LOCAL_DIRS}}. Even they don't use the same 
> {{Configuration}} object, but they will use the same {{AllocatorPerContext}} 
> object. Also {{LocalDirsHandlerService}} may change {{NM_LOCAL_DIRS}} value 
> in its {{Configuration}} object to exclude full and bad local dirs, 
> {{ShuffleHandler}} always uses the original {{NM_LOCAL_DIRS}} value in its 
> {{Configuration}} object. So every time {{AllocatorPerContext#confChanged}} 
> is called by {{ShuffleHandler}} after {{LocalDirsHandlerService}}, 
> {{AllocatorPerContext}} need be reinitialized because {{NM_LOCAL_DIRS}} value 
> is changed. This will cause some overhead.
> {code}
>   String newLocalDirs = conf.get(contextCfgItemName);
>   if (!newLocalDirs.equals(savedLocalDirs)) {
> {code}
> So it will be a good improvement to not share the same 
> {{AllocatorPerContext}} instance between {{ShuffleHandler}} and 

[jira] [Commented] (YARN-2945) FSLeafQueue#assignContainer - document the reason for using both write and read locks

2015-08-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14722841#comment-14722841
 ] 

Hudson commented on YARN-2945:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #322 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/322/])
YARN-2945. Fixing the CHANGES.txt to have the right JIRA number. (vinodkv: rev 
cf831565e8344523e1bd0eaf686ed56a2b48b920)
* hadoop-yarn-project/CHANGES.txt


> FSLeafQueue#assignContainer - document the reason for using both write and 
> read locks
> -
>
> Key: YARN-2945
> URL: https://issues.apache.org/jira/browse/YARN-2945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
> Fix For: 2.7.0
>
> Attachments: YARN-2945.001.patch, YARN-2945.002.patch
>
>
> After YARN-2910, assignContainer hold WriteLock while sorting and ReadLock 
> while referencing runnableApps. This can cause interrupted assignment of 
> containers regardless of the policy.
> {code}
> writeLock.lock();
> try {
>   Collections.sort(runnableApps, comparator);
> } finally {
>   writeLock.unlock();
> }
> readLock.lock();
> try {
>   for (FSAppAttempt sched : runnableApps) {
> if (SchedulerAppUtils.isBlacklisted(sched, node, LOG)) {
>   continue;
> }
> assigned = sched.assignContainer(node);
> if (!assigned.equals(Resources.none())) {
>   break;
> }
>}
> } finally {
>   readLock.unlock();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2945) FSLeafQueue#assignContainer - document the reason for using both write and read locks

2015-08-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14722839#comment-14722839
 ] 

Hudson commented on YARN-2945:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #1055 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1055/])
YARN-2945. Fixing the CHANGES.txt to have the right JIRA number. (vinodkv: rev 
cf831565e8344523e1bd0eaf686ed56a2b48b920)
* hadoop-yarn-project/CHANGES.txt


> FSLeafQueue#assignContainer - document the reason for using both write and 
> read locks
> -
>
> Key: YARN-2945
> URL: https://issues.apache.org/jira/browse/YARN-2945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
> Fix For: 2.7.0
>
> Attachments: YARN-2945.001.patch, YARN-2945.002.patch
>
>
> After YARN-2910, assignContainer hold WriteLock while sorting and ReadLock 
> while referencing runnableApps. This can cause interrupted assignment of 
> containers regardless of the policy.
> {code}
> writeLock.lock();
> try {
>   Collections.sort(runnableApps, comparator);
> } finally {
>   writeLock.unlock();
> }
> readLock.lock();
> try {
>   for (FSAppAttempt sched : runnableApps) {
> if (SchedulerAppUtils.isBlacklisted(sched, node, LOG)) {
>   continue;
> }
> assigned = sched.assignContainer(node);
> if (!assigned.equals(Resources.none())) {
>   break;
> }
>}
> } finally {
>   readLock.unlock();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2945) FSLeafQueue#assignContainer - document the reason for using both write and read locks

2015-08-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14722713#comment-14722713
 ] 

Hudson commented on YARN-2945:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #328 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/328/])
YARN-2945. Fixing the CHANGES.txt to have the right JIRA number. (vinodkv: rev 
cf831565e8344523e1bd0eaf686ed56a2b48b920)
* hadoop-yarn-project/CHANGES.txt


> FSLeafQueue#assignContainer - document the reason for using both write and 
> read locks
> -
>
> Key: YARN-2945
> URL: https://issues.apache.org/jira/browse/YARN-2945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
> Fix For: 2.7.0
>
> Attachments: YARN-2945.001.patch, YARN-2945.002.patch
>
>
> After YARN-2910, assignContainer hold WriteLock while sorting and ReadLock 
> while referencing runnableApps. This can cause interrupted assignment of 
> containers regardless of the policy.
> {code}
> writeLock.lock();
> try {
>   Collections.sort(runnableApps, comparator);
> } finally {
>   writeLock.unlock();
> }
> readLock.lock();
> try {
>   for (FSAppAttempt sched : runnableApps) {
> if (SchedulerAppUtils.isBlacklisted(sched, node, LOG)) {
>   continue;
> }
> assigned = sched.assignContainer(node);
> if (!assigned.equals(Resources.none())) {
>   break;
> }
>}
> } finally {
>   readLock.unlock();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2945) FSLeafQueue#assignContainer - document the reason for using both write and read locks

2015-08-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14722709#comment-14722709
 ] 

Hudson commented on YARN-2945:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #313 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/313/])
YARN-2945. Fixing the CHANGES.txt to have the right JIRA number. (vinodkv: rev 
cf831565e8344523e1bd0eaf686ed56a2b48b920)
* hadoop-yarn-project/CHANGES.txt


> FSLeafQueue#assignContainer - document the reason for using both write and 
> read locks
> -
>
> Key: YARN-2945
> URL: https://issues.apache.org/jira/browse/YARN-2945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
> Fix For: 2.7.0
>
> Attachments: YARN-2945.001.patch, YARN-2945.002.patch
>
>
> After YARN-2910, assignContainer hold WriteLock while sorting and ReadLock 
> while referencing runnableApps. This can cause interrupted assignment of 
> containers regardless of the policy.
> {code}
> writeLock.lock();
> try {
>   Collections.sort(runnableApps, comparator);
> } finally {
>   writeLock.unlock();
> }
> readLock.lock();
> try {
>   for (FSAppAttempt sched : runnableApps) {
> if (SchedulerAppUtils.isBlacklisted(sched, node, LOG)) {
>   continue;
> }
> assigned = sched.assignContainer(node);
> if (!assigned.equals(Resources.none())) {
>   break;
> }
>}
> } finally {
>   readLock.unlock();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4095) Avoid sharing AllocatorPerContext object in LocalDirAllocator between ShuffleHandler and LocalDirsHandlerService.

2015-08-30 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4095:

Attachment: YARN-4095.000.patch

> Avoid sharing AllocatorPerContext object in LocalDirAllocator between 
> ShuffleHandler and LocalDirsHandlerService.
> -
>
> Key: YARN-4095
> URL: https://issues.apache.org/jira/browse/YARN-4095
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4095.000.patch
>
>
> Currently {{ShuffleHandler}} and {{LocalDirsHandlerService}} share 
> {{AllocatorPerContext}} object in {{LocalDirAllocator}} for configuration 
> {{NM_LOCAL_DIRS}} because {{AllocatorPerContext}} are stored in a static 
> TreeMap with configuration name as key
> {code}
>   private static Map  contexts = 
>  new TreeMap();
> {code}
> {{LocalDirsHandlerService}} and {{ShuffleHandler}} both create a 
> {{LocalDirAllocator}} using {{NM_LOCAL_DIRS}}. Even they don't use the same 
> {{Configuration}} object, but they will use the same {{AllocatorPerContext}} 
> object. Also {{LocalDirsHandlerService}} may change {{NM_LOCAL_DIRS}} value 
> in its {{Configuration}} object to exclude full and bad local dirs, 
> {{ShuffleHandler}} always uses the original {{NM_LOCAL_DIRS}} value in its 
> {{Configuration}} object. So every time {{AllocatorPerContext#confChanged}} 
> is called by {{ShuffleHandler}} after {{LocalDirsHandlerService}}, 
> {{AllocatorPerContext}} need be reinitialized because {{NM_LOCAL_DIRS}} value 
> is changed. This will cause some overhead.
> {code}
>   String newLocalDirs = conf.get(contextCfgItemName);
>   if (!newLocalDirs.equals(savedLocalDirs)) {
> {code}
> So it will be a good improvement to not share the same 
> {{AllocatorPerContext}} instance between {{ShuffleHandler}} and 
> {{LocalDirsHandlerService}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4095) Avoid sharing AllocatorPerContext object in LocalDirAllocator between ShuffleHandler and LocalDirsHandlerService.

2015-08-30 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4095:

Attachment: (was: YARN-4095.000.patch)

> Avoid sharing AllocatorPerContext object in LocalDirAllocator between 
> ShuffleHandler and LocalDirsHandlerService.
> -
>
> Key: YARN-4095
> URL: https://issues.apache.org/jira/browse/YARN-4095
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4095.000.patch
>
>
> Currently {{ShuffleHandler}} and {{LocalDirsHandlerService}} share 
> {{AllocatorPerContext}} object in {{LocalDirAllocator}} for configuration 
> {{NM_LOCAL_DIRS}} because {{AllocatorPerContext}} are stored in a static 
> TreeMap with configuration name as key
> {code}
>   private static Map  contexts = 
>  new TreeMap();
> {code}
> {{LocalDirsHandlerService}} and {{ShuffleHandler}} both create a 
> {{LocalDirAllocator}} using {{NM_LOCAL_DIRS}}. Even they don't use the same 
> {{Configuration}} object, but they will use the same {{AllocatorPerContext}} 
> object. Also {{LocalDirsHandlerService}} may change {{NM_LOCAL_DIRS}} value 
> in its {{Configuration}} object to exclude full and bad local dirs, 
> {{ShuffleHandler}} always uses the original {{NM_LOCAL_DIRS}} value in its 
> {{Configuration}} object. So every time {{AllocatorPerContext#confChanged}} 
> is called by {{ShuffleHandler}} after {{LocalDirsHandlerService}}, 
> {{AllocatorPerContext}} need be reinitialized because {{NM_LOCAL_DIRS}} value 
> is changed. This will cause some overhead.
> {code}
>   String newLocalDirs = conf.get(contextCfgItemName);
>   if (!newLocalDirs.equals(savedLocalDirs)) {
> {code}
> So it will be a good improvement to not share the same 
> {{AllocatorPerContext}} instance between {{ShuffleHandler}} and 
> {{LocalDirsHandlerService}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4095) Avoid sharing AllocatorPerContext object in LocalDirAllocator between ShuffleHandler and LocalDirsHandlerService.

2015-08-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14722304#comment-14722304
 ] 

Hadoop QA commented on YARN-4095:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 33s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 51s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 17s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 24s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 53s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |   0m 22s | Tests failed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   7m 34s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  50m 35s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.conf.TestYarnConfigurationFields |
|   | hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12753220/YARN-4095.000.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / cf83156 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8948/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8948/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8948/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8948/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8948/console |


This message was automatically generated.

> Avoid sharing AllocatorPerContext object in LocalDirAllocator between 
> ShuffleHandler and LocalDirsHandlerService.
> -
>
> Key: YARN-4095
> URL: https://issues.apache.org/jira/browse/YARN-4095
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4095.000.patch
>
>
> Currently {{ShuffleHandler}} and {{LocalDirsHandlerService}} share 
> {{AllocatorPerContext}} object in {{LocalDirAllocator}} for configuration 
> {{NM_LOCAL_DIRS}} because {{AllocatorPerContext}} are stored in a static 
> TreeMap with configuration name as key
> {code}
>   private static Map  contexts = 
>  new TreeMap();
> {code}
> {{LocalDirsHandlerService}} and {{ShuffleHandler}} both create a 
> {{LocalDirAllocator}} using {{NM_LOCAL_DIRS}}. Even they don't use the same 
> {{Configuration}} object, but they will use the same {{AllocatorPerContext}} 
> object. Also {{LocalDirsHandlerService}} may change {{NM_LOCAL_DIRS}} value 
> in its {{Configuration}} object to exclude full and bad local dirs, 
> {{ShuffleHandler}} always uses the original {{NM_LOCAL_DIRS}} value in its 
> {{Configuration}} object. So every time {{AllocatorPerContext#confChanged}} 
> is called by {{ShuffleHandler}} after {{LocalDirsHandlerService}}, 
> {{AllocatorPerContext}} need be reinitialized because {{NM_LOCAL_DIRS}} value 
> is changed. This will cause some overhead.
> {code}
>   String newLocalDirs = conf.get(contextCfgItemName);
>   if (!newLocalDirs.equals(savedLocalDirs)) {
> {code}
> So it will be a good improvement to not share the same 
> {{AllocatorPerContext}} instance between {{ShuffleHandler}} and 
> {{LocalDirsHandlerService}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2945) FSLeafQueue#assignContainer - document the reason for using both write and read locks

2015-08-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14722303#comment-14722303
 ] 

Hudson commented on YARN-2945:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2271 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2271/])
YARN-2945. Fixing the CHANGES.txt to have the right JIRA number. (vinodkv: rev 
cf831565e8344523e1bd0eaf686ed56a2b48b920)
* hadoop-yarn-project/CHANGES.txt


> FSLeafQueue#assignContainer - document the reason for using both write and 
> read locks
> -
>
> Key: YARN-2945
> URL: https://issues.apache.org/jira/browse/YARN-2945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
> Fix For: 2.7.0
>
> Attachments: YARN-2945.001.patch, YARN-2945.002.patch
>
>
> After YARN-2910, assignContainer hold WriteLock while sorting and ReadLock 
> while referencing runnableApps. This can cause interrupted assignment of 
> containers regardless of the policy.
> {code}
> writeLock.lock();
> try {
>   Collections.sort(runnableApps, comparator);
> } finally {
>   writeLock.unlock();
> }
> readLock.lock();
> try {
>   for (FSAppAttempt sched : runnableApps) {
> if (SchedulerAppUtils.isBlacklisted(sched, node, LOG)) {
>   continue;
> }
> assigned = sched.assignContainer(node);
> if (!assigned.equals(Resources.none())) {
>   break;
> }
>}
> } finally {
>   readLock.unlock();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2997) NM keeps sending already-sent completed containers to RM until containers are removed from context

2015-08-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2997:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestNodeStatusUpdater before the 
push. Patch applied cleanly.


> NM keeps sending already-sent completed containers to RM until containers are 
> removed from context
> --
>
> Key: YARN-2997
> URL: https://issues.apache.org/jira/browse/YARN-2997
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.4.patch, 
> YARN-2997.5.patch, YARN-2997.patch
>
>
> We have seen in RM log a lot of
> {quote}
> INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> {quote}
> It is caused by NM sending completed containers repeatedly until the app is 
> finished. On the RM side, the container is already released, hence 
> {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2922) ConcurrentModificationException in CapacityScheduler's LeafQueue

2015-08-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2922:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestLeafQueue before the push. 
Patch applied cleanly.


> ConcurrentModificationException in CapacityScheduler's LeafQueue
> 
>
> Key: YARN-2922
> URL: https://issues.apache.org/jira/browse/YARN-2922
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager, scheduler
>Affects Versions: 2.5.1
>Reporter: Jason Tufo
>Assignee: Rohith Sharma K S
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: 0001-YARN-2922.patch, 0001-YARN-2922.patch
>
>
> java.util.ConcurrentModificationException
> at 
> java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115)
> at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.collectSchedulerApplications(LeafQueue.java:1618)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getAppsInQueue(CapacityScheduler.java:1119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:798)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:234)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at 
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at 
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2992) ZKRMStateStore crashes due to session expiry

2015-08-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2992:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation before the push. Patch applied cleanly.


> ZKRMStateStore crashes due to session expiry
> 
>
> Key: YARN-2992
> URL: https://issues.apache.org/jira/browse/YARN-2992
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: yarn-2992-1.patch
>
>
> We recently saw the RM crash with the following stacktrace. On session 
> expiry, we should gracefully transition to standby. 
> {noformat}
> 2014-12-18 06:28:42,689 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
> org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause: 
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired 
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) 
> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) 
> at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:941)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:958)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:687)
>  
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2015-08-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2340:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestWorkPreservingRMRestart before 
the push. Patch applied cleanly.

> NPE thrown when RM restart after queue is STOPPED. There after RM can not 
> recovery application's and remain in standby
> --
>
> Key: YARN-2340
> URL: https://issues.apache.org/jira/browse/YARN-2340
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.4.1
> Environment: Capacityscheduler with Queue a, b
>Reporter: Nishan Shetty
>Assignee: Rohith Sharma K S
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: 0001-YARN-2340.patch
>
>
> While job is in progress make Queue  state as STOPPED and then restart RM 
> Observe that standby RM fails to come up as acive throwing below NPE
> 2014-07-23 18:43:24,432 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
> 2014-07-23 18:43:24,433 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_ADDED to the scheduler
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
>  at java.lang.Thread.run(Thread.java:662)
> 2014-07-23 18:43:24,434 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2952) Incorrect version check in RMStateStore

2015-08-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2952:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and the tests TestFSRMStateStore, 
TestZKRMStateStore before the push. Patch applied cleanly.



> Incorrect version check in RMStateStore
> ---
>
> Key: YARN-2952
> URL: https://issues.apache.org/jira/browse/YARN-2952
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Rohith Sharma K S
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: 0001-YARN-2952.patch
>
>
> In RMStateStore#checkVersion:  if we modify  tCURRENT_VERSION_INFO to 2.0, 
> it'll still store the version as 1.0 which is incorrect; The same thing might 
> happen to NM store, timeline store.
> {code}
> // if there is no version info, treat it as 1.0;
> if (loadedVersion == null) {
>   loadedVersion = Version.newInstance(1, 0);
> }
> if (loadedVersion.isCompatibleTo(getCurrentVersion())) {
>   LOG.info("Storing RM state version info " + getCurrentVersion());
>   storeVersion();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1984) LeveldbTimelineStore does not handle db exceptions properly

2015-08-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1984:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1 as a dependency for YARN-2952. Ran compilation and 
TestLeveldbTimelineStore before the push. Patch applied cleanly.


> LeveldbTimelineStore does not handle db exceptions properly
> ---
>
> Key: YARN-1984
> URL: https://issues.apache.org/jira/browse/YARN-1984
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-1984.001.patch, YARN-1984.002.patch, YARN-1984.patch
>
>
> The org.iq80.leveldb.DB and DBIterator methods throw runtime exceptions 
> rather than IOException which can easily leak up the stack and kill threads 
> (e.g.: the deletion thread).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1984) LeveldbTimelineStore does not handle db exceptions properly

2015-08-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1984:
--
Labels: 2.6.1-candidate  (was: )

> LeveldbTimelineStore does not handle db exceptions properly
> ---
>
> Key: YARN-1984
> URL: https://issues.apache.org/jira/browse/YARN-1984
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-1984.001.patch, YARN-1984.002.patch, YARN-1984.patch
>
>
> The org.iq80.leveldb.DB and DBIterator methods throw runtime exceptions 
> rather than IOException which can easily leak up the stack and kill threads 
> (e.g.: the deletion thread).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4095) Avoid sharing AllocatorPerContext object in LocalDirAllocator between ShuffleHandler and LocalDirsHandlerService.

2015-08-30 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721925#comment-14721925
 ] 

zhihai xu commented on YARN-4095:
-

I attached a patch YARN-4095.000.patch, which used a new configuration 
NM_GOOD_LOCAL_DIRS to create {{LocalDirAllocator}} in 
{{LocalDirsHandlerService}} to store the good local dirs. So we can avoid using 
the same configuration name to create {{LocalDirAllocator}} between 
{{ShuffleHandler}} and {{LocalDirsHandlerService}}. I also created a new 
configuration NM_GOOD_LOG_DIRS to match NM_GOOD_LOCAL_DIRS.

> Avoid sharing AllocatorPerContext object in LocalDirAllocator between 
> ShuffleHandler and LocalDirsHandlerService.
> -
>
> Key: YARN-4095
> URL: https://issues.apache.org/jira/browse/YARN-4095
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4095.000.patch
>
>
> Currently {{ShuffleHandler}} and {{LocalDirsHandlerService}} share 
> {{AllocatorPerContext}} object in {{LocalDirAllocator}} for configuration 
> {{NM_LOCAL_DIRS}} because {{AllocatorPerContext}} are stored in a static 
> TreeMap with configuration name as key
> {code}
>   private static Map  contexts = 
>  new TreeMap();
> {code}
> {{LocalDirsHandlerService}} and {{ShuffleHandler}} both create a 
> {{LocalDirAllocator}} using {{NM_LOCAL_DIRS}}. Even they don't use the same 
> {{Configuration}} object, but they will use the same {{AllocatorPerContext}} 
> object. Also {{LocalDirsHandlerService}} may change {{NM_LOCAL_DIRS}} value 
> in its {{Configuration}} object to exclude full and bad local dirs, 
> {{ShuffleHandler}} always uses the original {{NM_LOCAL_DIRS}} value in its 
> {{Configuration}} object. So every time {{AllocatorPerContext#confChanged}} 
> is called by {{ShuffleHandler}} after {{LocalDirsHandlerService}}, 
> {{AllocatorPerContext}} need be reinitialized because {{NM_LOCAL_DIRS}} value 
> is changed. This will cause some overhead.
> {code}
>   String newLocalDirs = conf.get(contextCfgItemName);
>   if (!newLocalDirs.equals(savedLocalDirs)) {
> {code}
> So it will be a good improvement to not share the same 
> {{AllocatorPerContext}} instance between {{ShuffleHandler}} and 
> {{LocalDirsHandlerService}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-842) Resource Manager & Node Manager UI's doesn't work with IE

2015-08-30 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K resolved YARN-842.

Resolution: Not A Problem

It is working fine in the latest, closing it now. Please reopen if you still 
see this issue. Thanks.

> Resource Manager & Node Manager UI's doesn't work with IE
> -
>
> Key: YARN-842
> URL: https://issues.apache.org/jira/browse/YARN-842
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.0.4-alpha
>Reporter: Devaraj K
>
> {code:xml}
> Webpage error details
> User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; 
> SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media 
> Center PC 6.0)
> Timestamp: Mon, 17 Jun 2013 12:06:03 UTC
> Message: 'JSON' is undefined
> Line: 41
> Char: 218
> Code: 0
> URI: http://10.18.40.24:8088/cluster/apps
> {code}
> RM & NM UI's are not working with IE and showing the above error for every 
> link on the UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4095) Avoid sharing AllocatorPerContext object in LocalDirAllocator between ShuffleHandler and LocalDirsHandlerService.

2015-08-30 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4095:

Description: 
Currently {{ShuffleHandler}} and {{LocalDirsHandlerService}} share 
{{AllocatorPerContext}} object in {{LocalDirAllocator}} for configuration 
{{NM_LOCAL_DIRS}} because {{AllocatorPerContext}} are stored in a static 
TreeMap with configuration name as key
{code}
  private static Map  contexts = 
 new TreeMap();
{code}
{{LocalDirsHandlerService}} and {{ShuffleHandler}} both create a 
{{LocalDirAllocator}} using {{NM_LOCAL_DIRS}}. Even they don't use the same 
{{Configuration}} object, but they will use the same {{AllocatorPerContext}} 
object. Also {{LocalDirsHandlerService}} may change {{NM_LOCAL_DIRS}} value in 
its {{Configuration}} object to exclude full and bad local dirs, 
{{ShuffleHandler}} always uses the original {{NM_LOCAL_DIRS}} value in its 
{{Configuration}} object. So every time {{AllocatorPerContext#confChanged}} is 
called by {{ShuffleHandler}} after {{LocalDirsHandlerService}}, 
{{AllocatorPerContext}} need be reinitialized because {{NM_LOCAL_DIRS}} value 
is changed. This will cause some overhead.
{code}
  String newLocalDirs = conf.get(contextCfgItemName);
  if (!newLocalDirs.equals(savedLocalDirs)) {
{code}
So it will be a good improvement to not share the same {{AllocatorPerContext}} 
instance between {{ShuffleHandler}} and {{LocalDirsHandlerService}}.


  was:
Currently {{ShuffleHandler}} and {{LocalDirsHandlerService}} share 
{{AllocatorPerContext}} object in {{LocalDirAllocator}} for configuration 
{{NM_LOCAL_DIRS}} because {{AllocatorPerContext}}s are stored in a static 
TreeMap with configuration name as key
{code}
  private static Map  contexts = 
 new TreeMap();
{code}
{{LocalDirsHandlerService}} and {{ShuffleHandler}} both create a 
{{LocalDirAllocator}} using {{NM_LOCAL_DIRS}}. Even they don't use the same 
{{Configuration}} object, but they will use the same {{AllocatorPerContext}} 
object. Also {{LocalDirsHandlerService}} may change {{NM_LOCAL_DIRS}} value in 
its {{Configuration}} object to exclude full and bad local dirs, 
{{ShuffleHandler}} always uses the original {{NM_LOCAL_DIRS}} value in its 
{{Configuration}} object. So every time {{AllocatorPerContext#confChanged}} is 
called by {{ShuffleHandler}} after {{LocalDirsHandlerService}}, 
{{AllocatorPerContext}} need be reinitialized because {{NM_LOCAL_DIRS}} value 
is changed. This will cause some overhead.
{code}
  String newLocalDirs = conf.get(contextCfgItemName);
  if (!newLocalDirs.equals(savedLocalDirs)) {
{code}
So it will be a good improvement to not share the same {{AllocatorPerContext}} 
instance between {{ShuffleHandler}} and {{LocalDirsHandlerService}}.



> Avoid sharing AllocatorPerContext object in LocalDirAllocator between 
> ShuffleHandler and LocalDirsHandlerService.
> -
>
> Key: YARN-4095
> URL: https://issues.apache.org/jira/browse/YARN-4095
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4095.000.patch
>
>
> Currently {{ShuffleHandler}} and {{LocalDirsHandlerService}} share 
> {{AllocatorPerContext}} object in {{LocalDirAllocator}} for configuration 
> {{NM_LOCAL_DIRS}} because {{AllocatorPerContext}} are stored in a static 
> TreeMap with configuration name as key
> {code}
>   private static Map  contexts = 
>  new TreeMap();
> {code}
> {{LocalDirsHandlerService}} and {{ShuffleHandler}} both create a 
> {{LocalDirAllocator}} using {{NM_LOCAL_DIRS}}. Even they don't use the same 
> {{Configuration}} object, but they will use the same {{AllocatorPerContext}} 
> object. Also {{LocalDirsHandlerService}} may change {{NM_LOCAL_DIRS}} value 
> in its {{Configuration}} object to exclude full and bad local dirs, 
> {{ShuffleHandler}} always uses the original {{NM_LOCAL_DIRS}} value in its 
> {{Configuration}} object. So every time {{AllocatorPerContext#confChanged}} 
> is called by {{ShuffleHandler}} after {{LocalDirsHandlerService}}, 
> {{AllocatorPerContext}} need be reinitialized because {{NM_LOCAL_DIRS}} value 
> is changed. This will cause some overhead.
> {code}
>   String newLocalDirs = conf.get(contextCfgItemName);
>   if (!newLocalDirs.equals(savedLocalDirs)) {
> {code}
> So it will be a good improvement to not share the same 
> {{AllocatorPerContext}} instance between {{ShuffleHandler}} and 
> {{LocalDirsHandlerService}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2945) FSLeafQueue#assignContainer - document the reason for using both write and read locks

2015-08-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721915#comment-14721915
 ] 

Hudson commented on YARN-2945:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8371 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8371/])
YARN-2945. Fixing the CHANGES.txt to have the right JIRA number. (vinodkv: rev 
cf831565e8344523e1bd0eaf686ed56a2b48b920)
* hadoop-yarn-project/CHANGES.txt


> FSLeafQueue#assignContainer - document the reason for using both write and 
> read locks
> -
>
> Key: YARN-2945
> URL: https://issues.apache.org/jira/browse/YARN-2945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
> Fix For: 2.7.0
>
> Attachments: YARN-2945.001.patch, YARN-2945.002.patch
>
>
> After YARN-2910, assignContainer hold WriteLock while sorting and ReadLock 
> while referencing runnableApps. This can cause interrupted assignment of 
> containers regardless of the policy.
> {code}
> writeLock.lock();
> try {
>   Collections.sort(runnableApps, comparator);
> } finally {
>   writeLock.unlock();
> }
> readLock.lock();
> try {
>   for (FSAppAttempt sched : runnableApps) {
> if (SchedulerAppUtils.isBlacklisted(sched, node, LOG)) {
>   continue;
> }
> assigned = sched.assignContainer(node);
> if (!assigned.equals(Resources.none())) {
>   break;
> }
>}
> } finally {
>   readLock.unlock();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4095) Avoid sharing AllocatorPerContext object in LocalDirAllocator between ShuffleHandler and LocalDirsHandlerService.

2015-08-30 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4095:

Attachment: YARN-4095.000.patch

> Avoid sharing AllocatorPerContext object in LocalDirAllocator between 
> ShuffleHandler and LocalDirsHandlerService.
> -
>
> Key: YARN-4095
> URL: https://issues.apache.org/jira/browse/YARN-4095
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4095.000.patch
>
>
> Currently {{ShuffleHandler}} and {{LocalDirsHandlerService}} share 
> {{AllocatorPerContext}} object in {{LocalDirAllocator}} for configuration 
> {{NM_LOCAL_DIRS}} because {{AllocatorPerContext}}s are stored in a static 
> TreeMap with configuration name as key
> {code}
>   private static Map  contexts = 
>  new TreeMap();
> {code}
> {{LocalDirsHandlerService}} and {{ShuffleHandler}} both create a 
> {{LocalDirAllocator}} using {{NM_LOCAL_DIRS}}. Even they don't use the same 
> {{Configuration}} object, but they will use the same {{AllocatorPerContext}} 
> object. Also {{LocalDirsHandlerService}} may change {{NM_LOCAL_DIRS}} value 
> in its {{Configuration}} object to exclude full and bad local dirs, 
> {{ShuffleHandler}} always uses the original {{NM_LOCAL_DIRS}} value in its 
> {{Configuration}} object. So every time {{AllocatorPerContext#confChanged}} 
> is called by {{ShuffleHandler}} after {{LocalDirsHandlerService}}, 
> {{AllocatorPerContext}} need be reinitialized because {{NM_LOCAL_DIRS}} value 
> is changed. This will cause some overhead.
> {code}
>   String newLocalDirs = conf.get(contextCfgItemName);
>   if (!newLocalDirs.equals(savedLocalDirs)) {
> {code}
> So it will be a good improvement to not share the same 
> {{AllocatorPerContext}} instance between {{ShuffleHandler}} and 
> {{LocalDirsHandlerService}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2015-08-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2964:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestDelegationTokenRenewer before 
the push. Patch applied cleanly.

> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch
>
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1556) NPE getting application report with a null appId

2015-08-30 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721900#comment-14721900
 ] 

Weiwei Yang commented on YARN-1556:
---

Thanks [~djp]

> NPE getting application report with a null appId
> 
>
> Key: YARN-1556
> URL: https://issues.apache.org/jira/browse/YARN-1556
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.1
>Reporter: Steve Loughran
>Assignee: Weiwei Yang
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-1556.patch
>
>
> If you accidentally pass in a null appId to get application report, you get 
> an NPE back. This is arguably as intended, except that maybe a guard 
> statement could report this in such a way as to make it easy for callers to 
> track down the cause.
> {code}
> java.lang.NullPointerException: java.lang.NullPointerException
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137)
>   ... 28 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4095) Avoid sharing AllocatorPerContext object in LocalDirAllocator between ShuffleHandler and LocalDirsHandlerService.

2015-08-30 Thread zhihai xu (JIRA)
zhihai xu created YARN-4095:
---

 Summary: Avoid sharing AllocatorPerContext object in 
LocalDirAllocator between ShuffleHandler and LocalDirsHandlerService.
 Key: YARN-4095
 URL: https://issues.apache.org/jira/browse/YARN-4095
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: zhihai xu
Assignee: zhihai xu


Currently {{ShuffleHandler}} and {{LocalDirsHandlerService}} share 
{{AllocatorPerContext}} object in {{LocalDirAllocator}} for configuration 
{{NM_LOCAL_DIRS}} because {{AllocatorPerContext}}s are stored in a static 
TreeMap with configuration name as key
{code}
  private static Map  contexts = 
 new TreeMap();
{code}
{{LocalDirsHandlerService}} and {{ShuffleHandler}} both create a 
{{LocalDirAllocator}} using {{NM_LOCAL_DIRS}}. Even they don't use the same 
{{Configuration}} object, but they will use the same {{AllocatorPerContext}} 
object. Also {{LocalDirsHandlerService}} may change {{NM_LOCAL_DIRS}} value in 
its {{Configuration}} object to exclude full and bad local dirs, 
{{ShuffleHandler}} always uses the original {{NM_LOCAL_DIRS}} value in its 
{{Configuration}} object. So every time {{AllocatorPerContext#confChanged}} is 
called by {{ShuffleHandler}} after {{LocalDirsHandlerService}}, 
{{AllocatorPerContext}} need be reinitialized because {{NM_LOCAL_DIRS}} value 
is changed. This will cause some overhead.
{code}
  String newLocalDirs = conf.get(contextCfgItemName);
  if (!newLocalDirs.equals(savedLocalDirs)) {
{code}
So it will be a good improvement to not share the same {{AllocatorPerContext}} 
instance between {{ShuffleHandler}} and {{LocalDirsHandlerService}}.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-08-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721860#comment-14721860
 ] 

Hadoop QA commented on YARN-4092:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 34s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 44s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 55s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 55s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   3m 56s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   6m 57s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  53m 31s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 106m 57s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12753079/YARN-4092.3.patch |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | trunk / 837fb75 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8947/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8947/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8947/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8947/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8947/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8947/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8947/console |


This message was automatically generated.

> RM HA UI redirection needs to be fixed when both RMs are in standby mode
> 
>
> Key: YARN-4092
> URL: https://issues.apache.org/jira/browse/YARN-4092
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4092.1.patch, YARN-4092.2.patch, YARN-4092.3.patch
>
>
> In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
> accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment

2015-08-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721780#comment-14721780
 ] 

Wangda Tan commented on YARN-2801:
--

Sorry I missed this comment, thanks [~Naganarasimha] addressing them and review 
from [~ozawa]!

> Documentation development for Node labels requirment
> 
>
> Key: YARN-2801
> URL: https://issues.apache.org/jira/browse/YARN-2801
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Gururaj Shetty
>Assignee: Wangda Tan
> Attachments: YARN-2801.1.patch, YARN-2801.2.patch, YARN-2801.3.patch, 
> YARN-2801.4.patch
>
>
> Documentation needs to be developed for the node label requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2917) Potential deadlock in AsyncDispatcher when system.exit called in AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook

2015-08-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2917:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation before the push. Patch applied cleanly.


> Potential deadlock in AsyncDispatcher when system.exit called in 
> AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook
> 
>
> Key: YARN-2917
> URL: https://issues.apache.org/jira/browse/YARN-2917
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: 0001-YARN-2917.patch, 0002-YARN-2917.patch
>
>
> I encoutered scenario where RM hanged while shutting down and keep on logging 
> {{2014-12-03 19:32:44,283 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Waiting for AsyncDispatcher to drain.}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2910) FSLeafQueue can throw ConcurrentModificationException

2015-08-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2910:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestFSLeafQueue before the push. 
Patch applied cleanly.

> FSLeafQueue can throw ConcurrentModificationException
> -
>
> Key: YARN-2910
> URL: https://issues.apache.org/jira/browse/YARN-2910
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.5.2
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: FSLeafQueue_concurrent_exception.txt, 
> YARN-2910.004.patch, YARN-2910.1.patch, YARN-2910.2.patch, YARN-2910.3.patch, 
> YARN-2910.4.patch, YARN-2910.5.patch, YARN-2910.6.patch, YARN-2910.7.patch, 
> YARN-2910.8.patch, YARN-2910.patch
>
>
> The list that maintains the runnable and the non runnable apps are a standard 
> ArrayList but there is no guarantee that it will only be manipulated by one 
> thread in the system. This can lead to the following exception:
> {noformat}
> 2014-11-12 02:29:01,169 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
> CONTACTING RM.
> java.util.ConcurrentModificationException: 
> java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)
> at java.util.ArrayList$Itr.next(ArrayList.java:831)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java:147)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getHeadroom(FSAppAttempt.java:180)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:923)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:516)
> {noformat}
> Full stack trace in the attached file.
> We should guard against that by using a thread safe version from 
> java.util.concurrent.CopyOnWriteArrayList



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2874) Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further apps

2015-08-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2874:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation before the push. Patch applied cleanly.


> Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further 
> apps
> -
>
> Key: YARN-2874
> URL: https://issues.apache.org/jira/browse/YARN-2874
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0, 2.5.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-2874.20141118-1.patch, YARN-2874.20141118-2.patch
>
>
> When token renewal fails and the application finishes this dead lock can occur
> Jstack dump :
> {quote}
> Found one Java-level deadlock:
> =
> "DelegationTokenRenewer #181865":
>   waiting to lock monitor 0x00900918 (object 0xc18a9998, a 
> java.util.Collections$SynchronizedSet),
>   which is held by "DelayedTokenCanceller"
> "DelayedTokenCanceller":
>   waiting to lock monitor 0x04141718 (object 0xc7eae720, a 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask),
>   which is held by "Timer-4"
> "Timer-4":
>   waiting to lock monitor 0x00900918 (object 0xc18a9998, a 
> java.util.Collections$SynchronizedSet),
>   which is held by "DelayedTokenCanceller"
>  
> Java stack information for the threads listed above:
> ===
> "DelegationTokenRenewer #181865":
> at java.util.Collections$SynchronizedCollection.add(Collections.java:1636)
> - waiting to lock <0xc18a9998> (a 
> java.util.Collections$SynchronizedSet)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.addTokenToList(DelegationTokenRenewer.java:322)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:398)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$500(DelegationTokenRenewer.java:70)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> "DelayedTokenCanceller":
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.cancel(DelegationTokenRenewer.java:443)
> - waiting to lock <0xc7eae720> (a 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeApplicationFromRenewal(DelegationTokenRenewer.java:558)
> - locked <0xc18a9998> (a java.util.Collections$SynchronizedSet)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$300(DelegationTokenRenewer.java:70)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelayedTokenRemovalRunnable.run(DelegationTokenRenewer.java:599)
> at java.lang.Thread.run(Thread.java:745)
> "Timer-4":
> at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
> - waiting to lock <0xc18a9998> (a 
> java.util.Collections$SynchronizedSet)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeFailedDelegationToken(DelegationTokenRenewer.java:503)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$100(DelegationTokenRenewer.java:70)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.run(DelegationTokenRenewer.java:437)
> - locked <0xc7eae720> (a 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
>  
> Found 1 deadlock.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2894) When ACL's are enabled, if RM switches then application can not be viewed from web.

2015-08-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2894:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran into a couple of minor import issues in a couple of 
classes, fixed them.

Pushed the patch after running compilation and running the tests 
TestRMWebServices,TestRMWebServicesApps,TestRMWebServicesAppsModification,TestRMWebServicesCapacitySched,TestRMWebServicesDelegationTokens,TestRMWebServicesFairScheduler,TestRMWebServicesNodeLabels
 and TestRMWebServicesNodes.

> When ACL's are enabled, if RM switches then application can not be viewed 
> from web.
> ---
>
> Key: YARN-2894
> URL: https://issues.apache.org/jira/browse/YARN-2894
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-2894.1.patch, YARN-2894.patch
>
>
> Binding aclManager to RMWebApp would cause problem if RM is switched. There 
> could be some validation check may fail.
> I think , we should not bind aclManager for RMWebApp, instead we should get 
> from RM instance.
> In RMWebApp,
> {code}
> if (rm != null) {
>   bind(ResourceManager.class).toInstance(rm);
>   bind(RMContext.class).toInstance(rm.getRMContext());
>   bind(ApplicationACLsManager.class).toInstance(
>   rm.getApplicationACLsManager());
>   bind(QueueACLsManager.class).toInstance(rm.getQueueACLsManager());
> }
> {code}
> and in AppBlock#render below check may fail(Need to test and confirm)
> {code}
>if (callerUGI != null
> && !(this.aclsManager.checkAccess(callerUGI,
> ApplicationAccessType.VIEW_APP, app.getUser(), appID) ||
>  this.queueACLsManager.checkAccess(callerUGI,
> QueueACL.ADMINISTER_QUEUE, app.getQueue( {
>   puts("You (User " + remoteUser
>   + ") are not authorized to view application " + appID);
>   return;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup

2015-08-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721647#comment-14721647
 ] 

Hadoop QA commented on YARN-2729:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 52s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 43s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 52s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 51s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 21s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   7m 47s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  55m 52s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12753202/YARN-2729.20150830-1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 837fb75 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8946/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8946/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8946/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8946/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8946/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8946/console |


This message was automatically generated.

> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup
> ---
>
> Key: YARN-2729
> URL: https://issues.apache.org/jira/browse/YARN-2729
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, 
> YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, 
> YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, 
> YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, 
> YARN-2729.20150402-1.patch, YARN-2729.20150404-1.patch, 
> YARN-2729.20150517-1.patch, YARN-2729.20150830-1.patch
>
>
> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup

2015-08-30 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2729:

Attachment: YARN-2729.20150830-1.patch

Attaching a patch to sync with the changes of YARN-2923.

> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup
> ---
>
> Key: YARN-2729
> URL: https://issues.apache.org/jira/browse/YARN-2729
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, 
> YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, 
> YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, 
> YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, 
> YARN-2729.20150402-1.patch, YARN-2729.20150404-1.patch, 
> YARN-2729.20150517-1.patch, YARN-2729.20150830-1.patch
>
>
> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4094) Add Configration to support encryption of Distributed Cache Data

2015-08-30 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-4094:
---
Fix Version/s: (was: 2.7.2)

> Add Configration to support encryption of Distributed Cache Data
> 
>
> Key: YARN-4094
> URL: https://issues.apache.org/jira/browse/YARN-4094
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Vijay Singh
>
> Currently Ditributed cache does not allow mechanism to encrypt the data that 
> gets copied over during processing. One attack vector is to process small 
> files that contain sensitive data to use this mechanism to access contents of 
> small files. 
> This requests aims to counter that by providing for configuration at service 
> level that lets yarn encrypt all the data that gets to cache on each node. 
> Yarn components should encrypt while copying the data on to disk and decrypt 
> during the processing. Lets start by leveraging the symmetric key mechanism 
> used by HDFS transparent mechanism similar to DEK (Data Encryption key) that 
> could be generated as part of the process.
> Next step could be to setup Encryption zone key similar to transperent 
> encryption mechanism.
> Please suggest if there is a better way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4094) Add Configration to support encryption of Distributed Cache Data

2015-08-30 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-4094:
---
Target Version/s:   (was: 2.6.0, 2.7.1)

> Add Configration to support encryption of Distributed Cache Data
> 
>
> Key: YARN-4094
> URL: https://issues.apache.org/jira/browse/YARN-4094
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Vijay Singh
>
> Currently Ditributed cache does not allow mechanism to encrypt the data that 
> gets copied over during processing. One attack vector is to process small 
> files that contain sensitive data to use this mechanism to access contents of 
> small files. 
> This requests aims to counter that by providing for configuration at service 
> level that lets yarn encrypt all the data that gets to cache on each node. 
> Yarn components should encrypt while copying the data on to disk and decrypt 
> during the processing. Lets start by leveraging the symmetric key mechanism 
> used by HDFS transparent mechanism similar to DEK (Data Encryption key) that 
> could be generated as part of the process.
> Next step could be to setup Encryption zone key similar to transperent 
> encryption mechanism.
> Please suggest if there is a better way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment

2015-08-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721448#comment-14721448
 ] 

Hadoop QA commented on YARN-2801:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   2m 57s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   3m  0s | Site still builds. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| | |   6m 20s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12753173/YARN-2801.4.patch |
| Optional Tests | site |
| git revision | trunk / 837fb75 |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8945/console |


This message was automatically generated.

> Documentation development for Node labels requirment
> 
>
> Key: YARN-2801
> URL: https://issues.apache.org/jira/browse/YARN-2801
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Gururaj Shetty
>Assignee: Wangda Tan
> Attachments: YARN-2801.1.patch, YARN-2801.2.patch, YARN-2801.3.patch, 
> YARN-2801.4.patch
>
>
> Documentation needs to be developed for the node label requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4094) Add Configration to support encryption of Distributed Cache Data

2015-08-30 Thread Vijay Singh (JIRA)
Vijay Singh created YARN-4094:
-

 Summary: Add Configration to support encryption of Distributed 
Cache Data
 Key: YARN-4094
 URL: https://issues.apache.org/jira/browse/YARN-4094
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7.0, 2.6.0
Reporter: Vijay Singh
 Fix For: 2.7.2


Currently Ditributed cache does not allow mechanism to encrypt the data that 
gets copied over during processing. One attack vector is to process small files 
that contain sensitive data to use this mechanism to access contents of small 
files. 
This requests aims to counter that by providing for configuration at service 
level that lets yarn encrypt all the data that gets to cache on each node. Yarn 
components should encrypt while copying the data on to disk and decrypt during 
the processing. Lets start by leveraging the symmetric key mechanism used by 
HDFS transparent mechanism similar to DEK (Data Encryption key) that could be 
generated as part of the process.
Next step could be to setup Encryption zone key similar to transperent 
encryption mechanism.
Please suggest if there is a better way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2801) Documentation development for Node labels requirment

2015-08-30 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2801:

Attachment: YARN-2801.4.patch

Thanks [~ozawa], for the comments. [~leftnoteasy] as it was pending for a while 
and it was only small corrections, also i had to update document for 
distributed NodeLabels on top of this hence i have given a patch for this jira.
[~ozawa], have corrected most of your comments but just for few where in user 
was addressed in singular " A User", as i felt existing was fine.

> Documentation development for Node labels requirment
> 
>
> Key: YARN-2801
> URL: https://issues.apache.org/jira/browse/YARN-2801
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Gururaj Shetty
>Assignee: Wangda Tan
> Attachments: YARN-2801.1.patch, YARN-2801.2.patch, YARN-2801.3.patch, 
> YARN-2801.4.patch
>
>
> Documentation needs to be developed for the node label requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)