[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723049#comment-14723049 ] Kishore Chaliparambil commented on YARN-2884: - Thanks [~jianhe]. I will address these comments and upload the patch. Also as you suggested, I think I will create a new Jira for simulating the token renewal behavior in the proxy service since it might take more time. > Proxying all AM-RM communications > - > > Key: YARN-2884 > URL: https://issues.apache.org/jira/browse/YARN-2884 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Kishore Chaliparambil > Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, > YARN-2884-V11.patch, YARN-2884-V2.patch, YARN-2884-V3.patch, > YARN-2884-V4.patch, YARN-2884-V5.patch, YARN-2884-V6.patch, > YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch > > > We introduce the notion of an RMProxy, running on each node (or once per > rack). Upon start the AM is forced (via tokens and configuration) to direct > all its requests to a new services running on the NM that provide a proxy to > the central RM. > This give us a place to: > 1) perform distributed scheduling decisions > 2) throttling mis-behaving AMs > 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723023#comment-14723023 ] Tsuyoshi Ozawa commented on YARN-3798: -- zhihai, Thanks a lot. [~vinodkv] cc: [~jianhe] please notify us if we need to update the patch. I think it's ready. > ZKRMStateStore shouldn't create new session without occurrance of > SESSIONEXPIED > --- > > Key: YARN-3798 > URL: https://issues.apache.org/jira/browse/YARN-3798 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 >Reporter: Bibin A Chundatt >Assignee: Varun Saxena >Priority: Blocker > Labels: 2.6.1-candidate > Attachments: RM.log, YARN-3798-2.7.002.patch, > YARN-3798-branch-2.6.01.patch, YARN-3798-branch-2.7.002.patch, > YARN-3798-branch-2.7.003.patch, YARN-3798-branch-2.7.004.patch, > YARN-3798-branch-2.7.005.patch, YARN-3798-branch-2.7.006.patch, > YARN-3798-branch-2.7.patch > > > RM going down with NoNode exception during create of znode for appattempt > *Please find the exception logs* > {code} > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-09 10:09:44,886 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Exception while executing a ZK operation. > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) > at java.lang.Thread.run(Thread.java:745) > 2015-06-09 10:09:44,887 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed > out ZK retries. Giving up! > 2015-06-09 10:09:44,887 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error > updating appAttempt: appattempt_1433764310492_7152_01 > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at o
[jira] [Updated] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-3798: - Target Version/s: 2.6.1, 2.7.2 (was: 2.7.2) > ZKRMStateStore shouldn't create new session without occurrance of > SESSIONEXPIED > --- > > Key: YARN-3798 > URL: https://issues.apache.org/jira/browse/YARN-3798 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 >Reporter: Bibin A Chundatt >Assignee: Varun Saxena >Priority: Blocker > Labels: 2.6.1-candidate > Attachments: RM.log, YARN-3798-2.7.002.patch, > YARN-3798-branch-2.6.01.patch, YARN-3798-branch-2.7.002.patch, > YARN-3798-branch-2.7.003.patch, YARN-3798-branch-2.7.004.patch, > YARN-3798-branch-2.7.005.patch, YARN-3798-branch-2.7.006.patch, > YARN-3798-branch-2.7.patch > > > RM going down with NoNode exception during create of znode for appattempt > *Please find the exception logs* > {code} > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-09 10:09:44,886 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Exception while executing a ZK operation. > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) > at java.lang.Thread.run(Thread.java:745) > 2015-06-09 10:09:44,887 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed > out ZK retries. Giving up! > 2015-06-09 10:09:44,887 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error > updating appAttempt: appattempt_1433764310492_7152_01 > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java
[jira] [Commented] (YARN-4095) Avoid sharing AllocatorPerContext object in LocalDirAllocator between ShuffleHandler and LocalDirsHandlerService.
[ https://issues.apache.org/jira/browse/YARN-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14722954#comment-14722954 ] Hadoop QA commented on YARN-4095: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 20s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 46s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 50s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 46s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 21s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 7m 29s | Tests failed in hadoop-yarn-server-nodemanager. | | | | 56m 7s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12753223/YARN-4095.000.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / cf83156 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8949/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8949/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8949/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8949/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8949/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8949/console | This message was automatically generated. > Avoid sharing AllocatorPerContext object in LocalDirAllocator between > ShuffleHandler and LocalDirsHandlerService. > - > > Key: YARN-4095 > URL: https://issues.apache.org/jira/browse/YARN-4095 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4095.000.patch > > > Currently {{ShuffleHandler}} and {{LocalDirsHandlerService}} share > {{AllocatorPerContext}} object in {{LocalDirAllocator}} for configuration > {{NM_LOCAL_DIRS}} because {{AllocatorPerContext}} are stored in a static > TreeMap with configuration name as key > {code} > private static Map contexts = > new TreeMap(); > {code} > {{LocalDirsHandlerService}} and {{ShuffleHandler}} both create a > {{LocalDirAllocator}} using {{NM_LOCAL_DIRS}}. Even they don't use the same > {{Configuration}} object, but they will use the same {{AllocatorPerContext}} > object. Also {{LocalDirsHandlerService}} may change {{NM_LOCAL_DIRS}} value > in its {{Configuration}} object to exclude full and bad local dirs, > {{ShuffleHandler}} always uses the original {{NM_LOCAL_DIRS}} value in its > {{Configuration}} object. So every time {{AllocatorPerContext#confChanged}} > is called by {{ShuffleHandler}} after {{LocalDirsHandlerService}}, > {{AllocatorPerContext}} need be reinitialized because {{NM_LOCAL_DIRS}} value > is changed. This will cause some overhead. > {code} > String newLocalDirs = conf.get(contextCfgItemName); > if (!newLocalDirs.equals(savedLocalDirs)) { > {code} > So it will be a good improvement to not share the same > {{AllocatorPerContext}} instance between {{ShuffleHandler}} and
[jira] [Commented] (YARN-2945) FSLeafQueue#assignContainer - document the reason for using both write and read locks
[ https://issues.apache.org/jira/browse/YARN-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14722841#comment-14722841 ] Hudson commented on YARN-2945: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #322 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/322/]) YARN-2945. Fixing the CHANGES.txt to have the right JIRA number. (vinodkv: rev cf831565e8344523e1bd0eaf686ed56a2b48b920) * hadoop-yarn-project/CHANGES.txt > FSLeafQueue#assignContainer - document the reason for using both write and > read locks > - > > Key: YARN-2945 > URL: https://issues.apache.org/jira/browse/YARN-2945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa > Fix For: 2.7.0 > > Attachments: YARN-2945.001.patch, YARN-2945.002.patch > > > After YARN-2910, assignContainer hold WriteLock while sorting and ReadLock > while referencing runnableApps. This can cause interrupted assignment of > containers regardless of the policy. > {code} > writeLock.lock(); > try { > Collections.sort(runnableApps, comparator); > } finally { > writeLock.unlock(); > } > readLock.lock(); > try { > for (FSAppAttempt sched : runnableApps) { > if (SchedulerAppUtils.isBlacklisted(sched, node, LOG)) { > continue; > } > assigned = sched.assignContainer(node); > if (!assigned.equals(Resources.none())) { > break; > } >} > } finally { > readLock.unlock(); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2945) FSLeafQueue#assignContainer - document the reason for using both write and read locks
[ https://issues.apache.org/jira/browse/YARN-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14722839#comment-14722839 ] Hudson commented on YARN-2945: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #1055 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1055/]) YARN-2945. Fixing the CHANGES.txt to have the right JIRA number. (vinodkv: rev cf831565e8344523e1bd0eaf686ed56a2b48b920) * hadoop-yarn-project/CHANGES.txt > FSLeafQueue#assignContainer - document the reason for using both write and > read locks > - > > Key: YARN-2945 > URL: https://issues.apache.org/jira/browse/YARN-2945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa > Fix For: 2.7.0 > > Attachments: YARN-2945.001.patch, YARN-2945.002.patch > > > After YARN-2910, assignContainer hold WriteLock while sorting and ReadLock > while referencing runnableApps. This can cause interrupted assignment of > containers regardless of the policy. > {code} > writeLock.lock(); > try { > Collections.sort(runnableApps, comparator); > } finally { > writeLock.unlock(); > } > readLock.lock(); > try { > for (FSAppAttempt sched : runnableApps) { > if (SchedulerAppUtils.isBlacklisted(sched, node, LOG)) { > continue; > } > assigned = sched.assignContainer(node); > if (!assigned.equals(Resources.none())) { > break; > } >} > } finally { > readLock.unlock(); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2945) FSLeafQueue#assignContainer - document the reason for using both write and read locks
[ https://issues.apache.org/jira/browse/YARN-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14722713#comment-14722713 ] Hudson commented on YARN-2945: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #328 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/328/]) YARN-2945. Fixing the CHANGES.txt to have the right JIRA number. (vinodkv: rev cf831565e8344523e1bd0eaf686ed56a2b48b920) * hadoop-yarn-project/CHANGES.txt > FSLeafQueue#assignContainer - document the reason for using both write and > read locks > - > > Key: YARN-2945 > URL: https://issues.apache.org/jira/browse/YARN-2945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa > Fix For: 2.7.0 > > Attachments: YARN-2945.001.patch, YARN-2945.002.patch > > > After YARN-2910, assignContainer hold WriteLock while sorting and ReadLock > while referencing runnableApps. This can cause interrupted assignment of > containers regardless of the policy. > {code} > writeLock.lock(); > try { > Collections.sort(runnableApps, comparator); > } finally { > writeLock.unlock(); > } > readLock.lock(); > try { > for (FSAppAttempt sched : runnableApps) { > if (SchedulerAppUtils.isBlacklisted(sched, node, LOG)) { > continue; > } > assigned = sched.assignContainer(node); > if (!assigned.equals(Resources.none())) { > break; > } >} > } finally { > readLock.unlock(); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2945) FSLeafQueue#assignContainer - document the reason for using both write and read locks
[ https://issues.apache.org/jira/browse/YARN-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14722709#comment-14722709 ] Hudson commented on YARN-2945: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #313 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/313/]) YARN-2945. Fixing the CHANGES.txt to have the right JIRA number. (vinodkv: rev cf831565e8344523e1bd0eaf686ed56a2b48b920) * hadoop-yarn-project/CHANGES.txt > FSLeafQueue#assignContainer - document the reason for using both write and > read locks > - > > Key: YARN-2945 > URL: https://issues.apache.org/jira/browse/YARN-2945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa > Fix For: 2.7.0 > > Attachments: YARN-2945.001.patch, YARN-2945.002.patch > > > After YARN-2910, assignContainer hold WriteLock while sorting and ReadLock > while referencing runnableApps. This can cause interrupted assignment of > containers regardless of the policy. > {code} > writeLock.lock(); > try { > Collections.sort(runnableApps, comparator); > } finally { > writeLock.unlock(); > } > readLock.lock(); > try { > for (FSAppAttempt sched : runnableApps) { > if (SchedulerAppUtils.isBlacklisted(sched, node, LOG)) { > continue; > } > assigned = sched.assignContainer(node); > if (!assigned.equals(Resources.none())) { > break; > } >} > } finally { > readLock.unlock(); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4095) Avoid sharing AllocatorPerContext object in LocalDirAllocator between ShuffleHandler and LocalDirsHandlerService.
[ https://issues.apache.org/jira/browse/YARN-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-4095: Attachment: YARN-4095.000.patch > Avoid sharing AllocatorPerContext object in LocalDirAllocator between > ShuffleHandler and LocalDirsHandlerService. > - > > Key: YARN-4095 > URL: https://issues.apache.org/jira/browse/YARN-4095 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4095.000.patch > > > Currently {{ShuffleHandler}} and {{LocalDirsHandlerService}} share > {{AllocatorPerContext}} object in {{LocalDirAllocator}} for configuration > {{NM_LOCAL_DIRS}} because {{AllocatorPerContext}} are stored in a static > TreeMap with configuration name as key > {code} > private static Map contexts = > new TreeMap(); > {code} > {{LocalDirsHandlerService}} and {{ShuffleHandler}} both create a > {{LocalDirAllocator}} using {{NM_LOCAL_DIRS}}. Even they don't use the same > {{Configuration}} object, but they will use the same {{AllocatorPerContext}} > object. Also {{LocalDirsHandlerService}} may change {{NM_LOCAL_DIRS}} value > in its {{Configuration}} object to exclude full and bad local dirs, > {{ShuffleHandler}} always uses the original {{NM_LOCAL_DIRS}} value in its > {{Configuration}} object. So every time {{AllocatorPerContext#confChanged}} > is called by {{ShuffleHandler}} after {{LocalDirsHandlerService}}, > {{AllocatorPerContext}} need be reinitialized because {{NM_LOCAL_DIRS}} value > is changed. This will cause some overhead. > {code} > String newLocalDirs = conf.get(contextCfgItemName); > if (!newLocalDirs.equals(savedLocalDirs)) { > {code} > So it will be a good improvement to not share the same > {{AllocatorPerContext}} instance between {{ShuffleHandler}} and > {{LocalDirsHandlerService}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4095) Avoid sharing AllocatorPerContext object in LocalDirAllocator between ShuffleHandler and LocalDirsHandlerService.
[ https://issues.apache.org/jira/browse/YARN-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-4095: Attachment: (was: YARN-4095.000.patch) > Avoid sharing AllocatorPerContext object in LocalDirAllocator between > ShuffleHandler and LocalDirsHandlerService. > - > > Key: YARN-4095 > URL: https://issues.apache.org/jira/browse/YARN-4095 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4095.000.patch > > > Currently {{ShuffleHandler}} and {{LocalDirsHandlerService}} share > {{AllocatorPerContext}} object in {{LocalDirAllocator}} for configuration > {{NM_LOCAL_DIRS}} because {{AllocatorPerContext}} are stored in a static > TreeMap with configuration name as key > {code} > private static Map contexts = > new TreeMap(); > {code} > {{LocalDirsHandlerService}} and {{ShuffleHandler}} both create a > {{LocalDirAllocator}} using {{NM_LOCAL_DIRS}}. Even they don't use the same > {{Configuration}} object, but they will use the same {{AllocatorPerContext}} > object. Also {{LocalDirsHandlerService}} may change {{NM_LOCAL_DIRS}} value > in its {{Configuration}} object to exclude full and bad local dirs, > {{ShuffleHandler}} always uses the original {{NM_LOCAL_DIRS}} value in its > {{Configuration}} object. So every time {{AllocatorPerContext#confChanged}} > is called by {{ShuffleHandler}} after {{LocalDirsHandlerService}}, > {{AllocatorPerContext}} need be reinitialized because {{NM_LOCAL_DIRS}} value > is changed. This will cause some overhead. > {code} > String newLocalDirs = conf.get(contextCfgItemName); > if (!newLocalDirs.equals(savedLocalDirs)) { > {code} > So it will be a good improvement to not share the same > {{AllocatorPerContext}} instance between {{ShuffleHandler}} and > {{LocalDirsHandlerService}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4095) Avoid sharing AllocatorPerContext object in LocalDirAllocator between ShuffleHandler and LocalDirsHandlerService.
[ https://issues.apache.org/jira/browse/YARN-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14722304#comment-14722304 ] Hadoop QA commented on YARN-4095: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 33s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 51s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 17s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 24s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 53s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 0m 22s | Tests failed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 7m 34s | Tests failed in hadoop-yarn-server-nodemanager. | | | | 50m 35s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.conf.TestYarnConfigurationFields | | | hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12753220/YARN-4095.000.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / cf83156 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8948/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8948/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8948/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8948/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8948/console | This message was automatically generated. > Avoid sharing AllocatorPerContext object in LocalDirAllocator between > ShuffleHandler and LocalDirsHandlerService. > - > > Key: YARN-4095 > URL: https://issues.apache.org/jira/browse/YARN-4095 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4095.000.patch > > > Currently {{ShuffleHandler}} and {{LocalDirsHandlerService}} share > {{AllocatorPerContext}} object in {{LocalDirAllocator}} for configuration > {{NM_LOCAL_DIRS}} because {{AllocatorPerContext}} are stored in a static > TreeMap with configuration name as key > {code} > private static Map contexts = > new TreeMap(); > {code} > {{LocalDirsHandlerService}} and {{ShuffleHandler}} both create a > {{LocalDirAllocator}} using {{NM_LOCAL_DIRS}}. Even they don't use the same > {{Configuration}} object, but they will use the same {{AllocatorPerContext}} > object. Also {{LocalDirsHandlerService}} may change {{NM_LOCAL_DIRS}} value > in its {{Configuration}} object to exclude full and bad local dirs, > {{ShuffleHandler}} always uses the original {{NM_LOCAL_DIRS}} value in its > {{Configuration}} object. So every time {{AllocatorPerContext#confChanged}} > is called by {{ShuffleHandler}} after {{LocalDirsHandlerService}}, > {{AllocatorPerContext}} need be reinitialized because {{NM_LOCAL_DIRS}} value > is changed. This will cause some overhead. > {code} > String newLocalDirs = conf.get(contextCfgItemName); > if (!newLocalDirs.equals(savedLocalDirs)) { > {code} > So it will be a good improvement to not share the same > {{AllocatorPerContext}} instance between {{ShuffleHandler}} and > {{LocalDirsHandlerService}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2945) FSLeafQueue#assignContainer - document the reason for using both write and read locks
[ https://issues.apache.org/jira/browse/YARN-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14722303#comment-14722303 ] Hudson commented on YARN-2945: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2271 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2271/]) YARN-2945. Fixing the CHANGES.txt to have the right JIRA number. (vinodkv: rev cf831565e8344523e1bd0eaf686ed56a2b48b920) * hadoop-yarn-project/CHANGES.txt > FSLeafQueue#assignContainer - document the reason for using both write and > read locks > - > > Key: YARN-2945 > URL: https://issues.apache.org/jira/browse/YARN-2945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa > Fix For: 2.7.0 > > Attachments: YARN-2945.001.patch, YARN-2945.002.patch > > > After YARN-2910, assignContainer hold WriteLock while sorting and ReadLock > while referencing runnableApps. This can cause interrupted assignment of > containers regardless of the policy. > {code} > writeLock.lock(); > try { > Collections.sort(runnableApps, comparator); > } finally { > writeLock.unlock(); > } > readLock.lock(); > try { > for (FSAppAttempt sched : runnableApps) { > if (SchedulerAppUtils.isBlacklisted(sched, node, LOG)) { > continue; > } > assigned = sched.assignContainer(node); > if (!assigned.equals(Resources.none())) { > break; > } >} > } finally { > readLock.unlock(); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2997) NM keeps sending already-sent completed containers to RM until containers are removed from context
[ https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2997: -- Fix Version/s: 2.6.1 Pulled this into 2.6.1. Ran compilation and TestNodeStatusUpdater before the push. Patch applied cleanly. > NM keeps sending already-sent completed containers to RM until containers are > removed from context > -- > > Key: YARN-2997 > URL: https://issues.apache.org/jira/browse/YARN-2997 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.4.patch, > YARN-2997.5.patch, YARN-2997.patch > > > We have seen in RM log a lot of > {quote} > INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {quote} > It is caused by NM sending completed containers repeatedly until the app is > finished. On the RM side, the container is already released, hence > {{getRMContainer}} returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2922) ConcurrentModificationException in CapacityScheduler's LeafQueue
[ https://issues.apache.org/jira/browse/YARN-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2922: -- Fix Version/s: 2.6.1 Pulled this into 2.6.1. Ran compilation and TestLeafQueue before the push. Patch applied cleanly. > ConcurrentModificationException in CapacityScheduler's LeafQueue > > > Key: YARN-2922 > URL: https://issues.apache.org/jira/browse/YARN-2922 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, resourcemanager, scheduler >Affects Versions: 2.5.1 >Reporter: Jason Tufo >Assignee: Rohith Sharma K S > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: 0001-YARN-2922.patch, 0001-YARN-2922.patch > > > java.util.ConcurrentModificationException > at > java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115) > at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.collectSchedulerApplications(LeafQueue.java:1618) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getAppsInQueue(CapacityScheduler.java:1119) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:798) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:234) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2992) ZKRMStateStore crashes due to session expiry
[ https://issues.apache.org/jira/browse/YARN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2992: -- Fix Version/s: 2.6.1 Pulled this into 2.6.1. Ran compilation before the push. Patch applied cleanly. > ZKRMStateStore crashes due to session expiry > > > Key: YARN-2992 > URL: https://issues.apache.org/jira/browse/YARN-2992 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: yarn-2992-1.patch > > > We recently saw the RM crash with the following stacktrace. On session > expiry, we should gracefully transition to standby. > {noformat} > 2014-12-18 06:28:42,689 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a > org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type > STATE_STORE_OP_FAILED. Cause: > org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode > = Session expired > at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:941) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:958) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:687) > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2340: -- Fix Version/s: 2.6.1 Pulled this into 2.6.1. Ran compilation and TestWorkPreservingRMRestart before the push. Patch applied cleanly. > NPE thrown when RM restart after queue is STOPPED. There after RM can not > recovery application's and remain in standby > -- > > Key: YARN-2340 > URL: https://issues.apache.org/jira/browse/YARN-2340 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.4.1 > Environment: Capacityscheduler with Queue a, b >Reporter: Nishan Shetty >Assignee: Rohith Sharma K S >Priority: Critical > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: 0001-YARN-2340.patch > > > While job is in progress make Queue state as STOPPED and then restart RM > Observe that standby RM fails to come up as acive throwing below NPE > 2014-07-23 18:43:24,432 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED > 2014-07-23 18:43:24,433 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type APP_ATTEMPT_ADDED to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) > at java.lang.Thread.run(Thread.java:662) > 2014-07-23 18:43:24,434 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2952) Incorrect version check in RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2952: -- Fix Version/s: 2.6.1 Pulled this into 2.6.1. Ran compilation and the tests TestFSRMStateStore, TestZKRMStateStore before the push. Patch applied cleanly. > Incorrect version check in RMStateStore > --- > > Key: YARN-2952 > URL: https://issues.apache.org/jira/browse/YARN-2952 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Rohith Sharma K S > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: 0001-YARN-2952.patch > > > In RMStateStore#checkVersion: if we modify tCURRENT_VERSION_INFO to 2.0, > it'll still store the version as 1.0 which is incorrect; The same thing might > happen to NM store, timeline store. > {code} > // if there is no version info, treat it as 1.0; > if (loadedVersion == null) { > loadedVersion = Version.newInstance(1, 0); > } > if (loadedVersion.isCompatibleTo(getCurrentVersion())) { > LOG.info("Storing RM state version info " + getCurrentVersion()); > storeVersion(); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1984) LeveldbTimelineStore does not handle db exceptions properly
[ https://issues.apache.org/jira/browse/YARN-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1984: -- Fix Version/s: 2.6.1 Pulled this into 2.6.1 as a dependency for YARN-2952. Ran compilation and TestLeveldbTimelineStore before the push. Patch applied cleanly. > LeveldbTimelineStore does not handle db exceptions properly > --- > > Key: YARN-1984 > URL: https://issues.apache.org/jira/browse/YARN-1984 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.0 >Reporter: Jason Lowe >Assignee: Varun Saxena > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: YARN-1984.001.patch, YARN-1984.002.patch, YARN-1984.patch > > > The org.iq80.leveldb.DB and DBIterator methods throw runtime exceptions > rather than IOException which can easily leak up the stack and kill threads > (e.g.: the deletion thread). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1984) LeveldbTimelineStore does not handle db exceptions properly
[ https://issues.apache.org/jira/browse/YARN-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1984: -- Labels: 2.6.1-candidate (was: ) > LeveldbTimelineStore does not handle db exceptions properly > --- > > Key: YARN-1984 > URL: https://issues.apache.org/jira/browse/YARN-1984 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.0 >Reporter: Jason Lowe >Assignee: Varun Saxena > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: YARN-1984.001.patch, YARN-1984.002.patch, YARN-1984.patch > > > The org.iq80.leveldb.DB and DBIterator methods throw runtime exceptions > rather than IOException which can easily leak up the stack and kill threads > (e.g.: the deletion thread). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4095) Avoid sharing AllocatorPerContext object in LocalDirAllocator between ShuffleHandler and LocalDirsHandlerService.
[ https://issues.apache.org/jira/browse/YARN-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721925#comment-14721925 ] zhihai xu commented on YARN-4095: - I attached a patch YARN-4095.000.patch, which used a new configuration NM_GOOD_LOCAL_DIRS to create {{LocalDirAllocator}} in {{LocalDirsHandlerService}} to store the good local dirs. So we can avoid using the same configuration name to create {{LocalDirAllocator}} between {{ShuffleHandler}} and {{LocalDirsHandlerService}}. I also created a new configuration NM_GOOD_LOG_DIRS to match NM_GOOD_LOCAL_DIRS. > Avoid sharing AllocatorPerContext object in LocalDirAllocator between > ShuffleHandler and LocalDirsHandlerService. > - > > Key: YARN-4095 > URL: https://issues.apache.org/jira/browse/YARN-4095 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4095.000.patch > > > Currently {{ShuffleHandler}} and {{LocalDirsHandlerService}} share > {{AllocatorPerContext}} object in {{LocalDirAllocator}} for configuration > {{NM_LOCAL_DIRS}} because {{AllocatorPerContext}} are stored in a static > TreeMap with configuration name as key > {code} > private static Map contexts = > new TreeMap(); > {code} > {{LocalDirsHandlerService}} and {{ShuffleHandler}} both create a > {{LocalDirAllocator}} using {{NM_LOCAL_DIRS}}. Even they don't use the same > {{Configuration}} object, but they will use the same {{AllocatorPerContext}} > object. Also {{LocalDirsHandlerService}} may change {{NM_LOCAL_DIRS}} value > in its {{Configuration}} object to exclude full and bad local dirs, > {{ShuffleHandler}} always uses the original {{NM_LOCAL_DIRS}} value in its > {{Configuration}} object. So every time {{AllocatorPerContext#confChanged}} > is called by {{ShuffleHandler}} after {{LocalDirsHandlerService}}, > {{AllocatorPerContext}} need be reinitialized because {{NM_LOCAL_DIRS}} value > is changed. This will cause some overhead. > {code} > String newLocalDirs = conf.get(contextCfgItemName); > if (!newLocalDirs.equals(savedLocalDirs)) { > {code} > So it will be a good improvement to not share the same > {{AllocatorPerContext}} instance between {{ShuffleHandler}} and > {{LocalDirsHandlerService}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-842) Resource Manager & Node Manager UI's doesn't work with IE
[ https://issues.apache.org/jira/browse/YARN-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K resolved YARN-842. Resolution: Not A Problem It is working fine in the latest, closing it now. Please reopen if you still see this issue. Thanks. > Resource Manager & Node Manager UI's doesn't work with IE > - > > Key: YARN-842 > URL: https://issues.apache.org/jira/browse/YARN-842 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager >Affects Versions: 2.0.4-alpha >Reporter: Devaraj K > > {code:xml} > Webpage error details > User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; > SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media > Center PC 6.0) > Timestamp: Mon, 17 Jun 2013 12:06:03 UTC > Message: 'JSON' is undefined > Line: 41 > Char: 218 > Code: 0 > URI: http://10.18.40.24:8088/cluster/apps > {code} > RM & NM UI's are not working with IE and showing the above error for every > link on the UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4095) Avoid sharing AllocatorPerContext object in LocalDirAllocator between ShuffleHandler and LocalDirsHandlerService.
[ https://issues.apache.org/jira/browse/YARN-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-4095: Description: Currently {{ShuffleHandler}} and {{LocalDirsHandlerService}} share {{AllocatorPerContext}} object in {{LocalDirAllocator}} for configuration {{NM_LOCAL_DIRS}} because {{AllocatorPerContext}} are stored in a static TreeMap with configuration name as key {code} private static Map contexts = new TreeMap(); {code} {{LocalDirsHandlerService}} and {{ShuffleHandler}} both create a {{LocalDirAllocator}} using {{NM_LOCAL_DIRS}}. Even they don't use the same {{Configuration}} object, but they will use the same {{AllocatorPerContext}} object. Also {{LocalDirsHandlerService}} may change {{NM_LOCAL_DIRS}} value in its {{Configuration}} object to exclude full and bad local dirs, {{ShuffleHandler}} always uses the original {{NM_LOCAL_DIRS}} value in its {{Configuration}} object. So every time {{AllocatorPerContext#confChanged}} is called by {{ShuffleHandler}} after {{LocalDirsHandlerService}}, {{AllocatorPerContext}} need be reinitialized because {{NM_LOCAL_DIRS}} value is changed. This will cause some overhead. {code} String newLocalDirs = conf.get(contextCfgItemName); if (!newLocalDirs.equals(savedLocalDirs)) { {code} So it will be a good improvement to not share the same {{AllocatorPerContext}} instance between {{ShuffleHandler}} and {{LocalDirsHandlerService}}. was: Currently {{ShuffleHandler}} and {{LocalDirsHandlerService}} share {{AllocatorPerContext}} object in {{LocalDirAllocator}} for configuration {{NM_LOCAL_DIRS}} because {{AllocatorPerContext}}s are stored in a static TreeMap with configuration name as key {code} private static Map contexts = new TreeMap(); {code} {{LocalDirsHandlerService}} and {{ShuffleHandler}} both create a {{LocalDirAllocator}} using {{NM_LOCAL_DIRS}}. Even they don't use the same {{Configuration}} object, but they will use the same {{AllocatorPerContext}} object. Also {{LocalDirsHandlerService}} may change {{NM_LOCAL_DIRS}} value in its {{Configuration}} object to exclude full and bad local dirs, {{ShuffleHandler}} always uses the original {{NM_LOCAL_DIRS}} value in its {{Configuration}} object. So every time {{AllocatorPerContext#confChanged}} is called by {{ShuffleHandler}} after {{LocalDirsHandlerService}}, {{AllocatorPerContext}} need be reinitialized because {{NM_LOCAL_DIRS}} value is changed. This will cause some overhead. {code} String newLocalDirs = conf.get(contextCfgItemName); if (!newLocalDirs.equals(savedLocalDirs)) { {code} So it will be a good improvement to not share the same {{AllocatorPerContext}} instance between {{ShuffleHandler}} and {{LocalDirsHandlerService}}. > Avoid sharing AllocatorPerContext object in LocalDirAllocator between > ShuffleHandler and LocalDirsHandlerService. > - > > Key: YARN-4095 > URL: https://issues.apache.org/jira/browse/YARN-4095 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4095.000.patch > > > Currently {{ShuffleHandler}} and {{LocalDirsHandlerService}} share > {{AllocatorPerContext}} object in {{LocalDirAllocator}} for configuration > {{NM_LOCAL_DIRS}} because {{AllocatorPerContext}} are stored in a static > TreeMap with configuration name as key > {code} > private static Map contexts = > new TreeMap(); > {code} > {{LocalDirsHandlerService}} and {{ShuffleHandler}} both create a > {{LocalDirAllocator}} using {{NM_LOCAL_DIRS}}. Even they don't use the same > {{Configuration}} object, but they will use the same {{AllocatorPerContext}} > object. Also {{LocalDirsHandlerService}} may change {{NM_LOCAL_DIRS}} value > in its {{Configuration}} object to exclude full and bad local dirs, > {{ShuffleHandler}} always uses the original {{NM_LOCAL_DIRS}} value in its > {{Configuration}} object. So every time {{AllocatorPerContext#confChanged}} > is called by {{ShuffleHandler}} after {{LocalDirsHandlerService}}, > {{AllocatorPerContext}} need be reinitialized because {{NM_LOCAL_DIRS}} value > is changed. This will cause some overhead. > {code} > String newLocalDirs = conf.get(contextCfgItemName); > if (!newLocalDirs.equals(savedLocalDirs)) { > {code} > So it will be a good improvement to not share the same > {{AllocatorPerContext}} instance between {{ShuffleHandler}} and > {{LocalDirsHandlerService}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2945) FSLeafQueue#assignContainer - document the reason for using both write and read locks
[ https://issues.apache.org/jira/browse/YARN-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721915#comment-14721915 ] Hudson commented on YARN-2945: -- FAILURE: Integrated in Hadoop-trunk-Commit #8371 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8371/]) YARN-2945. Fixing the CHANGES.txt to have the right JIRA number. (vinodkv: rev cf831565e8344523e1bd0eaf686ed56a2b48b920) * hadoop-yarn-project/CHANGES.txt > FSLeafQueue#assignContainer - document the reason for using both write and > read locks > - > > Key: YARN-2945 > URL: https://issues.apache.org/jira/browse/YARN-2945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa > Fix For: 2.7.0 > > Attachments: YARN-2945.001.patch, YARN-2945.002.patch > > > After YARN-2910, assignContainer hold WriteLock while sorting and ReadLock > while referencing runnableApps. This can cause interrupted assignment of > containers regardless of the policy. > {code} > writeLock.lock(); > try { > Collections.sort(runnableApps, comparator); > } finally { > writeLock.unlock(); > } > readLock.lock(); > try { > for (FSAppAttempt sched : runnableApps) { > if (SchedulerAppUtils.isBlacklisted(sched, node, LOG)) { > continue; > } > assigned = sched.assignContainer(node); > if (!assigned.equals(Resources.none())) { > break; > } >} > } finally { > readLock.unlock(); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4095) Avoid sharing AllocatorPerContext object in LocalDirAllocator between ShuffleHandler and LocalDirsHandlerService.
[ https://issues.apache.org/jira/browse/YARN-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-4095: Attachment: YARN-4095.000.patch > Avoid sharing AllocatorPerContext object in LocalDirAllocator between > ShuffleHandler and LocalDirsHandlerService. > - > > Key: YARN-4095 > URL: https://issues.apache.org/jira/browse/YARN-4095 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4095.000.patch > > > Currently {{ShuffleHandler}} and {{LocalDirsHandlerService}} share > {{AllocatorPerContext}} object in {{LocalDirAllocator}} for configuration > {{NM_LOCAL_DIRS}} because {{AllocatorPerContext}}s are stored in a static > TreeMap with configuration name as key > {code} > private static Map contexts = > new TreeMap(); > {code} > {{LocalDirsHandlerService}} and {{ShuffleHandler}} both create a > {{LocalDirAllocator}} using {{NM_LOCAL_DIRS}}. Even they don't use the same > {{Configuration}} object, but they will use the same {{AllocatorPerContext}} > object. Also {{LocalDirsHandlerService}} may change {{NM_LOCAL_DIRS}} value > in its {{Configuration}} object to exclude full and bad local dirs, > {{ShuffleHandler}} always uses the original {{NM_LOCAL_DIRS}} value in its > {{Configuration}} object. So every time {{AllocatorPerContext#confChanged}} > is called by {{ShuffleHandler}} after {{LocalDirsHandlerService}}, > {{AllocatorPerContext}} need be reinitialized because {{NM_LOCAL_DIRS}} value > is changed. This will cause some overhead. > {code} > String newLocalDirs = conf.get(contextCfgItemName); > if (!newLocalDirs.equals(savedLocalDirs)) { > {code} > So it will be a good improvement to not share the same > {{AllocatorPerContext}} instance between {{ShuffleHandler}} and > {{LocalDirsHandlerService}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)
[ https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2964: -- Fix Version/s: 2.6.1 Pulled this into 2.6.1. Ran compilation and TestDelegationTokenRenewer before the push. Patch applied cleanly. > RM prematurely cancels tokens for jobs that submit jobs (oozie) > --- > > Key: YARN-2964 > URL: https://issues.apache.org/jira/browse/YARN-2964 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Daryn Sharp >Assignee: Jian He >Priority: Blocker > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch > > > The RM used to globally track the unique set of tokens for all apps. It > remembered the first job that was submitted with the token. The first job > controlled the cancellation of the token. This prevented completion of > sub-jobs from canceling tokens used by the main job. > As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no > notion of the first/main job. This results in sub-jobs canceling tokens and > failing the main job and other sub-jobs. It also appears to schedule > multiple redundant renewals. > The issue is not immediately obvious because the RM will cancel tokens ~10 > min (NM livelyness interval) after log aggregation completes. The result is > an oozie job, ex. pig, that will launch many sub-jobs over time will fail if > any sub-jobs are launched >10 min after any sub-job completes. If all other > sub-jobs complete within that 10 min window, then the issue goes unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1556) NPE getting application report with a null appId
[ https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721900#comment-14721900 ] Weiwei Yang commented on YARN-1556: --- Thanks [~djp] > NPE getting application report with a null appId > > > Key: YARN-1556 > URL: https://issues.apache.org/jira/browse/YARN-1556 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.7.1 >Reporter: Steve Loughran >Assignee: Weiwei Yang >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-1556.patch > > > If you accidentally pass in a null appId to get application report, you get > an NPE back. This is arguably as intended, except that maybe a guard > statement could report this in such a way as to make it easy for callers to > track down the cause. > {code} > java.lang.NullPointerException: java.lang.NullPointerException > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) > ... 28 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4095) Avoid sharing AllocatorPerContext object in LocalDirAllocator between ShuffleHandler and LocalDirsHandlerService.
zhihai xu created YARN-4095: --- Summary: Avoid sharing AllocatorPerContext object in LocalDirAllocator between ShuffleHandler and LocalDirsHandlerService. Key: YARN-4095 URL: https://issues.apache.org/jira/browse/YARN-4095 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu Currently {{ShuffleHandler}} and {{LocalDirsHandlerService}} share {{AllocatorPerContext}} object in {{LocalDirAllocator}} for configuration {{NM_LOCAL_DIRS}} because {{AllocatorPerContext}}s are stored in a static TreeMap with configuration name as key {code} private static Map contexts = new TreeMap(); {code} {{LocalDirsHandlerService}} and {{ShuffleHandler}} both create a {{LocalDirAllocator}} using {{NM_LOCAL_DIRS}}. Even they don't use the same {{Configuration}} object, but they will use the same {{AllocatorPerContext}} object. Also {{LocalDirsHandlerService}} may change {{NM_LOCAL_DIRS}} value in its {{Configuration}} object to exclude full and bad local dirs, {{ShuffleHandler}} always uses the original {{NM_LOCAL_DIRS}} value in its {{Configuration}} object. So every time {{AllocatorPerContext#confChanged}} is called by {{ShuffleHandler}} after {{LocalDirsHandlerService}}, {{AllocatorPerContext}} need be reinitialized because {{NM_LOCAL_DIRS}} value is changed. This will cause some overhead. {code} String newLocalDirs = conf.get(contextCfgItemName); if (!newLocalDirs.equals(savedLocalDirs)) { {code} So it will be a good improvement to not share the same {{AllocatorPerContext}} instance between {{ShuffleHandler}} and {{LocalDirsHandlerService}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode
[ https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721860#comment-14721860 ] Hadoop QA commented on YARN-4092: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 34s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 44s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 55s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 55s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 56s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 57s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 53m 31s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 106m 57s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12753079/YARN-4092.3.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / 837fb75 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8947/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8947/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8947/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8947/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8947/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8947/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8947/console | This message was automatically generated. > RM HA UI redirection needs to be fixed when both RMs are in standby mode > > > Key: YARN-4092 > URL: https://issues.apache.org/jira/browse/YARN-4092 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4092.1.patch, YARN-4092.2.patch, YARN-4092.3.patch > > > In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be > accessible. It will keep redirecting between both RMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment
[ https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721780#comment-14721780 ] Wangda Tan commented on YARN-2801: -- Sorry I missed this comment, thanks [~Naganarasimha] addressing them and review from [~ozawa]! > Documentation development for Node labels requirment > > > Key: YARN-2801 > URL: https://issues.apache.org/jira/browse/YARN-2801 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Gururaj Shetty >Assignee: Wangda Tan > Attachments: YARN-2801.1.patch, YARN-2801.2.patch, YARN-2801.3.patch, > YARN-2801.4.patch > > > Documentation needs to be developed for the node label requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2917) Potential deadlock in AsyncDispatcher when system.exit called in AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook
[ https://issues.apache.org/jira/browse/YARN-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2917: -- Fix Version/s: 2.6.1 Pulled this into 2.6.1. Ran compilation before the push. Patch applied cleanly. > Potential deadlock in AsyncDispatcher when system.exit called in > AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook > > > Key: YARN-2917 > URL: https://issues.apache.org/jira/browse/YARN-2917 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Critical > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: 0001-YARN-2917.patch, 0002-YARN-2917.patch > > > I encoutered scenario where RM hanged while shutting down and keep on logging > {{2014-12-03 19:32:44,283 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Waiting for AsyncDispatcher to drain.}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2910) FSLeafQueue can throw ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2910: -- Fix Version/s: 2.6.1 Pulled this into 2.6.1. Ran compilation and TestFSLeafQueue before the push. Patch applied cleanly. > FSLeafQueue can throw ConcurrentModificationException > - > > Key: YARN-2910 > URL: https://issues.apache.org/jira/browse/YARN-2910 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.5.2 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: FSLeafQueue_concurrent_exception.txt, > YARN-2910.004.patch, YARN-2910.1.patch, YARN-2910.2.patch, YARN-2910.3.patch, > YARN-2910.4.patch, YARN-2910.5.patch, YARN-2910.6.patch, YARN-2910.7.patch, > YARN-2910.8.patch, YARN-2910.patch > > > The list that maintains the runnable and the non runnable apps are a standard > ArrayList but there is no guarantee that it will only be manipulated by one > thread in the system. This can lead to the following exception: > {noformat} > 2014-11-12 02:29:01,169 ERROR [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN > CONTACTING RM. > java.util.ConcurrentModificationException: > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859) > at java.util.ArrayList$Itr.next(ArrayList.java:831) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java:147) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getHeadroom(FSAppAttempt.java:180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:516) > {noformat} > Full stack trace in the attached file. > We should guard against that by using a thread safe version from > java.util.concurrent.CopyOnWriteArrayList -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2874) Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further apps
[ https://issues.apache.org/jira/browse/YARN-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2874: -- Fix Version/s: 2.6.1 Pulled this into 2.6.1. Ran compilation before the push. Patch applied cleanly. > Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further > apps > - > > Key: YARN-2874 > URL: https://issues.apache.org/jira/browse/YARN-2874 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0, 2.5.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Blocker > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: YARN-2874.20141118-1.patch, YARN-2874.20141118-2.patch > > > When token renewal fails and the application finishes this dead lock can occur > Jstack dump : > {quote} > Found one Java-level deadlock: > = > "DelegationTokenRenewer #181865": > waiting to lock monitor 0x00900918 (object 0xc18a9998, a > java.util.Collections$SynchronizedSet), > which is held by "DelayedTokenCanceller" > "DelayedTokenCanceller": > waiting to lock monitor 0x04141718 (object 0xc7eae720, a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask), > which is held by "Timer-4" > "Timer-4": > waiting to lock monitor 0x00900918 (object 0xc18a9998, a > java.util.Collections$SynchronizedSet), > which is held by "DelayedTokenCanceller" > > Java stack information for the threads listed above: > === > "DelegationTokenRenewer #181865": > at java.util.Collections$SynchronizedCollection.add(Collections.java:1636) > - waiting to lock <0xc18a9998> (a > java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.addTokenToList(DelegationTokenRenewer.java:322) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:398) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$500(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > "DelayedTokenCanceller": > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.cancel(DelegationTokenRenewer.java:443) > - waiting to lock <0xc7eae720> (a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeApplicationFromRenewal(DelegationTokenRenewer.java:558) > - locked <0xc18a9998> (a java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$300(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelayedTokenRemovalRunnable.run(DelegationTokenRenewer.java:599) > at java.lang.Thread.run(Thread.java:745) > "Timer-4": > at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) > - waiting to lock <0xc18a9998> (a > java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeFailedDelegationToken(DelegationTokenRenewer.java:503) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$100(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.run(DelegationTokenRenewer.java:437) > - locked <0xc7eae720> (a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > > Found 1 deadlock. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2894) When ACL's are enabled, if RM switches then application can not be viewed from web.
[ https://issues.apache.org/jira/browse/YARN-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2894: -- Fix Version/s: 2.6.1 Pulled this into 2.6.1. Ran into a couple of minor import issues in a couple of classes, fixed them. Pushed the patch after running compilation and running the tests TestRMWebServices,TestRMWebServicesApps,TestRMWebServicesAppsModification,TestRMWebServicesCapacitySched,TestRMWebServicesDelegationTokens,TestRMWebServicesFairScheduler,TestRMWebServicesNodeLabels and TestRMWebServicesNodes. > When ACL's are enabled, if RM switches then application can not be viewed > from web. > --- > > Key: YARN-2894 > URL: https://issues.apache.org/jira/browse/YARN-2894 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: YARN-2894.1.patch, YARN-2894.patch > > > Binding aclManager to RMWebApp would cause problem if RM is switched. There > could be some validation check may fail. > I think , we should not bind aclManager for RMWebApp, instead we should get > from RM instance. > In RMWebApp, > {code} > if (rm != null) { > bind(ResourceManager.class).toInstance(rm); > bind(RMContext.class).toInstance(rm.getRMContext()); > bind(ApplicationACLsManager.class).toInstance( > rm.getApplicationACLsManager()); > bind(QueueACLsManager.class).toInstance(rm.getQueueACLsManager()); > } > {code} > and in AppBlock#render below check may fail(Need to test and confirm) > {code} >if (callerUGI != null > && !(this.aclsManager.checkAccess(callerUGI, > ApplicationAccessType.VIEW_APP, app.getUser(), appID) || > this.queueACLsManager.checkAccess(callerUGI, > QueueACL.ADMINISTER_QUEUE, app.getQueue( { > puts("You (User " + remoteUser > + ") are not authorized to view application " + appID); > return; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup
[ https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721647#comment-14721647 ] Hadoop QA commented on YARN-2729: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 52s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 43s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 52s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 51s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 21s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 7m 47s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 55m 52s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12753202/YARN-2729.20150830-1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 837fb75 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8946/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8946/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8946/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8946/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8946/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8946/console | This message was automatically generated. > Support script based NodeLabelsProvider Interface in Distributed Node Label > Configuration Setup > --- > > Key: YARN-2729 > URL: https://issues.apache.org/jira/browse/YARN-2729 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, > YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, > YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, > YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, > YARN-2729.20150402-1.patch, YARN-2729.20150404-1.patch, > YARN-2729.20150517-1.patch, YARN-2729.20150830-1.patch > > > Support script based NodeLabelsProvider Interface in Distributed Node Label > Configuration Setup . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup
[ https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2729: Attachment: YARN-2729.20150830-1.patch Attaching a patch to sync with the changes of YARN-2923. > Support script based NodeLabelsProvider Interface in Distributed Node Label > Configuration Setup > --- > > Key: YARN-2729 > URL: https://issues.apache.org/jira/browse/YARN-2729 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, > YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, > YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, > YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, > YARN-2729.20150402-1.patch, YARN-2729.20150404-1.patch, > YARN-2729.20150517-1.patch, YARN-2729.20150830-1.patch > > > Support script based NodeLabelsProvider Interface in Distributed Node Label > Configuration Setup . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4094) Add Configration to support encryption of Distributed Cache Data
[ https://issues.apache.org/jira/browse/YARN-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-4094: --- Fix Version/s: (was: 2.7.2) > Add Configration to support encryption of Distributed Cache Data > > > Key: YARN-4094 > URL: https://issues.apache.org/jira/browse/YARN-4094 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 2.6.0, 2.7.0 >Reporter: Vijay Singh > > Currently Ditributed cache does not allow mechanism to encrypt the data that > gets copied over during processing. One attack vector is to process small > files that contain sensitive data to use this mechanism to access contents of > small files. > This requests aims to counter that by providing for configuration at service > level that lets yarn encrypt all the data that gets to cache on each node. > Yarn components should encrypt while copying the data on to disk and decrypt > during the processing. Lets start by leveraging the symmetric key mechanism > used by HDFS transparent mechanism similar to DEK (Data Encryption key) that > could be generated as part of the process. > Next step could be to setup Encryption zone key similar to transperent > encryption mechanism. > Please suggest if there is a better way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4094) Add Configration to support encryption of Distributed Cache Data
[ https://issues.apache.org/jira/browse/YARN-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-4094: --- Target Version/s: (was: 2.6.0, 2.7.1) > Add Configration to support encryption of Distributed Cache Data > > > Key: YARN-4094 > URL: https://issues.apache.org/jira/browse/YARN-4094 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 2.6.0, 2.7.0 >Reporter: Vijay Singh > > Currently Ditributed cache does not allow mechanism to encrypt the data that > gets copied over during processing. One attack vector is to process small > files that contain sensitive data to use this mechanism to access contents of > small files. > This requests aims to counter that by providing for configuration at service > level that lets yarn encrypt all the data that gets to cache on each node. > Yarn components should encrypt while copying the data on to disk and decrypt > during the processing. Lets start by leveraging the symmetric key mechanism > used by HDFS transparent mechanism similar to DEK (Data Encryption key) that > could be generated as part of the process. > Next step could be to setup Encryption zone key similar to transperent > encryption mechanism. > Please suggest if there is a better way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment
[ https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721448#comment-14721448 ] Hadoop QA commented on YARN-2801: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 2m 57s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 3m 0s | Site still builds. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | | | 6m 20s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12753173/YARN-2801.4.patch | | Optional Tests | site | | git revision | trunk / 837fb75 | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8945/console | This message was automatically generated. > Documentation development for Node labels requirment > > > Key: YARN-2801 > URL: https://issues.apache.org/jira/browse/YARN-2801 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Gururaj Shetty >Assignee: Wangda Tan > Attachments: YARN-2801.1.patch, YARN-2801.2.patch, YARN-2801.3.patch, > YARN-2801.4.patch > > > Documentation needs to be developed for the node label requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4094) Add Configration to support encryption of Distributed Cache Data
Vijay Singh created YARN-4094: - Summary: Add Configration to support encryption of Distributed Cache Data Key: YARN-4094 URL: https://issues.apache.org/jira/browse/YARN-4094 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.7.0, 2.6.0 Reporter: Vijay Singh Fix For: 2.7.2 Currently Ditributed cache does not allow mechanism to encrypt the data that gets copied over during processing. One attack vector is to process small files that contain sensitive data to use this mechanism to access contents of small files. This requests aims to counter that by providing for configuration at service level that lets yarn encrypt all the data that gets to cache on each node. Yarn components should encrypt while copying the data on to disk and decrypt during the processing. Lets start by leveraging the symmetric key mechanism used by HDFS transparent mechanism similar to DEK (Data Encryption key) that could be generated as part of the process. Next step could be to setup Encryption zone key similar to transperent encryption mechanism. Please suggest if there is a better way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2801) Documentation development for Node labels requirment
[ https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2801: Attachment: YARN-2801.4.patch Thanks [~ozawa], for the comments. [~leftnoteasy] as it was pending for a while and it was only small corrections, also i had to update document for distributed NodeLabels on top of this hence i have given a patch for this jira. [~ozawa], have corrected most of your comments but just for few where in user was addressed in singular " A User", as i felt existing was fine. > Documentation development for Node labels requirment > > > Key: YARN-2801 > URL: https://issues.apache.org/jira/browse/YARN-2801 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Gururaj Shetty >Assignee: Wangda Tan > Attachments: YARN-2801.1.patch, YARN-2801.2.patch, YARN-2801.3.patch, > YARN-2801.4.patch > > > Documentation needs to be developed for the node label requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)