[jira] [Updated] (YARN-2494) [YARN-796] Node label manager API and storage implementations
[ https://issues.apache.org/jira/browse/YARN-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2494: - Attachment: YARN-2494.patch [YARN-796] Node label manager API and storage implementations - Key: YARN-2494 URL: https://issues.apache.org/jira/browse/YARN-2494 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, YARN-2494.patch This JIRA includes APIs and storage implementations of node label manager, NodeLabelManager is an abstract class used to manage labels of nodes in the cluster, it has APIs to query/modify - Nodes according to given label - Labels according to given hostname - Add/remove labels - Set labels of nodes in the cluster - Persist/recover changes of labels/labels-on-nodes to/from storage And it has two implementations to store modifications - Memory based storage: It will not persist changes, so all labels will be lost when RM restart - FileSystem based storage: It will persist/recover to/from FileSystem (like HDFS), and all labels and labels-on-nodes will be recovered upon RM restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2494) [YARN-796] Node label manager API and storage implementations
[ https://issues.apache.org/jira/browse/YARN-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136872#comment-14136872 ] Wangda Tan commented on YARN-2494: -- Hi [~cwelch], Thanks for your comments, bq. the only change is the (?automated) removal of an import, I think you should just drop it from the change set. Good catch, reverted this file. bq. why force all to lower case? Discussion favored dropping that... Updated according to our discussion bq. checks for valid labels, there must be an easier way/stringlib/regex Good suggestion, updated bq. also in updateLableResource - it looks like if node1 has label a b and queue q1 has label a b it’s resources will be added 2x and removed 2x, while present it will have a 2x value (1x too many) It should not, you can check the test: {{TestNodeLabelManager#testGetQueueResource}}, it should cover the case you described. bq. line 603 exception message needs to include “or not present” I found there's no exception msg around line 603, could you please update according to latest patch? bq. pls rename activeNode deactiveNode to activateNode and deactivateNode Renamed, Attached a new patch address your comments, please kindly review. Thanks! Wangda [YARN-796] Node label manager API and storage implementations - Key: YARN-2494 URL: https://issues.apache.org/jira/browse/YARN-2494 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, YARN-2494.patch This JIRA includes APIs and storage implementations of node label manager, NodeLabelManager is an abstract class used to manage labels of nodes in the cluster, it has APIs to query/modify - Nodes according to given label - Labels according to given hostname - Add/remove labels - Set labels of nodes in the cluster - Persist/recover changes of labels/labels-on-nodes to/from storage And it has two implementations to store modifications - Memory based storage: It will not persist changes, so all labels will be lost when RM restart - FileSystem based storage: It will persist/recover to/from FileSystem (like HDFS), and all labels and labels-on-nodes will be recovered upon RM restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2558) Updating ContainerTokenIdentifier#read/write to use ContainerId#getContainerId
[ https://issues.apache.org/jira/browse/YARN-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2558: - Attachment: YARN-2558.2.patch [~jianhe], thanks for your suggestion and review. Updated to add test to confirm whether the serialization and deserialization work well. Updating ContainerTokenIdentifier#read/write to use ContainerId#getContainerId -- Key: YARN-2558 URL: https://issues.apache.org/jira/browse/YARN-2558 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Priority: Blocker Attachments: YARN-2558.1.patch, YARN-2558.2.patch We should update ContainerTokenIdentifier#read/write to use {{getContainerId}} instead of {{getId}} to pass all container information correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136878#comment-14136878 ] Wangda Tan commented on YARN-2496: -- Hi [~cwelch], I'm not sure quite understand about this, did you mean we need calculate consumed resource for each label (or label-expression) under each queue? Could you give me an example about how to avoid job starvation with it? It confuse me that, if we have resource per label/l-expression, should we have resource per host/rack (we can ask for resource only on a host/rack by specifying relax-locality). Thanks, Wangda [YARN-796] Changes for capacity scheduler to support allocate resource respect labels - Key: YARN-2496 URL: https://issues.apache.org/jira/browse/YARN-2496 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch This JIRA Includes: - Add/parse labels option to {{capacity-scheduler.xml}} similar to other options of queue like capacity/maximum-capacity, etc. - Include a default-label-expression option in queue config, if an app doesn't specify label-expression, default-label-expression of queue will be used. - Check if labels can be accessed by the queue when submit an app with labels-expression to queue or update ResourceRequest with label-expression - Check labels on NM when trying to allocate ResourceRequest on the NM with label-expression - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2494) [YARN-796] Node label manager API and storage implementations
[ https://issues.apache.org/jira/browse/YARN-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136885#comment-14136885 ] Hadoop QA commented on YARN-2494: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669353/YARN-2494.patch against trunk revision c0c7e6f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4986//console This message is automatically generated. [YARN-796] Node label manager API and storage implementations - Key: YARN-2494 URL: https://issues.apache.org/jira/browse/YARN-2494 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, YARN-2494.patch This JIRA includes APIs and storage implementations of node label manager, NodeLabelManager is an abstract class used to manage labels of nodes in the cluster, it has APIs to query/modify - Nodes according to given label - Labels according to given hostname - Add/remove labels - Set labels of nodes in the cluster - Persist/recover changes of labels/labels-on-nodes to/from storage And it has two implementations to store modifications - Memory based storage: It will not persist changes, so all labels will be lost when RM restart - FileSystem based storage: It will persist/recover to/from FileSystem (like HDFS), and all labels and labels-on-nodes will be recovered upon RM restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2505) [YARN-796] Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136891#comment-14136891 ] Wangda Tan commented on YARN-2505: -- Hi Craig, I've reviewed this patch, some comments: 1) I think it's better to rename /labels/all-nodes-to-labels to /labels/nodes-to-labels, because it's not all nodes-to-labels always. And I think the filter should be better changed to node-filter. My feeling is it's not very natural to apply a filter on values instead of keys. Or we can support both node-filter and label-filter. 2) Some lines exceeds 80 chars, you can run regex on vim to check: /^+.\{80,} 3) Test looks very good to me, thanks! Regards, Wangda [YARN-796] Support get/add/remove/change labels in RM REST API -- Key: YARN-2505 URL: https://issues.apache.org/jira/browse/YARN-2505 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Craig Welch Attachments: YARN-2505.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2558) Updating ContainerTokenIdentifier#read/write to use ContainerId#getContainerId
[ https://issues.apache.org/jira/browse/YARN-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136900#comment-14136900 ] Hadoop QA commented on YARN-2558: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669356/YARN-2558.2.patch against trunk revision c0c7e6f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4987//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4987//console This message is automatically generated. Updating ContainerTokenIdentifier#read/write to use ContainerId#getContainerId -- Key: YARN-2558 URL: https://issues.apache.org/jira/browse/YARN-2558 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Priority: Blocker Attachments: YARN-2558.1.patch, YARN-2558.2.patch We should update ContainerTokenIdentifier#read/write to use {{getContainerId}} instead of {{getId}} to pass all container information correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2561) MR job client cannot reconnect to AM after NM restart.
[ https://issues.apache.org/jira/browse/YARN-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2561: - Attachment: YARN-2561-v2.patch Update the patch to fix the test failure. MR job client cannot reconnect to AM after NM restart. -- Key: YARN-2561 URL: https://issues.apache.org/jira/browse/YARN-2561 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Tassapol Athiapinya Assignee: Junping Du Priority: Blocker Attachments: YARN-2561-v2.patch, YARN-2561.patch Work-preserving NM restart is disabled. Submit a job. Restart the only NM and found that Job will hang with connect retries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2498) [YARN-796] Respect labels in preemption policy of capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2498: - Attachment: yarn-2498-implementation-notes.pdf [YARN-796] Respect labels in preemption policy of capacity scheduler Key: YARN-2498 URL: https://issues.apache.org/jira/browse/YARN-2498 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2498.patch, YARN-2498.patch, yarn-2498-implementation-notes.pdf There're 3 stages in ProportionalCapacityPreemptionPolicy, # Recursively calculate {{ideal_assigned}} for queue. This is depends on available resource, resource used/pending in each queue and guaranteed capacity of each queue. # Mark to-be preempted containers: For each over-satisfied queue, it will mark some containers will be preempted. # Notify scheduler about to-be preempted container. We need respect labels in the cluster for both #1 and #2: For #1, when there're some resource available in the cluster, we shouldn't assign it to a queue (by increasing {{ideal_assigned}}) if the queue cannot access such labels For #2, when we make decision about whether we need preempt a container, we need make sure, resource this container is *possibly* usable by a queue which is under-satisfied and has pending resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2498) [YARN-796] Respect labels in preemption policy of capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136984#comment-14136984 ] Hadoop QA commented on YARN-2498: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669372/yarn-2498-implementation-notes.pdf against trunk revision c0c7e6f. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4989//console This message is automatically generated. [YARN-796] Respect labels in preemption policy of capacity scheduler Key: YARN-2498 URL: https://issues.apache.org/jira/browse/YARN-2498 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2498.patch, YARN-2498.patch, yarn-2498-implementation-notes.pdf There're 3 stages in ProportionalCapacityPreemptionPolicy, # Recursively calculate {{ideal_assigned}} for queue. This is depends on available resource, resource used/pending in each queue and guaranteed capacity of each queue. # Mark to-be preempted containers: For each over-satisfied queue, it will mark some containers will be preempted. # Notify scheduler about to-be preempted container. We need respect labels in the cluster for both #1 and #2: For #1, when there're some resource available in the cluster, we shouldn't assign it to a queue (by increasing {{ideal_assigned}}) if the queue cannot access such labels For #2, when we make decision about whether we need preempt a container, we need make sure, resource this container is *possibly* usable by a queue which is under-satisfied and has pending resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2498) [YARN-796] Respect labels in preemption policy of capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136987#comment-14136987 ] Wangda Tan commented on YARN-2498: -- Attached implementation notes, [~curino], [~sunilg], [~mayank_bansal], I would appreciate if you can take a look at it. Thanks a lot! Wangda [YARN-796] Respect labels in preemption policy of capacity scheduler Key: YARN-2498 URL: https://issues.apache.org/jira/browse/YARN-2498 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2498.patch, YARN-2498.patch, yarn-2498-implementation-notes.pdf There're 3 stages in ProportionalCapacityPreemptionPolicy, # Recursively calculate {{ideal_assigned}} for queue. This is depends on available resource, resource used/pending in each queue and guaranteed capacity of each queue. # Mark to-be preempted containers: For each over-satisfied queue, it will mark some containers will be preempted. # Notify scheduler about to-be preempted container. We need respect labels in the cluster for both #1 and #2: For #1, when there're some resource available in the cluster, we shouldn't assign it to a queue (by increasing {{ideal_assigned}}) if the queue cannot access such labels For #2, when we make decision about whether we need preempt a container, we need make sure, resource this container is *possibly* usable by a queue which is under-satisfied and has pending resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch 0 after YARN-2182
[ https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136993#comment-14136993 ] Steve Loughran commented on YARN-2562: -- +1 for text making it clear what the values are, but please make it lower case for consistency ContainerId@toString() is unreadable for epoch 0 after YARN-2182 - Key: YARN-2562 URL: https://issues.apache.org/jira/browse/YARN-2562 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Tsuyoshi OZAWA Priority: Critical ContainerID string format is unreadable for RMs that restarted at least once (epoch 0) after YARN-2182. For e.g, container_1410901177871_0001_01_05_17. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2561) MR job client cannot reconnect to AM after NM restart.
[ https://issues.apache.org/jira/browse/YARN-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136998#comment-14136998 ] Hadoop QA commented on YARN-2561: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669364/YARN-2561-v2.patch against trunk revision c0c7e6f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4988//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4988//console This message is automatically generated. MR job client cannot reconnect to AM after NM restart. -- Key: YARN-2561 URL: https://issues.apache.org/jira/browse/YARN-2561 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Tassapol Athiapinya Assignee: Junping Du Priority: Blocker Attachments: YARN-2561-v2.patch, YARN-2561.patch Work-preserving NM restart is disabled. Submit a job. Restart the only NM and found that Job will hang with connect retries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed
[ https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137013#comment-14137013 ] Sunil G commented on YARN-2308: --- +1 for this approach. I also feel that returning application state as FAILED is not complete solution. NPE happened when RM restart after CapacityScheduler queue configuration changed - Key: YARN-2308 URL: https://issues.apache.org/jira/browse/YARN-2308 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: chang li Priority: Critical Attachments: jira2308.patch, jira2308.patch, jira2308.patch I encountered a NPE when RM restart {code} 2014-07-16 07:22:46,957 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} And RM will be failed to restart. This is caused by queue configuration changed, I removed some queues and added new queues. So when RM restarts, it tries to recover history applications, and when any of queues of these applications removed, NPE will be raised. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1250) Generic history service should support application-acls
[ https://issues.apache.org/jira/browse/YARN-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137085#comment-14137085 ] Hudson commented on YARN-1250: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #683 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/683/]) YARN-1250. Generic history service should support application-acls. (Contributed by Zhijie Shen) (junping_du: rev 90a0c03f0a696d32e871a5da4560828edea8cfa9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationACLsUpdatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsEventType.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java YARN-1250. Addendum (junping_du: rev 0e7d1dbf9ab732dd04dccaacbf273e9ac437eba5) * hadoop-yarn-project/CHANGES.txt Generic history service should support application-acls --- Key: YARN-1250 URL: https://issues.apache.org/jira/browse/YARN-1250 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Fix For: 2.6.0 Attachments: GenericHistoryACLs.pdf, YARN-1250.1.patch, YARN-1250.2.patch, YARN-1250.3.patch, YARN-1250.4.patch, YARN-1250.5.patch, YARN-1250.6.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2531) CGroups - Admins should be allowed to enforce strict cpu limits
[ https://issues.apache.org/jira/browse/YARN-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137084#comment-14137084 ] Hudson commented on YARN-2531: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #683 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/683/]) YARN-2531. Added a configuration for admins to be able to override app-configs and enforce/not-enforce strict control of per-container cpu usage. Contributed by Varun Vasudev. (vinodkv: rev 9f6891d9ef7064d121305ca783eb62586c8aa018) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java CGroups - Admins should be allowed to enforce strict cpu limits --- Key: YARN-2531 URL: https://issues.apache.org/jira/browse/YARN-2531 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.6.0 Attachments: apache-yarn-2531.0.patch From YARN-2440 - {quote} The other dimension to this is determinism w.r.t performance. Limiting to allocated cores overall (as well as per container later) helps orgs run workloads and reason about them deterministically. One of the examples is benchmarking apps, but deterministic execution is a desired option beyond benchmarks too. {quote} It would be nice to have an option to let admins to enforce strict cpu limits for apps for things like benchmarking, etc. By default this flag should be off so that containers can use available cpu but admin can turn the flag on to determine worst case performance, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2557) Add a parameter attempt_Failures_Validity_Interval in DistributedShell
[ https://issues.apache.org/jira/browse/YARN-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137087#comment-14137087 ] Hudson commented on YARN-2557: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #683 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/683/]) YARN-2557. Add a parameter attempt_Failures_Validity_Interval into (xgong: rev 8e5d6713cf16473d791c028cecc274fd2c7fd10b) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSSleepingAppMaster.java Add a parameter attempt_Failures_Validity_Interval in DistributedShell - Key: YARN-2557 URL: https://issues.apache.org/jira/browse/YARN-2557 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.6.0 Attachments: YARN-2557.1.patch, YARN-2557.2.patch Change Distributed shell to enable attemptFailuresValidityInterval -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2565) ResourceManager is fails to start when GenericHistoryService is enabled in secure mode without doing manual kinit as yarn
Karam Singh created YARN-2565: - Summary: ResourceManager is fails to start when GenericHistoryService is enabled in secure mode without doing manual kinit as yarn Key: YARN-2565 URL: https://issues.apache.org/jira/browse/YARN-2565 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.6.0 Environment: Secure cluster with ATS (timeline server enabled) and yarn.resourcemanager.system-metrics-publisher.enabled=true so that RM can send Application history to Timeline Store Reporter: Karam Singh Observed that RM fails to start in Secure mode when GenericeHistoryService is enabled and ResourceManager is set to use Timeline Store -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2565) ResourceManager is fails to start when GenericHistoryService is enabled in secure mode without doing manual kinit as yarn
[ https://issues.apache.org/jira/browse/YARN-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137204#comment-14137204 ] Karam Singh commented on YARN-2565: --- Observed that RM fails to start in Secure mode when GenericeHistoryService is enabled and ResourceManager is set to use Timeline Store {code} yarn.resourcemanager.keytab=RM_HOST yarn.resourcemanager.principal=RM_PRINCIPAL yarn.timeline-service.enabled=true yarn.timeline-service.hostname=ATS_HOST yarn.timeline-service.address=ATS_HOST:10200 yarn.timeline-service.webapp.address=ATS_HOST:8188 yarn.timeline-service.handler-thread-count=10 yarn.timeline-service.ttl-enable=true yarn.timeline-service.ttl-ms=60480 yarn.timeline-service.leveldb-timeline-store.path=/tm/timeline yarn.timeline-service.keytab=ATS_KEYTAB yarn.timeline-service.principal=ATS_PRINCIPAL yarn.timeline-service.webapp.spnego-principal=ATS_SPNEGO_PRINICPAL yarn.timeline-service.webapp.spnego-keytab-file=ATS_SPNEGO_KETAB yarn.timeline-service.http-authentication.type=kerberos yarn.timeline-service.http-authentication.kerberos.principal=ATS_SPNEGO_PRINICPAL yarn.timeline-service.http-authentication.kerberos.keytab=ATS_SPNEGO_KETAB yarn.timeline-service.generic-application-history.enabled=true yarn.timeline-service.generic-application-history.store-class='' yarn.resourcemanager.system-metrics-publisher.enabled=true yarn.resourcemanager.system-metrics-publisher.dispatcher.pool-size=10 {code} Stop ResoruceManager and Timelineserver Start Timelineserver. After ATS gets restart successfully. Start ResourceManager. RM fails to start with following exception : {code} 2014-09-15 10:58:57,735 WARN ipc.Client (Client.java:run(675)) - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2014-09-15 10:58:57,740 ERROR applicationhistoryservice.FileSystemApplicationHistoryStore (FileSystemApplicationHistoryStore.java:serviceInit(132)) - Error when initializing FileSystemHistoryStorage java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: RM_HOST; destination host is: NN_HOST:8020; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) at org.apache.hadoop.ipc.Client.call(Client.java:1423) at org.apache.hadoop.ipc.Client.call(Client.java:1372) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:219) at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:748) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy15.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1918) at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1105) at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1101) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1101) at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1413) at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.serviceInit(FileSystemApplicationHistoryStore.java:126) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter.serviceInit(RMApplicationHistoryWriter.java:99) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:490) at
[jira] [Commented] (YARN-1250) Generic history service should support application-acls
[ https://issues.apache.org/jira/browse/YARN-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137236#comment-14137236 ] Hudson commented on YARN-1250: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1899 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1899/]) YARN-1250. Generic history service should support application-acls. (Contributed by Zhijie Shen) (junping_du: rev 90a0c03f0a696d32e871a5da4560828edea8cfa9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationACLsUpdatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsEventType.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java YARN-1250. Addendum (junping_du: rev 0e7d1dbf9ab732dd04dccaacbf273e9ac437eba5) * hadoop-yarn-project/CHANGES.txt Generic history service should support application-acls --- Key: YARN-1250 URL: https://issues.apache.org/jira/browse/YARN-1250 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Fix For: 2.6.0 Attachments: GenericHistoryACLs.pdf, YARN-1250.1.patch, YARN-1250.2.patch, YARN-1250.3.patch, YARN-1250.4.patch, YARN-1250.5.patch, YARN-1250.6.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2557) Add a parameter attempt_Failures_Validity_Interval in DistributedShell
[ https://issues.apache.org/jira/browse/YARN-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137238#comment-14137238 ] Hudson commented on YARN-2557: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1899 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1899/]) YARN-2557. Add a parameter attempt_Failures_Validity_Interval into (xgong: rev 8e5d6713cf16473d791c028cecc274fd2c7fd10b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSSleepingAppMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java * hadoop-yarn-project/CHANGES.txt Add a parameter attempt_Failures_Validity_Interval in DistributedShell - Key: YARN-2557 URL: https://issues.apache.org/jira/browse/YARN-2557 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.6.0 Attachments: YARN-2557.1.patch, YARN-2557.2.patch Change Distributed shell to enable attemptFailuresValidityInterval -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2531) CGroups - Admins should be allowed to enforce strict cpu limits
[ https://issues.apache.org/jira/browse/YARN-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137235#comment-14137235 ] Hudson commented on YARN-2531: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1899 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1899/]) YARN-2531. Added a configuration for admins to be able to override app-configs and enforce/not-enforce strict control of per-container cpu usage. Contributed by Varun Vasudev. (vinodkv: rev 9f6891d9ef7064d121305ca783eb62586c8aa018) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java CGroups - Admins should be allowed to enforce strict cpu limits --- Key: YARN-2531 URL: https://issues.apache.org/jira/browse/YARN-2531 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.6.0 Attachments: apache-yarn-2531.0.patch From YARN-2440 - {quote} The other dimension to this is determinism w.r.t performance. Limiting to allocated cores overall (as well as per container later) helps orgs run workloads and reason about them deterministically. One of the examples is benchmarking apps, but deterministic execution is a desired option beyond benchmarks too. {quote} It would be nice to have an option to let admins to enforce strict cpu limits for apps for things like benchmarking, etc. By default this flag should be off so that containers can use available cpu but admin can turn the flag on to determine worst case performance, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2531) CGroups - Admins should be allowed to enforce strict cpu limits
[ https://issues.apache.org/jira/browse/YARN-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137262#comment-14137262 ] Hudson commented on YARN-2531: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1874 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1874/]) YARN-2531. Added a configuration for admins to be able to override app-configs and enforce/not-enforce strict control of per-container cpu usage. Contributed by Varun Vasudev. (vinodkv: rev 9f6891d9ef7064d121305ca783eb62586c8aa018) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java CGroups - Admins should be allowed to enforce strict cpu limits --- Key: YARN-2531 URL: https://issues.apache.org/jira/browse/YARN-2531 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.6.0 Attachments: apache-yarn-2531.0.patch From YARN-2440 - {quote} The other dimension to this is determinism w.r.t performance. Limiting to allocated cores overall (as well as per container later) helps orgs run workloads and reason about them deterministically. One of the examples is benchmarking apps, but deterministic execution is a desired option beyond benchmarks too. {quote} It would be nice to have an option to let admins to enforce strict cpu limits for apps for things like benchmarking, etc. By default this flag should be off so that containers can use available cpu but admin can turn the flag on to determine worst case performance, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1250) Generic history service should support application-acls
[ https://issues.apache.org/jira/browse/YARN-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137263#comment-14137263 ] Hudson commented on YARN-1250: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1874 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1874/]) YARN-1250. Generic history service should support application-acls. (Contributed by Zhijie Shen) (junping_du: rev 90a0c03f0a696d32e871a5da4560828edea8cfa9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsEventType.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationACLsUpdatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java YARN-1250. Addendum (junping_du: rev 0e7d1dbf9ab732dd04dccaacbf273e9ac437eba5) * hadoop-yarn-project/CHANGES.txt Generic history service should support application-acls --- Key: YARN-1250 URL: https://issues.apache.org/jira/browse/YARN-1250 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Fix For: 2.6.0 Attachments: GenericHistoryACLs.pdf, YARN-1250.1.patch, YARN-1250.2.patch, YARN-1250.3.patch, YARN-1250.4.patch, YARN-1250.5.patch, YARN-1250.6.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2557) Add a parameter attempt_Failures_Validity_Interval in DistributedShell
[ https://issues.apache.org/jira/browse/YARN-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137265#comment-14137265 ] Hudson commented on YARN-2557: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1874 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1874/]) YARN-2557. Add a parameter attempt_Failures_Validity_Interval into (xgong: rev 8e5d6713cf16473d791c028cecc274fd2c7fd10b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSSleepingAppMaster.java Add a parameter attempt_Failures_Validity_Interval in DistributedShell - Key: YARN-2557 URL: https://issues.apache.org/jira/browse/YARN-2557 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.6.0 Attachments: YARN-2557.1.patch, YARN-2557.2.patch Change Distributed shell to enable attemptFailuresValidityInterval -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2561) MR job client cannot reconnect to AM after NM restart.
[ https://issues.apache.org/jira/browse/YARN-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137385#comment-14137385 ] Jason Lowe commented on YARN-2561: -- Thanks for the patch, Junping! I'm not sure it's best for the RM to examine its local config for NM recovery and assume the same applies to the remote nodemanager. I think it would be better if the RM cross-checked the list of running containers reported in the registration request against what containers it thinks are running on the node and act accordingly. If the NM doesn't report a container then we should kill it. MR job client cannot reconnect to AM after NM restart. -- Key: YARN-2561 URL: https://issues.apache.org/jira/browse/YARN-2561 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Tassapol Athiapinya Assignee: Junping Du Priority: Blocker Attachments: YARN-2561-v2.patch, YARN-2561.patch Work-preserving NM restart is disabled. Submit a job. Restart the only NM and found that Job will hang with connect retries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager
[ https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137431#comment-14137431 ] Benoy Antony commented on YARN-2527: [~zjshen], any comments ? Since the ACLS are provided by Application Master and , isn't Null a valid value ? If not, we could log a warning. Any other suggestions are welcome. NPE in ApplicationACLsManager - Key: YARN-2527 URL: https://issues.apache.org/jira/browse/YARN-2527 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.0 Reporter: Benoy Antony Assignee: Benoy Antony Attachments: YARN-2527.patch, YARN-2527.patch NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error. The relevant stacktrace snippet from the ResourceManager logs is as below {code} Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104) at org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) {code} This issue was reported by [~miguenther]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137496#comment-14137496 ] Maysam Yabandeh commented on YARN-1963: --- I was wondering what is the long-term plan for this jira? It does not seem to have any activity in the past 4 months and I was wondering if we have any rough estimate that on which release we plan to have this feature added? Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy
[ https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Maron updated YARN-2554: - Attachment: YARN-2554.1.patch Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy - Key: YARN-2554 URL: https://issues.apache.org/jira/browse/YARN-2554 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.6.0 Reporter: Jonathan Maron Attachments: YARN-2554.1.patch If the HTTP policy to enable HTTPS is specified, the RM and AM are initialized with SSL listeners. The RM has a web app proxy servlet that acts as a proxy for incoming AM requests. In order to forward the requests to the AM the proxy servlet makes use of HttpClient. However, the HttpClient utilized is not initialized correctly with the necessary certs to allow for successful one way SSL invocations to the other nodes in the cluster (it is not configured to access/load the client truststore specified in ssl-client.xml). I imagine SSLFactory.createSSLSocketFactory() could be utilized to create an instance that can be assigned to the HttpClient. The symptoms of this issue are: AM: Displays unknown_certificate exception RM: Displays an exception such as javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy
[ https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137544#comment-14137544 ] Hadoop QA commented on YARN-2554: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669439/YARN-2554.1.patch against trunk revision c0c7e6f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4990//console This message is automatically generated. Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy - Key: YARN-2554 URL: https://issues.apache.org/jira/browse/YARN-2554 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.6.0 Reporter: Jonathan Maron Attachments: YARN-2554.1.patch If the HTTP policy to enable HTTPS is specified, the RM and AM are initialized with SSL listeners. The RM has a web app proxy servlet that acts as a proxy for incoming AM requests. In order to forward the requests to the AM the proxy servlet makes use of HttpClient. However, the HttpClient utilized is not initialized correctly with the necessary certs to allow for successful one way SSL invocations to the other nodes in the cluster (it is not configured to access/load the client truststore specified in ssl-client.xml). I imagine SSLFactory.createSSLSocketFactory() could be utilized to create an instance that can be assigned to the HttpClient. The symptoms of this issue are: AM: Displays unknown_certificate exception RM: Displays an exception such as javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137564#comment-14137564 ] Sunil G commented on YARN-1963: --- HI [~maysamyabandeh], We are bringing up a design doc for this by capturing all details, will soon publish the same. [~vinodkv], could we discuss doc this offline and publish it. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy
[ https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Maron updated YARN-2554: - Attachment: YARN-2554.2.patch Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy - Key: YARN-2554 URL: https://issues.apache.org/jira/browse/YARN-2554 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.6.0 Reporter: Jonathan Maron Attachments: YARN-2554.1.patch, YARN-2554.2.patch If the HTTP policy to enable HTTPS is specified, the RM and AM are initialized with SSL listeners. The RM has a web app proxy servlet that acts as a proxy for incoming AM requests. In order to forward the requests to the AM the proxy servlet makes use of HttpClient. However, the HttpClient utilized is not initialized correctly with the necessary certs to allow for successful one way SSL invocations to the other nodes in the cluster (it is not configured to access/load the client truststore specified in ssl-client.xml). I imagine SSLFactory.createSSLSocketFactory() could be utilized to create an instance that can be assigned to the HttpClient. The symptoms of this issue are: AM: Displays unknown_certificate exception RM: Displays an exception such as javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2561) MR job client cannot reconnect to AM after NM restart.
[ https://issues.apache.org/jira/browse/YARN-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2561: - Attachment: YARN-2561-v3.patch MR job client cannot reconnect to AM after NM restart. -- Key: YARN-2561 URL: https://issues.apache.org/jira/browse/YARN-2561 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Tassapol Athiapinya Assignee: Junping Du Priority: Blocker Attachments: YARN-2561-v2.patch, YARN-2561-v3.patch, YARN-2561.patch Work-preserving NM restart is disabled. Submit a job. Restart the only NM and found that Job will hang with connect retries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2561) MR job client cannot reconnect to AM after NM restart.
[ https://issues.apache.org/jira/browse/YARN-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137590#comment-14137590 ] Junping Du commented on YARN-2561: -- Thanks [~jlowe] for the comments! Yes. Checking running containers reported by NM seems to be a better way given each NM's recovery configuration is possible to be different (although we don't encourage this configuration. Isn't it?). v3 patch use the new way. MR job client cannot reconnect to AM after NM restart. -- Key: YARN-2561 URL: https://issues.apache.org/jira/browse/YARN-2561 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Tassapol Athiapinya Assignee: Junping Du Priority: Blocker Attachments: YARN-2561-v2.patch, YARN-2561-v3.patch, YARN-2561.patch Work-preserving NM restart is disabled. Submit a job. Restart the only NM and found that Job will hang with connect retries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2565) ResourceManager is fails to start when GenericHistoryService is enabled in secure mode without doing manual kinit as yarn
[ https://issues.apache.org/jira/browse/YARN-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reassigned YARN-2565: - Assignee: Zhijie Shen ResourceManager is fails to start when GenericHistoryService is enabled in secure mode without doing manual kinit as yarn - Key: YARN-2565 URL: https://issues.apache.org/jira/browse/YARN-2565 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.6.0 Environment: Secure cluster with ATS (timeline server enabled) and yarn.resourcemanager.system-metrics-publisher.enabled=true so that RM can send Application history to Timeline Store Reporter: Karam Singh Assignee: Zhijie Shen Observed that RM fails to start in Secure mode when GenericeHistoryService is enabled and ResourceManager is set to use Timeline Store -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2561) MR job client cannot reconnect to AM after NM restart.
[ https://issues.apache.org/jira/browse/YARN-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137620#comment-14137620 ] Hadoop QA commented on YARN-2561: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669449/YARN-2561-v3.patch against trunk revision c0c7e6f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4993//console This message is automatically generated. MR job client cannot reconnect to AM after NM restart. -- Key: YARN-2561 URL: https://issues.apache.org/jira/browse/YARN-2561 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Tassapol Athiapinya Assignee: Junping Du Priority: Blocker Attachments: YARN-2561-v2.patch, YARN-2561-v3.patch, YARN-2561.patch Work-preserving NM restart is disabled. Submit a job. Restart the only NM and found that Job will hang with connect retries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy
[ https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137629#comment-14137629 ] Hadoop QA commented on YARN-2554: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669448/YARN-2554.2.patch against trunk revision c0c7e6f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4991//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4991//console This message is automatically generated. Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy - Key: YARN-2554 URL: https://issues.apache.org/jira/browse/YARN-2554 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.6.0 Reporter: Jonathan Maron Attachments: YARN-2554.1.patch, YARN-2554.2.patch If the HTTP policy to enable HTTPS is specified, the RM and AM are initialized with SSL listeners. The RM has a web app proxy servlet that acts as a proxy for incoming AM requests. In order to forward the requests to the AM the proxy servlet makes use of HttpClient. However, the HttpClient utilized is not initialized correctly with the necessary certs to allow for successful one way SSL invocations to the other nodes in the cluster (it is not configured to access/load the client truststore specified in ssl-client.xml). I imagine SSLFactory.createSSLSocketFactory() could be utilized to create an instance that can be assigned to the HttpClient. The symptoms of this issue are: AM: Displays unknown_certificate exception RM: Displays an exception such as javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2561) MR job client cannot reconnect to AM after NM restart.
[ https://issues.apache.org/jira/browse/YARN-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2561: - Attachment: YARN-2561-v4.patch Fix the compile issues in v3. MR job client cannot reconnect to AM after NM restart. -- Key: YARN-2561 URL: https://issues.apache.org/jira/browse/YARN-2561 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Tassapol Athiapinya Assignee: Junping Du Priority: Blocker Attachments: YARN-2561-v2.patch, YARN-2561-v3.patch, YARN-2561-v4.patch, YARN-2561.patch Work-preserving NM restart is disabled. Submit a job. Restart the only NM and found that Job will hang with connect retries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2559) ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher
[ https://issues.apache.org/jira/browse/YARN-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137636#comment-14137636 ] Jian He commented on YARN-2559: --- To be consistent with the FinalApplicationStatus exposed on RM web UI and CLI, we may publish UNDEFINED state as well in case finalStatus is unavailable ? ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher -- Key: YARN-2559 URL: https://issues.apache.org/jira/browse/YARN-2559 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.6.0 Environment: Generice History Service is enabled in Timelineserver with yarn.resourcemanager.system-metrics-publisher.enabled=true So that ResourceManager should Timeline Store for recording application history information Reporter: Karam Singh Assignee: Zhijie Shen Attachments: YARN-2559.1.patch ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2563) On secure clusters call to timeline server fails with authentication errors when running a job via oozie
[ https://issues.apache.org/jira/browse/YARN-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reassigned YARN-2563: - Assignee: Zhijie Shen On secure clusters call to timeline server fails with authentication errors when running a job via oozie Key: YARN-2563 URL: https://issues.apache.org/jira/browse/YARN-2563 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Arpit Gupta Assignee: Zhijie Shen Priority: Blocker During our nightlies on a secure cluster we have seen oozie jobs fail with authentication error to the time line server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2446) Using TimelineNamespace to shield the entities of a user
[ https://issues.apache.org/jira/browse/YARN-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137637#comment-14137637 ] Li Lu commented on YARN-2446: - Hi [~zjshen], yes the latest patch looks good to me. Still, similar to YARN-2102, maybe you want some more committers to review the patch? Using TimelineNamespace to shield the entities of a user Key: YARN-2446 URL: https://issues.apache.org/jira/browse/YARN-2446 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2446.1.patch, YARN-2446.2.patch Given YARN-2102 adds TimelineNamespace, we can make use of it to shield the entities, preventing them from being accessed or affected by other users' operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2001) Threshold for RM to accept requests from AM after failover
[ https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137644#comment-14137644 ] Vinod Kumar Vavilapalli commented on YARN-2001: --- This looks almost close except for the logging - we don't have any indication of this wait in the RM logs. Threshold for RM to accept requests from AM after failover -- Key: YARN-2001 URL: https://issues.apache.org/jira/browse/YARN-2001 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2001.1.patch, YARN-2001.2.patch, YARN-2001.3.patch, YARN-2001.4.patch After failover, RM may require a certain threshold to determine whether it’s safe to make scheduling decisions and start accepting new container requests from AMs. The threshold could be a certain amount of nodes. i.e. RM waits until a certain amount of nodes joining before accepting new container requests. Or it could simply be a timeout, only after the timeout RM accepts new requests. NMs joined after the threshold can be treated as new NMs and instructed to kill all its containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2468) Log handling for LRS
[ https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137661#comment-14137661 ] Xuan Gong commented on YARN-2468: - bq. If LogContext is not specified, we're running into the traditional log handling case, right? We will still have a combined log file identified by the node id? Or node id will always be the directory, and there exists only one file under it? node id will always be the directory, and there exists only one file under it bq. Let's say if work-preserving NM restarting happens, NM is going to forget all the uploaded logs files, and redo everything, right? If NM restarts happens, it will upload all logs which are previous uploaded, but not deleted. I think that we can solve this problem in separate ticket, because this ticket is the first step to solve Log handling for LRS. bq. LogContext doesn't need to be in ApplicatonSubmissionContext, because ApplicatonSubmissionContext contains ContainerLaunchContext. LogContext is container related stuff, such that ContainerLaunchContext should be the best place. Concurrently, we can have one context for all containers. Maybe in the future we can think of setting different LogContext for each individual container. DONE bq. In getFilteredLogFiles, the logic is that if the log file matches the include pattern, it will be added first, and if then if it matches the exclude pattern, it will be removed. Shall we do the sanity check to make sure we can not include and exclude the same pattern, otherwise, the semantics is a bit weird. Add more explanation in javaDoc. Uploaded a new patch to address all comments. Log handling for LRS Key: YARN-2468 URL: https://issues.apache.org/jira/browse/YARN-2468 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2468.1.patch, YARN-2468.2.patch, YARN-2468.3.patch, YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, YARN-2468.4.patch, YARN-2468.5.patch Currently, when application is finished, NM will start to do the log aggregation. But for Long running service applications, this is not ideal. The problems we have are: 1) LRS applications are expected to run for a long time (weeks, months). 2) Currently, all the container logs (from one NM) will be written into a single file. The files could become larger and larger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2468) Log handling for LRS
[ https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2468: Attachment: YARN-2468.5.patch Log handling for LRS Key: YARN-2468 URL: https://issues.apache.org/jira/browse/YARN-2468 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2468.1.patch, YARN-2468.2.patch, YARN-2468.3.patch, YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, YARN-2468.4.patch, YARN-2468.5.patch Currently, when application is finished, NM will start to do the log aggregation. But for Long running service applications, this is not ideal. The problems we have are: 1) LRS applications are expected to run for a long time (weeks, months). 2) Currently, all the container logs (from one NM) will be written into a single file. The files could become larger and larger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1779) Handle AMRMTokens across RM failover
[ https://issues.apache.org/jira/browse/YARN-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137668#comment-14137668 ] Hadoop QA commented on YARN-1779: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669450/YARN-1779.3.patch against trunk revision c0c7e6f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4992//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4992//console This message is automatically generated. Handle AMRMTokens across RM failover Key: YARN-1779 URL: https://issues.apache.org/jira/browse/YARN-1779 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Jian He Priority: Blocker Labels: ha Attachments: YARN-1779.1.patch, YARN-1779.2.patch, YARN-1779.3.patch Verify if AMRMTokens continue to work against RM failover. If not, we will have to do something along the lines of YARN-986. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2001) Threshold for RM to accept requests from AM after failover
[ https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2001: -- Attachment: YARN-2001.5.patch Fixed logging to add the wait msg. Threshold for RM to accept requests from AM after failover -- Key: YARN-2001 URL: https://issues.apache.org/jira/browse/YARN-2001 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2001.1.patch, YARN-2001.2.patch, YARN-2001.3.patch, YARN-2001.4.patch, YARN-2001.5.patch After failover, RM may require a certain threshold to determine whether it’s safe to make scheduling decisions and start accepting new container requests from AMs. The threshold could be a certain amount of nodes. i.e. RM waits until a certain amount of nodes joining before accepting new container requests. Or it could simply be a timeout, only after the timeout RM accepts new requests. NMs joined after the threshold can be treated as new NMs and instructed to kill all its containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1779) Handle AMRMTokens across RM failover
[ https://issues.apache.org/jira/browse/YARN-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137685#comment-14137685 ] Xuan Gong commented on YARN-1779: - +1 LGTM Handle AMRMTokens across RM failover Key: YARN-1779 URL: https://issues.apache.org/jira/browse/YARN-1779 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Jian He Priority: Blocker Labels: ha Attachments: YARN-1779.1.patch, YARN-1779.2.patch, YARN-1779.3.patch Verify if AMRMTokens continue to work against RM failover. If not, we will have to do something along the lines of YARN-986. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1453) [JDK8] Fix Javadoc errors caused by incorrect or illegal tags in doc comments
[ https://issues.apache.org/jira/browse/YARN-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-1453: --- Attachment: YARN-1453-02.patch Rebased. It doesn't fix all the javadoc errors, but it at least applies now. [JDK8] Fix Javadoc errors caused by incorrect or illegal tags in doc comments - Key: YARN-1453 URL: https://issues.apache.org/jira/browse/YARN-1453 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.3.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Attachments: 1453-branch-2.patch, 1453-branch-2.patch, 1453-trunk.patch, 1453-trunk.patch, YARN-1453-02.patch Javadoc is more strict by default in JDK8 and will error out on malformed or illegal tags found in doc comments. Although tagged as JDK8 all of the required changes are generic Javadoc cleanups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2558) Updating ContainerTokenIdentifier#read/write to use ContainerId#getContainerId
[ https://issues.apache.org/jira/browse/YARN-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137696#comment-14137696 ] Jian He commented on YARN-2558: --- the test looks good me, thanks Tsuyoshi ! Updating ContainerTokenIdentifier#read/write to use ContainerId#getContainerId -- Key: YARN-2558 URL: https://issues.apache.org/jira/browse/YARN-2558 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Priority: Blocker Attachments: YARN-2558.1.patch, YARN-2558.2.patch We should update ContainerTokenIdentifier#read/write to use {{getContainerId}} instead of {{getId}} to pass all container information correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2179) Initial cache manager structure and context
[ https://issues.apache.org/jira/browse/YARN-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137725#comment-14137725 ] Karthik Kambatla commented on YARN-2179: Review comments: # Rename yarn.sharedcache.nested.level to yarn.sharedcache.nested-level? # Rename AppChecker#appIsActive to either isActive(AppId) or isAppActive. # Nit (okay with not changing): Should AppChecker methods thrown YarnException instead of IOException, since they are strictly used within SCM code? # CacheStructureUtil: remove empty line in class javadoc # sharedcache-pom: my understand of maven is pretty sparse, so please correct me if I am wrong. Looks like sharedcache depends on the RM. If we were to embed the sharedcache in the RM, wouldn't that lead to circular dependency? How do we plan to solve it? # RemoteAppChecker: Just thinking out loud - in a non-embedded case, what happens if we upgrade other daemons/clients but not the SCM and add a new completed state? There might not be a solution here though, the worst case appears to be that we wouldn't clear the cache when apps end up in that state. One alternative is to query the RM for active states or an app being active. I am open to adding these APIs (Private for now) to the RM. {code} private static final EnumSetYarnApplicationState ACTIVE_STATES = EnumSet.complementOf(EnumSet.of(YarnApplicationState.FINISHED, YarnApplicationState.FAILED, YarnApplicationState.KILLED)); {code} # RemoteAppChecker#create should use ClientRMProxy instead of YarnRPC for it to work in an HA-RM setting. # As per offline discussions, we don't need the SCMContext outside of the store implementations. Can we move it out? Initial cache manager structure and context --- Key: YARN-2179 URL: https://issues.apache.org/jira/browse/YARN-2179 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2179-trunk-v1.patch, YARN-2179-trunk-v2.patch, YARN-2179-trunk-v3.patch, YARN-2179-trunk-v4.patch, YARN-2179-trunk-v5.patch Implement the initial shared cache manager structure and context. The SCMContext will be used by a number of manager services (i.e. the backing store and the cleaner service). The AppChecker is used to gather the currently running applications on SCM startup (necessary for an scm that is backed by an in-memory store). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2561) MR job client cannot reconnect to AM after NM restart.
[ https://issues.apache.org/jira/browse/YARN-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137738#comment-14137738 ] Hadoop QA commented on YARN-2561: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669455/YARN-2561-v4.patch against trunk revision c0c7e6f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4994//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4994//console This message is automatically generated. MR job client cannot reconnect to AM after NM restart. -- Key: YARN-2561 URL: https://issues.apache.org/jira/browse/YARN-2561 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Tassapol Athiapinya Assignee: Junping Du Priority: Blocker Attachments: YARN-2561-v2.patch, YARN-2561-v3.patch, YARN-2561-v4.patch, YARN-2561.patch Work-preserving NM restart is disabled. Submit a job. Restart the only NM and found that Job will hang with connect retries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2468) Log handling for LRS
[ https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137740#comment-14137740 ] Hadoop QA commented on YARN-2468: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669460/YARN-2468.5.patch against trunk revision d9a8603. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 3 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/4995//artifact/trunk/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4995//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4995//console This message is automatically generated. Log handling for LRS Key: YARN-2468 URL: https://issues.apache.org/jira/browse/YARN-2468 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2468.1.patch, YARN-2468.2.patch, YARN-2468.3.patch, YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, YARN-2468.4.patch, YARN-2468.5.patch Currently, when application is finished, NM will start to do the log aggregation. But for Long running service applications, this is not ideal. The problems we have are: 1) LRS applications are expected to run for a long time (weeks, months). 2) Currently, all the container logs (from one NM) will be written into a single file. The files could become larger and larger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2001) Threshold for RM to accept requests from AM after failover
[ https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137743#comment-14137743 ] Hadoop QA commented on YARN-2001: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669463/YARN-2001.5.patch against trunk revision 8a7671d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4996//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4996//console This message is automatically generated. Threshold for RM to accept requests from AM after failover -- Key: YARN-2001 URL: https://issues.apache.org/jira/browse/YARN-2001 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2001.1.patch, YARN-2001.2.patch, YARN-2001.3.patch, YARN-2001.4.patch, YARN-2001.5.patch After failover, RM may require a certain threshold to determine whether it’s safe to make scheduling decisions and start accepting new container requests from AMs. The threshold could be a certain amount of nodes. i.e. RM waits until a certain amount of nodes joining before accepting new container requests. Or it could simply be a timeout, only after the timeout RM accepts new requests. NMs joined after the threshold can be treated as new NMs and instructed to kill all its containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2558) Updating ContainerTokenIdentifier#read/write to use ContainerId#getContainerId
[ https://issues.apache.org/jira/browse/YARN-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137757#comment-14137757 ] Jason Lowe commented on YARN-2558: -- bq. Jason Lowe, without changing containerTokenIdentifier, we can't support rolling upgrades. does the current approach look good you ? Yes, this should be fine for the short term. I just wanted to make it clear that until YARN-668 is addressed we're going to continue to break backwards compatibility and thus rolling upgrades with seemingly simple changes like this. Some minor comments on the patch, none of which are must-fix: - is it necessary to call testNMToken in testContainerManagerWithEpoch? That test is already covered by testContainerManager. - the timeout on the test seems way too large - not sure what the point of having the test catch an exception just to print a stacktrace and re-throw it. Won't the stacktrace be printed also when the test fails due to the thrown exception? Updating ContainerTokenIdentifier#read/write to use ContainerId#getContainerId -- Key: YARN-2558 URL: https://issues.apache.org/jira/browse/YARN-2558 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Priority: Blocker Attachments: YARN-2558.1.patch, YARN-2558.2.patch We should update ContainerTokenIdentifier#read/write to use {{getContainerId}} instead of {{getId}} to pass all container information correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2562) ContainerId@toString() is unreadable for epoch 0 after YARN-2182
[ https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2562: - Attachment: YARN-2562.1.patch Thanks for your comment, Steve. Attaching a first patch to change the format to container_1410901177871_0001_01_05_epoch_17. ContainerId@toString() is unreadable for epoch 0 after YARN-2182 - Key: YARN-2562 URL: https://issues.apache.org/jira/browse/YARN-2562 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Tsuyoshi OZAWA Priority: Critical Attachments: YARN-2562.1.patch ContainerID string format is unreadable for RMs that restarted at least once (epoch 0) after YARN-2182. For e.g, container_1410901177871_0001_01_05_17. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2558) Updating ContainerTokenIdentifier#read/write to use ContainerId#getContainerId
[ https://issues.apache.org/jira/browse/YARN-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2558: - Attachment: YARN-2558.3.patch Thanks for your review, Jian and Jason! Updated: * Removed testNMToken in the new test case. * Made the timeout short. * Removed the needless catch and re-throw of the exception. Updating ContainerTokenIdentifier#read/write to use ContainerId#getContainerId -- Key: YARN-2558 URL: https://issues.apache.org/jira/browse/YARN-2558 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Priority: Blocker Attachments: YARN-2558.1.patch, YARN-2558.2.patch, YARN-2558.3.patch We should update ContainerTokenIdentifier#read/write to use {{getContainerId}} instead of {{getId}} to pass all container information correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1453) [JDK8] Fix Javadoc errors caused by incorrect or illegal tags in doc comments
[ https://issues.apache.org/jira/browse/YARN-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137849#comment-14137849 ] Hadoop QA commented on YARN-1453: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669467/YARN-1453-02.patch against trunk revision ea4e2e8. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4997//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4997//console This message is automatically generated. [JDK8] Fix Javadoc errors caused by incorrect or illegal tags in doc comments - Key: YARN-1453 URL: https://issues.apache.org/jira/browse/YARN-1453 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.3.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Attachments: 1453-branch-2.patch, 1453-branch-2.patch, 1453-trunk.patch, 1453-trunk.patch, YARN-1453-02.patch Javadoc is more strict by default in JDK8 and will error out on malformed or illegal tags found in doc comments. Although tagged as JDK8 all of the required changes are generic Javadoc cleanups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch 0 after YARN-2182
[ https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137854#comment-14137854 ] Hadoop QA commented on YARN-2562: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669482/YARN-2562.1.patch against trunk revision ea4e2e8. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4998//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4998//console This message is automatically generated. ContainerId@toString() is unreadable for epoch 0 after YARN-2182 - Key: YARN-2562 URL: https://issues.apache.org/jira/browse/YARN-2562 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Tsuyoshi OZAWA Priority: Critical Attachments: YARN-2562.1.patch ContainerID string format is unreadable for RMs that restarted at least once (epoch 0) after YARN-2182. For e.g, container_1410901177871_0001_01_05_17. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2561) MR job client cannot reconnect to AM after NM restart.
[ https://issues.apache.org/jira/browse/YARN-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137863#comment-14137863 ] Junping Du commented on YARN-2561: -- This test failure looks like a random failure and should be unrelated. Kick off test again manually. MR job client cannot reconnect to AM after NM restart. -- Key: YARN-2561 URL: https://issues.apache.org/jira/browse/YARN-2561 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Tassapol Athiapinya Assignee: Junping Du Priority: Blocker Attachments: YARN-2561-v2.patch, YARN-2561-v3.patch, YARN-2561-v4.patch, YARN-2561.patch Work-preserving NM restart is disabled. Submit a job. Restart the only NM and found that Job will hang with connect retries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager
[ https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137881#comment-14137881 ] Vinod Kumar Vavilapalli commented on YARN-2080: --- This looks good, +1. Let's commit it to the branch.. Admission Control: Integrate Reservation subsystem with ResourceManager --- Key: YARN-2080 URL: https://issues.apache.org/jira/browse/YARN-2080 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Subru Krishnan Assignee: Subru Krishnan Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, YARN-2080.patch This JIRA tracks the integration of Reservation subsystem data structures introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring of YARN-1051. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137895#comment-14137895 ] Chris Trezzo commented on YARN-1492: [~kasha], [~vinodkv] and I had a conversation around the main things needed before committing to trunk: 1. Complete the refactor that removes SCMContext and ensures implementation details from the in-memory store are not leaked through the SCMStore interface. 2. Add a configuration parameter at the yarn level that allows operators to disallow uploading resources to the shared cache if they are not PUBLIC (currently resources are allowed if they are PUBLIC or owned by the user requesting the upload). 3. Ability to run SCM optionally as part of the RM. A few things that are important, but can be added post merge: 1. A levelDB store implementation. 2. Security. 3. ZK-based store implementation. Also, the consensus was that it seemed OK to let store implementations handle eviction policy logic. Having eviction policy logic span store implementations might be difficult and could cause store implementation details to leak through into the policies. For example, the in-memory store has to consider when it started up during cache eviction, where persistent stores may not need to. truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Chris Trezzo Attachments: YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, shared_cache_design_v5.pdf, shared_cache_design_v6.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2559) ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher
[ https://issues.apache.org/jira/browse/YARN-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2559: -- Attachment: YARN-2559.2.patch bq. To be consistent with the FinalApplicationStatus exposed on RM web UI and CLI, we may publish UNDEFINED state as well in case finalStatus is unavailable ? Nice catch! Fix the issue in the new patch. ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher -- Key: YARN-2559 URL: https://issues.apache.org/jira/browse/YARN-2559 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.6.0 Environment: Generice History Service is enabled in Timelineserver with yarn.resourcemanager.system-metrics-publisher.enabled=true So that ResourceManager should Timeline Store for recording application history information Reporter: Karam Singh Assignee: Zhijie Shen Attachments: YARN-2559.1.patch, YARN-2559.2.patch ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU
[ https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137903#comment-14137903 ] Varun Vasudev commented on YARN-2190: - [~chuanliu] thanks for the patch! Some questions and comments - 1. What is the behaviour of a process that tries to exceed the allocated memory? Will it start swapping or will it be killed? 2. Your code assumes a 1-1 mapping of physical cores to vcores. This assumption is/will be problematic, especially in heterogeneous clusters. You're better off using the ratio of (container-vcores/node-vcores) to determine cpu limits. 3. {noformat} Index: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java === --- hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java (revision 1618292) +++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java (working copy) @@ -38,6 +38,7 @@ import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.permission.FsPermission; import org.apache.hadoop.yarn.api.records.ContainerId; +import org.apache.hadoop.yarn.api.records.Resource; import org.apache.hadoop.yarn.conf.YarnConfiguration; import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container; import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerDiagnosticsUpdateEvent; @@ -257,6 +258,11 @@ readLock.unlock(); } } + + protected String[] getRunCommand(String command, String groupId, + Configuration conf) { +return getRunCommand(command, groupId, conf, null); + } Index: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java === --- hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java (revision 1618292) +++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java (working copy) @@ -185,7 +185,7 @@ // Setup command to run String[] command = getRunCommand(sb.getWrapperScriptPath().toString(), -containerIdStr, this.getConf()); +containerIdStr, this.getConf(), container.getResource()); LOG.info(launchContainer: + Arrays.toString(command)); {noformat} Can you explain why you are modifying DefaultContainerExecutor? You've added a method for the old signature in ContainerExecutor. 4. Can you modify the comments/usage to specify the units of memory(bytes, MB, GB)? Provide a Windows container executor that can limit memory and CPU -- Key: YARN-2190 URL: https://issues.apache.org/jira/browse/YARN-2190 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Reporter: Chuan Liu Assignee: Chuan Liu Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, YARN-2190.5.patch Yarn default container executor on Windows does not set the resource limit on the containers currently. The memory limit is enforced by a separate monitoring thread. The container implementation on Windows uses Job Object right now. The latest Windows (8 or later) API allows CPU and memory limits on the job objects. We want to create a Windows container executor that sets the limits on job objects thus provides resource enforcement at OS level. http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2558) Updating ContainerTokenIdentifier#read/write to use ContainerId#getContainerId
[ https://issues.apache.org/jira/browse/YARN-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137902#comment-14137902 ] Hadoop QA commented on YARN-2558: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669487/YARN-2558.3.patch against trunk revision ea4e2e8. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4999//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4999//console This message is automatically generated. Updating ContainerTokenIdentifier#read/write to use ContainerId#getContainerId -- Key: YARN-2558 URL: https://issues.apache.org/jira/browse/YARN-2558 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Priority: Blocker Attachments: YARN-2558.1.patch, YARN-2558.2.patch, YARN-2558.3.patch We should update ContainerTokenIdentifier#read/write to use {{getContainerId}} instead of {{getId}} to pass all container information correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU
[ https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137905#comment-14137905 ] Hadoop QA commented on YARN-2190: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12662538/YARN-2190.5.patch against trunk revision e3803d0. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5001//console This message is automatically generated. Provide a Windows container executor that can limit memory and CPU -- Key: YARN-2190 URL: https://issues.apache.org/jira/browse/YARN-2190 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Reporter: Chuan Liu Assignee: Chuan Liu Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, YARN-2190.5.patch Yarn default container executor on Windows does not set the resource limit on the containers currently. The memory limit is enforced by a separate monitoring thread. The container implementation on Windows uses Job Object right now. The latest Windows (8 or later) API allows CPU and memory limits on the job objects. We want to create a Windows container executor that sets the limits on job objects thus provides resource enforcement at OS level. http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-668) TokenIdentifier serialization should consider Unknown fields
[ https://issues.apache.org/jira/browse/YARN-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-668: - Priority: Blocker (was: Major) Target Version/s: 2.6.0 TokenIdentifier serialization should consider Unknown fields Key: YARN-668 URL: https://issues.apache.org/jira/browse/YARN-668 Project: Hadoop YARN Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Vinod Kumar Vavilapalli Priority: Blocker This would allow changing of the TokenIdentifier between versions. The current serialization is Writable. A simple way to achieve this would be to have a Proto object as the payload for TokenIdentifiers, instead of individual fields. TokenIdentifier continues to implement Writable to work with the RPC layer - but the payload itself is serialized using PB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2558) Updating ContainerTokenIdentifier#read/write to use ContainerId#getContainerId
[ https://issues.apache.org/jira/browse/YARN-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137907#comment-14137907 ] Vinod Kumar Vavilapalli commented on YARN-2558: --- bq. Yes, this should be fine for the short term. I just wanted to make it clear that until YARN-668 is addressed we're going to continue to break backwards compatibility and thus rolling upgrades with seemingly simple changes like this. *Sigh* yes. I just marked YARN-668 as a blocker for 2.6. Thanks [~jianhe] for pointing out problems with preserving restart without the patch. Updating ContainerTokenIdentifier#read/write to use ContainerId#getContainerId -- Key: YARN-2558 URL: https://issues.apache.org/jira/browse/YARN-2558 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Priority: Blocker Attachments: YARN-2558.1.patch, YARN-2558.2.patch, YARN-2558.3.patch We should update ContainerTokenIdentifier#read/write to use {{getContainerId}} instead of {{getId}} to pass all container information correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2561) MR job client cannot reconnect to AM after NM restart.
[ https://issues.apache.org/jira/browse/YARN-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137908#comment-14137908 ] Junping Du commented on YARN-2561: -- Also, try this patch in a real cluster which works fine as expected. MR job client cannot reconnect to AM after NM restart. -- Key: YARN-2561 URL: https://issues.apache.org/jira/browse/YARN-2561 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Tassapol Athiapinya Assignee: Junping Du Priority: Blocker Attachments: YARN-2561-v2.patch, YARN-2561-v3.patch, YARN-2561-v4.patch, YARN-2561.patch Work-preserving NM restart is disabled. Submit a job. Restart the only NM and found that Job will hang with connect retries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-668) TokenIdentifier serialization should consider Unknown fields
[ https://issues.apache.org/jira/browse/YARN-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137912#comment-14137912 ] Vinod Kumar Vavilapalli commented on YARN-668: -- I think [~sseth]'s solution in the description is a much simpler way to address this bq. The current serialization is Writable. A simple way to achieve this would be to have a Proto object as the payload for TokenIdentifiers, instead of individual fields. TokenIdentifier serialization should consider Unknown fields Key: YARN-668 URL: https://issues.apache.org/jira/browse/YARN-668 Project: Hadoop YARN Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Vinod Kumar Vavilapalli Priority: Blocker This would allow changing of the TokenIdentifier between versions. The current serialization is Writable. A simple way to achieve this would be to have a Proto object as the payload for TokenIdentifiers, instead of individual fields. TokenIdentifier continues to implement Writable to work with the RPC layer - but the payload itself is serialized using PB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2001) Threshold for RM to accept requests from AM after failover
[ https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2001: -- Attachment: YARN-2001.5.patch Test passes locally, re-submit the same patch Threshold for RM to accept requests from AM after failover -- Key: YARN-2001 URL: https://issues.apache.org/jira/browse/YARN-2001 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2001.1.patch, YARN-2001.2.patch, YARN-2001.3.patch, YARN-2001.4.patch, YARN-2001.5.patch, YARN-2001.5.patch After failover, RM may require a certain threshold to determine whether it’s safe to make scheduling decisions and start accepting new container requests from AMs. The threshold could be a certain amount of nodes. i.e. RM waits until a certain amount of nodes joining before accepting new container requests. Or it could simply be a timeout, only after the timeout RM accepts new requests. NMs joined after the threshold can be treated as new NMs and instructed to kill all its containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed
[ https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137937#comment-14137937 ] chang li commented on YARN-2308: thanks for the collective thought about this and all suggestions. I will improve my solution. NPE happened when RM restart after CapacityScheduler queue configuration changed - Key: YARN-2308 URL: https://issues.apache.org/jira/browse/YARN-2308 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: chang li Priority: Critical Attachments: jira2308.patch, jira2308.patch, jira2308.patch I encountered a NPE when RM restart {code} 2014-07-16 07:22:46,957 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} And RM will be failed to restart. This is caused by queue configuration changed, I removed some queues and added new queues. So when RM restarts, it tries to recover history applications, and when any of queues of these applications removed, NPE will be raised. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2179) Initial cache manager structure and context
[ https://issues.apache.org/jira/browse/YARN-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137944#comment-14137944 ] Chris Trezzo commented on YARN-2179: [~kasha] a couple of comments: bq. 5. sharedcache-pom: my understand of maven is pretty sparse, so please correct me if I am wrong. Looks like sharedcache depends on the RM. If we were to embed the sharedcache in the RM, wouldn't that lead to circular dependency? How do we plan to solve it? One approach would be to move the shared cache project back into the RM project. This would not affect the ability to run the shared cache manager as a separate service, but would be more a code organizational thing. Thoughts? bq. 6. RemoteAppChecker: Just thinking out loud - in a non-embedded case, what happens if we upgrade other daemons/clients but not the SCM and add a new completed state? There might not be a solution here though, the worst case appears to be that we wouldn't clear the cache when apps end up in that state. One alternative is to query the RM for active states or an app being active. I am open to adding these APIs (Private for now) to the RM. I took a look at the ApplicationReport interface again. Would it make more sense to leverage getFinalApplicationStatus() instead of getYarnApplicationState()? That way we can just say if the FinalApplicationStatus is undefined don't clean it up, otherwise we are safe to delete the appId. I will work on the changes for the other comments and post an updated patch. Initial cache manager structure and context --- Key: YARN-2179 URL: https://issues.apache.org/jira/browse/YARN-2179 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2179-trunk-v1.patch, YARN-2179-trunk-v2.patch, YARN-2179-trunk-v3.patch, YARN-2179-trunk-v4.patch, YARN-2179-trunk-v5.patch Implement the initial shared cache manager structure and context. The SCMContext will be used by a number of manager services (i.e. the backing store and the cleaner service). The AppChecker is used to gather the currently running applications on SCM startup (necessary for an scm that is backed by an in-memory store). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2468) Log handling for LRS
[ https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2468: Attachment: YARN-2468.5.1.patch Log handling for LRS Key: YARN-2468 URL: https://issues.apache.org/jira/browse/YARN-2468 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2468.1.patch, YARN-2468.2.patch, YARN-2468.3.patch, YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.patch Currently, when application is finished, NM will start to do the log aggregation. But for Long running service applications, this is not ideal. The problems we have are: 1) LRS applications are expected to run for a long time (weeks, months). 2) Currently, all the container logs (from one NM) will be written into a single file. The files could become larger and larger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-668) TokenIdentifier serialization should consider Unknown fields
[ https://issues.apache.org/jira/browse/YARN-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137952#comment-14137952 ] Junping Du commented on YARN-668: - bq. I think Siddharth Seth's solution in the description is a much simpler way to address this. Agree. I am starting to work on this way. [~vinodkv], can I take it over if you haven't start to work on this? Thanks! TokenIdentifier serialization should consider Unknown fields Key: YARN-668 URL: https://issues.apache.org/jira/browse/YARN-668 Project: Hadoop YARN Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Vinod Kumar Vavilapalli Priority: Blocker This would allow changing of the TokenIdentifier between versions. The current serialization is Writable. A simple way to achieve this would be to have a Proto object as the payload for TokenIdentifiers, instead of individual fields. TokenIdentifier continues to implement Writable to work with the RPC layer - but the payload itself is serialized using PB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-913: Attachment: YARN-913-003.patch assuming this patch builds (it does appear to locally), this patch # is in sync with trunk, including the new curator import of HADOOP-10982 # Adds security # has tests that bring up a kerberized ZK cluster to verify clients can work with it. # has the RM in charge of setting up paths and cleaning up after I don't think security is perfect ... I need to lock down the ACLs, and get the design docs from google drive into the docs as .md files. Add a way to register long-lived services in a YARN cluster --- Key: YARN-913 URL: https://issues.apache.org/jira/browse/YARN-913 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Affects Versions: 2.5.0, 2.4.1 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, yarnregistry.pdf, yarnregistry.tla In a YARN cluster you can't predict where services will come up -or on what ports. The services need to work those things out as they come up and then publish them somewhere. Applications need to be able to find the service instance they are to bond to -and not any others in the cluster. Some kind of service registry -in the RM, in ZK, could do this. If the RM held the write access to the ZK nodes, it would be more secure than having apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2561) MR job client cannot reconnect to AM after NM restart.
[ https://issues.apache.org/jira/browse/YARN-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137973#comment-14137973 ] Hadoop QA commented on YARN-2561: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669455/YARN-2561-v4.patch against trunk revision ea4e2e8. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5000//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5000//console This message is automatically generated. MR job client cannot reconnect to AM after NM restart. -- Key: YARN-2561 URL: https://issues.apache.org/jira/browse/YARN-2561 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Tassapol Athiapinya Assignee: Junping Du Priority: Blocker Attachments: YARN-2561-v2.patch, YARN-2561-v3.patch, YARN-2561-v4.patch, YARN-2561.patch Work-preserving NM restart is disabled. Submit a job. Restart the only NM and found that Job will hang with connect retries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1779) Handle AMRMTokens across RM failover
[ https://issues.apache.org/jira/browse/YARN-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1779: -- Attachment: YARN-1779.6.patch Thanks Vinod for reviewing. Reverted unnecessary changes. Also changed TestUnmanagedAMLauncher to new YarnConfiguration instead of Configuration so that YarnConfiguration can be reloaded. Tested on real HA cluster with work-preserving restart enabled. Without the patch, AM will get Token exception if AM fails over from rm1-rm2-rm1. With the patch. AM can failover properly. Handle AMRMTokens across RM failover Key: YARN-1779 URL: https://issues.apache.org/jira/browse/YARN-1779 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Jian He Priority: Blocker Labels: ha Attachments: YARN-1779.1.patch, YARN-1779.2.patch, YARN-1779.3.patch, YARN-1779.6.patch Verify if AMRMTokens continue to work against RM failover. If not, we will have to do something along the lines of YARN-986. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137997#comment-14137997 ] Hadoop QA commented on YARN-913: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669504/YARN-913-003.patch against trunk revision f24ac42. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 32 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5004//console This message is automatically generated. Add a way to register long-lived services in a YARN cluster --- Key: YARN-913 URL: https://issues.apache.org/jira/browse/YARN-913 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Affects Versions: 2.5.0, 2.4.1 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, yarnregistry.pdf, yarnregistry.tla In a YARN cluster you can't predict where services will come up -or on what ports. The services need to work those things out as they come up and then publish them somewhere. Applications need to be able to find the service instance they are to bond to -and not any others in the cluster. Some kind of service registry -in the RM, in ZK, could do this. If the RM held the write access to the ZK nodes, it would be more secure than having apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-668) TokenIdentifier serialization should consider Unknown fields
[ https://issues.apache.org/jira/browse/YARN-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137999#comment-14137999 ] Tsuyoshi OZAWA commented on YARN-668: - +1(non-binding) for making TokenIdentifier serialization protobuf. With the change, we can do versioning TokenIdentifier and co-exist old and new TokenIdentifier. TokenIdentifier serialization should consider Unknown fields Key: YARN-668 URL: https://issues.apache.org/jira/browse/YARN-668 Project: Hadoop YARN Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Vinod Kumar Vavilapalli Priority: Blocker This would allow changing of the TokenIdentifier between versions. The current serialization is Writable. A simple way to achieve this would be to have a Proto object as the payload for TokenIdentifiers, instead of individual fields. TokenIdentifier continues to implement Writable to work with the RPC layer - but the payload itself is serialized using PB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2559) ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher
[ https://issues.apache.org/jira/browse/YARN-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138003#comment-14138003 ] Hadoop QA commented on YARN-2559: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669496/YARN-2559.2.patch against trunk revision e3803d0. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5002//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5002//console This message is automatically generated. ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher -- Key: YARN-2559 URL: https://issues.apache.org/jira/browse/YARN-2559 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.6.0 Environment: Generice History Service is enabled in Timelineserver with yarn.resourcemanager.system-metrics-publisher.enabled=true So that ResourceManager should Timeline Store for recording application history information Reporter: Karam Singh Assignee: Zhijie Shen Attachments: YARN-2559.1.patch, YARN-2559.2.patch ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2179) Initial cache manager structure and context
[ https://issues.apache.org/jira/browse/YARN-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138021#comment-14138021 ] Karthik Kambatla commented on YARN-2179: bq. One approach would be to move the shared cache project back into the RM project. That should work. I am okay with leaving the patch as is for now and move modules when we are embedding SCM in the RM. bq. Would it make more sense to leverage getFinalApplicationStatus() instead of getYarnApplicationState()? Sounds reasonable. Initial cache manager structure and context --- Key: YARN-2179 URL: https://issues.apache.org/jira/browse/YARN-2179 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2179-trunk-v1.patch, YARN-2179-trunk-v2.patch, YARN-2179-trunk-v3.patch, YARN-2179-trunk-v4.patch, YARN-2179-trunk-v5.patch Implement the initial shared cache manager structure and context. The SCMContext will be used by a number of manager services (i.e. the backing store and the cleaner service). The AppChecker is used to gather the currently running applications on SCM startup (necessary for an scm that is backed by an in-memory store). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2468) Log handling for LRS
[ https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2468: Attachment: YARN-2468.5.1.patch Log handling for LRS Key: YARN-2468 URL: https://issues.apache.org/jira/browse/YARN-2468 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2468.1.patch, YARN-2468.2.patch, YARN-2468.3.patch, YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, YARN-2468.5.patch Currently, when application is finished, NM will start to do the log aggregation. But for Long running service applications, this is not ideal. The problems we have are: 1) LRS applications are expected to run for a long time (weeks, months). 2) Currently, all the container logs (from one NM) will be written into a single file. The files could become larger and larger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2001) Threshold for RM to accept requests from AM after failover
[ https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138060#comment-14138060 ] Hadoop QA commented on YARN-2001: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669499/YARN-2001.5.patch against trunk revision f24ac42. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5005//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5005//console This message is automatically generated. Threshold for RM to accept requests from AM after failover -- Key: YARN-2001 URL: https://issues.apache.org/jira/browse/YARN-2001 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2001.1.patch, YARN-2001.2.patch, YARN-2001.3.patch, YARN-2001.4.patch, YARN-2001.5.patch, YARN-2001.5.patch After failover, RM may require a certain threshold to determine whether it’s safe to make scheduling decisions and start accepting new container requests from AMs. The threshold could be a certain amount of nodes. i.e. RM waits until a certain amount of nodes joining before accepting new container requests. Or it could simply be a timeout, only after the timeout RM accepts new requests. NMs joined after the threshold can be treated as new NMs and instructed to kill all its containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1779) Handle AMRMTokens across RM failover
[ https://issues.apache.org/jira/browse/YARN-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138072#comment-14138072 ] Hadoop QA commented on YARN-1779: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669507/YARN-1779.6.patch against trunk revision f24ac42. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5006//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5006//console This message is automatically generated. Handle AMRMTokens across RM failover Key: YARN-1779 URL: https://issues.apache.org/jira/browse/YARN-1779 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Jian He Priority: Blocker Labels: ha Attachments: YARN-1779.1.patch, YARN-1779.2.patch, YARN-1779.3.patch, YARN-1779.6.patch Verify if AMRMTokens continue to work against RM failover. If not, we will have to do something along the lines of YARN-986. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart
[ https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138083#comment-14138083 ] Jian He commented on YARN-1372: --- We can probably do this: - Transfer both justFinishedContainers and finishedContainersSentToAM to the new attempt irrespective work-preserving AM restart is enabled or not, so that second attempt could continously ack previous finished containers. - In pullJustFinishedContainers, we can check if work-preserving AM restart is enabled. If it is, we return all the attempts’ finished containers. If it is not enabled, only return current attempt’s containers. Ensure all completed containers are reported to the AMs across RM restart - Key: YARN-1372 URL: https://issues.apache.org/jira/browse/YARN-1372 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1372.001.patch, YARN-1372.001.patch, YARN-1372.002_NMHandlesCompletedApp.patch, YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.003.patch, YARN-1372.004.patch, YARN-1372.005.patch, YARN-1372.005.patch, YARN-1372.prelim.patch, YARN-1372.prelim2.patch Currently the NM informs the RM about completed containers and then removes those containers from the RM notification list. The RM passes on that completed container information to the AM and the AM pulls this data. If the RM dies before the AM pulls this data then the AM may not be able to get this information again. To fix this, NM should maintain a separate list of such completed container notifications sent to the RM. After the AM has pulled the containers from the RM then the RM will inform the NM about it and the NM can remove the completed container from the new list. Upon re-register with the RM (after RM restart) the NM should send the entire list of completed containers to the RM along with any other containers that completed while the RM was dead. This ensures that the RM can inform the AM's about all completed containers. Some container completions may be reported more than once since the AM may have pulled the container but the RM may die before notifying the NM about the pull. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2558) Updating ContainerTokenIdentifier#read/write to use ContainerId#getContainerId
[ https://issues.apache.org/jira/browse/YARN-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138093#comment-14138093 ] Jian He commented on YARN-2558: --- Committing this, thanks [~jlowe], [~vinodkv] for the comments. Updating ContainerTokenIdentifier#read/write to use ContainerId#getContainerId -- Key: YARN-2558 URL: https://issues.apache.org/jira/browse/YARN-2558 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Priority: Blocker Attachments: YARN-2558.1.patch, YARN-2558.2.patch, YARN-2558.3.patch We should update ContainerTokenIdentifier#read/write to use {{getContainerId}} instead of {{getId}} to pass all container information correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2563) On secure clusters call to timeline server fails with authentication errors when running a job via oozie
[ https://issues.apache.org/jira/browse/YARN-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138107#comment-14138107 ] Zhijie Shen commented on YARN-2563: --- When submitting an app in a secure mode, YarnClient will automatically obtain a timeline DT from the timeline server. This communication needs to pass Kerberos authentication. It works at the client side, which has Kerberos setup. In a container (either of AM or a specific task), the process doesn't do Kerberos login, such that it is not able to pass Kerberos authentication to get the timeline DT. In this scenario, Oozie is starting a MR job inside the MR mapper container, such that it fails to pass Kerberos authentication enforced by the timeline server. However, the expected behavior is that YarnClient only grab a timeline DT when it is not found when submitting a app, and the DT will be put into the credentials of ContainerLaunchContext, and passed to AM and the remaining MR tasks' containers. Hence when Oozie wants to launch to a RM job from there, it should already have the DT, and don't need to invoke getTimelineDelegationToken method. It seems that YarnClientImpl.addTimelineDelegationToken has a bug. No matter the DT is already in the credentials or not, YarnClientImpl will always grab one, but only put it into the credentials when the DT is not there. The right behavior should be: when the DT is already in credentials, we even shouldn't invoke getTimelineDelegationToken. I'll create a patch to fix the bug. On secure clusters call to timeline server fails with authentication errors when running a job via oozie Key: YARN-2563 URL: https://issues.apache.org/jira/browse/YARN-2563 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Arpit Gupta Assignee: Zhijie Shen Priority: Blocker During our nightlies on a secure cluster we have seen oozie jobs fail with authentication error to the time line server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2468) Log handling for LRS
[ https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138127#comment-14138127 ] Hadoop QA commented on YARN-2468: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669521/YARN-2468.5.1.patch against trunk revision f230248. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5007//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5007//console This message is automatically generated. Log handling for LRS Key: YARN-2468 URL: https://issues.apache.org/jira/browse/YARN-2468 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2468.1.patch, YARN-2468.2.patch, YARN-2468.3.patch, YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, YARN-2468.5.patch Currently, when application is finished, NM will start to do the log aggregation. But for Long running service applications, this is not ideal. The problems we have are: 1) LRS applications are expected to run for a long time (weeks, months). 2) Currently, all the container logs (from one NM) will be written into a single file. The files could become larger and larger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2001) Threshold for RM to accept requests from AM after failover
[ https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2001: -- Attachment: YARN-2001.5.patch Trying the same patch again. no failures actually found in the jenkins console log Threshold for RM to accept requests from AM after failover -- Key: YARN-2001 URL: https://issues.apache.org/jira/browse/YARN-2001 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2001.1.patch, YARN-2001.2.patch, YARN-2001.3.patch, YARN-2001.4.patch, YARN-2001.5.patch, YARN-2001.5.patch, YARN-2001.5.patch After failover, RM may require a certain threshold to determine whether it’s safe to make scheduling decisions and start accepting new container requests from AMs. The threshold could be a certain amount of nodes. i.e. RM waits until a certain amount of nodes joining before accepting new container requests. Or it could simply be a timeout, only after the timeout RM accepts new requests. NMs joined after the threshold can be treated as new NMs and instructed to kill all its containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2559) ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher
[ https://issues.apache.org/jira/browse/YARN-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138162#comment-14138162 ] Jian He commented on YARN-2559: --- looks good overall, We may just call RMApp#getFinalApplicationStatus here? {code} (appAttempt.getFinalApplicationStatus() == null ? RMServerUtils.createFinalApplicationStatus(appState) : appAttempt.getFinalApplicationStatus() {code} ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher -- Key: YARN-2559 URL: https://issues.apache.org/jira/browse/YARN-2559 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.6.0 Environment: Generice History Service is enabled in Timelineserver with yarn.resourcemanager.system-metrics-publisher.enabled=true So that ResourceManager should Timeline Store for recording application history information Reporter: Karam Singh Assignee: Zhijie Shen Attachments: YARN-2559.1.patch, YARN-2559.2.patch ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2563) On secure clusters call to timeline server fails with authentication errors when running a job via oozie
[ https://issues.apache.org/jira/browse/YARN-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2563: -- Attachment: YARN-2563.1.patch Create a patch to fix the aforementioned bug. On secure clusters call to timeline server fails with authentication errors when running a job via oozie Key: YARN-2563 URL: https://issues.apache.org/jira/browse/YARN-2563 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Arpit Gupta Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-2563.1.patch During our nightlies on a secure cluster we have seen oozie jobs fail with authentication error to the time line server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2565) ResourceManager is fails to start when GenericHistoryService is enabled in secure mode without doing manual kinit as yarn
[ https://issues.apache.org/jira/browse/YARN-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138184#comment-14138184 ] Zhijie Shen commented on YARN-2565: --- [~karams], I think you've neglected mentioning the config: yarn.timeline-service.generic-application-history.enabled. It should be true, such that FileSystemApplicationHistoryStore is picked by RMApplicationHistoryWriter, which cannot access HDFS correctly in secure mode. After YARN-2033, when you enable generic history service, you should by default pick the new storage stack based on TimelineStore. The problem seems to be that the configurations which determine what store is chosen by ApplicationHistoryServer and RMApplicationHistoryWriter is not consistent. On RMApplicationHistoryWriter side, we should also use FileSystemApplicationHistoryStore only when users have explicitly put it in the config file. ResourceManager is fails to start when GenericHistoryService is enabled in secure mode without doing manual kinit as yarn - Key: YARN-2565 URL: https://issues.apache.org/jira/browse/YARN-2565 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.6.0 Environment: Secure cluster with ATS (timeline server enabled) and yarn.resourcemanager.system-metrics-publisher.enabled=true so that RM can send Application history to Timeline Store Reporter: Karam Singh Assignee: Zhijie Shen Observed that RM fails to start in Secure mode when GenericeHistoryService is enabled and ResourceManager is set to use Timeline Store -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2565) RM shouldn't use the old RMApplicationHistoryWriter unless explicitly setting FileSystemApplicationHistoryStore
[ https://issues.apache.org/jira/browse/YARN-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2565: -- Summary: RM shouldn't use the old RMApplicationHistoryWriter unless explicitly setting FileSystemApplicationHistoryStore (was: ResourceManager is fails to start when GenericHistoryService is enabled in secure mode without doing manual kinit as yarn) RM shouldn't use the old RMApplicationHistoryWriter unless explicitly setting FileSystemApplicationHistoryStore --- Key: YARN-2565 URL: https://issues.apache.org/jira/browse/YARN-2565 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.6.0 Environment: Secure cluster with ATS (timeline server enabled) and yarn.resourcemanager.system-metrics-publisher.enabled=true so that RM can send Application history to Timeline Store Reporter: Karam Singh Assignee: Zhijie Shen Observed that RM fails to start in Secure mode when GenericeHistoryService is enabled and ResourceManager is set to use Timeline Store -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2559) ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher
[ https://issues.apache.org/jira/browse/YARN-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2559: -- Attachment: YARN-2559.3.patch Update the patch accordingly ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher -- Key: YARN-2559 URL: https://issues.apache.org/jira/browse/YARN-2559 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.6.0 Environment: Generice History Service is enabled in Timelineserver with yarn.resourcemanager.system-metrics-publisher.enabled=true So that ResourceManager should Timeline Store for recording application history information Reporter: Karam Singh Assignee: Zhijie Shen Attachments: YARN-2559.1.patch, YARN-2559.2.patch, YARN-2559.3.patch ResourceManager sometime become un-responsive due to NPE in SystemMetricsPublisher -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2001) Threshold for RM to accept requests from AM after failover
[ https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138233#comment-14138233 ] Hadoop QA commented on YARN-2001: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669564/YARN-2565.1.patch against trunk revision 123f20d. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5011//console This message is automatically generated. Threshold for RM to accept requests from AM after failover -- Key: YARN-2001 URL: https://issues.apache.org/jira/browse/YARN-2001 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2001.1.patch, YARN-2001.2.patch, YARN-2001.3.patch, YARN-2001.4.patch, YARN-2001.5.patch, YARN-2001.5.patch, YARN-2001.5.patch After failover, RM may require a certain threshold to determine whether it’s safe to make scheduling decisions and start accepting new container requests from AMs. The threshold could be a certain amount of nodes. i.e. RM waits until a certain amount of nodes joining before accepting new container requests. Or it could simply be a timeout, only after the timeout RM accepts new requests. NMs joined after the threshold can be treated as new NMs and instructed to kill all its containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2565) RM shouldn't use the old RMApplicationHistoryWriter unless explicitly setting FileSystemApplicationHistoryStore
[ https://issues.apache.org/jira/browse/YARN-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138232#comment-14138232 ] Hadoop QA commented on YARN-2565: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669564/YARN-2565.1.patch against trunk revision 123f20d. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5010//console This message is automatically generated. RM shouldn't use the old RMApplicationHistoryWriter unless explicitly setting FileSystemApplicationHistoryStore --- Key: YARN-2565 URL: https://issues.apache.org/jira/browse/YARN-2565 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.6.0 Environment: Secure cluster with ATS (timeline server enabled) and yarn.resourcemanager.system-metrics-publisher.enabled=true so that RM can send Application history to Timeline Store Reporter: Karam Singh Assignee: Zhijie Shen Attachments: YARN-2565.1.patch Observed that RM fails to start in Secure mode when GenericeHistoryService is enabled and ResourceManager is set to use Timeline Store -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager
[ https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138234#comment-14138234 ] Karthik Kambatla commented on YARN-2080: Looks mostly good. Nits: There are some unused imports and javadoc errors in the files. Also, a couple of class javadocs have empty lines at the end. Comments: # It would be nice to not have default values for configs for ReservationSystem and PlanFollower. We could pick these defaults based on the scheduler. # I am not convinced using UTCClock is the best way, particularly when client time in not UTC. But, I guess we can go ahead with this for now and revisit it when we run into problems. Admission Control: Integrate Reservation subsystem with ResourceManager --- Key: YARN-2080 URL: https://issues.apache.org/jira/browse/YARN-2080 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Subru Krishnan Assignee: Subru Krishnan Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, YARN-2080.patch This JIRA tracks the integration of Reservation subsystem data structures introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring of YARN-1051. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2139) Add support for disk IO isolation/scheduling for containers
[ https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2139: -- Attachment: Disk_IO_Scheduling_Design_2.pdf Update a new design doc including spindle-locality information. Comments are very welcome. I'll create the sub-tasks to upload prelim code for review soon. Add support for disk IO isolation/scheduling for containers --- Key: YARN-2139 URL: https://issues.apache.org/jira/browse/YARN-2139 Project: Hadoop YARN Issue Type: New Feature Reporter: Wei Yan Assignee: Wei Yan Attachments: Disk_IO_Scheduling_Design_1.pdf, Disk_IO_Scheduling_Design_2.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2563) On secure clusters call to timeline server fails with authentication errors when running a job via oozie
[ https://issues.apache.org/jira/browse/YARN-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138262#comment-14138262 ] Hadoop QA commented on YARN-2563: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669557/YARN-2563.1.patch against trunk revision 123f20d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5009//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5009//console This message is automatically generated. On secure clusters call to timeline server fails with authentication errors when running a job via oozie Key: YARN-2563 URL: https://issues.apache.org/jira/browse/YARN-2563 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Arpit Gupta Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-2563.1.patch During our nightlies on a secure cluster we have seen oozie jobs fail with authentication error to the time line server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2558) Updating ContainerTokenIdentifier#read/write to use ContainerId#getContainerId
[ https://issues.apache.org/jira/browse/YARN-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138272#comment-14138272 ] Tsuyoshi OZAWA commented on YARN-2558: -- Thanks Jason, Vinod, and Jian for the comments and review. Updating ContainerTokenIdentifier#read/write to use ContainerId#getContainerId -- Key: YARN-2558 URL: https://issues.apache.org/jira/browse/YARN-2558 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Priority: Blocker Fix For: 2.6.0 Attachments: YARN-2558.1.patch, YARN-2558.2.patch, YARN-2558.3.patch We should update ContainerTokenIdentifier#read/write to use {{getContainerId}} instead of {{getId}} to pass all container information correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)