[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278804#comment-14278804 ] Hudson commented on YARN-1492: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2025 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2025/]) YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Chris Trezzo Priority: Critical Attachments: YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, shared_cache_design_v5.pdf, shared_cache_design_v6.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive
[ https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278805#comment-14278805 ] Hudson commented on YARN-2807: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2025 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2025/]) YARN-2807. Option --forceactive not works as described in usage of (xgong: rev d15cbae73c7ae22d5d60d8cba16cba565e8e8b20) * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive Key: YARN-2807 URL: https://issues.apache.org/jira/browse/YARN-2807 Project: Hadoop YARN Issue Type: Sub-task Components: documentation, resourcemanager Reporter: Wangda Tan Assignee: Masatake Iwasaki Priority: Minor Fix For: 2.7.0 Attachments: YARN-2807.1.patch, YARN-2807.2.patch, YARN-2807.3.patch, YARN-2807.4.patch Currently the help message of yarn rmadmin -transitionToActive is: {code} transitionToActive: incorrect number of arguments Usage: HAAdmin [-transitionToActive serviceId [--forceactive]] {code} But the --forceactive not works as expected. When transition RM state with --forceactive: {code} yarn rmadmin -transitionToActive rm2 --forceactive Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state. If you are very sure you know what you are doing, please specify the forcemanual flag. {code} As shown above, we still cannot transitionToActive when automatic failover is enabled with --forceactive. The option can work is: {{--forcemanual}}, there's no place in usage describes this option. I think we should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3005) [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java
[ https://issues.apache.org/jira/browse/YARN-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278811#comment-14278811 ] Hudson commented on YARN-3005: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2025 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2025/]) YARN-3005. [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java (Contributed by Kengo Seki) (aajisaka: rev 533e551eb42af188535aeb0ab35f8ebf150a0da1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/main/java/org/apache/hadoop/registry/client/impl/zk/RegistrySecurity.java * hadoop-yarn-project/CHANGES.txt [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java Key: YARN-3005 URL: https://issues.apache.org/jira/browse/YARN-3005 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.7.0 Reporter: Akira AJISAKA Priority: Trivial Labels: newbie Fix For: 2.7.0 Attachments: YARN-3005.001.patch, YARN-3005.002.patch Since we have moved to JDK7, we can refactor the below if-else statement for String. {code} // TODO JDK7 SWITCH if (REGISTRY_CLIENT_AUTH_KERBEROS.equals(auth)) { access = AccessPolicy.sasl; } else if (REGISTRY_CLIENT_AUTH_DIGEST.equals(auth)) { access = AccessPolicy.digest; } else if (REGISTRY_CLIENT_AUTH_ANONYMOUS.equals(auth)) { access = AccessPolicy.anon; } else { throw new ServiceStateException(E_UNKNOWN_AUTHENTICATION_MECHANISM + \ + auth + \); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2217) Shared cache client side changes
[ https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278807#comment-14278807 ] Hudson commented on YARN-2217: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2025 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2025/]) YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java Shared cache client side changes Key: YARN-2217 URL: https://issues.apache.org/jira/browse/YARN-2217 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch, YARN-2217-trunk-v9.patch Implement the client side changes for the shared cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3064) TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk
[ https://issues.apache.org/jira/browse/YARN-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3064: -- Attachment: YARN-3064.2.patch thanks Junping ! updated the patch TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk --- Key: YARN-3064 URL: https://issues.apache.org/jira/browse/YARN-3064 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Wangda Tan Assignee: Jian He Priority: Critical Attachments: YARN-3064.1.patch, YARN-3064.2.patch Noticed consistent tests failure, see: https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/ Logs like: {code} Error Message Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED Stacktrace java.lang.AssertionError: Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794) {code} I can reproduce it in local environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3031) create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279045#comment-14279045 ] Vrushali C commented on YARN-3031: -- Hi Varun, I'd like to take ownership of this JIRA, and hope you're OK with that. Do let me know. thanks Vrushali create backing storage write interface for ATS writers -- Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279056#comment-14279056 ] Mayank Bansal commented on YARN-2933: - Thanks [~jianhe] and [~wangda] for the review bq. looks good overall, we should use priority.AMCONTAINER here ? It was Confusing by name , I changed the names and updated accordingly. bq. it's better to use enum type instead of int in mockContainer, which can avoid call getValue() from enum. Priority is been override in multiple tests differently so didn't want to change the signature of the functions, Moreover its same. Uploading the updated patch Thanks, Mayank Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch, YARN-2933-8.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3064) TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk
[ https://issues.apache.org/jira/browse/YARN-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279062#comment-14279062 ] Hadoop QA commented on YARN-3064: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692450/YARN-3064.1.patch against trunk revision ce29074. {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6341//console This message is automatically generated. TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk --- Key: YARN-3064 URL: https://issues.apache.org/jira/browse/YARN-3064 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Wangda Tan Assignee: Jian He Priority: Critical Attachments: YARN-3064.1.patch Noticed consistent tests failure, see: https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/ Logs like: {code} Error Message Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED Stacktrace java.lang.AssertionError: Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794) {code} I can reproduce it in local environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279116#comment-14279116 ] Varun Saxena commented on YARN-2928: As branch has been created, we can now decide the order of tasks and assignees as [~vinodkv] suggested. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Vinod Kumar Vavilapalli Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3037) create HBase cluster backing storage implementation for ATS writes
[ https://issues.apache.org/jira/browse/YARN-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279046#comment-14279046 ] Vrushali C commented on YARN-3037: -- Hi Zhijie I'd like to take ownership of this JIRA, and hope you're OK with that. Do let me know. thanks Vrushali create HBase cluster backing storage implementation for ATS writes -- Key: YARN-3037 URL: https://issues.apache.org/jira/browse/YARN-3037 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Per design in YARN-2928, create a backing storage implementation for ATS writes based on a full HBase cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param
[ https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-3062. --- Resolution: Invalid Thanks for your confirmation, [~pramachandran]! Close the Jira. timelineserver gives inconsistent data for otherinfo field based on the filter param Key: YARN-3062 URL: https://issues.apache.org/jira/browse/YARN-3062 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.4.0, 2.5.0, 2.6.0 Reporter: Prakash Ramachandran Attachments: withfilter.json, withoutfilter.json When otherinfo field gets updated, in some cases the data returned for an entity is dependent on the filter usage. for ex in the attached files for the - entity: vertex_1421164610335_0020_1_01, - entitytype: TEZ_VERTEX_ID, for the otherinfo.numTasks, got updated from 1009 to 253 - using {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/ {code} gives the updated value: 253 - using {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code} gives the old value: 1009 for the otherinfo.status field, which gets updated, both of them show the updated value. TEZ-1942 has more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-2933: Attachment: YARN-2933-9.patch Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch, YARN-2933-8.patch, YARN-2933-9.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279109#comment-14279109 ] Anubhav Dhoot commented on YARN-3021: - I don't see any security holes. This token is only for the application's own use. The validation and renewal that you are turning off via the new parameter should not impact security of YARN or other applications. YARN's delegation-token handling disallows certain trust setups to operate properly --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278767#comment-14278767 ] Hudson commented on YARN-1492: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #75 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/75/]) YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Chris Trezzo Priority: Critical Attachments: YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, shared_cache_design_v5.pdf, shared_cache_design_v6.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3005) [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java
[ https://issues.apache.org/jira/browse/YARN-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278774#comment-14278774 ] Hudson commented on YARN-3005: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #75 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/75/]) YARN-3005. [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java (Contributed by Kengo Seki) (aajisaka: rev 533e551eb42af188535aeb0ab35f8ebf150a0da1) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/main/java/org/apache/hadoop/registry/client/impl/zk/RegistrySecurity.java [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java Key: YARN-3005 URL: https://issues.apache.org/jira/browse/YARN-3005 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.7.0 Reporter: Akira AJISAKA Priority: Trivial Labels: newbie Fix For: 2.7.0 Attachments: YARN-3005.001.patch, YARN-3005.002.patch Since we have moved to JDK7, we can refactor the below if-else statement for String. {code} // TODO JDK7 SWITCH if (REGISTRY_CLIENT_AUTH_KERBEROS.equals(auth)) { access = AccessPolicy.sasl; } else if (REGISTRY_CLIENT_AUTH_DIGEST.equals(auth)) { access = AccessPolicy.digest; } else if (REGISTRY_CLIENT_AUTH_ANONYMOUS.equals(auth)) { access = AccessPolicy.anon; } else { throw new ServiceStateException(E_UNKNOWN_AUTHENTICATION_MECHANISM + \ + auth + \); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3009) TimelineWebServices always parses primary and secondary filters as numbers if first char is a number
[ https://issues.apache.org/jira/browse/YARN-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278914#comment-14278914 ] Naganarasimha G R commented on YARN-3009: - Hi [~cwensel] Hope the approach proposed as part of the patch is fine with you? [~zjshen], Any comments on the earlier work around patch ? TimelineWebServices always parses primary and secondary filters as numbers if first char is a number Key: YARN-3009 URL: https://issues.apache.org/jira/browse/YARN-3009 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Chris K Wensel Assignee: Naganarasimha G R Attachments: YARN-3009.20150108-1.patch, YARN-3009.20150111-1.patch If you pass a filter value that starts with a number (7CCA...), the filter value will be parsed into the Number '7' causing the filter to fail the search. Should be noted the actual value as stored via a PUT operation is properly parsed and stored as a String. This manifests as a very hard to identify issue with DAGClient in Apache Tez and naming dags/vertices with alphanumeric guid values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param
[ https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278878#comment-14278878 ] Prakash Ramachandran commented on YARN-3062: [~zjshen] looks like that was the issue. can close this jira now. Thanks timelineserver gives inconsistent data for otherinfo field based on the filter param Key: YARN-3062 URL: https://issues.apache.org/jira/browse/YARN-3062 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.4.0, 2.5.0, 2.6.0 Reporter: Prakash Ramachandran Attachments: withfilter.json, withoutfilter.json When otherinfo field gets updated, in some cases the data returned for an entity is dependent on the filter usage. for ex in the attached files for the - entity: vertex_1421164610335_0020_1_01, - entitytype: TEZ_VERTEX_ID, for the otherinfo.numTasks, got updated from 1009 to 253 - using {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/ {code} gives the updated value: 253 - using {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code} gives the old value: 1009 for the otherinfo.status field, which gets updated, both of them show the updated value. TEZ-1942 has more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive
[ https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278768#comment-14278768 ] Hudson commented on YARN-2807: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #75 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/75/]) YARN-2807. Option --forceactive not works as described in usage of (xgong: rev d15cbae73c7ae22d5d60d8cba16cba565e8e8b20) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive Key: YARN-2807 URL: https://issues.apache.org/jira/browse/YARN-2807 Project: Hadoop YARN Issue Type: Sub-task Components: documentation, resourcemanager Reporter: Wangda Tan Assignee: Masatake Iwasaki Priority: Minor Fix For: 2.7.0 Attachments: YARN-2807.1.patch, YARN-2807.2.patch, YARN-2807.3.patch, YARN-2807.4.patch Currently the help message of yarn rmadmin -transitionToActive is: {code} transitionToActive: incorrect number of arguments Usage: HAAdmin [-transitionToActive serviceId [--forceactive]] {code} But the --forceactive not works as expected. When transition RM state with --forceactive: {code} yarn rmadmin -transitionToActive rm2 --forceactive Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state. If you are very sure you know what you are doing, please specify the forcemanual flag. {code} As shown above, we still cannot transitionToActive when automatic failover is enabled with --forceactive. The option can work is: {{--forcemanual}}, there's no place in usage describes this option. I think we should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2217) Shared cache client side changes
[ https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278770#comment-14278770 ] Hudson commented on YARN-2217: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #75 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/75/]) YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java Shared cache client side changes Key: YARN-2217 URL: https://issues.apache.org/jira/browse/YARN-2217 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch, YARN-2217-trunk-v9.patch Implement the client side changes for the shared cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3031) create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279130#comment-14279130 ] Varun Saxena commented on YARN-3031: Sure, go ahead. create backing storage write interface for ATS writers -- Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3015) yarn classpath command should support same options as hadoop classpath.
[ https://issues.apache.org/jira/browse/YARN-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3015: --- Attachment: YARN-3015.004.patch yarn classpath command should support same options as hadoop classpath. --- Key: YARN-3015 URL: https://issues.apache.org/jira/browse/YARN-3015 Project: Hadoop YARN Issue Type: Bug Components: scripts Reporter: Chris Nauroth Assignee: Varun Saxena Priority: Minor Attachments: YARN-3015.001.patch, YARN-3015.002.patch, YARN-3015.003.patch, YARN-3015.004.patch HADOOP-10903 enhanced the {{hadoop classpath}} command to support optional expansion of the wildcards and bundling the classpath into a jar file containing a manifest with the Class-Path attribute. The other classpath commands should do the same for consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2984) Metrics for container's actual memory usage
[ https://issues.apache.org/jira/browse/YARN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279182#comment-14279182 ] Robert Kanter commented on YARN-2984: - +1 Metrics for container's actual memory usage --- Key: YARN-2984 URL: https://issues.apache.org/jira/browse/YARN-2984 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-2984-1.patch, yarn-2984-2.patch, yarn-2984-3.patch, yarn-2984-prelim.patch It would be nice to capture resource usage per container, for a variety of reasons. This JIRA is to track memory usage. YARN-2965 tracks the resource usage on the node, and the two implementations should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2984) Metrics for container's actual memory usage
[ https://issues.apache.org/jira/browse/YARN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279205#comment-14279205 ] Robert Kanter commented on YARN-2984: - One minor thing: the patch has this: {code:java}new HashMap();{code} which is a Java 7 feature. Did we officially decide to drop Java 6 in the Hadoop 2.7.0 release? Metrics for container's actual memory usage --- Key: YARN-2984 URL: https://issues.apache.org/jira/browse/YARN-2984 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-2984-1.patch, yarn-2984-2.patch, yarn-2984-3.patch, yarn-2984-prelim.patch It would be nice to capture resource usage per container, for a variety of reasons. This JIRA is to track memory usage. YARN-2965 tracks the resource usage on the node, and the two implementations should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2861) Timeline DT secret manager should not reuse the RM's configs.
[ https://issues.apache.org/jira/browse/YARN-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279135#comment-14279135 ] Hudson commented on YARN-2861: -- FAILURE: Integrated in Hadoop-trunk-Commit #6869 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6869/]) YARN-2861. Fixed Timeline DT secret manager to not reuse RM's configs. Contributed by Zhijie Shen (jianhe: rev 9e33116d1d8944a393937337b3963e192b9c74d1) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMSecretManagerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineDelegationTokenSecretManagerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java Timeline DT secret manager should not reuse the RM's configs. - Key: YARN-2861 URL: https://issues.apache.org/jira/browse/YARN-2861 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.7.0 Attachments: YARN-2861.1.patch, YARN-2861.2.patch This is the configs for RM DT secret manager. We should create separate ones for timeline DT only. {code} @Override protected void serviceInit(Configuration conf) throws Exception { long secretKeyInterval = conf.getLong(YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_KEY, YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_DEFAULT); long tokenMaxLifetime = conf.getLong(YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_KEY, YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_DEFAULT); long tokenRenewInterval = conf.getLong(YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_KEY, YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_DEFAULT); secretManager = new TimelineDelegationTokenSecretManager(secretKeyInterval, tokenMaxLifetime, tokenRenewInterval, 360); secretManager.startThreads(); serviceAddr = TimelineUtils.getTimelineTokenServiceAddress(getConfig()); super.init(conf); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2932) Add entry for preemptable status to scheduler web UI and queue initialize/refresh logging
[ https://issues.apache.org/jira/browse/YARN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-2932: - Attachment: YARN-2932.v4.txt Thank you [~leftnoteasy] for your review and comments. {quote} Re 2: You're partially correct, queue finally calls setupQueueConfig when reinitialize is invoked. The CapacityScheduler reinitialization is creating a new set of queues, and copy new parameters to your old queues via {code} setupQueueConfigs( clusterResource, newlyParsedLeafQueue.capacity, ... {code} So you need put the parameter you wants to update to setupQueueConfig as well. Without that, queue will not be refreshed. I didn't find any changes to parameter of setupQueueConfig, so I guess so, it's better to add a test to verify it. {quote} I made the changes to the API for {{AbstractCSQueue#setupQueueConfigs}} to take the additional preemptable parameter. When it is called from {{[Leaf|Parent]Queue#setupQueueConfigs}}, it calls {{AbstractCSQueue#isQueuePathHierarchyPreemptable}} to get the preemptability of the queue. I tested the fixes in both version 3 and version 4 of this patch on a one-node cluster and on a 10-node cluster. In both version, I was able to change the {{disable_preemption}} properties, refresh the queues using {{yarn rmadmin -refreshQueues}}, and I was able to see the updates on the Scheduler UI page. However, I think I see that if the new list of queues is different than the old list of queues, it would not pick up the parameters for the new queues without this change. {quote} Re 3: You can take a look at how AbstractCSQueue initialize labels, I think they have similar logic – For node label is trying to get value from configuration, if not set, inherit from parent. With this, you can make getPreemptable interface without defaultVal in CapacitySchedulerConfiguration. {quote} I did change {{CapacitySchedulerConfiguration#getQueuePreemptable}} to not take a default value, but in order to pass back the {{null}} information, it has to return a {{String}} and then the caller has to convert the {{String}} to a Boolean, which I think is a little awkward. {quote} Since YARN-2056 is also planned in 2.7 (I thought it's already included in 2.6), do you think is it better to make configuration option name to queue-patch.preemptable for consistency? {quote} Well, that would be ideal, I think, but it isn't that simple on our side. We have already started using the code in YARN-2056 and are using the {{disable_preemption}} property. An argument could be made that {{disable_preemption}} is better because it indicates that it is turning off the {{...monitor.capacity.preemption...}} property. If {{disable_preemption}} were changed to {{preemptable}}, someone may look at that property and think that the queue should have that property without considering the overall, system property {{...monitor.capacity.preemption...}}. How important is it to you that {{disable_preemption}} property be changed to {{preemptable}}? Add entry for preemptable status to scheduler web UI and queue initialize/refresh logging --- Key: YARN-2932 URL: https://issues.apache.org/jira/browse/YARN-2932 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.7.0 Reporter: Eric Payne Assignee: Eric Payne Attachments: YARN-2932.v1.txt, YARN-2932.v2.txt, YARN-2932.v3.txt, YARN-2932.v4.txt YARN-2056 enables the ability to turn preemption on or off on a per-queue level. This JIRA will provide the preemption status for each queue in the {{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue refresh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2932) Add entry for preemptable status to scheduler web UI and queue initialize/refresh logging
[ https://issues.apache.org/jira/browse/YARN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279193#comment-14279193 ] Hadoop QA commented on YARN-2932: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692570/YARN-2932.v4.txt against trunk revision 9e33116. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6344//console This message is automatically generated. Add entry for preemptable status to scheduler web UI and queue initialize/refresh logging --- Key: YARN-2932 URL: https://issues.apache.org/jira/browse/YARN-2932 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.7.0 Reporter: Eric Payne Assignee: Eric Payne Attachments: YARN-2932.v1.txt, YARN-2932.v2.txt, YARN-2932.v3.txt, YARN-2932.v4.txt YARN-2056 enables the ability to turn preemption on or off on a per-queue level. This JIRA will provide the preemption status for each queue in the {{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue refresh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3031) create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279129#comment-14279129 ] Varun Saxena commented on YARN-3031: Sure, go ahead. create backing storage write interface for ATS writers -- Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed
[ https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279202#comment-14279202 ] Chris Nauroth commented on YARN-3066: - I'm not familiar with {{ssid}} on FreeBSD. Does it have the same usage as Linux {{setsid}}? If so, then perhaps an appropriate workaround is to copy that binary to {{setsid}} and make sure it's available on the {{PATH}}. This might not require any YARN code changes. bq. I propose to make Shell.isSetsidAvailable test more strict and fail to start if it is not found. This would likely have to be considered backwards-incompatible, because applications would fail to start on existing systems that don't have {{setsid}}. I suppose the new behavior could be hidden behind an opt-in configuration property. Also, we need to keep in mind that {{Shell.isSetsidAvailable}} is always {{false}} on Windows. (On Windows, we handle the issue of orphaned processes by using Windows API job objects instead of {{setsid}}.) Hadoop leaves orphaned tasks running after job is killed Key: YARN-3066 URL: https://issues.apache.org/jira/browse/YARN-3066 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1 Reporter: Dmitry Sivachenko When spawning user task, node manager checks for setsid(1) utility and spawns task program via it. See hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java for instance: String exec = Shell.isSetsidAvailable? exec setsid : exec; FreeBSD, unlike Linux, does not have setsid(1) utility. So plain exec is used to spawn user task. If that task spawns other external programs (this is common case if a task program is a shell script) and user kills job via mapred job -kill Job, these child processes remain running. 1) Why do you silently ignore the absence of setsid(1) and spawn task process via exec: this is the guarantee to have orphaned processes when job is prematurely killed. 2) FreeBSD has a replacement third-party program called ssid (which does almost the same as Linux's setsid). It would be nice to detect which binary is present during configure stage and put @SETSID@ macros into java file to use the correct name. I propose to make Shell.isSetsidAvailable test more strict and fail to start if it is not found: at least we will know about the problem at start rather than guess why there are orphaned tasks running forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3064) TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk
[ https://issues.apache.org/jira/browse/YARN-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279243#comment-14279243 ] Hadoop QA commented on YARN-3064: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692561/YARN-3064.2.patch against trunk revision ce29074. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6343//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6343//console This message is automatically generated. TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk --- Key: YARN-3064 URL: https://issues.apache.org/jira/browse/YARN-3064 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Wangda Tan Assignee: Jian He Priority: Critical Attachments: YARN-3064.1.patch, YARN-3064.2.patch Noticed consistent tests failure, see: https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/ Logs like: {code} Error Message Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED Stacktrace java.lang.AssertionError: Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794) {code} I can reproduce it in local environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3064) TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk
[ https://issues.apache.org/jira/browse/YARN-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279587#comment-14279587 ] Hadoop QA commented on YARN-3064: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692561/YARN-3064.2.patch against trunk revision 780a6bf. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6349//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6349//console This message is automatically generated. TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk --- Key: YARN-3064 URL: https://issues.apache.org/jira/browse/YARN-3064 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Wangda Tan Assignee: Jian He Priority: Critical Attachments: YARN-3064.1.patch, YARN-3064.2.patch Noticed consistent tests failure, see: https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/ Logs like: {code} Error Message Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED Stacktrace java.lang.AssertionError: Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794) {code} I can reproduce it in local environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279383#comment-14279383 ] Sangjin Lee commented on YARN-2928: --- And sorry for going back on the name. :) I realized that the term aggregating is now quite overloaded. When we said aggregation, we tried to stick with the definition of adding up metrics to the next parent. In that sense, I'm not sure the timeline aggregator would be the best name, as it would not do that type of aggregation. Aggregation up to the app level would be done by the AM, and the flow run level aggregation is done by the backing storage. How about Timeline writer? Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Vinod Kumar Vavilapalli Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279377#comment-14279377 ] Sangjin Lee commented on YARN-2928: --- bq. We will have to carve out some capacity for the per-node companions. I see some sort of static allocation like 1GB similar to NodeManager. The required memory for the per-node aggregator might be larger than anticipated. One reference point may be the memory footprint of a MR AM. The bulk of the job-related pinned-down memory would be needed on the aggregator. And that can be easily in the several hundreds of MB. Also, for buffering multiple of such data for writes would require more room. On top of that, one would need to multiply by the number of apps it needs to support (x2 or x3 at most). All in all, my gut feeling is that 1 GB might be rather tight. I think we'll know more as we start testing it with realistic size apps and backing storage. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Vinod Kumar Vavilapalli Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279401#comment-14279401 ] Sangjin Lee commented on YARN-2928: --- Hi [~varun_saxena], give us a just little more time. I think it might make more sense for some of us who got involved in the design process earlier to start working out the initial key pieces. As those pieces fall into place, we'll be in a much better shape to go more parallel. If you haven't had a chance to do so yet, could you also go over the attached design doc and let us know if you have any questions/feedback/suggestions? That would certainly be useful in getting up to speed. Thanks again for your interest! Much appreciated. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Vinod Kumar Vavilapalli Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2932) Add entry for preemptable status to scheduler web UI and queue initialize/refresh logging
[ https://issues.apache.org/jira/browse/YARN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-2932: - Attachment: YARN-2932.v5.txt Uploading patch v5. Should apply, now, to both branch-2 and trunk. Add entry for preemptable status to scheduler web UI and queue initialize/refresh logging --- Key: YARN-2932 URL: https://issues.apache.org/jira/browse/YARN-2932 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.7.0 Reporter: Eric Payne Assignee: Eric Payne Attachments: YARN-2932.v1.txt, YARN-2932.v2.txt, YARN-2932.v3.txt, YARN-2932.v4.txt, YARN-2932.v5.txt YARN-2056 enables the ability to turn preemption on or off on a per-queue level. This JIRA will provide the preemption status for each queue in the {{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue refresh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3030) set up ATS writer with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279510#comment-14279510 ] Sangjin Lee commented on YARN-3030: --- Jotting down my initial thoughts (I wish one could create subtasks of subtasks): We need to satisfy 3 use cases with this: *the current per-node aggregator, RM's aggregator, and the (future) per-app aggregator*. The baseline idea is to create a *logical* per-app aggregator as a service (CompositeService). The per-node aggregator can then be thought of as a thin container/manager/router of per-app aggregators which will come and go with applications. This will give us ease of development and maximal isolation between apps. Furthermore, it would help us supporting the per-app aggregator more easily. However, since we want to serve RM's aggregator as well, it makes sense to create a (abstract?) base aggregator service that is common between RM (not app-specific) and per-app aggregators. The RM one could be a very thin extension of the base. The per-app aggregator would add app-based logic (mostly lifecycle management). These are the pieces for this JIRA: - set up this class hierarchy - work out the timeline client API (both sync and async) - implement the lifecycle of the base aggregator service - implement the timeline client RPC server end (can be no-op for now) - work out some batching-related logic We would still need to work out the backing storage interface and serving reads, etc. but they are captured in other tickets. Thoughts? Feedback? set up ATS writer with basic request serving structure and lifecycle Key: YARN-3030 URL: https://issues.apache.org/jira/browse/YARN-3030 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Per design in YARN-2928, create an ATS writer as a service, and implement the basic service structure including the lifecycle management. Also, as part of this JIRA, we should come up with the ATS client API for sending requests to this ATS writer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279506#comment-14279506 ] Yongjun Zhang commented on YARN-3021: - Hi [~adhoot], Thanks for clarifying. That sounds good. YARN's delegation-token handling disallows certain trust setups to operate properly --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2984) Metrics for container's actual memory usage
[ https://issues.apache.org/jira/browse/YARN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279822#comment-14279822 ] Karthik Kambatla commented on YARN-2984: Yes. 2.6.x is the last Java 6 release. 2.7.x is all Java 7, dropping Java 6. Metrics for container's actual memory usage --- Key: YARN-2984 URL: https://issues.apache.org/jira/browse/YARN-2984 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-2984-1.patch, yarn-2984-2.patch, yarn-2984-3.patch, yarn-2984-prelim.patch It would be nice to capture resource usage per container, for a variety of reasons. This JIRA is to track memory usage. YARN-2965 tracks the resource usage on the node, and the two implementations should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3003) Provide API for client to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279890#comment-14279890 ] Varun Saxena commented on YARN-3003: [~leftnoteasy], the approach sounds good. Provide API for client to retrieve label to node mapping Key: YARN-3003 URL: https://issues.apache.org/jira/browse/YARN-3003 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Ted Yu Assignee: Varun Saxena Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set of labels associated with the node. Client (such as Slider) may be interested in label to node mapping - given label, return the nodes with this label. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279916#comment-14279916 ] Zhijie Shen commented on YARN-2928: --- bq. I suppose the reason the client-side API resides in yarn-api and yarn-common rather than yarn-client is to accommodate RM's use of ATS? Right, this is because we want to prevent cyclic dependency issue (RM - ATS - server-tests - RM). Another issue is TimelineDelegationToken#renewer inside common module is using timeline client too. YARN-2506 is investigating the solution to correct the packaging. bq. but we need to make a decision on where we will put the client and common pieces. IMHO, common code goes to hadoop-yarn-common (or hadoop-yarn-api if it's API related). If we can prevent cyclic dependency, the client code is best to be in hadoop-yarn-client. bq. My suggestion would be to use The package naming looks good. However, there are already some conventions. For example, all client libs are under {{org.apache.hadoop.yarn.client.api}}. It may be better to keep to it. As to the common code, I saw the style in hadoop-yarn-common is {{org.apache.hadoop.yarn.\[feature name\]}}. Finally, the server code doesn't have server in the package name. It may be organized like {{org.apache.hadoop.yarn.timelineservice.aggregator.*}} Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Vinod Kumar Vavilapalli Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3064) TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk
[ https://issues.apache.org/jira/browse/YARN-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279933#comment-14279933 ] Junping Du commented on YARN-3064: -- v2 patch LGTM. +1. Will commit it shortly. TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk --- Key: YARN-3064 URL: https://issues.apache.org/jira/browse/YARN-3064 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Wangda Tan Assignee: Jian He Priority: Critical Attachments: YARN-3064.1.patch, YARN-3064.2.patch Noticed consistent tests failure, see: https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/ Logs like: {code} Error Message Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED Stacktrace java.lang.AssertionError: Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794) {code} I can reproduce it in local environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279210#comment-14279210 ] Hadoop QA commented on YARN-2933: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692557/YARN-2933-9.patch against trunk revision ce29074. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6342//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6342//console This message is automatically generated. Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch, YARN-2933-8.patch, YARN-2933-9.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279655#comment-14279655 ] Sangjin Lee commented on YARN-2928: --- One observation on the code organization. The existing ATS code is actually spread out in several places: - entities, etc. API: {{org.apache.hadoop.yarn.api.records.timeline.\*}} at hadoop-yarn-api - TimelineClient API: {{org.apache.hadoop.yarn.client.api.\*}} at hadoop-yarn-common - server: {{org.apache.hadoop.yarn.server.timeline.\*}} at hadoop-yarn-server-applicationhistoryservice I suppose the reason the client-side API resides in yarn-api and yarn-common rather than yarn-client is to accommodate RM's use of ATS? How should we organize new code? We settled the question on the server piece (hadoop-yarn-server-timelineservice), but we need to make a decision on where we will put the client and common pieces. Also, we may want to organize the package names to be coherent. My suggestion would be to use {noformat} org.apache.hadoop.yarn.[common|client|server].timelineservice.detailed_subfeature {noformat} For example, the timeline aggregator would go to {{org.apache.hadoop.yarn.server.timelineservice.aggregator.\*}}. The timeline client API would go to {{org.apache.hadoop.yarn.client.timelineservice.api.\*}}. What is the best practice in terms of package naming? Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Vinod Kumar Vavilapalli Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3030) set up ATS writer with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279702#comment-14279702 ] Sangjin Lee commented on YARN-3030: --- You're right. We need the object model to be able to hash out the client API fully. Here I was suggesting putting a skeleton in (mostly empty classes for the object model). We'll have to come back to it as we work on the object model. I'm OK with going with REST for now. We'll need to get quick consensus on the code/package organization however (see https://issues.apache.org/jira/browse/YARN-2928?focusedCommentId=14279655page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14279655). set up ATS writer with basic request serving structure and lifecycle Key: YARN-3030 URL: https://issues.apache.org/jira/browse/YARN-3030 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Per design in YARN-2928, create an ATS writer as a service, and implement the basic service structure including the lifecycle management. Also, as part of this JIRA, we should come up with the ATS client API for sending requests to this ATS writer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-1418) Add Tracing to YARN
[ https://issues.apache.org/jira/browse/YARN-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu reassigned YARN-1418: Assignee: Yi Liu Add Tracing to YARN --- Key: YARN-1418 URL: https://issues.apache.org/jira/browse/YARN-1418 Project: Hadoop YARN Issue Type: Improvement Components: api, nodemanager, resourcemanager Reporter: Masatake Iwasaki Assignee: Yi Liu Adding tracing using HTrace in the same way as HBASE-6449 and HDFS-5274. The most part of changes needed for basis such as RPC seems to be almost ready in HDFS-5274. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2467) Add SpanReceiverHost to YARN daemons
[ https://issues.apache.org/jira/browse/YARN-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu reassigned YARN-2467: Assignee: Yi Liu Add SpanReceiverHost to YARN daemons - Key: YARN-2467 URL: https://issues.apache.org/jira/browse/YARN-2467 Project: Hadoop YARN Issue Type: Sub-task Components: api, nodemanager, resourcemanager Reporter: Masatake Iwasaki Assignee: Yi Liu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2664) Improve RM webapp to expose info about reservations.
[ https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Mazzucchelli updated YARN-2664: -- Attachment: YARN-2664.8.patch The submitted patch adds two new features: - a switch to change resource shown in the graph (memory, cpu) - a query parameter to get data to a specific queue (http://url/cluster/planner/queue_name) Improve RM webapp to expose info about reservations. Key: YARN-2664 URL: https://issues.apache.org/jira/browse/YARN-2664 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Matteo Mazzucchelli Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, YARN-2664.6.patch, YARN-2664.7.patch, YARN-2664.8.patch, YARN-2664.patch, legal.patch, screenshot_reservation_UI.pdf YARN-1051 provides a new functionality in the RM to ask for reservation on resources. Exposing this through the webapp GUI is important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278538#comment-14278538 ] Hudson commented on YARN-1492: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #74 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/74/]) YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Chris Trezzo Priority: Critical Attachments: YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, shared_cache_design_v5.pdf, shared_cache_design_v6.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2217) Shared cache client side changes
[ https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278541#comment-14278541 ] Hudson commented on YARN-2217: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #74 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/74/]) YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java Shared cache client side changes Key: YARN-2217 URL: https://issues.apache.org/jira/browse/YARN-2217 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch, YARN-2217-trunk-v9.patch Implement the client side changes for the shared cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive
[ https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278539#comment-14278539 ] Hudson commented on YARN-2807: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #74 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/74/]) YARN-2807. Option --forceactive not works as described in usage of (xgong: rev d15cbae73c7ae22d5d60d8cba16cba565e8e8b20) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm * hadoop-yarn-project/CHANGES.txt Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive Key: YARN-2807 URL: https://issues.apache.org/jira/browse/YARN-2807 Project: Hadoop YARN Issue Type: Sub-task Components: documentation, resourcemanager Reporter: Wangda Tan Assignee: Masatake Iwasaki Priority: Minor Fix For: 2.7.0 Attachments: YARN-2807.1.patch, YARN-2807.2.patch, YARN-2807.3.patch, YARN-2807.4.patch Currently the help message of yarn rmadmin -transitionToActive is: {code} transitionToActive: incorrect number of arguments Usage: HAAdmin [-transitionToActive serviceId [--forceactive]] {code} But the --forceactive not works as expected. When transition RM state with --forceactive: {code} yarn rmadmin -transitionToActive rm2 --forceactive Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state. If you are very sure you know what you are doing, please specify the forcemanual flag. {code} As shown above, we still cannot transitionToActive when automatic failover is enabled with --forceactive. The option can work is: {{--forcemanual}}, there's no place in usage describes this option. I think we should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2217) Shared cache client side changes
[ https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278549#comment-14278549 ] Hudson commented on YARN-2217: -- FAILURE: Integrated in Hadoop-Yarn-trunk #808 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/808/]) YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java Shared cache client side changes Key: YARN-2217 URL: https://issues.apache.org/jira/browse/YARN-2217 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch, YARN-2217-trunk-v9.patch Implement the client side changes for the shared cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278546#comment-14278546 ] Hudson commented on YARN-1492: -- FAILURE: Integrated in Hadoop-Yarn-trunk #808 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/808/]) YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Chris Trezzo Priority: Critical Attachments: YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, shared_cache_design_v5.pdf, shared_cache_design_v6.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive
[ https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278547#comment-14278547 ] Hudson commented on YARN-2807: -- FAILURE: Integrated in Hadoop-Yarn-trunk #808 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/808/]) YARN-2807. Option --forceactive not works as described in usage of (xgong: rev d15cbae73c7ae22d5d60d8cba16cba565e8e8b20) * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive Key: YARN-2807 URL: https://issues.apache.org/jira/browse/YARN-2807 Project: Hadoop YARN Issue Type: Sub-task Components: documentation, resourcemanager Reporter: Wangda Tan Assignee: Masatake Iwasaki Priority: Minor Fix For: 2.7.0 Attachments: YARN-2807.1.patch, YARN-2807.2.patch, YARN-2807.3.patch, YARN-2807.4.patch Currently the help message of yarn rmadmin -transitionToActive is: {code} transitionToActive: incorrect number of arguments Usage: HAAdmin [-transitionToActive serviceId [--forceactive]] {code} But the --forceactive not works as expected. When transition RM state with --forceactive: {code} yarn rmadmin -transitionToActive rm2 --forceactive Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state. If you are very sure you know what you are doing, please specify the forcemanual flag. {code} As shown above, we still cannot transitionToActive when automatic failover is enabled with --forceactive. The option can work is: {{--forcemanual}}, there's no place in usage describes this option. I think we should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278461#comment-14278461 ] Hudson commented on YARN-1492: -- FAILURE: Integrated in Hadoop-trunk-Commit #6864 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6864/]) YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Chris Trezzo Priority: Critical Attachments: YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, shared_cache_design_v5.pdf, shared_cache_design_v6.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2217) Shared cache client side changes
[ https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278462#comment-14278462 ] Hudson commented on YARN-2217: -- FAILURE: Integrated in Hadoop-trunk-Commit #6864 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6864/]) YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java Shared cache client side changes Key: YARN-2217 URL: https://issues.apache.org/jira/browse/YARN-2217 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch, YARN-2217-trunk-v9.patch Implement the client side changes for the shared cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3064) TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk
[ https://issues.apache.org/jira/browse/YARN-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278436#comment-14278436 ] Junping Du commented on YARN-3064: -- Patch looks good to me. However, for failures in TestAMRestart, I saw we set configuration of YarnConfiguration.RECOVERY_ENABLED in some test cases. May be we should apply the same change there? {code} conf.setBoolean(YarnConfiguration.RECOVERY_ENABLED, true); {code} TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk --- Key: YARN-3064 URL: https://issues.apache.org/jira/browse/YARN-3064 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Wangda Tan Assignee: Jian He Priority: Critical Attachments: YARN-3064.1.patch Noticed consistent tests failure, see: https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/ Logs like: {code} Error Message Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED Stacktrace java.lang.AssertionError: Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794) {code} I can reproduce it in local environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.
[ https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278601#comment-14278601 ] Hadoop QA commented on YARN-2664: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692499/YARN-2664.8.patch against trunk revision ba5116e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 5 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6340//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6340//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6340//console This message is automatically generated. Improve RM webapp to expose info about reservations. Key: YARN-2664 URL: https://issues.apache.org/jira/browse/YARN-2664 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Matteo Mazzucchelli Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, YARN-2664.6.patch, YARN-2664.7.patch, YARN-2664.8.patch, YARN-2664.patch, legal.patch, screenshot_reservation_UI.pdf YARN-1051 provides a new functionality in the RM to ask for reservation on resources. Exposing this through the webapp GUI is important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3005) [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java
[ https://issues.apache.org/jira/browse/YARN-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278637#comment-14278637 ] Hudson commented on YARN-3005: -- FAILURE: Integrated in Hadoop-trunk-Commit #6866 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6866/]) YARN-3005. [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java (Contributed by Kengo Seki) (aajisaka: rev 533e551eb42af188535aeb0ab35f8ebf150a0da1) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/main/java/org/apache/hadoop/registry/client/impl/zk/RegistrySecurity.java [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java Key: YARN-3005 URL: https://issues.apache.org/jira/browse/YARN-3005 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.7.0 Reporter: Akira AJISAKA Priority: Trivial Labels: newbie Fix For: 2.7.0 Attachments: YARN-3005.001.patch, YARN-3005.002.patch Since we have moved to JDK7, we can refactor the below if-else statement for String. {code} // TODO JDK7 SWITCH if (REGISTRY_CLIENT_AUTH_KERBEROS.equals(auth)) { access = AccessPolicy.sasl; } else if (REGISTRY_CLIENT_AUTH_DIGEST.equals(auth)) { access = AccessPolicy.digest; } else if (REGISTRY_CLIENT_AUTH_ANONYMOUS.equals(auth)) { access = AccessPolicy.anon; } else { throw new ServiceStateException(E_UNKNOWN_AUTHENTICATION_MECHANISM + \ + auth + \); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3005) [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java
[ https://issues.apache.org/jira/browse/YARN-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278632#comment-14278632 ] Akira AJISAKA commented on YARN-3005: - Can anyone assign [~sekikn] to this issue? Now I don't have the permission to do this. [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java Key: YARN-3005 URL: https://issues.apache.org/jira/browse/YARN-3005 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.7.0 Reporter: Akira AJISAKA Priority: Trivial Labels: newbie Fix For: 2.7.0 Attachments: YARN-3005.001.patch, YARN-3005.002.patch Since we have moved to JDK7, we can refactor the below if-else statement for String. {code} // TODO JDK7 SWITCH if (REGISTRY_CLIENT_AUTH_KERBEROS.equals(auth)) { access = AccessPolicy.sasl; } else if (REGISTRY_CLIENT_AUTH_DIGEST.equals(auth)) { access = AccessPolicy.digest; } else if (REGISTRY_CLIENT_AUTH_ANONYMOUS.equals(auth)) { access = AccessPolicy.anon; } else { throw new ServiceStateException(E_UNKNOWN_AUTHENTICATION_MECHANISM + \ + auth + \); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3005) [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java
[ https://issues.apache.org/jira/browse/YARN-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278627#comment-14278627 ] Akira AJISAKA commented on YARN-3005: - LGTM, +1. The patch is just to refactor the code, so new tests are not needed. [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java Key: YARN-3005 URL: https://issues.apache.org/jira/browse/YARN-3005 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.7.0 Reporter: Akira AJISAKA Priority: Trivial Labels: newbie Attachments: YARN-3005.001.patch, YARN-3005.002.patch Since we have moved to JDK7, we can refactor the below if-else statement for String. {code} // TODO JDK7 SWITCH if (REGISTRY_CLIENT_AUTH_KERBEROS.equals(auth)) { access = AccessPolicy.sasl; } else if (REGISTRY_CLIENT_AUTH_DIGEST.equals(auth)) { access = AccessPolicy.digest; } else if (REGISTRY_CLIENT_AUTH_ANONYMOUS.equals(auth)) { access = AccessPolicy.anon; } else { throw new ServiceStateException(E_UNKNOWN_AUTHENTICATION_MECHANISM + \ + auth + \); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2419) RM applications page doesn't sort application id properly
[ https://issues.apache.org/jira/browse/YARN-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278696#comment-14278696 ] Andrew Johnson commented on YARN-2419: -- I am encountering this same problem. Is there a fix in the works? RM applications page doesn't sort application id properly - Key: YARN-2419 URL: https://issues.apache.org/jira/browse/YARN-2419 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Thomas Graves The ResourceManager apps page doesn't sort the application ids properly when the app id rolls over from to 1. When it rolls over the 1+ application ids end up being many pages down by the 0XXX numbers. I assume we just sort alphabetically so we would need a special sorter that knows about application ids. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed
Dmitry Sivachenko created YARN-3066: --- Summary: Hadoop leaves orphaned tasks running after job is killed Key: YARN-3066 URL: https://issues.apache.org/jira/browse/YARN-3066 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1 Reporter: Dmitry Sivachenko When spawning user task, node manager checks for setsid(1) utility and spawns task program via it. See hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java for instance: String exec = Shell.isSetsidAvailable? exec setsid : exec; FreeBSD, unlike Linux, does not have setsid(1) utility. So plain exec is used to spawn user task. If that task spawns other external programs (this is common case if a task program is a shell script) and user kills job via mapred job -kill Job, these child processes remain running. 1) Why do you silently ignore the absence of setsid(1) and spawn task process via exec: this is the guarantee to have orphaned processes when job is prematurely killed. 2) FreeBSD has a replacement third-party program called ssid (which does almost the same as Linux's setsid). It would be nice to detect which binary is present during configure stage and put @SETSID@ macros into java file to use the correct name. I propose to make Shell.isSetsidAvailable test more strict and fail to start if it is not found: at least we will know about the problem at start rather than guess why there are orphaned tasks running forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed
[ https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279287#comment-14279287 ] Dmitry Sivachenko commented on YARN-3066: - Windows case is tested separately, see private static boolean isSetsidSupported() in hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shel l.java for instance: if (Shell.WINDOWS) { return false; } In any UNIX-like case I suppose it will leave orphaned processes, because if isSetsidSupported()==false it uses kill(pid) to kill task instead of kill(pgid) to kill the whole process group. ssid(1) in FreeBSD is the analog setsid(1) in Linux: userland wrapper for setsid() system call. Renaming does not sound as sane idea, because it is hard to convince all people to do rename of installed binaries by hand. I propose to treat it like system-dependent option and act accordingly. (I suppose other OS's like Solaris also lack setsid(1) utility so they could also benefit). For ssid source see http://tools.suckless.org/ssid/ As for backwards compatibility we can change that in 3.0, it is not fatal, failure to start without setsid will just remind users to install setsid() or ssid() and proceed futher, and be sure that there will be no side effects like orphaned tasks eating CPU. Hadoop leaves orphaned tasks running after job is killed Key: YARN-3066 URL: https://issues.apache.org/jira/browse/YARN-3066 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1 Reporter: Dmitry Sivachenko When spawning user task, node manager checks for setsid(1) utility and spawns task program via it. See hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java for instance: String exec = Shell.isSetsidAvailable? exec setsid : exec; FreeBSD, unlike Linux, does not have setsid(1) utility. So plain exec is used to spawn user task. If that task spawns other external programs (this is common case if a task program is a shell script) and user kills job via mapred job -kill Job, these child processes remain running. 1) Why do you silently ignore the absence of setsid(1) and spawn task process via exec: this is the guarantee to have orphaned processes when job is prematurely killed. 2) FreeBSD has a replacement third-party program called ssid (which does almost the same as Linux's setsid). It would be nice to detect which binary is present during configure stage and put @SETSID@ macros into java file to use the correct name. I propose to make Shell.isSetsidAvailable test more strict and fail to start if it is not found: at least we will know about the problem at start rather than guess why there are orphaned tasks running forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive
[ https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278720#comment-14278720 ] Hudson commented on YARN-2807: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2006 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2006/]) YARN-2807. Option --forceactive not works as described in usage of (xgong: rev d15cbae73c7ae22d5d60d8cba16cba565e8e8b20) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive Key: YARN-2807 URL: https://issues.apache.org/jira/browse/YARN-2807 Project: Hadoop YARN Issue Type: Sub-task Components: documentation, resourcemanager Reporter: Wangda Tan Assignee: Masatake Iwasaki Priority: Minor Fix For: 2.7.0 Attachments: YARN-2807.1.patch, YARN-2807.2.patch, YARN-2807.3.patch, YARN-2807.4.patch Currently the help message of yarn rmadmin -transitionToActive is: {code} transitionToActive: incorrect number of arguments Usage: HAAdmin [-transitionToActive serviceId [--forceactive]] {code} But the --forceactive not works as expected. When transition RM state with --forceactive: {code} yarn rmadmin -transitionToActive rm2 --forceactive Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state. If you are very sure you know what you are doing, please specify the forcemanual flag. {code} As shown above, we still cannot transitionToActive when automatic failover is enabled with --forceactive. The option can work is: {{--forcemanual}}, there's no place in usage describes this option. I think we should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2217) Shared cache client side changes
[ https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278731#comment-14278731 ] Hudson commented on YARN-2217: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #71 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/71/]) YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java Shared cache client side changes Key: YARN-2217 URL: https://issues.apache.org/jira/browse/YARN-2217 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch, YARN-2217-trunk-v9.patch Implement the client side changes for the shared cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2217) Shared cache client side changes
[ https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278722#comment-14278722 ] Hudson commented on YARN-2217: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2006 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2006/]) YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java Shared cache client side changes Key: YARN-2217 URL: https://issues.apache.org/jira/browse/YARN-2217 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch, YARN-2217-trunk-v9.patch Implement the client side changes for the shared cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278719#comment-14278719 ] Hudson commented on YARN-1492: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2006 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2006/]) YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Chris Trezzo Priority: Critical Attachments: YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, shared_cache_design_v5.pdf, shared_cache_design_v6.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive
[ https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278729#comment-14278729 ] Hudson commented on YARN-2807: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #71 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/71/]) YARN-2807. Option --forceactive not works as described in usage of (xgong: rev d15cbae73c7ae22d5d60d8cba16cba565e8e8b20) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java * hadoop-yarn-project/CHANGES.txt Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive Key: YARN-2807 URL: https://issues.apache.org/jira/browse/YARN-2807 Project: Hadoop YARN Issue Type: Sub-task Components: documentation, resourcemanager Reporter: Wangda Tan Assignee: Masatake Iwasaki Priority: Minor Fix For: 2.7.0 Attachments: YARN-2807.1.patch, YARN-2807.2.patch, YARN-2807.3.patch, YARN-2807.4.patch Currently the help message of yarn rmadmin -transitionToActive is: {code} transitionToActive: incorrect number of arguments Usage: HAAdmin [-transitionToActive serviceId [--forceactive]] {code} But the --forceactive not works as expected. When transition RM state with --forceactive: {code} yarn rmadmin -transitionToActive rm2 --forceactive Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state. If you are very sure you know what you are doing, please specify the forcemanual flag. {code} As shown above, we still cannot transitionToActive when automatic failover is enabled with --forceactive. The option can work is: {{--forcemanual}}, there's no place in usage describes this option. I think we should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278728#comment-14278728 ] Hudson commented on YARN-1492: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #71 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/71/]) YARN-2217. [YARN-1492] Shared cache client side changes. (Chris Trezzo via kasha) (kasha: rev ba5116ec8e0c075096c6f84a8c8a1c6ce8297cf2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestSharedCacheClientImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/SharedCacheClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/SharedCacheClientImpl.java truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Chris Trezzo Priority: Critical Attachments: YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, shared_cache_design_v5.pdf, shared_cache_design_v6.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3030) set up ATS writer with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279611#comment-14279611 ] Sangjin Lee commented on YARN-3030: --- Also, the current ATS timeline client API is based on REST. Do we want to use REST similarly, or do we want to consider using RPC? The standard pros and cons apply here: REST would be bit better with off-cluster arbitrary clients, minimize code coupling between client and server, and make things more symmetric between reads and writes. On the other hand, RPC would provide more flexibility in terms of operations. Thoughts? set up ATS writer with basic request serving structure and lifecycle Key: YARN-3030 URL: https://issues.apache.org/jira/browse/YARN-3030 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Per design in YARN-2928, create an ATS writer as a service, and implement the basic service structure including the lifecycle management. Also, as part of this JIRA, we should come up with the ATS client API for sending requests to this ATS writer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3009) TimelineWebServices always parses primary and secondary filters as numbers if first char is a number
[ https://issues.apache.org/jira/browse/YARN-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279664#comment-14279664 ] Zhijie Shen commented on YARN-3009: --- [~Naganarasimha], thanks for the patch. I think your work around is going to mitigate the problem. However, my concern is whether we should do this work around instead of how to do it correctly. While I understand it's counter-intuitive to use (double) quotes to enforce the value as a string, I'm afraid *atoi* or *atof* of jackson parser is probably doing the right thing. A string that starts with numeric char, but contains non-numeric char could still be a valid number. For example,the value is {{123456D}} or {{123.45E+6}}. On the other side, we can also consider them as a string, e.g., representing an ID. TimelineWebServices always parses primary and secondary filters as numbers if first char is a number Key: YARN-3009 URL: https://issues.apache.org/jira/browse/YARN-3009 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Chris K Wensel Assignee: Naganarasimha G R Attachments: YARN-3009.20150108-1.patch, YARN-3009.20150111-1.patch If you pass a filter value that starts with a number (7CCA...), the filter value will be parsed into the Number '7' causing the filter to fail the search. Should be noted the actual value as stored via a PUT operation is properly parsed and stored as a String. This manifests as a very hard to identify issue with DAGClient in Apache Tez and naming dags/vertices with alphanumeric guid values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3030) set up ATS writer with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279680#comment-14279680 ] Zhijie Shen commented on YARN-3030: --- bq. work out the timeline client API (both sync and async) I'm wondering if we have to finalize the data model, such as entities, events and metrics first, because the APIs are going to operate on these stuff right. bq. Do we want to use REST similarly, or do we want to consider using RPC? I suggest going on with REST now, as we may easily reuse existing REST communication stack, but we can isolate the client/server interface and the underlying communication layer. And in the future, if we want to take advantage of the operation flexibility of RPC, we can implement the interface with protos and replace the REST one. set up ATS writer with basic request serving structure and lifecycle Key: YARN-3030 URL: https://issues.apache.org/jira/browse/YARN-3030 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Per design in YARN-2928, create an ATS writer as a service, and implement the basic service structure including the lifecycle management. Also, as part of this JIRA, we should come up with the ATS client API for sending requests to this ATS writer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)