[jira] [Updated] (YARN-3764) CapacityScheduler forbid of moving LeafQueue from one parent to another
[ https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3764: - Description: Currently CapacityScheduler doesn't handle the case well, for example: A queue structure: {code} root | a (100) / \ x y (50) (50) {code} And reinitialize using following structure: {code} root / \ (50)a x (50) | y (100) {code} The actual queue structure after reinitialize is: {code} root /\ a (50) x (50) / \ xy (50) (100) {code} We should forbid admin doing that. was: Currently CapacityScheduler doesn't handle the case well, for example: A queue structure: {code} root | a (100) / \ x y (50) (50) {code} And reinitialize using following structure: {code} root / \ (50)a x (50) | y (100) {code} The actual queue structure after reinitialize is: {code} root /\ a (50) x (50) / \ xy (50) (100) {code} We should handle this case better. CapacityScheduler forbid of moving LeafQueue from one parent to another --- Key: YARN-3764 URL: https://issues.apache.org/jira/browse/YARN-3764 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Priority: Blocker Currently CapacityScheduler doesn't handle the case well, for example: A queue structure: {code} root | a (100) / \ x y (50) (50) {code} And reinitialize using following structure: {code} root / \ (50)a x (50) | y (100) {code} The actual queue structure after reinitialize is: {code} root /\ a (50) x (50) / \ xy (50) (100) {code} We should forbid admin doing that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571500#comment-14571500 ] Sunil G commented on YARN-3751: --- Thank you [~zjshen] for committing the patch! TestAHSWebServices fails after YARN-3467 Key: YARN-3751 URL: https://issues.apache.org/jira/browse/YARN-3751 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Sunil G Fix For: 2.8.0 Attachments: 0001-YARN-3751.patch YARN-3467 changed AppInfo and assumed that used resource is not null. It's not true as this information is not published to timeline server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571607#comment-14571607 ] Vrushali C commented on YARN-3411: -- After evaluating both approaches of backend storage implementations in terms of their performance, scalability, usability, maintenance as given by YARN-3134 (Phoenix based HBase schema) and YARN-3411 (hybrid HBase schema - vanilla HBase tables in the direct write path and phoenix based tables for reporting), conclusion is to use vanilla hbase tables in the direct write path. Attached to YARN-2928 is a write-up that describes how we ended up choosing the approach of writing to vanilla HBase tables (YARN-3411) in the direct write path. [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Fix For: YARN-2928 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571606#comment-14571606 ] Vrushali C commented on YARN-3134: -- After evaluating both approaches of backend storage implementations in terms of their performance, scalability, usability, maintenance as given by YARN-3134 (Phoenix based HBase schema) and YARN-3411 (hybrid HBase schema - vanilla HBase tables in the direct write path and phoenix based tables for reporting), conclusion is to use vanilla hbase tables in the direct write path. Attached to YARN-2928 is a write-up that describes how we ended up choosing the approach of writing to vanilla HBase tables (YARN-3411) in the direct write path. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Fix For: YARN-2928 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, YARN-3134-YARN-2928.007.patch, YARN-3134DataSchema.pdf, hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3765) Fix findbugs the warning in YARN-2928 branch, TimelineMetric
[ https://issues.apache.org/jira/browse/YARN-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3765: Description: There is one warning about reversing the return value of comparisons in YARN-2928 branch. This is a valid warning. Quoting the findbugs warning message: RV_NEGATING_RESULT_OF_COMPARETO: Negating the result of compareTo()/compare() This code negatives the return value of a compareTo or compare method. This is a questionable or bad programming practice, since if the return value is Integer.MIN_VALUE, negating the return value won't negate the sign of the result. You can achieve the same intended result by reversing the order of the operands rather than by negating the results. was:There is one warning about reversing the return value of comparisons in YARN-2928 branch. I believe this is a false alarm since we intentionally said the comparator is a reversed comparator. Fix findbugs the warning in YARN-2928 branch, TimelineMetric Key: YARN-3765 URL: https://issues.apache.org/jira/browse/YARN-3765 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu Attachments: YARN-3765-YARN-2928.001.patch There is one warning about reversing the return value of comparisons in YARN-2928 branch. This is a valid warning. Quoting the findbugs warning message: RV_NEGATING_RESULT_OF_COMPARETO: Negating the result of compareTo()/compare() This code negatives the return value of a compareTo or compare method. This is a questionable or bad programming practice, since if the return value is Integer.MIN_VALUE, negating the return value won't negate the sign of the result. You can achieve the same intended result by reversing the order of the operands rather than by negating the results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
[ https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571480#comment-14571480 ] Hadoop QA commented on YARN-3453: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 55s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 37s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 46s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 25s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 50m 14s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 88m 10s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12737319/YARN-3453.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c59e745 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8179/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8179/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8179/console | This message was automatically generated. Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing Key: YARN-3453 URL: https://issues.apache.org/jira/browse/YARN-3453 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Ashwin Shankar Assignee: Arun Suresh Attachments: YARN-3453.1.patch There are two places in preemption code flow where DefaultResourceCalculator is used, even in DRF mode. Which basically results in more resources getting preempted than needed, and those extra preempted containers aren’t even getting to the “starved” queue since scheduling logic is based on DRF's Calculator. Following are the two places : 1. {code:title=FSLeafQueue.java|borderStyle=solid} private boolean isStarved(Resource share) {code} A queue shouldn’t be marked as “starved” if the dominant resource usage is = fair/minshare. 2. {code:title=FairScheduler.java|borderStyle=solid} protected Resource resToPreempt(FSLeafQueue sched, long curTime) {code} -- One more thing that I believe needs to change in DRF mode is : during a preemption round,if preempting a few containers results in satisfying needs of a resource type, then we should exit that preemption round, since the containers that we just preempted should bring the dominant resource usage to min/fair share. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3044: Attachment: YARN-3044-YARN-2928.010.patch Hi [~zjshen], Please find the attached rebased patch to incorporate YARN-1462 [Event producers] Implement RM writing app lifecycle events to ATS -- Key: YARN-3044 URL: https://issues.apache.org/jira/browse/YARN-3044 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3044-YARN-2928.004.patch, YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, YARN-3044-YARN-2928.007.patch, YARN-3044-YARN-2928.008.patch, YARN-3044-YARN-2928.009.patch, YARN-3044-YARN-2928.010.patch, YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, YARN-3044.20150416-1.patch Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3276) Refactor and fix null casting in some map cast for TimelineEntity (old and new) and fix findbug warnings
[ https://issues.apache.org/jira/browse/YARN-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571610#comment-14571610 ] Li Lu commented on YARN-3276: - The -1 appears to be irrelevant to the fix in this patch. I can confirm we actually have one findbugs warning in TimelineMetric, reverse comparator. Will open a separate JIRA to do a quick fix. Refactor and fix null casting in some map cast for TimelineEntity (old and new) and fix findbug warnings Key: YARN-3276 URL: https://issues.apache.org/jira/browse/YARN-3276 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Junping Du Assignee: Junping Du Attachments: YARN-3276-YARN-2928.v3.patch, YARN-3276-YARN-2928.v4.patch, YARN-3276-YARN-2928.v5-fix-checkstyle.patch, YARN-3276-YARN-2928.v5.patch, YARN-3276-YARN-2928.v6.patch, YARN-3276-v2.patch, YARN-3276-v3.patch, YARN-3276.patch Per discussion in YARN-3087, we need to refactor some similar logic to cast map to hashmap and get rid of NPE issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3765) Fix findbugs the warning in YARN-2928 branch, TimelineMetric
[ https://issues.apache.org/jira/browse/YARN-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3765: Attachment: YARN-3765-YARN-2928.001.patch I looked into the warning message, and now I believe it's not a false alarm. Previously we're directly negating the comparison, but this may potentially hit integer overflow. A simple fix is to reverse the direction of the comparison. Fix findbugs the warning in YARN-2928 branch, TimelineMetric Key: YARN-3765 URL: https://issues.apache.org/jira/browse/YARN-3765 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu Attachments: YARN-3765-YARN-2928.001.patch There is one warning about reversing the return value of comparisons in YARN-2928 branch. I believe this is a false alarm since we intentionally said the comparator is a reversed comparator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3276) Refactor and fix null casting in some map cast for TimelineEntity (old and new) and fix findbug warnings
[ https://issues.apache.org/jira/browse/YARN-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3276: -- Attachment: YARN-3276-YARN-2928.v6.patch +1 for the last patch. Rebase it against the latest branch. Will commit it after jenkins comment. Refactor and fix null casting in some map cast for TimelineEntity (old and new) and fix findbug warnings Key: YARN-3276 URL: https://issues.apache.org/jira/browse/YARN-3276 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Junping Du Assignee: Junping Du Attachments: YARN-3276-YARN-2928.v3.patch, YARN-3276-YARN-2928.v4.patch, YARN-3276-YARN-2928.v5-fix-checkstyle.patch, YARN-3276-YARN-2928.v5.patch, YARN-3276-YARN-2928.v6.patch, YARN-3276-v2.patch, YARN-3276-v3.patch, YARN-3276.patch Per discussion in YARN-3087, we need to refactor some similar logic to cast map to hashmap and get rid of NPE issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2928) YARN Timeline Service: Next generation
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-2928: - Attachment: TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf We decided to evaluate two approaches of backend storage implementations in terms of their performance, scalability, usability, maintenance: YARN-3134 (Phoenix based HBase schema) and YARN-3411 (hybrid HBase schema - vanilla HBase tables in the direct write path and phoenix based tables for reporting). Attaching a write-up that describes how we ended up choosing the approach of writing to vanilla HBase tables (YARN-3411) in the direct write path. YARN Timeline Service: Next generation -- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx, TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3765) Fix findbugs the warning in YARN-2928 branch, TimelineMetric
Li Lu created YARN-3765: --- Summary: Fix findbugs the warning in YARN-2928 branch, TimelineMetric Key: YARN-3765 URL: https://issues.apache.org/jira/browse/YARN-3765 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu There is one warning about reversing the return value of comparisons in YARN-2928 branch. I believe this is a false alarm since we intentionally said the comparator is a reversed comparator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571503#comment-14571503 ] Hudson commented on YARN-3751: -- FAILURE: Integrated in Hadoop-trunk-Commit #7952 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7952/]) YARN-3751. Fixed AppInfo to check if used resources are null. Contributed by Sunil G. (zjshen: rev dbc4f64937ea2b4c941a3ac49afc4eeba3f5b763) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java * hadoop-yarn-project/CHANGES.txt TestAHSWebServices fails after YARN-3467 Key: YARN-3751 URL: https://issues.apache.org/jira/browse/YARN-3751 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Sunil G Fix For: 2.8.0 Attachments: 0001-YARN-3751.patch YARN-3467 changed AppInfo and assumed that used resource is not null. It's not true as this information is not published to timeline server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571502#comment-14571502 ] Karthik Kambatla commented on YARN-3762: Thanks for the review, Arun. Good points. bq. what happens if the collection is modified in between.. The two possible modifications are adding/removing a child queue. Adding a child queue to the end of the list doesn't affect container assignment. Removing a child queue affects container assignment, but that is a good thing. We should probably add a comment to that effect so we don't forget this in the future. bq. instead of using a List and sorting it everytime, we could use a Sorted Bag (MultiSet) ? One issue with using a sorted list is the sorting happens on addition/removal. FSQueues already in the list also change affecting the order. May be, we could remove and re-insert the queue if anything changes, but that is a much bigger change and needs to be carefully evaluated for performance. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo --- Key: YARN-3762 URL: https://issues.apache.org/jira/browse/YARN-3762 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: yarn-3762-1.patch, yarn-3762-1.patch In our testing, we ran into the following ConcurrentModificationException: {noformat} halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, queueName=root.testyarnpool3, queueCurrentCapacity=0.0, queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client java.util.ConcurrentModificationException: java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util.ArrayList$Itr.next(ArrayList.java:851) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1942) Many of ConverterUtils methods need to have public interfaces
[ https://issues.apache.org/jira/browse/YARN-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571623#comment-14571623 ] Zhijie Shen commented on YARN-1942: --- It seems that we have more than ConverterUtils that has been referenced by external projects. For example, in YARN-1462, we just encountered the issue that newInstance is marked as \@Private, but it's actually referenced by Tez. We need to check the public methods that are annotated as \@Private in api/common module. If they are useful to or reasonably referenced by the downstream projects, we should mark them \@Public. Sid has suggested to take MR as the example. If there're some such methods used by MR, it's very likely to be used by others too. Many of ConverterUtils methods need to have public interfaces - Key: YARN-1942 URL: https://issues.apache.org/jira/browse/YARN-1942 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.4.0 Reporter: Thomas Graves Assignee: Wangda Tan Priority: Critical Attachments: YARN-1942.1.patch, YARN-1942.2.patch ConverterUtils has a bunch of functions that are useful to application masters. It should either be made public or we make some of the utilities in it public or we provide other external apis for application masters to use. Note that distributedshell and MR are both using these interfaces. For instance the main use case I see right now is for getting the application attempt id within the appmaster: String containerIdStr = System.getenv(Environment.CONTAINER_ID.name()); ConverterUtils.toContainerId ContainerId containerId = ConverterUtils.toContainerId(containerIdStr); ApplicationAttemptId applicationAttemptId = containerId.getApplicationAttemptId(); I don't see any other way for the application master to get this information. If there is please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another
[ https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3764: - Attachment: YARN-3764.1.patch Attached initial patch for review. CapacityScheduler should forbid moving LeafQueue from one parent to another --- Key: YARN-3764 URL: https://issues.apache.org/jira/browse/YARN-3764 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Priority: Blocker Attachments: YARN-3764.1.patch Currently CapacityScheduler doesn't handle the case well, for example: A queue structure: {code} root | a (100) / \ x y (50) (50) {code} And reinitialize using following structure: {code} root / \ (50)a x (50) | y (100) {code} The actual queue structure after reinitialize is: {code} root /\ a (50) x (50) / \ xy (50) (100) {code} We should forbid admin doing that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3764) CapacityScheduler should properly handle moving LeafQueue from one parent to another
[ https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571479#comment-14571479 ] Vinod Kumar Vavilapalli commented on YARN-3764: --- bq. A short term fix is don't allow remove queue under parentQueue. We never supported removing queues. So this is not just a short-term fix, this is the right fix for now. CapacityScheduler should properly handle moving LeafQueue from one parent to another Key: YARN-3764 URL: https://issues.apache.org/jira/browse/YARN-3764 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Priority: Blocker Currently CapacityScheduler doesn't handle the case well, for example: A queue structure: {code} root | a (100) / \ x y (50) (50) {code} And reinitialize using following structure: {code} root / \ (50)a x (50) | y (100) {code} The actual queue structure after reinitialize is: {code} root /\ a (50) x (50) / \ xy (50) (100) {code} We should handle this case better. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another
[ https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571486#comment-14571486 ] Wangda Tan commented on YARN-3764: -- [~vinodkv], agree. Update the title/desc and will search/file separated ticket for moving/removing queue. CapacityScheduler should forbid moving LeafQueue from one parent to another --- Key: YARN-3764 URL: https://issues.apache.org/jira/browse/YARN-3764 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Priority: Blocker Currently CapacityScheduler doesn't handle the case well, for example: A queue structure: {code} root | a (100) / \ x y (50) (50) {code} And reinitialize using following structure: {code} root / \ (50)a x (50) | y (100) {code} The actual queue structure after reinitialize is: {code} root /\ a (50) x (50) / \ xy (50) (100) {code} We should forbid admin doing that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another
[ https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3764: - Summary: CapacityScheduler should forbid moving LeafQueue from one parent to another (was: CapacityScheduler forbid of moving LeafQueue from one parent to another) CapacityScheduler should forbid moving LeafQueue from one parent to another --- Key: YARN-3764 URL: https://issues.apache.org/jira/browse/YARN-3764 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Priority: Blocker Currently CapacityScheduler doesn't handle the case well, for example: A queue structure: {code} root | a (100) / \ x y (50) (50) {code} And reinitialize using following structure: {code} root / \ (50)a x (50) | y (100) {code} The actual queue structure after reinitialize is: {code} root /\ a (50) x (50) / \ xy (50) (100) {code} We should forbid admin doing that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3276) Refactor and fix null casting in some map cast for TimelineEntity (old and new) and fix findbug warnings
[ https://issues.apache.org/jira/browse/YARN-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571590#comment-14571590 ] Hadoop QA commented on YARN-3276: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 21m 40s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 11m 0s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 12m 7s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 25s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 41s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 2m 15s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 50s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 55s | The patch does not introduce any new Findbugs (version 3.0.0) warnings, and fixes 1 pre-existing warnings. | | {color:green}+1{color} | yarn tests | 0m 28s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 2m 18s | Tests passed in hadoop-yarn-common. | | | | 56m 43s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12737351/YARN-3276-YARN-2928.v6.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 2e12480 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8183/artifact/patchprocess/YARN-2928FindbugsWarningshadoop-yarn-api.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8183/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8183/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8183/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8183/console | This message was automatically generated. Refactor and fix null casting in some map cast for TimelineEntity (old and new) and fix findbug warnings Key: YARN-3276 URL: https://issues.apache.org/jira/browse/YARN-3276 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Junping Du Assignee: Junping Du Attachments: YARN-3276-YARN-2928.v3.patch, YARN-3276-YARN-2928.v4.patch, YARN-3276-YARN-2928.v5-fix-checkstyle.patch, YARN-3276-YARN-2928.v5.patch, YARN-3276-YARN-2928.v6.patch, YARN-3276-v2.patch, YARN-3276-v3.patch, YARN-3276.patch Per discussion in YARN-3087, we need to refactor some similar logic to cast map to hashmap and get rid of NPE issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571611#comment-14571611 ] Hadoop QA commented on YARN-3762: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 56s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 45s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 50m 7s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 87m 55s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12737349/yarn-3762-2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / dbc4f64 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8182/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8182/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8182/console | This message was automatically generated. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo --- Key: YARN-3762 URL: https://issues.apache.org/jira/browse/YARN-3762 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch In our testing, we ran into the following ConcurrentModificationException: {noformat} halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, queueName=root.testyarnpool3, queueCurrentCapacity=0.0, queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client java.util.ConcurrentModificationException: java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util.ArrayList$Itr.next(ArrayList.java:851) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reopened YARN-1462: --- Per discussion offline with Sid, my proposal is: 1.Revert the current commit, create and commit a new patch with compatible newInstance change. 2. Do not change the annotation from Private to Public as it's separate issue. File another jira or link to the existing jira to track the problem of downstream projects' reference to private methods in api/common module. AHS API and other AHS changes to handle tags for completed MR jobs -- Key: YARN-1462 URL: https://issues.apache.org/jira/browse/YARN-1462 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Xuan Gong Fix For: 2.8.0 Attachments: YARN-1462-branch-2.7-1.2.patch, YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, YARN-1462.3.patch AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571618#comment-14571618 ] Karthik Kambatla commented on YARN-3762: Thanks Arun. Checking this in. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo --- Key: YARN-3762 URL: https://issues.apache.org/jira/browse/YARN-3762 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch In our testing, we ran into the following ConcurrentModificationException: {noformat} halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, queueName=root.testyarnpool3, queueCurrentCapacity=0.0, queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client java.util.ConcurrentModificationException: java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util.ArrayList$Itr.next(ArrayList.java:851) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3762: --- Attachment: yarn-3762-2.patch Updated the patch to add more comments. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo --- Key: YARN-3762 URL: https://issues.apache.org/jira/browse/YARN-3762 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch In our testing, we ran into the following ConcurrentModificationException: {noformat} halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, queueName=root.testyarnpool3, queueCurrentCapacity=0.0, queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client java.util.ConcurrentModificationException: java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util.ArrayList$Itr.next(ArrayList.java:851) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571626#comment-14571626 ] Hudson commented on YARN-3762: -- FAILURE: Integrated in Hadoop-trunk-Commit #7955 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7955/]) YARN-3762. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo. (kasha) (kasha: rev edb9cd0f7aa1ecaf34afaa120e3d79583e0ec689) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java * hadoop-yarn-project/CHANGES.txt FairScheduler: CME on FSParentQueue#getQueueUserAclInfo --- Key: YARN-3762 URL: https://issues.apache.org/jira/browse/YARN-3762 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Fix For: 2.8.0 Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch In our testing, we ran into the following ConcurrentModificationException: {noformat} halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, queueName=root.testyarnpool3, queueCurrentCapacity=0.0, queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client java.util.ConcurrentModificationException: java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util.ArrayList$Itr.next(ArrayList.java:851) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3764) CapacityScheduler forbid of moving LeafQueue from one parent to another
[ https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3764: - Summary: CapacityScheduler forbid of moving LeafQueue from one parent to another (was: CapacityScheduler should properly handle moving LeafQueue from one parent to another) CapacityScheduler forbid of moving LeafQueue from one parent to another --- Key: YARN-3764 URL: https://issues.apache.org/jira/browse/YARN-3764 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Priority: Blocker Currently CapacityScheduler doesn't handle the case well, for example: A queue structure: {code} root | a (100) / \ x y (50) (50) {code} And reinitialize using following structure: {code} root / \ (50)a x (50) | y (100) {code} The actual queue structure after reinitialize is: {code} root /\ a (50) x (50) / \ xy (50) (100) {code} We should handle this case better. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571512#comment-14571512 ] Arun Suresh commented on YARN-3762: --- Makes sense +1, LGTM FairScheduler: CME on FSParentQueue#getQueueUserAclInfo --- Key: YARN-3762 URL: https://issues.apache.org/jira/browse/YARN-3762 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch In our testing, we ran into the following ConcurrentModificationException: {noformat} halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, queueName=root.testyarnpool3, queueCurrentCapacity=0.0, queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client java.util.ConcurrentModificationException: java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util.ArrayList$Itr.next(ArrayList.java:851) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs
[ https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571614#comment-14571614 ] Hadoop QA commented on YARN-3749: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 2s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 8 new or modified test files. | | {color:green}+1{color} | javac | 7m 35s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 14s | The applied patch generated 1 new checkstyle issues (total was 212, now 213). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 6m 11s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 7m 0s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 1m 57s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 60m 24s | Tests failed in hadoop-yarn-server-resourcemanager. | | {color:green}+1{color} | yarn tests | 1m 52s | Tests passed in hadoop-yarn-server-tests. | | | | 120m 57s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12737337/YARN-3749.7.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c59e745 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8181/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8181/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8181/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8181/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8181/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-tests test log | https://builds.apache.org/job/PreCommit-YARN-Build/8181/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8181/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8181/console | This message was automatically generated. We should make a copy of configuration when init MiniYARNCluster with multiple RMs -- Key: YARN-3749 URL: https://issues.apache.org/jira/browse/YARN-3749 Project: Hadoop YARN Issue Type: Bug Reporter: Chun Chen Assignee: Chun Chen Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, YARN-3749.patch When I was trying to write a test case for YARN-2674, I found DS client trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 when RM failover. But I initially set yarn.resourcemanager.address.rm1=0.0.0.0:18032, yarn.resourcemanager.address.rm2=0.0.0.0:28032 After digging, I found it is in ClientRMService where the value of yarn.resourcemanager.address.rm2 changed to 0.0.0.0:18032. See the following code in ClientRMService: {code} clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST, YarnConfiguration.RM_ADDRESS, YarnConfiguration.DEFAULT_RM_ADDRESS,
[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571481#comment-14571481 ] Zhijie Shen commented on YARN-3751: --- +1 LGTM No more test case is required, while the existing one covers the code already. Will commit the patch. TestAHSWebServices fails after YARN-3467 Key: YARN-3751 URL: https://issues.apache.org/jira/browse/YARN-3751 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Sunil G Attachments: 0001-YARN-3751.patch YARN-3467 changed AppInfo and assumed that used resource is not null. It's not true as this information is not published to timeline server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3764) CapacityScheduler should properly handle moving LeafQueue from one parent to another
[ https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571477#comment-14571477 ] Wangda Tan commented on YARN-3764: -- Following test case can verify this issue: {code} @Test public void testQueueParsingWithMoveQueue() throws IOException { YarnConfiguration conf = new YarnConfiguration(); CapacitySchedulerConfiguration csConf = new CapacitySchedulerConfiguration(conf); csConf.setQueues(root, new String[] { a }); csConf.setQueues(root.a, new String[] { x, y }); csConf.setCapacity(root.a, 100); csConf.setCapacity(root.a.x, 50); csConf.setCapacity(root.a.y, 50); CapacityScheduler capacityScheduler = new CapacityScheduler(); RMContextImpl rmContext = new RMContextImpl(null, null, null, null, null, null, new RMContainerTokenSecretManager(csConf), new NMTokenSecretManagerInRM(csConf), new ClientToAMTokenSecretManagerInRM(), null); rmContext.setNodeLabelManager(nodeLabelManager); capacityScheduler.setConf(csConf); capacityScheduler.setRMContext(rmContext); capacityScheduler.init(csConf); capacityScheduler.start(); csConf.setQueues(root, new String[] { a, x }); csConf.setQueues(root.a, new String[] { y }); csConf.setCapacity(root.x, 50); csConf.setCapacity(root.a, 50); csConf.setCapacity(root.a.y, 100); capacityScheduler.reinitialize(csConf, rmContext); Assert.assertEquals(1, ((ParentQueue) capacityScheduler.getQueue(a)) .getChildQueues().size()); } {code} CapacityScheduler should properly handle moving LeafQueue from one parent to another Key: YARN-3764 URL: https://issues.apache.org/jira/browse/YARN-3764 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Priority: Blocker Currently CapacityScheduler doesn't handle the case well, for example: A queue structure: {code} root | a (100) / \ x y (50) (50) {code} And reinitialize using following structure: {code} root / \ (50)a x (50) | y (100) {code} The actual queue structure after reinitialize is: {code} root /\ a (50) x (50) / \ xy (50) (100) {code} We should handle this case better. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571633#comment-14571633 ] Xuan Gong commented on YARN-1462: - I am ok with this plan. bq. 1.Revert the current commit, create and commit a new patch with compatible newInstance change. Looks like that we have to revert two commits {code} commit 0b5cfacde638bc25cc010cd9236369237b4e51a8 Author: Xuan xg...@apache.org Date: Mon Jun 1 11:39:00 2015 -0700 YARN-1462. Correct fix version from branch-2.7.1 to branch-2.8 in CHANGES.txt {code} And {code} commit 4a9ec1a8243e2394ff7221b1c20dfaa80e9f5111 Author: Zhijie Shen zjs...@apache.org Date: Sat May 30 09:35:59 2015 -0700 YARN-1462. Made RM write application tags to timeline server and exposed them to users via generic history web UI and REST API. Contributed by Xuan Gong. {code} bq. 2. Do not change the annotation from Private to Public as it's separate issue. File another jira or link to the existing jira to track the problem of downstream projects' reference to private methods in api/common module. Link https://issues.apache.org/jira/browse/YARN-1942 AHS API and other AHS changes to handle tags for completed MR jobs -- Key: YARN-1462 URL: https://issues.apache.org/jira/browse/YARN-1462 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Xuan Gong Fix For: 2.8.0 Attachments: YARN-1462-branch-2.7-1.2.patch, YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, YARN-1462.3.patch AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-19) 4-layer topology (with NodeGroup layer) implementation of Container Assignment and Task Scheduling (for YARN)
[ https://issues.apache.org/jira/browse/YARN-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571632#comment-14571632 ] Abin Shahab commented on YARN-19: - Hi [~jdu] Can this be merged in 2.8? We at Altiscale are encountering issues with split NM and DN. If YARN does not know how to schedule a container based on topology, locality suffers. [~raviprakash] and [~aw], what do you guys think? 4-layer topology (with NodeGroup layer) implementation of Container Assignment and Task Scheduling (for YARN) - Key: YARN-19 URL: https://issues.apache.org/jira/browse/YARN-19 Project: Hadoop YARN Issue Type: New Feature Reporter: Junping Du Assignee: Junping Du Attachments: HADOOP-8475-ContainerAssignmentTaskScheduling-withNodeGroup.patch, MAPREDUCE-4310-v1.patch, MAPREDUCE-4310.patch, YARN-19-v2.patch, YARN-19-v3-alpha.patch, YARN-19-v4.patch, YARN-19.patch There are several classes in YARN’s container assignment and task scheduling algorithms that related to data locality which were updated to give preference to running a container on the same nodegroup. This section summarized the changes in the patch that provides a new implementation to support a four-layer hierarchy. When the ApplicationMaster makes a resource allocation request to the scheduler of ResourceManager, it will add the node group to the list of attributes in the ResourceRequest. The parameters of the resource request will change from priority, (host, rack, *), memory, #containers to priority, (host, nodegroup, rack, *), memory, #containers. After receiving the ResoureRequest the RM scheduler will assign containers for requests in the sequence of data-local, nodegroup-local, rack-local and off-switch.Then, ApplicationMaster schedules tasks on allocated containers in sequence of data- local, nodegroup-local, rack-local and off-switch. In terms of code changes made to YARN task scheduling, we updated the class ContainerRequestEvent so that applications can requests for containers can include anodegroup. In RM schedulers, FifoScheduler and CapacityScheduler were updated. For the FifoScheduler, the changes were in the method assignContainers. For the Capacity Scheduler the method assignContainersOnNode in the class of LeafQueue was updated. In both changes a new method, assignNodeGroupLocalContainers() was added in between the assignment data-local and rack-local. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
[ https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571542#comment-14571542 ] Hudson commented on YARN-3585: -- FAILURE: Integrated in Hadoop-trunk-Commit #7953 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7953/]) YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled. Contributed by Rohith Sharmaks (jlowe: rev e13b671aa510f553f4a6a232b4694b6a4cce88ae) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * hadoop-yarn-project/CHANGES.txt NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled -- Key: YARN-3585 URL: https://issues.apache.org/jira/browse/YARN-3585 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Peng Zhang Assignee: Rohith Priority: Critical Fix For: 2.7.1 Attachments: 0001-YARN-3585.patch, YARN-3585.patch With NM recovery enabled, after decommission, nodemanager log show stop but process cannot end. non daemon thread: {noformat} DestroyJavaVM prio=10 tid=0x7f3460011800 nid=0x29ec waiting on condition [0x] leveldb prio=10 tid=0x7f3354001800 nid=0x2a97 runnable [0x] VM Thread prio=10 tid=0x7f3460167000 nid=0x29f8 runnable Gang worker#0 (Parallel GC Threads) prio=10 tid=0x7f346002 nid=0x29ed runnable Gang worker#1 (Parallel GC Threads) prio=10 tid=0x7f3460022000 nid=0x29ee runnable Gang worker#2 (Parallel GC Threads) prio=10 tid=0x7f3460024000 nid=0x29ef runnable Gang worker#3 (Parallel GC Threads) prio=10 tid=0x7f3460025800 nid=0x29f0 runnable Gang worker#4 (Parallel GC Threads) prio=10 tid=0x7f3460027800 nid=0x29f1 runnable Gang worker#5 (Parallel GC Threads) prio=10 tid=0x7f3460029000 nid=0x29f2 runnable Gang worker#6 (Parallel GC Threads) prio=10 tid=0x7f346002b000 nid=0x29f3 runnable Gang worker#7 (Parallel GC Threads) prio=10 tid=0x7f346002d000 nid=0x29f4 runnable Concurrent Mark-Sweep GC Thread prio=10 tid=0x7f3460120800 nid=0x29f7 runnable Gang worker#0 (Parallel CMS Threads) prio=10 tid=0x7f346011c800 nid=0x29f5 runnable Gang worker#1 (Parallel CMS Threads) prio=10 tid=0x7f346011e800 nid=0x29f6 runnable VM Periodic Task Thread prio=10 tid=0x7f346019f800 nid=0x2a01 waiting on condition {noformat} and jni leveldb thread stack {noformat} Thread 12 (Thread 0x7f33dd842700 (LWP 10903)): #0 0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x7f33dfce2a3b in leveldb::(anonymous namespace)::PosixEnv::BGThreadWrapper(void*) () from /tmp/libleveldbjni-64-1-6922178968300745716.8 #2 0x003d83407851 in start_thread () from /lib64/libpthread.so.0 #3 0x003d830e811d in clone () from /lib64/libc.so.6 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3510) Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness
[ https://issues.apache.org/jira/browse/YARN-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571575#comment-14571575 ] Wangda Tan commented on YARN-3510: -- [~sunilg], bq. and in general if the new approach gives a more fair preemption, then we can move to that. The approach mentioend by [~cwelch] at https://issues.apache.org/jira/browse/YARN-3510?focusedCommentId=14571405page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14571405 is the true fair. You can come back to look at the patch once it is uploaded. Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness Key: YARN-3510 URL: https://issues.apache.org/jira/browse/YARN-3510 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3510.2.patch, YARN-3510.3.patch, YARN-3510.5.patch, YARN-3510.6.patch The ProportionalCapacityPreemptionPolicy preempts as many containers from applications as it can during it's preemption run. For fifo this makes sense, as it is prempting in reverse order therefore maintaining the primacy of the oldest. For fair ordering this does not have the desired effect - instead, it should preempt a number of containers from each application which maintains a fair balance /close to a fair balance between them -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation
[ https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571593#comment-14571593 ] Karthik Kambatla commented on YARN-3655: bq. IMHO, It is not good to add if (isValidReservation) check in FSAppAttempt#reserve because all the conditions checked in isValidReservation are already checked before we call FSAppAttempt#reserve, it will be duplicate code which will affect the performance. Is it possible to avoid the checks before the call, and do all the checks in the call. The reasoning behind this is to have all reservation-related code in as few places as possible. If this is not possible, we can leave it as the patch has it now. bq. While adding this check in FSAppAttempt#assignContainer(node) might work in practice, it somehow feels out of place. Instead of adding the check to assignContainer(node) can we add it to assignContainer(node, request, nodeType, reserved)? FairScheduler: potential livelock due to maxAMShare limitation and container reservation - Key: YARN-3655 URL: https://issues.apache.org/jira/browse/YARN-3655 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3655.000.patch, YARN-3655.001.patch, YARN-3655.002.patch, YARN-3655.003.patch FairScheduler: potential livelock due to maxAMShare limitation and container reservation. If a node is reserved by an application, all the other applications don't have any chance to assign a new container on this node, unless the application which reserves the node assigns a new container on this node or releases the reserved container on this node. The problem is if an application tries to call assignReservedContainer and fail to get a new container due to maxAMShare limitation, it will block all other applications to use the nodes it reserves. If all other running applications can't release their AM containers due to being blocked by these reserved containers. A livelock situation can happen. The following is the code at FSAppAttempt#assignContainer which can cause this potential livelock. {code} // Check the AM resource usage for the leaf queue if (!isAmRunning() !getUnmanagedAM()) { ListResourceRequest ask = appSchedulingInfo.getAllResourceRequests(); if (ask.isEmpty() || !getQueue().canRunAppAM( ask.get(0).getCapability())) { if (LOG.isDebugEnabled()) { LOG.debug(Skipping allocation because maxAMShare limit would + be exceeded); } return Resources.none(); } } {code} To fix this issue, we can unreserve the node if we can't allocate the AM container on the node due to Max AM share limitation and the node is reserved by the application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571732#comment-14571732 ] Zhijie Shen commented on YARN-3044: --- [~Naganarasimha], thanks for updating the patch. It looks good to me so far, but I want to hold the patch for the following issues. 1. After YARN-3276 is committed, this patch will conflict on {{return l2.compareTo(l1);}}. 2. We're reworking YARN-1462. It won't affect this patch, but there's commit revert. Let's wait until YARN-1462 is done. 3. It not caused by this patch, but I found a race condition of publishing app finish event: {code} 15/06/03 14:59:56 INFO rmapp.RMAppImpl: application_1433367826630_0002 State change from FINISHING to FINISHED 15/06/03 14:59:56 INFO capacity.LeafQueue: completedContainer container=Container: [ContainerId: container_1433367826630_0002_01_01, NodeId: localhost:9105, NodeHttpAddress: localhost:8042, Resource: memory:2048, vCores:1, Priority: 0, Token: Token { kind: ContainerToken, service: 127.0.0.1:9105 }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=memory:0, vCores:0, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=0 cluster=memory:8192, vCores:8 15/06/03 14:59:56 INFO resourcemanager.RMAuditLogger: USER=zshen OPERATION=Application Finished - Succeeded TARGET=RMAppManager RESULT=SUCCESS APPID=application_1433367826630_0002 15/06/03 14:59:56 INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=0.0 absoluteUsedCapacity=0.0 used=memory:0, vCores:0 cluster=memory:8192, vCores:8 15/06/03 14:59:56 ERROR metrics.TimelineServiceV2Publisher: Error when publishing entity TimelineEntity[type='YARN_APPLICATION', id='application_1433367826630_0002'] java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:273) at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.publishApplicationFinishedEvent(TimelineServiceV2Publisher.java:133) at org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(AbstractTimelineServicePublisher.java:70) at org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(AbstractTimelineServicePublisher.java:35) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:176) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) at java.lang.Thread.run(Thread.java:745) 15/06/03 14:59:56 INFO amlauncher.AMLauncher: Cleaning master appattempt_1433367826630_0002_01 {code} I think the problem is we stop the timeline collector immediately after calling appFinished, which is an async call, and publishing operation is executed asynchronously on another thread. One option is to stopTimelineCollector after publishing finish event in publisher. Can you take care of it? {code} app.rmContext.getSystemMetricsPublisher() .appFinished(app, finalState, app.finishTime); app.stopTimelineCollector(); {code} [Event producers] Implement RM writing app lifecycle events to ATS -- Key: YARN-3044 URL: https://issues.apache.org/jira/browse/YARN-3044 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3044-YARN-2928.004.patch, YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, YARN-3044-YARN-2928.007.patch, YARN-3044-YARN-2928.008.patch, YARN-3044-YARN-2928.009.patch, YARN-3044-YARN-2928.010.patch, YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, YARN-3044.20150416-1.patch Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1942) Many of ConverterUtils methods need to have public interfaces
[ https://issues.apache.org/jira/browse/YARN-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571761#comment-14571761 ] Sergey Shelukhin commented on YARN-1942: No, it's used in production code as far as I can tell Many of ConverterUtils methods need to have public interfaces - Key: YARN-1942 URL: https://issues.apache.org/jira/browse/YARN-1942 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.4.0 Reporter: Thomas Graves Assignee: Wangda Tan Priority: Critical Attachments: YARN-1942.1.patch, YARN-1942.2.patch ConverterUtils has a bunch of functions that are useful to application masters. It should either be made public or we make some of the utilities in it public or we provide other external apis for application masters to use. Note that distributedshell and MR are both using these interfaces. For instance the main use case I see right now is for getting the application attempt id within the appmaster: String containerIdStr = System.getenv(Environment.CONTAINER_ID.name()); ConverterUtils.toContainerId ContainerId containerId = ConverterUtils.toContainerId(containerIdStr); ApplicationAttemptId applicationAttemptId = containerId.getApplicationAttemptId(); I don't see any other way for the application master to get this information. If there is please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
[ https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571667#comment-14571667 ] Karthik Kambatla commented on YARN-3453: Should we add {{SchedulingPolicy#getResourceCalculator()}} and use that instead? Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing Key: YARN-3453 URL: https://issues.apache.org/jira/browse/YARN-3453 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Ashwin Shankar Assignee: Arun Suresh Attachments: YARN-3453.1.patch There are two places in preemption code flow where DefaultResourceCalculator is used, even in DRF mode. Which basically results in more resources getting preempted than needed, and those extra preempted containers aren’t even getting to the “starved” queue since scheduling logic is based on DRF's Calculator. Following are the two places : 1. {code:title=FSLeafQueue.java|borderStyle=solid} private boolean isStarved(Resource share) {code} A queue shouldn’t be marked as “starved” if the dominant resource usage is = fair/minshare. 2. {code:title=FairScheduler.java|borderStyle=solid} protected Resource resToPreempt(FSLeafQueue sched, long curTime) {code} -- One more thing that I believe needs to change in DRF mode is : during a preemption round,if preempting a few containers results in satisfying needs of a resource type, then we should exit that preemption round, since the containers that we just preempted should bring the dominant resource usage to min/fair share. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1942) Many of ConverterUtils methods need to have public interfaces
[ https://issues.apache.org/jira/browse/YARN-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571668#comment-14571668 ] Vinod Kumar Vavilapalli commented on YARN-1942: --- bq. It seems that we have more than ConverterUtils that has been referenced by external projects. For example, in YARN-1462, we just encountered the issue that newInstance is marked as @Private, but it's actually referenced by Tez. Is this only in tests? Then you need YARN-2792. Many of ConverterUtils methods need to have public interfaces - Key: YARN-1942 URL: https://issues.apache.org/jira/browse/YARN-1942 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.4.0 Reporter: Thomas Graves Assignee: Wangda Tan Priority: Critical Attachments: YARN-1942.1.patch, YARN-1942.2.patch ConverterUtils has a bunch of functions that are useful to application masters. It should either be made public or we make some of the utilities in it public or we provide other external apis for application masters to use. Note that distributedshell and MR are both using these interfaces. For instance the main use case I see right now is for getting the application attempt id within the appmaster: String containerIdStr = System.getenv(Environment.CONTAINER_ID.name()); ConverterUtils.toContainerId ContainerId containerId = ConverterUtils.toContainerId(containerIdStr); ApplicationAttemptId applicationAttemptId = containerId.getApplicationAttemptId(); I don't see any other way for the application master to get this information. If there is please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571669#comment-14571669 ] Zhijie Shen commented on YARN-1462: --- Reverted. AHS API and other AHS changes to handle tags for completed MR jobs -- Key: YARN-1462 URL: https://issues.apache.org/jira/browse/YARN-1462 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Xuan Gong Fix For: 2.8.0 Attachments: YARN-1462-branch-2.7-1.2.patch, YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, YARN-1462.3.patch AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) YARN Timeline Service: Next generation
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571765#comment-14571765 ] Li Lu commented on YARN-2928: - Thanks [~sjlee0], [~jrottinghuis], and [~vrushalic] for hosting the benchmark session. This is very helpful! YARN Timeline Service: Next generation -- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx, TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1462: Attachment: YARN-1462.4.patch AHS API and other AHS changes to handle tags for completed MR jobs -- Key: YARN-1462 URL: https://issues.apache.org/jira/browse/YARN-1462 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Xuan Gong Fix For: 2.8.0 Attachments: YARN-1462-branch-2.7-1.2.patch, YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, YARN-1462.3.patch, YARN-1462.4.patch AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3766) ATS Web UI breaks because of YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3766: Attachment: YARN-3766.1.patch Create a patch to fix it. No testcases needed ATS Web UI breaks because of YARN-3467 -- Key: YARN-3766 URL: https://issues.apache.org/jira/browse/YARN-3766 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, webapp, yarn Affects Versions: 2.8.0 Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: ATSWebPageBreaks.png, YARN-3766.1.patch The ATS web UI breaks because of the following changes made in YARN-3467. {code} +++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs( .append(, 'mRender': renderHadoopDate }) .append(\n, {'sType':'numeric', bSearchable:false, 'aTargets':); if (isFairSchedulerPage) { - sb.append([11]); + sb.append([13]); } else if (isResourceManager) { - sb.append([10]); + sb.append([12]); } else { sb.append([9]); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
[ https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571859#comment-14571859 ] Hadoop QA commented on YARN-3453: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 7s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 39s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 30s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 25s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 50m 7s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 86m 58s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12737386/YARN-3453.2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / bc85959 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8187/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8187/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8187/console | This message was automatically generated. Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing Key: YARN-3453 URL: https://issues.apache.org/jira/browse/YARN-3453 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Ashwin Shankar Assignee: Arun Suresh Attachments: YARN-3453.1.patch, YARN-3453.2.patch There are two places in preemption code flow where DefaultResourceCalculator is used, even in DRF mode. Which basically results in more resources getting preempted than needed, and those extra preempted containers aren’t even getting to the “starved” queue since scheduling logic is based on DRF's Calculator. Following are the two places : 1. {code:title=FSLeafQueue.java|borderStyle=solid} private boolean isStarved(Resource share) {code} A queue shouldn’t be marked as “starved” if the dominant resource usage is = fair/minshare. 2. {code:title=FairScheduler.java|borderStyle=solid} protected Resource resToPreempt(FSLeafQueue sched, long curTime) {code} -- One more thing that I believe needs to change in DRF mode is : during a preemption round,if preempting a few containers results in satisfying needs of a resource type, then we should exit that preemption round, since the containers that we just preempted should bring the dominant resource usage to min/fair share. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571676#comment-14571676 ] Hudson commented on YARN-1462: -- FAILURE: Integrated in Hadoop-trunk-Commit #7956 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7956/]) Revert YARN-1462. Correct fix version from branch-2.7.1 to branch-2.8 in (zjshen: rev 4eec2fd132a7c3d100f2124b99ca8cd7befa27c7) * hadoop-yarn-project/CHANGES.txt Revert YARN-1462. Made RM write application tags to timeline server and exposed them to users via generic history web UI and REST API. Contributed by Xuan Gong. (zjshen: rev bc85959eddcb11037e8b9f0e06780b7c3e1cbab6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/NotRunningJob.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java AHS API and other AHS changes to handle tags for completed MR jobs -- Key: YARN-1462 URL: https://issues.apache.org/jira/browse/YARN-1462 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Xuan Gong Fix For: 2.8.0 Attachments: YARN-1462-branch-2.7-1.2.patch, YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, YARN-1462.3.patch AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another
[ https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571797#comment-14571797 ] Hadoop QA commented on YARN-3764: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 20m 52s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 9m 42s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 48s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 41s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 4 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 54m 5s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 100m 6s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12737374/YARN-3764.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / bc85959 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8186/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8186/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8186/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8186/console | This message was automatically generated. CapacityScheduler should forbid moving LeafQueue from one parent to another --- Key: YARN-3764 URL: https://issues.apache.org/jira/browse/YARN-3764 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Priority: Blocker Attachments: YARN-3764.1.patch Currently CapacityScheduler doesn't handle the case well, for example: A queue structure: {code} root | a (100) / \ x y (50) (50) {code} And reinitialize using following structure: {code} root / \ (50)a x (50) | y (100) {code} The actual queue structure after reinitialize is: {code} root /\ a (50) x (50) / \ xy (50) (100) {code} We should forbid admin doing that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3766) ATS Web UI breaks because of YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571867#comment-14571867 ] Hadoop QA commented on YARN-3766: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 25s | Pre-patch trunk has 3 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 54s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 59s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 3m 12s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | {color:green}+1{color} | yarn tests | 0m 26s | Tests passed in hadoop-yarn-server-common. | | | | 42m 39s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12737404/YARN-3766.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / bc85959 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8189/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8189/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8189/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8189/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8189/console | This message was automatically generated. ATS Web UI breaks because of YARN-3467 -- Key: YARN-3766 URL: https://issues.apache.org/jira/browse/YARN-3766 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, webapp, yarn Affects Versions: 2.8.0 Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: ATSWebPageBreaks.png, YARN-3766.1.patch The ATS web UI breaks because of the following changes made in YARN-3467. {code} +++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs( .append(, 'mRender': renderHadoopDate }) .append(\n, {'sType':'numeric', bSearchable:false, 'aTargets':); if (isFairSchedulerPage) { - sb.append([11]); + sb.append([13]); } else if (isResourceManager) { - sb.append([10]); + sb.append([12]); } else { sb.append([9]); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs
[ https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571896#comment-14571896 ] Hudson commented on YARN-3749: -- FAILURE: Integrated in Hadoop-trunk-Commit #7958 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7958/]) YARN-3749. We should make a copy of configuration when init (xgong: rev 5766a04428f65bb008b5c451f6f09e61e1000300) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/HATestUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java We should make a copy of configuration when init MiniYARNCluster with multiple RMs -- Key: YARN-3749 URL: https://issues.apache.org/jira/browse/YARN-3749 Project: Hadoop YARN Issue Type: Bug Reporter: Chun Chen Assignee: Chun Chen Fix For: 2.8.0 Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, YARN-3749.patch When I was trying to write a test case for YARN-2674, I found DS client trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 when RM failover. But I initially set yarn.resourcemanager.address.rm1=0.0.0.0:18032, yarn.resourcemanager.address.rm2=0.0.0.0:28032 After digging, I found it is in ClientRMService where the value of yarn.resourcemanager.address.rm2 changed to 0.0.0.0:18032. See the following code in ClientRMService: {code} clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST, YarnConfiguration.RM_ADDRESS, YarnConfiguration.DEFAULT_RM_ADDRESS, server.getListenerAddress()); {code} Since we use the same instance of configuration in rm1 and rm2 and init both RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during starting of rm1. So I think it is safe to make a copy of configuration when init both of the rm. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another
[ https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571872#comment-14571872 ] Jian He commented on YARN-3764: --- looks good, +1 CapacityScheduler should forbid moving LeafQueue from one parent to another --- Key: YARN-3764 URL: https://issues.apache.org/jira/browse/YARN-3764 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Priority: Blocker Attachments: YARN-3764.1.patch Currently CapacityScheduler doesn't handle the case well, for example: A queue structure: {code} root | a (100) / \ x y (50) (50) {code} And reinitialize using following structure: {code} root / \ (50)a x (50) | y (100) {code} The actual queue structure after reinitialize is: {code} root /\ a (50) x (50) / \ xy (50) (100) {code} We should forbid admin doing that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs
[ https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571886#comment-14571886 ] Xuan Gong commented on YARN-3749: - Committed into trunk/branch-2. Thanks, Chun Chen. And thanks for review, zhihai We should make a copy of configuration when init MiniYARNCluster with multiple RMs -- Key: YARN-3749 URL: https://issues.apache.org/jira/browse/YARN-3749 Project: Hadoop YARN Issue Type: Bug Reporter: Chun Chen Assignee: Chun Chen Fix For: 2.8.0 Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, YARN-3749.patch When I was trying to write a test case for YARN-2674, I found DS client trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 when RM failover. But I initially set yarn.resourcemanager.address.rm1=0.0.0.0:18032, yarn.resourcemanager.address.rm2=0.0.0.0:28032 After digging, I found it is in ClientRMService where the value of yarn.resourcemanager.address.rm2 changed to 0.0.0.0:18032. See the following code in ClientRMService: {code} clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST, YarnConfiguration.RM_ADDRESS, YarnConfiguration.DEFAULT_RM_ADDRESS, server.getListenerAddress()); {code} Since we use the same instance of configuration in rm1 and rm2 and init both RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during starting of rm1. So I think it is safe to make a copy of configuration when init both of the rm. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571821#comment-14571821 ] Srikanth Kandula commented on YARN-3366: 1) Does this also capture the network usage due to non containers? For eg. that due to evacuation or replication or data downloads? 2) What about receive bandwidth? 3) Perhaps i missed this above, but what are the overhead microbenchmark numbers re: added latency for normal packets and extra cpu usage overall due to sending packets through tc/ due to polling tc counters periodically? Outbound network bandwidth : classify/shape traffic originating from YARN containers Key: YARN-3366 URL: https://issues.apache.org/jira/browse/YARN-3366 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sidharta Seethana Assignee: Sidharta Seethana Fix For: 2.8.0 Attachments: YARN-3366.001.patch, YARN-3366.002.patch, YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, YARN-3366.006.patch, YARN-3366.007.patch In order to be able to isolate based on/enforce outbound traffic bandwidth limits, we need a mechanism to classify/shape network traffic in the nodemanager. For more information on the design, please see the attached design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2513) Host framework UIs in YARN for use with the ATS
[ https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571822#comment-14571822 ] Hitesh Shah commented on YARN-2513: --- +1 to making this available for ATS v1. Would be useful in various deployments . Host framework UIs in YARN for use with the ATS --- Key: YARN-2513 URL: https://issues.apache.org/jira/browse/YARN-2513 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-2513-v1.patch, YARN-2513-v2.patch, YARN-2513.v3.patch Allow for pluggable UIs as described by TEZ-8. Yarn can provide the infrastructure to host java script and possible java UIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571685#comment-14571685 ] Hadoop QA commented on YARN-3044: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 50s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 42s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 43s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 35s | The applied patch generated 1 new checkstyle issues (total was 242, now 242). | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 38s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 4m 48s | The patch appears to introduce 7 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 0m 28s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 50m 54s | Tests passed in hadoop-yarn-server-resourcemanager. | | {color:green}+1{color} | yarn tests | 1m 14s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 98m 2s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12737355/YARN-3044-YARN-2928.010.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 2e12480 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8184/artifact/patchprocess/YARN-2928FindbugsWarningshadoop-yarn-api.html | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8184/artifact/patchprocess/YARN-2928FindbugsWarningshadoop-yarn-server-common.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8184/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8184/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8184/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8184/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8184/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8184/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8184/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8184/console | This message was automatically generated. [Event producers] Implement RM writing app lifecycle events to ATS -- Key: YARN-3044 URL: https://issues.apache.org/jira/browse/YARN-3044 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3044-YARN-2928.004.patch, YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, YARN-3044-YARN-2928.007.patch, YARN-3044-YARN-2928.008.patch, YARN-3044-YARN-2928.009.patch, YARN-3044-YARN-2928.010.patch, YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, YARN-3044.20150416-1.patch Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3765) Fix findbugs the warning in YARN-2928 branch, TimelineMetric
[ https://issues.apache.org/jira/browse/YARN-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571709#comment-14571709 ] Hadoop QA commented on YARN-3765: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 22s | Pre-patch YARN-2928 has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 45s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 47s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 1s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 38s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 42s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 34s | The patch does not introduce any new Findbugs (version 3.0.0) warnings, and fixes 1 pre-existing warnings. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-api. | | | | 39m 40s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12737370/YARN-3765-YARN-2928.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 2e12480 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8185/artifact/patchprocess/YARN-2928FindbugsWarningshadoop-yarn-api.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8185/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8185/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8185/console | This message was automatically generated. Fix findbugs the warning in YARN-2928 branch, TimelineMetric Key: YARN-3765 URL: https://issues.apache.org/jira/browse/YARN-3765 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu Attachments: YARN-3765-YARN-2928.001.patch There is one warning about reversing the return value of comparisons in YARN-2928 branch. This is a valid warning. Quoting the findbugs warning message: RV_NEGATING_RESULT_OF_COMPARETO: Negating the result of compareTo()/compare() This code negatives the return value of a compareTo or compare method. This is a questionable or bad programming practice, since if the return value is Integer.MIN_VALUE, negating the return value won't negate the sign of the result. You can achieve the same intended result by reversing the order of the operands rather than by negating the results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
[ https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-3453: -- Attachment: YARN-3453.2.patch Agreed.. Updating patch with your suggestion.. Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing Key: YARN-3453 URL: https://issues.apache.org/jira/browse/YARN-3453 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Ashwin Shankar Assignee: Arun Suresh Attachments: YARN-3453.1.patch, YARN-3453.2.patch There are two places in preemption code flow where DefaultResourceCalculator is used, even in DRF mode. Which basically results in more resources getting preempted than needed, and those extra preempted containers aren’t even getting to the “starved” queue since scheduling logic is based on DRF's Calculator. Following are the two places : 1. {code:title=FSLeafQueue.java|borderStyle=solid} private boolean isStarved(Resource share) {code} A queue shouldn’t be marked as “starved” if the dominant resource usage is = fair/minshare. 2. {code:title=FairScheduler.java|borderStyle=solid} protected Resource resToPreempt(FSLeafQueue sched, long curTime) {code} -- One more thing that I believe needs to change in DRF mode is : during a preemption round,if preempting a few containers results in satisfying needs of a resource type, then we should exit that preemption round, since the containers that we just preempted should bring the dominant resource usage to min/fair share. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1942) Many of ConverterUtils methods need to have public interfaces
[ https://issues.apache.org/jira/browse/YARN-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571755#comment-14571755 ] Zhijie Shen commented on YARN-1942: --- [~sershe], would you please comment? Many of ConverterUtils methods need to have public interfaces - Key: YARN-1942 URL: https://issues.apache.org/jira/browse/YARN-1942 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.4.0 Reporter: Thomas Graves Assignee: Wangda Tan Priority: Critical Attachments: YARN-1942.1.patch, YARN-1942.2.patch ConverterUtils has a bunch of functions that are useful to application masters. It should either be made public or we make some of the utilities in it public or we provide other external apis for application masters to use. Note that distributedshell and MR are both using these interfaces. For instance the main use case I see right now is for getting the application attempt id within the appmaster: String containerIdStr = System.getenv(Environment.CONTAINER_ID.name()); ConverterUtils.toContainerId ContainerId containerId = ConverterUtils.toContainerId(containerIdStr); ApplicationAttemptId applicationAttemptId = containerId.getApplicationAttemptId(); I don't see any other way for the application master to get this information. If there is please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3766) ATS Web UI breaks because of YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3766: Priority: Blocker (was: Major) ATS Web UI breaks because of YARN-3467 -- Key: YARN-3766 URL: https://issues.apache.org/jira/browse/YARN-3766 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, webapp, yarn Affects Versions: 2.8.0 Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker The ATS web UI breaks because of the following changes made in YARN-3467. {code} +++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs( .append(, 'mRender': renderHadoopDate }) .append(\n, {'sType':'numeric', bSearchable:false, 'aTargets':); if (isFairSchedulerPage) { - sb.append([11]); + sb.append([13]); } else if (isResourceManager) { - sb.append([10]); + sb.append([12]); } else { sb.append([9]); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3766) ATS Web UI breaks because of YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3766: Affects Version/s: 2.8.0 ATS Web UI breaks because of YARN-3467 -- Key: YARN-3766 URL: https://issues.apache.org/jira/browse/YARN-3766 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, webapp, yarn Affects Versions: 2.8.0 Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker The ATS web UI breaks because of the following changes made in YARN-3467. {code} +++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs( .append(, 'mRender': renderHadoopDate }) .append(\n, {'sType':'numeric', bSearchable:false, 'aTargets':); if (isFairSchedulerPage) { - sb.append([11]); + sb.append([13]); } else if (isResourceManager) { - sb.append([10]); + sb.append([12]); } else { sb.append([9]); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3766) ATS Web UI breaks because of YARN-3467
Xuan Gong created YARN-3766: --- Summary: ATS Web UI breaks because of YARN-3467 Key: YARN-3766 URL: https://issues.apache.org/jira/browse/YARN-3766 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong The ATS web UI breaks because of the following changes made in YARN-3467. {code} +++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs( .append(, 'mRender': renderHadoopDate }) .append(\n, {'sType':'numeric', bSearchable:false, 'aTargets':); if (isFairSchedulerPage) { - sb.append([11]); + sb.append([13]); } else if (isResourceManager) { - sb.append([10]); + sb.append([12]); } else { sb.append([9]); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571925#comment-14571925 ] Naganarasimha G R commented on YARN-3044: - Hi [~zjshen], bq. It not caused by this patch, but I found a race condition of publishing app finish event i got stuck big time with YARN-3045 for similar issue in NM side, and wanted to propose the same but was not sure whether the approach was fine. Will take care of this in RM side as you mentioned but shall i adopt the similar approach in NM side ? [Event producers] Implement RM writing app lifecycle events to ATS -- Key: YARN-3044 URL: https://issues.apache.org/jira/browse/YARN-3044 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3044-YARN-2928.004.patch, YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, YARN-3044-YARN-2928.007.patch, YARN-3044-YARN-2928.008.patch, YARN-3044-YARN-2928.009.patch, YARN-3044-YARN-2928.010.patch, YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, YARN-3044.20150416-1.patch Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3733) DominantRC#compare() does not work as expected if cluster resource is empty
[ https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571658#comment-14571658 ] Wangda Tan commented on YARN-3733: -- Patch LGTM generally, will commit the patch once [~sunilg] +1. DominantRC#compare() does not work as expected if cluster resource is empty --- Key: YARN-3733 URL: https://issues.apache.org/jira/browse/YARN-3733 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 , 2 NM , 2 RM one NM - 3 GB 6 v core Reporter: Bibin A Chundatt Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, YARN-3733.patch Steps to reproduce = 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster) 2. Configure map and reduce size to 512 MB after changing scheduler minimum size to 512 MB 3. Configure capacity scheduler and AM limit to .5 (DominantResourceCalculator is configured) 4. Submit 30 concurrent task 5. Switch RM Actual = For 12 Jobs AM gets allocated and all 12 starts running No other Yarn child is initiated , *all 12 Jobs in Running state for ever* Expected === Only 6 should be running at a time since max AM allocated is .5 (3072 MB) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) YARN Timeline Service: Next generation
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571660#comment-14571660 ] Sangjin Lee commented on YARN-2928: --- Thanks [~vrushalic] for the summary! YARN Timeline Service: Next generation -- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx, TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571811#comment-14571811 ] Xuan Gong commented on YARN-1462: - Create a new patch with compatible newInstance change. AHS API and other AHS changes to handle tags for completed MR jobs -- Key: YARN-1462 URL: https://issues.apache.org/jira/browse/YARN-1462 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Xuan Gong Fix For: 2.8.0 Attachments: YARN-1462-branch-2.7-1.2.patch, YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, YARN-1462.3.patch, YARN-1462.4.patch AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3766) ATS Web UI breaks because of YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3766: Attachment: ATSWebPageBreaks.png Uploaded a screen shot ATS Web UI breaks because of YARN-3467 -- Key: YARN-3766 URL: https://issues.apache.org/jira/browse/YARN-3766 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, webapp, yarn Affects Versions: 2.8.0 Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: ATSWebPageBreaks.png The ATS web UI breaks because of the following changes made in YARN-3467. {code} +++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs( .append(, 'mRender': renderHadoopDate }) .append(\n, {'sType':'numeric', bSearchable:false, 'aTargets':); if (isFairSchedulerPage) { - sb.append([11]); + sb.append([13]); } else if (isResourceManager) { - sb.append([10]); + sb.append([12]); } else { sb.append([9]); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs
[ https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571879#comment-14571879 ] Xuan Gong commented on YARN-3749: - +1. LGTM. Will commit We should make a copy of configuration when init MiniYARNCluster with multiple RMs -- Key: YARN-3749 URL: https://issues.apache.org/jira/browse/YARN-3749 Project: Hadoop YARN Issue Type: Bug Reporter: Chun Chen Assignee: Chun Chen Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, YARN-3749.patch When I was trying to write a test case for YARN-2674, I found DS client trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 when RM failover. But I initially set yarn.resourcemanager.address.rm1=0.0.0.0:18032, yarn.resourcemanager.address.rm2=0.0.0.0:28032 After digging, I found it is in ClientRMService where the value of yarn.resourcemanager.address.rm2 changed to 0.0.0.0:18032. See the following code in ClientRMService: {code} clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST, YarnConfiguration.RM_ADDRESS, YarnConfiguration.DEFAULT_RM_ADDRESS, server.getListenerAddress()); {code} Since we use the same instance of configuration in rm1 and rm2 and init both RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during starting of rm1. So I think it is safe to make a copy of configuration when init both of the rm. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor
[ https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lavkesh Lahngir updated YARN-3745: -- Attachment: YARN-3745.1.patch Added test. With previous implementation the test was failing with NoSuchMethodException {code} testDeserializeWithDefaultConstructor(org.apache.hadoop.yarn.api.records.impl.pb.TestSerializedExceptionPBImpl) Time elapsed: 0.129 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.NoSuchMethodException: java.nio.channels.ClosedChannelException.init(java.lang.String) at java.lang.Class.getConstructor0(Class.java:2892) at java.lang.Class.getConstructor(Class.java:1723) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:181) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.yarn.api.records.impl.pb.TestSerializedExceptionPBImpl.testDeserializeWithDefaultConstructor(TestSerializedExceptionPBImpl.java:72) {code} SerializedException should also try to instantiate internal exception with the default constructor -- Key: YARN-3745 URL: https://issues.apache.org/jira/browse/YARN-3745 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Lavkesh Lahngir Assignee: Lavkesh Lahngir Attachments: YARN-3745.1.patch, YARN-3745.patch While deserialising a SerializedException it tries to create internal exception in instantiateException() with cn = cls.getConstructor(String.class). if cls does not has a constructor with String parameter it throws Nosuchmethodexception for example ClosedChannelException class. We should also try to instantiate exception with default constructor so that inner exception can to propagated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3510) Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness
[ https://issues.apache.org/jira/browse/YARN-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570664#comment-14570664 ] Sunil G commented on YARN-3510: --- Hi [~cwelch] Thanks for taking this optimization up. I have few doubts here. Here an evenly distributed preempting policy across applications are tried. But each application internally has containers from different priorities, and least priority container is selected first from an application for preemption. Now consider a scenario where we have 2 applications (assuming map reduce). {noformat} App1 has containers 10 containers:Priority 10, 5 containers:Priority 20 Old timestamp App2 has containers 10 containers:Priority 10, 2 containers:Priority 20 New timestamp {noformat} As per new implementation, after 2 rounds, some containers of priority 10(maps) may get preempted if I am not wrong. Is this intentional, because killing maps is costlier. I feel, we can group containers based on priority among all applications, and then can do this preemption at each container priority level. It may be more better but we may have more buckets of priorities. Please share your thoughts. Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness Key: YARN-3510 URL: https://issues.apache.org/jira/browse/YARN-3510 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3510.2.patch, YARN-3510.3.patch, YARN-3510.5.patch, YARN-3510.6.patch The ProportionalCapacityPreemptionPolicy preempts as many containers from applications as it can during it's preemption run. For fifo this makes sense, as it is prempting in reverse order therefore maintaining the primacy of the oldest. For fair ordering this does not have the desired effect - instead, it should preempt a number of containers from each application which maintains a fair balance /close to a fair balance between them -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3432) Cluster metrics have wrong Total Memory when there is reserved memory on CS
[ https://issues.apache.org/jira/browse/YARN-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570680#comment-14570680 ] Brahma Reddy Battula commented on YARN-3432: Kindly review the attached patch!!! Cluster metrics have wrong Total Memory when there is reserved memory on CS --- Key: YARN-3432 URL: https://issues.apache.org/jira/browse/YARN-3432 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Affects Versions: 2.6.0 Reporter: Thomas Graves Assignee: Brahma Reddy Battula Attachments: YARN-3432-002.patch, YARN-3432.patch I noticed that when reservations happen when using the Capacity Scheduler, the UI and web services report the wrong total memory. For example. I have a 300GB of total memory in my cluster. I allocate 50 and I reserve 10. The cluster metrics for total memory get reported as 290GB. This was broken by https://issues.apache.org/jira/browse/YARN-656 so perhaps there is a difference between fair scheduler and capacity scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure
[ https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570652#comment-14570652 ] Lavkesh Lahngir commented on YARN-3591: --- Thanks [~sunilg] and [~zxu] for comments and review. I did slightly differently. I added newRepairedDirs and newErrorDirs into DirectoryCollection. In this version checkLocalizedResources(dirsTocheck) takes the list of good dirs. {code:title=DirectoryCollection.java|borderStyle=solid} + private ListString newErrorDirs; + private ListString newRepariedDirs; private int numFailures; @@ -159,6 +161,8 @@ public DirectoryCollection(String[] dirs, localDirs = new CopyOnWriteArrayListString(dirs); errorDirs = new CopyOnWriteArrayListString(); fullDirs = new CopyOnWriteArrayListString(); +newErrorDirs = new CopyOnWriteArrayListString(); +newRepariedDirs = new CopyOnWriteArrayListString(); @@ -213,6 +217,20 @@ synchronized int getNumFailures() { } /** + * @return Recently discovered error dirs + */ + synchronized ListString getNewErrorDirs() { +return newErrorDirs; + } + + /** + * @return Recently discovered repaired dirs + */ + synchronized ListString getNewRepairedDirs() { +return newRepariedDirs; + } + @@ -259,6 +277,8 @@ synchronized boolean checkDirs() { localDirs.clear(); errorDirs.clear(); fullDirs.clear(); +newRepariedDirs.clear(); +newErrorDirs.clear(); for (Map.EntryString, DiskErrorInformation entry : dirsFailedCheck .entrySet()) { @@ -292,6 +312,11 @@ synchronized boolean checkDirs() { } SetString postCheckFullDirs = new HashSetString(fullDirs); SetString postCheckOtherDirs = new HashSetString(errorDirs); +for (String dir : preCheckGoodDirs) { + if (postCheckOtherDirs.contains(dir)) { +newErrorDirs.add(dir); + } +} for (String dir : preCheckFullDirs) { if (postCheckOtherDirs.contains(dir)) { LOG.warn(Directory + dir + error @@ -304,6 +329,9 @@ synchronized boolean checkDirs() { LOG.warn(Directory + dir + error + dirsFailedCheck.get(dir).message); } + if (localDirs.contains(dir) || postCheckFullDirs.contains(dir)) { +newRepariedDirs.add(dir); + } } {code} {code:title=LocalDirsHandlerService.java|borderStyle=solid} + * @return Recently added error dirs + */ + public ListString getDiskNewErrorDirs() { +return localDirs.getNewErrorDirs(); + } + + /** + * @return Recently added repaired dirs + */ + public ListString getDiskNewRepairedDirs() { +return localDirs.getNewRepairedDirs(); + } {code} {code:title=ResourceLocalizationService.java|borderStyle=solid} @Override public void onDirsChanged() { checkAndInitializeLocalDirs(); +ListString dirsTocheck = +new ArrayListString(dirsHandler.getLocalDirs()); +dirsTocheck.addAll(dirsHandler.getDiskFullLocalDirs()); +// checks if resources are present in the dirsTocheck +publicRsrc.checkLocalizedResources(dirsTocheck); for (LocalResourcesTracker tracker : privateRsrc.values()) { + tracker.checkLocalizedResources(dirsTocheck); +} +ListString newRepairedDirs = dirsHandler.getDiskNewRepairedDirs(); +// Delete any resources found in the newly repaired Dirs. +for (String dir : newRepairedDirs) { + cleanUpLocalDir(lfs, delService, dir); } +// Add code here to add errordirs to statestore. } }; {code} {code:title=DirectoryCollection.java|borderStyle=solid} synchronized ListString getErrorDirs() { return Collections.unmodifiableList(errorDirs); } {code} We can use getErroeDirs() and keep it in the NMstate as suggested and upon start we can do a cleanUpLocalDir on the errordirs. Resource Localisation on a bad disk causes subsequent containers failure - Key: YARN-3591 URL: https://issues.apache.org/jira/browse/YARN-3591 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Lavkesh Lahngir Assignee: Lavkesh Lahngir Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, YARN-3591.2.patch, YARN-3591.3.patch, YARN-3591.4.patch It happens when a resource is localised on the disk, after localising that disk has gone bad. NM keeps paths for localised resources in memory. At the time of resource request isResourcePresent(rsrc) will be called which calls file.exists() on the localised path. In some cases when disk has gone bad, inodes are stilled cached and file.exists() returns true. But at the time of reading, file will not open. Note: file.exists() actually calls stat64 natively which returns true because it was
[jira] [Updated] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor
[ https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lavkesh Lahngir updated YARN-3745: -- Attachment: YARN-3745.patch SerializedException should also try to instantiate internal exception with the default constructor -- Key: YARN-3745 URL: https://issues.apache.org/jira/browse/YARN-3745 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Lavkesh Lahngir Assignee: Lavkesh Lahngir Attachments: YARN-3745.patch While deserialising a SerializedException it tries to create internal exception in instantiateException() with cn = cls.getConstructor(String.class). if cls does not has a constructor with String parameter it throws Nosuchmethodexception for example ClosedChannelException class. We should also try to instantiate exception with default constructor so that inner exception can to propagated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor
[ https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570706#comment-14570706 ] Hadoop QA commented on YARN-3745: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 3s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 36s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 54s | The applied patch generated 1 new checkstyle issues (total was 8, now 9). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 33s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 57s | Tests passed in hadoop-yarn-common. | | | | 40m 9s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12737155/YARN-3745.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c59e745 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8171/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8171/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8171/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8171/console | This message was automatically generated. SerializedException should also try to instantiate internal exception with the default constructor -- Key: YARN-3745 URL: https://issues.apache.org/jira/browse/YARN-3745 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Lavkesh Lahngir Assignee: Lavkesh Lahngir Attachments: YARN-3745.patch While deserialising a SerializedException it tries to create internal exception in instantiateException() with cn = cls.getConstructor(String.class). if cls does not has a constructor with String parameter it throws Nosuchmethodexception for example ClosedChannelException class. We should also try to instantiate exception with default constructor so that inner exception can to propagated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3432) Cluster metrics have wrong Total Memory when there is reserved memory on CS
[ https://issues.apache.org/jira/browse/YARN-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-3432: --- Attachment: YARN-3432-002.patch Cluster metrics have wrong Total Memory when there is reserved memory on CS --- Key: YARN-3432 URL: https://issues.apache.org/jira/browse/YARN-3432 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Affects Versions: 2.6.0 Reporter: Thomas Graves Assignee: Brahma Reddy Battula Attachments: YARN-3432-002.patch, YARN-3432.patch I noticed that when reservations happen when using the Capacity Scheduler, the UI and web services report the wrong total memory. For example. I have a 300GB of total memory in my cluster. I allocate 50 and I reserve 10. The cluster metrics for total memory get reported as 290GB. This was broken by https://issues.apache.org/jira/browse/YARN-656 so perhaps there is a difference between fair scheduler and capacity scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor
[ https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570869#comment-14570869 ] Hadoop QA commented on YARN-3745: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 8s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 35s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 52s | The applied patch generated 1 new checkstyle issues (total was 8, now 9). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 33s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 55s | Tests passed in hadoop-yarn-common. | | | | 40m 7s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12737170/YARN-3745.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c59e745 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8173/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8173/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8173/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8173/console | This message was automatically generated. SerializedException should also try to instantiate internal exception with the default constructor -- Key: YARN-3745 URL: https://issues.apache.org/jira/browse/YARN-3745 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Lavkesh Lahngir Assignee: Lavkesh Lahngir Attachments: YARN-3745.1.patch, YARN-3745.patch While deserialising a SerializedException it tries to create internal exception in instantiateException() with cn = cls.getConstructor(String.class). if cls does not has a constructor with String parameter it throws Nosuchmethodexception for example ClosedChannelException class. We should also try to instantiate exception with default constructor so that inner exception can to propagated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570921#comment-14570921 ] Chang Li commented on YARN-2556: [~zjshen], [~djp] could you please help review the latest patch? Thanks! Tool to measure the performance of the timeline server -- Key: YARN-2556 URL: https://issues.apache.org/jira/browse/YARN-2556 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Chang Li Labels: BB2015-05-TBR Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, YARN-2556.1.patch, YARN-2556.10.patch, YARN-2556.11.patch, YARN-2556.12.patch, YARN-2556.13.patch, YARN-2556.13.whitespacefix.patch, YARN-2556.14.patch, YARN-2556.14.whitespacefix.patch, YARN-2556.2.patch, YARN-2556.3.patch, YARN-2556.4.patch, YARN-2556.5.patch, YARN-2556.6.patch, YARN-2556.7.patch, YARN-2556.8.patch, YARN-2556.9.patch, YARN-2556.patch, yarn2556.patch, yarn2556.patch, yarn2556_wip.patch We need to be able to understand the capacity model for the timeline server to give users the tools they need to deploy a timeline server with the correct capacity. I propose we create a mapreduce job that can measure timeline server write and read performance. Transactions per second, I/O for both read and write would be a good start. This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3733) DominantRC#compare() does not work as expected if cluster resource is empty
[ https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570950#comment-14570950 ] Hadoop QA commented on YARN-3733: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 20s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 34s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 40s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 0s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 57s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 50m 12s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 93m 53s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12737171/0002-YARN-3733.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c59e745 | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8174/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8174/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8174/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8174/console | This message was automatically generated. DominantRC#compare() does not work as expected if cluster resource is empty --- Key: YARN-3733 URL: https://issues.apache.org/jira/browse/YARN-3733 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 , 2 NM , 2 RM one NM - 3 GB 6 v core Reporter: Bibin A Chundatt Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, YARN-3733.patch Steps to reproduce = 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster) 2. Configure map and reduce size to 512 MB after changing scheduler minimum size to 512 MB 3. Configure capacity scheduler and AM limit to .5 (DominantResourceCalculator is configured) 4. Submit 30 concurrent task 5. Switch RM Actual = For 12 Jobs AM gets allocated and all 12 starts running No other Yarn child is initiated , *all 12 Jobs in Running state for ever* Expected === Only 6 should be running at a time since max AM allocated is .5 (3072 MB) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3733) DominantRC#compare() does not work as expected if cluster resource is empty
[ https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570937#comment-14570937 ] Sunil G commented on YARN-3733: --- Thank you [~rohithsharma] for the detailed information and patch. 1. Could we add a test case where only memory or vcores are more in TestCapacityScheduler. {code} Resource amResource2 = Resource.newInstance(amResourceLimit.getMemory() + 1, amResourceLimit.getVirtualCores()); {code} 2. In TestCapacityScheduler#verifyAMLimitForLeafQueue, while submitting second app, you could change the app name to app-2. DominantRC#compare() does not work as expected if cluster resource is empty --- Key: YARN-3733 URL: https://issues.apache.org/jira/browse/YARN-3733 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 , 2 NM , 2 RM one NM - 3 GB 6 v core Reporter: Bibin A Chundatt Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, YARN-3733.patch Steps to reproduce = 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster) 2. Configure map and reduce size to 512 MB after changing scheduler minimum size to 512 MB 3. Configure capacity scheduler and AM limit to .5 (DominantResourceCalculator is configured) 4. Submit 30 concurrent task 5. Switch RM Actual = For 12 Jobs AM gets allocated and all 12 starts running No other Yarn child is initiated , *all 12 Jobs in Running state for ever* Expected === Only 6 should be running at a time since max AM allocated is .5 (3072 MB) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570941#comment-14570941 ] Varun Saxena commented on YARN-3051: bq. I noticed there is a readerLimit for read operations, which works for ATS v1. I'm wondering if it's fine to use -1 to indicate there's no such limit? Not sure if this feature is already there. You mean limit to limit the number of records ? bq. The fromId parameter, we may need to be careful on the concept of id. In timeline v2 we need context information to identify each entity, such as cluster, user, flow, run. When querying with fromId, what kind of assumptions should we make on the id here? {{fromId}} is primarily there to be backward compatible with ATS v1. It is used in context of entity ID only. This will be documented in the javadoc. I have not changed names of the query params (if these parameters are supported in ATS v1). Whether we need to support same REST endpoints as ATS v1 for the sake of backward compatibility or whether we can break the backward compatibility(in case of no use case) is something which I wanted to discuss. Commented on YARN-3411 as well regarding one such param. bq. In some APIs, we're requiring clusterID and appID, but not having flow/run informationMaybe we can have flow and run information as optional parameters so that we can avoid full table scans when the caller does have flow and run information? Agree with your suggestion. Even I was thinking about including them in the next patch as query params. This will make the parameter list even longer :) bq. The current APIs require a pretty long list of parameters. For most of the use cases, I think we can abstract something much simpler. These parameters are directly fetched from query params coming in REST API and are directly passed down to storage layer(after minor verification). Yes, we can decide on few of the key parameters(which correspond to row key/primary key) and have different methods for that. And have different reader API methods for them as well. [Storage abstraction] Create backing storage read interface for ATS readers --- Key: YARN-3051 URL: https://issues.apache.org/jira/browse/YARN-3051 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: YARN-3051-YARN-2928.003.patch, YARN-3051-YARN-2928.03.patch, YARN-3051.wip.02.YARN-2928.patch, YARN-3051.wip.patch, YARN-3051_temp.patch Per design in YARN-2928, create backing storage read interface that can be implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3733) DominantRC#compare() does not work as expected if cluster resource is empty
[ https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3733: - Attachment: 0002-YARN-3733.patch Thanks [~sunilg] and [~leftnoteasy] for sharing your thoughts.. I modified bit of logic and the order of if check so that it should handle all the possible combination of inputs below table. The problem was in 5th and 7th inputs. The validation returning 1 but it was expected to be zero for 5th combinations i.e flow never reach 2nd check since 1st step is OR for memory vs cpu. ||Sl.no||cr||lhs||rhs||Output|| |1|0,0| 1,1 | 1,1 | 0 | |2|0,0| 1,1 | 0,0 | 1 | |3|0,0| 0,0 | 1,1 | -1 | |4|0,0| 0,1 | 1,0 | 0 | |5|0,0| 1,0 | 0,1 | 0 | |6|0,0| 1,1 | 1,0 | 1 | |7|0,0| 1,0 | 1,1 | -1 | Updated Patch has followig change : # Changed the logic for comparing lhs and rhs resources when clusterResource is empty as suggested. # Added test for AMLimit usage. # Addred test for all above cobination of inputs. Kindly review the patch DominantRC#compare() does not work as expected if cluster resource is empty --- Key: YARN-3733 URL: https://issues.apache.org/jira/browse/YARN-3733 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 , 2 NM , 2 RM one NM - 3 GB 6 v core Reporter: Bibin A Chundatt Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, YARN-3733.patch Steps to reproduce = 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster) 2. Configure map and reduce size to 512 MB after changing scheduler minimum size to 512 MB 3. Configure capacity scheduler and AM limit to .5 (DominantResourceCalculator is configured) 4. Submit 30 concurrent task 5. Switch RM Actual = For 12 Jobs AM gets allocated and all 12 starts running No other Yarn child is initiated , *all 12 Jobs in Running state for ever* Expected === Only 6 should be running at a time since max AM allocated is .5 (3072 MB) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3754) Race condition when the NodeManager is shutting down and container is launched
[ https://issues.apache.org/jira/browse/YARN-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3754: -- Attachment: NM.log Race condition when the NodeManager is shutting down and container is launched -- Key: YARN-3754 URL: https://issues.apache.org/jira/browse/YARN-3754 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Assignee: Sunil G Priority: Critical Attachments: NM.log Container is launched and returned to ContainerImpl NodeManager closed the DB connection which resulting in {{org.iq80.leveldb.DBException: Closed}}. *Attaching the exception trace* {code} 2015-05-30 02:11:49,122 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Unable to update state store diagnostics for container_e310_1432817693365_3338_01_02 java.io.IOException: org.iq80.leveldb.DBException: Closed at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostics(NMLeveldbStateStoreService.java:261) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ContainerDiagnosticsUpdateTransition.transition(ContainerImpl.java:1109) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ContainerDiagnosticsUpdateTransition.transition(ContainerImpl.java:1101) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1129) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:83) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:246) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.iq80.leveldb.DBException: Closed at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:123) at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:106) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostics(NMLeveldbStateStoreService.java:259) ... 15 more {code} we can add a check whether DB is closed while we move container from ACQUIRED state. As per the discussion in YARN-3585 have add the same -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3754) Race condition when the NodeManager is shutting down and container is launched
[ https://issues.apache.org/jira/browse/YARN-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570790#comment-14570790 ] Sunil G commented on YARN-3754: --- I have got the logs from [~bibinchundatt] offline. {noformat} 2015-05-30 01:11:16,179 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_e313_1432908361253_4506_01_01 and exit code: 0 java.io.IOException: java.lang.InterruptedException ... ... 2015-05-30 01:11:16,179 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Unable to update diagnostics in state store for container_e313_1432908361253_4506_01_01 java.io.IOException: org.iq80.leveldb.DBException: Closed at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostic {noformat} When NM is shutting down, ContainerLaunch is also interrupted. During this interrupted exception handling, NM tries to update container diagnostics. But from main thread statestore is down ,hence caused the DB Close exception. This scenario is handled in YARN-3641 already by [~djp] . [~bibinchundatt] could you please update this patch and check this and we can close this ticket as duplicate. Attaching NM logs too. Race condition when the NodeManager is shutting down and container is launched -- Key: YARN-3754 URL: https://issues.apache.org/jira/browse/YARN-3754 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Assignee: Sunil G Priority: Critical Container is launched and returned to ContainerImpl NodeManager closed the DB connection which resulting in {{org.iq80.leveldb.DBException: Closed}}. *Attaching the exception trace* {code} 2015-05-30 02:11:49,122 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Unable to update state store diagnostics for container_e310_1432817693365_3338_01_02 java.io.IOException: org.iq80.leveldb.DBException: Closed at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostics(NMLeveldbStateStoreService.java:261) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ContainerDiagnosticsUpdateTransition.transition(ContainerImpl.java:1109) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ContainerDiagnosticsUpdateTransition.transition(ContainerImpl.java:1101) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1129) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:83) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:246) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.iq80.leveldb.DBException: Closed at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:123) at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:106) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerDiagnostics(NMLeveldbStateStoreService.java:259) ... 15 more {code} we can add a check whether DB is closed while we move container from ACQUIRED state. As per the discussion in YARN-3585 have add the same -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor
[ https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570970#comment-14570970 ] Sunil G commented on YARN-3745: --- HI [~lavkesh] Thanks for working on this patch. In initExceptionWithConstructor, I feel *IllegalArgumentException* also has to be thrown. Its missing now. SerializedException should also try to instantiate internal exception with the default constructor -- Key: YARN-3745 URL: https://issues.apache.org/jira/browse/YARN-3745 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Lavkesh Lahngir Assignee: Lavkesh Lahngir Attachments: YARN-3745.1.patch, YARN-3745.patch While deserialising a SerializedException it tries to create internal exception in instantiateException() with cn = cls.getConstructor(String.class). if cls does not has a constructor with String parameter it throws Nosuchmethodexception for example ClosedChannelException class. We should also try to instantiate exception with default constructor so that inner exception can to propagated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor
[ https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570989#comment-14570989 ] Lavkesh Lahngir commented on YARN-3745: --- [~sunilg] : Uh.. IllegalArgumentException is not a checked Exception. It is not needed to be declared thrown. SerializedException should also try to instantiate internal exception with the default constructor -- Key: YARN-3745 URL: https://issues.apache.org/jira/browse/YARN-3745 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Lavkesh Lahngir Assignee: Lavkesh Lahngir Attachments: YARN-3745.1.patch, YARN-3745.patch While deserialising a SerializedException it tries to create internal exception in instantiateException() with cn = cls.getConstructor(String.class). if cls does not has a constructor with String parameter it throws Nosuchmethodexception for example ClosedChannelException class. We should also try to instantiate exception with default constructor so that inner exception can to propagated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor
[ https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571235#comment-14571235 ] Sunil G commented on YARN-3745: --- Yes. Missed it :) Thanks! SerializedException should also try to instantiate internal exception with the default constructor -- Key: YARN-3745 URL: https://issues.apache.org/jira/browse/YARN-3745 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Lavkesh Lahngir Assignee: Lavkesh Lahngir Attachments: YARN-3745.1.patch, YARN-3745.patch While deserialising a SerializedException it tries to create internal exception in instantiateException() with cn = cls.getConstructor(String.class). if cls does not has a constructor with String parameter it throws Nosuchmethodexception for example ClosedChannelException class. We should also try to instantiate exception with default constructor so that inner exception can to propagated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571275#comment-14571275 ] Hadoop QA commented on YARN-41: --- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 20m 30s | Pre-patch trunk has 3 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 12 new or modified test files. | | {color:green}+1{color} | javac | 9m 47s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 12m 20s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 30s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 3m 15s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 42s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 2m 6s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 42s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 7m 4s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 35s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 0m 39s | Tests passed in hadoop-yarn-server-common. | | {color:red}-1{color} | yarn tests | 7m 23s | Tests failed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 48m 0s | Tests failed in hadoop-yarn-server-resourcemanager. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-server-tests. | | | | 115m 37s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService | | | hadoop.yarn.server.resourcemanager.TestRMRestart | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12735565/YARN-41-8.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c59e745 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8175/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8175/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8175/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8175/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8175/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-tests test log | https://builds.apache.org/job/PreCommit-YARN-Build/8175/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8175/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8175/console | This message was automatically generated. The RM should handle the graceful shutdown of the NM. - Key: YARN-41 URL: https://issues.apache.org/jira/browse/YARN-41 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Ravi Teja Ch N V Assignee: Devaraj K Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41-7.patch, YARN-41-8.patch, YARN-41.patch Instead of waiting for the NM expiry, RM should remove and handle the NM, which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3534) Collect memory/cpu usage on the node
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571318#comment-14571318 ] Hadoop QA commented on YARN-3534: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 42s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:red}-1{color} | javac | 7m 34s | The applied patch generated 1 additional warning messages. | | {color:red}-1{color} | javadoc | 9m 39s | The applied patch generated 3 additional warning messages. | | {color:red}-1{color} | release audit | 0m 18s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 24s | The applied patch generated 9 new checkstyle issues (total was 212, now 220). | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 18s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 21s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 6m 4s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 50m 53s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12737047/YARN-3534-10.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c59e745 | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/8176/artifact/patchprocess/diffJavacWarnings.txt | | javadoc | https://builds.apache.org/job/PreCommit-YARN-Build/8176/artifact/patchprocess/diffJavadocWarnings.txt | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/8176/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8176/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8176/artifact/patchprocess/whitespace.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8176/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8176/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8176/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8176/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8176/console | This message was automatically generated. Collect memory/cpu usage on the node Key: YARN-3534 URL: https://issues.apache.org/jira/browse/YARN-3534 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Assignee: Inigo Goiri Attachments: YARN-3534-1.patch, YARN-3534-10.patch, YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch, YARN-3534-4.patch, YARN-3534-5.patch, YARN-3534-6.patch, YARN-3534-7.patch, YARN-3534-8.patch, YARN-3534-9.patch Original Estimate: 336h Remaining Estimate: 336h YARN should be aware of the resource utilization of the nodes when scheduling containers. For this, this task will implement the collection of memory/cpu usage on the node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571335#comment-14571335 ] Sunil G commented on YARN-3751: --- Hi TestAHSWebServices fails after YARN-3467 Key: YARN-3751 URL: https://issues.apache.org/jira/browse/YARN-3751 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Sunil G Attachments: 0001-YARN-3751.patch YARN-3467 changed AppInfo and assumed that used resource is not null. It's not true as this information is not published to timeline server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571272#comment-14571272 ] Karthik Kambatla commented on YARN-3762: The patch just adds read-write locks to address any races. Haven't added any tests since it is hard to test race conditions. The code changed is accessed directly or indirectly by existing tests. Running jcarder should catch any deadlocks introduced by this change. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo --- Key: YARN-3762 URL: https://issues.apache.org/jira/browse/YARN-3762 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: yarn-3762-1.patch, yarn-3762-1.patch In our testing, we ran into the following ConcurrentModificationException: {noformat} halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, queueName=root.testyarnpool3, queueCurrentCapacity=0.0, queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client java.util.ConcurrentModificationException: java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util.ArrayList$Itr.next(ArrayList.java:851) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
[ https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh reassigned YARN-3453: - Assignee: Arun Suresh Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing Key: YARN-3453 URL: https://issues.apache.org/jira/browse/YARN-3453 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Ashwin Shankar Assignee: Arun Suresh There are two places in preemption code flow where DefaultResourceCalculator is used, even in DRF mode. Which basically results in more resources getting preempted than needed, and those extra preempted containers aren’t even getting to the “starved” queue since scheduling logic is based on DRF's Calculator. Following are the two places : 1. {code:title=FSLeafQueue.java|borderStyle=solid} private boolean isStarved(Resource share) {code} A queue shouldn’t be marked as “starved” if the dominant resource usage is = fair/minshare. 2. {code:title=FairScheduler.java|borderStyle=solid} protected Resource resToPreempt(FSLeafQueue sched, long curTime) {code} -- One more thing that I believe needs to change in DRF mode is : during a preemption round,if preempting a few containers results in satisfying needs of a resource type, then we should exit that preemption round, since the containers that we just preempted should bring the dominant resource usage to min/fair share. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID
[ https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Shahid Khan updated YARN-3017: --- Attachment: YARN-3017_2.patch fixed the review comment ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID -- Key: YARN-3017 URL: https://issues.apache.org/jira/browse/YARN-3017 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.8.0 Reporter: MUFEED USMAN Priority: Minor Labels: PatchAvailable Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch Not sure if this should be filed as a bug or not. In the ResourceManager log in the events surrounding the creation of a new application attempt, ... ... 2014-11-14 17:45:37,258 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching masterappattempt_1412150883650_0001_02 ... ... The application attempt has the ID format _1412150883650_0001_02. Whereas the associated ContainerID goes by _1412150883650_0001_02_. ... ... 2014-11-14 17:45:37,260 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting up container Container: [ContainerId: container_1412150883650_0001_02_01, NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: memory:2048, vCores:1, disks:0.0, Priority: 0, Token: Token { kind: ContainerToken, service: 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02 ... ... Curious to know if this is kept like that for a reason. If not while using filtering tools to, say, grep events surrounding a specific attempt by the numeric ID part information may slip out during troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
[ https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-3453: -- Attachment: YARN-3453.1.patch [~peng.zhang], [~ashwinshankar77], Thank you reporting this.. and the associated discussion I vote that we : # fix the {{isStarved()}} method to use the correct Calculator # fix the {{resToPreempt()}} method to use componentWiseMin for the target... but defer using the {{targetRatio}}, since it is probably an optimization and can be addressed in a future JIRA I have attached a preliminary patch that does this.. Will upload one with test cases shortly Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing Key: YARN-3453 URL: https://issues.apache.org/jira/browse/YARN-3453 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Ashwin Shankar Assignee: Arun Suresh Attachments: YARN-3453.1.patch There are two places in preemption code flow where DefaultResourceCalculator is used, even in DRF mode. Which basically results in more resources getting preempted than needed, and those extra preempted containers aren’t even getting to the “starved” queue since scheduling logic is based on DRF's Calculator. Following are the two places : 1. {code:title=FSLeafQueue.java|borderStyle=solid} private boolean isStarved(Resource share) {code} A queue shouldn’t be marked as “starved” if the dominant resource usage is = fair/minshare. 2. {code:title=FairScheduler.java|borderStyle=solid} protected Resource resToPreempt(FSLeafQueue sched, long curTime) {code} -- One more thing that I believe needs to change in DRF mode is : during a preemption round,if preempting a few containers results in satisfying needs of a resource type, then we should exit that preemption round, since the containers that we just preempted should bring the dominant resource usage to min/fair share. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467
[ https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571338#comment-14571338 ] Sunil G commented on YARN-3751: --- Hi [~zjshen] I checked the patch and the tests are getting passed now. Please check if this is fine. TestAHSWebServices fails after YARN-3467 Key: YARN-3751 URL: https://issues.apache.org/jira/browse/YARN-3751 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Sunil G Attachments: 0001-YARN-3751.patch YARN-3467 changed AppInfo and assumed that used resource is not null. It's not true as this information is not published to timeline server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1462: -- Target Version/s: 2.8.0 (was: 2.7.1) AHS API and other AHS changes to handle tags for completed MR jobs -- Key: YARN-1462 URL: https://issues.apache.org/jira/browse/YARN-1462 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Xuan Gong Fix For: 2.8.0 Attachments: YARN-1462-branch-2.7-1.2.patch, YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, YARN-1462.3.patch, YARN-1462.4.patch AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs
[ https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571957#comment-14571957 ] Chun Chen commented on YARN-3749: - Thanks for reviewing and committing the patch, [~xgong]. We should make a copy of configuration when init MiniYARNCluster with multiple RMs -- Key: YARN-3749 URL: https://issues.apache.org/jira/browse/YARN-3749 Project: Hadoop YARN Issue Type: Bug Reporter: Chun Chen Assignee: Chun Chen Fix For: 2.8.0 Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, YARN-3749.patch When I was trying to write a test case for YARN-2674, I found DS client trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 when RM failover. But I initially set yarn.resourcemanager.address.rm1=0.0.0.0:18032, yarn.resourcemanager.address.rm2=0.0.0.0:28032 After digging, I found it is in ClientRMService where the value of yarn.resourcemanager.address.rm2 changed to 0.0.0.0:18032. See the following code in ClientRMService: {code} clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST, YarnConfiguration.RM_ADDRESS, YarnConfiguration.DEFAULT_RM_ADDRESS, server.getListenerAddress()); {code} Since we use the same instance of configuration in rm1 and rm2 and init both RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during starting of rm1. So I think it is safe to make a copy of configuration when init both of the rm. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3706) Generalize native HBase writer for additional tables
[ https://issues.apache.org/jira/browse/YARN-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572000#comment-14572000 ] Joep Rottinghuis commented on YARN-3706: It turns out that the TimelineWriterUtils.join has a bug where it returns an extra byte at the end of the return value if a null argument is passed. In attempting to fix this I realized we're having a hard time to distinguish nulls from spaces. As I was discussing the fix with [~sjlee0] I realized that we currently have a mix of replace, cleanse etc. Sometimes we replace, sometimes we strip. That is a bit of a mess. He wondered if we can simply URL Encode all columns. Rather than doing that I'm not taking the approach to URL encode the separators that are needed, and to change to ensure that we set a limit when splitting separators out again. The only downside is that we still cannot differentiate between null values and empty strings, but in most cases when we need to encode qualifiers in columns, this will not happen (entity IDs are never null). The other disadvantage is that if an identifier (rowkey, related entity key, etc.) contain URL encoded strings, we might end up decoding them. I think that is an acceptable approach. New patch with these fixes coming up. Generalize native HBase writer for additional tables Key: YARN-3706 URL: https://issues.apache.org/jira/browse/YARN-3706 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Joep Rottinghuis Assignee: Joep Rottinghuis Priority: Minor Attachments: YARN-3706-YARN-2928.001.patch, YARN-3726-YARN-2928.002.patch, YARN-3726-YARN-2928.003.patch, YARN-3726-YARN-2928.004.patch When reviewing YARN-3411 we noticed that we could change the class hierarchy a little in order to accommodate additional tables easily. In order to get ready for benchmark testing we left the original layout in place, as performance would not be impacted by the code hierarchy. Here is a separate jira to address the hierarchy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3510) Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness
[ https://issues.apache.org/jira/browse/YARN-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572010#comment-14572010 ] Sunil G commented on YARN-3510: --- Thank you [~leftnoteasy] for the pointer. I almost understood the idea overall. I will also take a look when [~cwelch] shares the patch. Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness Key: YARN-3510 URL: https://issues.apache.org/jira/browse/YARN-3510 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3510.2.patch, YARN-3510.3.patch, YARN-3510.5.patch, YARN-3510.6.patch The ProportionalCapacityPreemptionPolicy preempts as many containers from applications as it can during it's preemption run. For fifo this makes sense, as it is prempting in reverse order therefore maintaining the primacy of the oldest. For fair ordering this does not have the desired effect - instead, it should preempt a number of containers from each application which maintains a fair balance /close to a fair balance between them -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3752) TestRMFailover fails due to intermittent UnknownHostException
[ https://issues.apache.org/jira/browse/YARN-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki resolved YARN-3752. Resolution: Duplicate Fix Version/s: 2.8.0 I can not reproduce the issue after YARN-3749 is committed. I'm closing this issue as duplicate of YARN-3749. TestRMFailover fails due to intermittent UnknownHostException - Key: YARN-3752 URL: https://issues.apache.org/jira/browse/YARN-3752 Project: Hadoop YARN Issue Type: Bug Components: test Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Minor Fix For: 2.8.0 Client fails to create connection due to UnknownHostException while client retries to connect to next RM after failover in unit test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572116#comment-14572116 ] Zhijie Shen commented on YARN-1462: --- +1 for the last patch. The change in ApplicationReport should be backward compatible. [~sershe], would you please take a look? AHS API and other AHS changes to handle tags for completed MR jobs -- Key: YARN-1462 URL: https://issues.apache.org/jira/browse/YARN-1462 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Xuan Gong Fix For: 2.8.0 Attachments: YARN-1462-branch-2.7-1.2.patch, YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, YARN-1462.3.patch, YARN-1462.4.patch AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2578) NM does not failover timely if RM node network connection fails
[ https://issues.apache.org/jira/browse/YARN-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572143#comment-14572143 ] Masatake Iwasaki commented on YARN-2578: Hi [~wilfreds], do you have any update on this? I saw the same issue in our cluster and the attached patch worked. I would like the fix to comes in the next release. If you do not have enough time, I would like to take over. Otherwise we can commit the current patch and fix hadoop-common later. It still applies to trunk and branch-2. NM does not failover timely if RM node network connection fails --- Key: YARN-2578 URL: https://issues.apache.org/jira/browse/YARN-2578 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.1 Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg Attachments: YARN-2578.patch The NM does not fail over correctly when the network cable of the RM is unplugged or the failure is simulated by a service network stop or a firewall that drops all traffic on the node. The RM fails over to the standby node when the failure is detected as expected. The NM should than re-register with the new active RM. This re-register takes a long time (15 minutes or more). Until then the cluster has no nodes for processing and applications are stuck. Reproduction test case which can be used in any environment: - create a cluster with 3 nodes node 1: ZK, NN, JN, ZKFC, DN, RM, NM node 2: ZK, NN, JN, ZKFC, DN, RM, NM node 3: ZK, JN, DN, NM - start all services make sure they are in good health - kill the network connection of the RM that is active using one of the network kills from above - observe the NN and RM failover - the DN's fail over to the new active NN - the NM does not recover for a long time - the logs show a long delay and traces show no change at all The stack traces of the NM all show the same set of threads. The main thread which should be used in the re-register is the Node Status Updater This thread is stuck in: {code} Node Status Updater prio=10 tid=0x7f5a6cc99800 nid=0x18d0 in Object.wait() [0x7f5a51fc1000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xed62f488 (a org.apache.hadoop.ipc.Client$Call) at java.lang.Object.wait(Object.java:503) at org.apache.hadoop.ipc.Client.call(Client.java:1395) - locked 0xed62f488 (a org.apache.hadoop.ipc.Client$Call) at org.apache.hadoop.ipc.Client.call(Client.java:1362) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy26.nodeHeartbeat(Unknown Source) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80) {code} The client connection which goes through the proxy can be traced back to the ResourceTrackerPBClientImpl. The generated proxy does not time out and we should be using a version which takes the RPC timeout (from the configuration) as a parameter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-19) 4-layer topology (with NodeGroup layer) implementation of Container Assignment and Task Scheduling (for YARN)
[ https://issues.apache.org/jira/browse/YARN-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572210#comment-14572210 ] Junping Du commented on YARN-19: Hi [~ashahab], thanks for your feedback on this! I remember long time ago, the community decide to go hierarchical way instead of plugable way so the patch here may not suitable to go forward (please check YARN-18 design doc for details). I haven't get bandwidth to follow up the new design for a new implementation given other priorities. However, if you are interested, please feel free to take over YARN-18 and 19 and move it forward (better to conform with new design), and I will try to help on review. 4-layer topology (with NodeGroup layer) implementation of Container Assignment and Task Scheduling (for YARN) - Key: YARN-19 URL: https://issues.apache.org/jira/browse/YARN-19 Project: Hadoop YARN Issue Type: New Feature Reporter: Junping Du Assignee: Junping Du Attachments: HADOOP-8475-ContainerAssignmentTaskScheduling-withNodeGroup.patch, MAPREDUCE-4310-v1.patch, MAPREDUCE-4310.patch, YARN-19-v2.patch, YARN-19-v3-alpha.patch, YARN-19-v4.patch, YARN-19.patch There are several classes in YARN’s container assignment and task scheduling algorithms that related to data locality which were updated to give preference to running a container on the same nodegroup. This section summarized the changes in the patch that provides a new implementation to support a four-layer hierarchy. When the ApplicationMaster makes a resource allocation request to the scheduler of ResourceManager, it will add the node group to the list of attributes in the ResourceRequest. The parameters of the resource request will change from priority, (host, rack, *), memory, #containers to priority, (host, nodegroup, rack, *), memory, #containers. After receiving the ResoureRequest the RM scheduler will assign containers for requests in the sequence of data-local, nodegroup-local, rack-local and off-switch.Then, ApplicationMaster schedules tasks on allocated containers in sequence of data- local, nodegroup-local, rack-local and off-switch. In terms of code changes made to YARN task scheduling, we updated the class ContainerRequestEvent so that applications can requests for containers can include anodegroup. In RM schedulers, FifoScheduler and CapacityScheduler were updated. For the FifoScheduler, the changes were in the method assignContainers. For the Capacity Scheduler the method assignContainersOnNode in the class of LeafQueue was updated. In both changes a new method, assignNodeGroupLocalContainers() was added in between the assignment data-local and rack-local. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3733) DominantRC#compare() does not work as expected if cluster resource is empty
[ https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572085#comment-14572085 ] Rohith commented on YARN-3733: -- bq. only memory or vcores are more in TestCapacityScheduler. All the combination of inputs are verified in the TestResourceCalculator. And in TestCapacityScheduler, app submission happens only for memory in {{MockRM.submitApp}}, so default vcore minimum allocation is 1 which will be taken by default. So just changing memory to {{amResourceLimit.getMemory() + 2}} should enough. bq. TestCapacityScheduler#verifyAMLimitForLeafQueue, while submitting second app, you could change the app name to app-2. Agree. I will upload a patch soon DominantRC#compare() does not work as expected if cluster resource is empty --- Key: YARN-3733 URL: https://issues.apache.org/jira/browse/YARN-3733 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 , 2 NM , 2 RM one NM - 3 GB 6 v core Reporter: Bibin A Chundatt Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, YARN-3733.patch Steps to reproduce = 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster) 2. Configure map and reduce size to 512 MB after changing scheduler minimum size to 512 MB 3. Configure capacity scheduler and AM limit to .5 (DominantResourceCalculator is configured) 4. Submit 30 concurrent task 5. Switch RM Actual = For 12 Jobs AM gets allocated and all 12 starts running No other Yarn child is initiated , *all 12 Jobs in Running state for ever* Expected === Only 6 should be running at a time since max AM allocated is .5 (3072 MB) -- This message was sent by Atlassian JIRA (v6.3.4#6332)