[jira] [Commented] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720988#comment-14720988 ] Varun Saxena commented on YARN-4075: Yup it can progress. > [reader REST API] implement support for querying for flows and flow runs > > > Key: YARN-4075 > URL: https://issues.apache.org/jira/browse/YARN-4075 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > > We need to be able to query for flows and flow runs via REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3933) Race condition when calling AbstractYarnScheduler.completedContainer.
[ https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720963#comment-14720963 ] Shiwei Guo commented on YARN-3933: -- Thanks for add me to the contributer list, so exciting! I have noticed Jenkins' complain and submitted a new patch in [YARN-4089|https://issues.apache.org/jira/browse/YARN-4089]. And unfortunately it still not confirm to the QA standard. I'm working on it to add a unit test for this issue, and will submit it to here soon. > Race condition when calling AbstractYarnScheduler.completedContainer. > - > > Key: YARN-3933 > URL: https://issues.apache.org/jira/browse/YARN-3933 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.2 >Reporter: Lavkesh Lahngir >Assignee: Shiwei Guo > Labels: patch > Attachments: patch.BUGFIX-JIRA-YARN-3933.txt > > > In our cluster we are seeing available memory and cores being negative. > Initial inspection: > Scenario no. 1: > In capacity scheduler the method allocateContainersToNode() checks if > there are excess reservation of containers for an application, and they are > no longer needed then it calls queue.completedContainer() which causes > resources being negative. And they were never assigned in the first place. > I am still looking through the code. Can somebody suggest how to simulate > excess containers assignments ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4093) Encapsulate additional group information in the AM to RM heartbeat
[ https://issues.apache.org/jira/browse/YARN-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720962#comment-14720962 ] Hadoop QA commented on YARN-4093: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 17s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 42s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 56s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 13s | The applied patch generated 2 new checkstyle issues (total was 24, now 26). | | {color:green}+1{color} | whitespace | 0m 4s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 5m 27s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 6m 52s | Tests passed in hadoop-yarn-client. | | {color:red}-1{color} | yarn tests | 1m 56s | Tests failed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 52m 47s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 110m 39s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-common | | Failed unit tests | hadoop.yarn.api.TestPBImplRecords | | Failed build | hadoop-yarn-server-resourcemanager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12753110/YARN-4093.v2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / e2c9b28 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8940/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8940/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8940/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8940/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8940/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8940/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8940/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8940/console | This message was automatically generated. > Encapsulate additional group information in the AM to RM heartbeat > -- > > Key: YARN-4093 > URL: https://issues.apache.org/jira/browse/YARN-4093 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, yarn >Affects Versions: 2.7.1 >Reporter: Robert Grandl >Assignee: Robert Grandl > Labels: patch > Attachments: AllocateRequest_extension.docx, YARN-4093.patch, > YARN-4093.v2.patch > > > In this JIRA we propose to enhance the AM RM protocol with a new message > which encapsulates additional information about group of tasks. The RM > scheduler will benefit of the additional information to take better decisions > at the scheduling time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720961#comment-14720961 ] Hadoop QA commented on YARN-3528: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12753117/YARN-3528-branch2.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / e2c9b28 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8941/console | This message was automatically generated. > Tests with 12345 as hard-coded port break jenkins > - > > Key: YARN-3528 > URL: https://issues.apache.org/jira/browse/YARN-3528 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 > Environment: ASF Jenkins >Reporter: Steve Loughran >Assignee: Brahma Reddy Battula >Priority: Blocker > Labels: test > Attachments: YARN-3528-002.patch, YARN-3528-003.patch, > YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, > YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528-branch2.patch, > YARN-3528.patch > > > A lot of the YARN tests have hard-coded the port 12345 for their services to > come up on. > This makes it impossible to have scheduled or precommit tests to run > consistently on the ASF jenkins hosts. Instead the tests fail regularly and > appear to get ignored completely. > A quick grep of "12345" shows up many places in the test suite where this > practise has developed. > * All {{BaseContainerManagerTest}} subclasses > * {{TestNodeManagerShutdown}} > * {{TestContainerManager}} > + others > This needs to be addressed through portscanning and dynamic port allocation. > Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-3528: --- Attachment: YARN-3528-branch2.patch > Tests with 12345 as hard-coded port break jenkins > - > > Key: YARN-3528 > URL: https://issues.apache.org/jira/browse/YARN-3528 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 > Environment: ASF Jenkins >Reporter: Steve Loughran >Assignee: Brahma Reddy Battula >Priority: Blocker > Labels: test > Attachments: YARN-3528-002.patch, YARN-3528-003.patch, > YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, > YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528-branch2.patch, > YARN-3528.patch > > > A lot of the YARN tests have hard-coded the port 12345 for their services to > come up on. > This makes it impossible to have scheduled or precommit tests to run > consistently on the ASF jenkins hosts. Instead the tests fail regularly and > appear to get ignored completely. > A quick grep of "12345" shows up many places in the test suite where this > practise has developed. > * All {{BaseContainerManagerTest}} subclasses > * {{TestNodeManagerShutdown}} > * {{TestContainerManager}} > + others > This needs to be addressed through portscanning and dynamic port allocation. > Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720958#comment-14720958 ] Brahma Reddy Battula commented on YARN-3528: [~rkanter] uploaded the branch-2 patch.. thanks.. > Tests with 12345 as hard-coded port break jenkins > - > > Key: YARN-3528 > URL: https://issues.apache.org/jira/browse/YARN-3528 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 > Environment: ASF Jenkins >Reporter: Steve Loughran >Assignee: Brahma Reddy Battula >Priority: Blocker > Labels: test > Attachments: YARN-3528-002.patch, YARN-3528-003.patch, > YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, > YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528-branch2.patch, > YARN-3528.patch > > > A lot of the YARN tests have hard-coded the port 12345 for their services to > come up on. > This makes it impossible to have scheduled or precommit tests to run > consistently on the ASF jenkins hosts. Instead the tests fail regularly and > appear to get ignored completely. > A quick grep of "12345" shows up many places in the test suite where this > practise has developed. > * All {{BaseContainerManagerTest}} subclasses > * {{TestNodeManagerShutdown}} > * {{TestContainerManager}} > + others > This needs to be addressed through portscanning and dynamic port allocation. > Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers
[ https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720937#comment-14720937 ] Hadoop QA commented on YARN-3920: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 3s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 8m 52s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 57s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 55s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 26s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 27s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 53m 25s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 96m 8s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12753105/YARN-3920.005.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / e2c9b28 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8939/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8939/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8939/console | This message was automatically generated. > FairScheduler Reserving a node for a container should be configurable to > allow it used only for large containers > > > Key: YARN-3920 > URL: https://issues.apache.org/jira/browse/YARN-3920 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3920.004.patch, YARN-3920.004.patch, > YARN-3920.004.patch, YARN-3920.004.patch, YARN-3920.005.patch, > yARN-3920.001.patch, yARN-3920.002.patch, yARN-3920.003.patch > > > Reserving a node for a container was designed for preventing large containers > from starvation from small requests that keep getting into a node. Today we > let this be used even for a small container request. This has a huge impact > on scheduling since we block other scheduling requests until that reservation > is fulfilled. We should make this configurable so its impact can be minimized > by limiting it for large container requests as originally intended. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4093) Encapsulate additional group information in the AM to RM heartbeat
[ https://issues.apache.org/jira/browse/YARN-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Grandl updated YARN-4093: Attachment: YARN-4093.v2.patch > Encapsulate additional group information in the AM to RM heartbeat > -- > > Key: YARN-4093 > URL: https://issues.apache.org/jira/browse/YARN-4093 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, yarn >Affects Versions: 2.7.1 >Reporter: Robert Grandl >Assignee: Robert Grandl > Labels: patch > Attachments: AllocateRequest_extension.docx, YARN-4093.patch, > YARN-4093.v2.patch > > > In this JIRA we propose to enhance the AM RM protocol with a new message > which encapsulates additional information about group of tasks. The RM > scheduler will benefit of the additional information to take better decisions > at the scheduling time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers
[ https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3920: Attachment: YARN-3920.005.patch Fixed whitespace > FairScheduler Reserving a node for a container should be configurable to > allow it used only for large containers > > > Key: YARN-3920 > URL: https://issues.apache.org/jira/browse/YARN-3920 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3920.004.patch, YARN-3920.004.patch, > YARN-3920.004.patch, YARN-3920.004.patch, YARN-3920.005.patch, > yARN-3920.001.patch, yARN-3920.002.patch, yARN-3920.003.patch > > > Reserving a node for a container was designed for preventing large containers > from starvation from small requests that keep getting into a node. Today we > let this be used even for a small container request. This has a huge impact > on scheduling since we block other scheduling requests until that reservation > is fulfilled. We should make this configurable so its impact can be minimized > by limiting it for large container requests as originally intended. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4093) Encapsulate additional group information in the AM to RM heartbeat
[ https://issues.apache.org/jira/browse/YARN-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720877#comment-14720877 ] Hadoop QA commented on YARN-4093: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 40s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 54s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 6s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 25s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 8s | The applied patch generated 1 new checkstyle issues (total was 24, now 25). | | {color:green}+1{color} | whitespace | 0m 4s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 5m 45s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 6m 57s | Tests failed in hadoop-yarn-client. | | {color:red}-1{color} | yarn tests | 2m 0s | Tests failed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 58m 37s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 117m 48s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-common | | Failed unit tests | hadoop.yarn.client.TestYarnApiClasses | | | hadoop.yarn.api.TestPBImplRecords | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12753093/YARN-4093.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / e2c9b28 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8938/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8938/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8938/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8938/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8938/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8938/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8938/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8938/console | This message was automatically generated. > Encapsulate additional group information in the AM to RM heartbeat > -- > > Key: YARN-4093 > URL: https://issues.apache.org/jira/browse/YARN-4093 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, yarn >Affects Versions: 2.7.1 >Reporter: Robert Grandl >Assignee: Robert Grandl > Labels: patch > Attachments: AllocateRequest_extension.docx, YARN-4093.patch > > > In this JIRA we propose to enhance the AM RM protocol with a new message > which encapsulates additional information about group of tasks. The RM > scheduler will benefit of the additional information to take better decisions > at the scheduling time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run table
[ https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720837#comment-14720837 ] Joep Rottinghuis commented on YARN-3901: Reviewed / discussed 1.patch with [~vrushalic] Comments may sound cryptic to others, but roughly we discussed these changes to make things generic (and more clear / reusable for the long run): No timestamp needed in FlowActivity table. Runs can start one day and end another. Probably start without, add later if needed. Min/Max how does the use of app-id be needed? FlowScanner currentMinCell should not consider app ID. If there is a start time for an app id, and then later another start, we should still keep the min, not the latest value. UI based on FlowActivity can enumerate active flows for that day, plus show number of runs, and # of distinct versions. Update javadoc on FlowRunKey. FlowRunTable add increment and decrement for number of running apps (during start and app end). MIN, MAX, SUM, SUM_FINAL should be AggOps Aggregation dimension = metric name (stored in column) Aggregation compaction dimension = application id For store, make the Attributes... the last argument. An attribute is a tuple of String, byte[] The MIN AggregationOperation should have a createAttribute method that takes an AggCompactionDimension as argument and return an Attribute. Assumption is that all the cells in a put are the same operation. In the general coprocessor, read the attribute (does not have to be unique). Always add a tag witht eh aggregationCompaction dimension. Set compaction tag only if compaction needs to be done (if the operation is SUM_FINAL). > Populate flow run data in the flow_run table > > > Key: YARN-3901 > URL: https://issues.apache.org/jira/browse/YARN-3901 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: YARN-3901-YARN-2928.1.patch, > YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch > > > As per the schema proposed in YARN-3815 in > https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf > filing jira to track creation and population of data in the flow run table. > Some points that are being considered: > - Stores per flow run information aggregated across applications, flow version > RM’s collector writes to on app creation and app completion > - Per App collector writes to it for metric updates at a slower frequency > than the metric updates to application table > primary key: cluster ! user ! flow ! flow run id > - Only the latest version of flow-level aggregated metrics will be kept, even > if the entity and application level keep a timeseries. > - The running_apps column will be incremented on app creation, and > decremented on app completion. > - For min_start_time the RM writer will simply write a value with the tag for > the applicationId. A coprocessor will return the min value of all written > values. - > - Upon flush and compactions, the min value between all the cells of this > column will be written to the cell without any tag (empty tag) and all the > other cells will be discarded. > - Ditto for the max_end_time, but then the max will be kept. > - Tags are represented as #type:value. The type can be not set (0), or can > indicate running (1) or complete (2). In those cases (for metrics) only > complete app metrics are collapsed on compaction. > - The m! values are aggregated (summed) upon read. Only when applications are > completed (indicated by tag type 2) can the values be collapsed. > - The application ids that have completed and been aggregated into the flow > numbers are retained in a separate column for historical tracking: we don’t > want to re-aggregate for those upon replay > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4093) Encapsulate additional group information in the AM to RM heartbeat
[ https://issues.apache.org/jira/browse/YARN-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Grandl updated YARN-4093: Summary: Encapsulate additional group information in the AM to RM heartbeat (was: Encapsulate additional information through AM to RM heartbeat) > Encapsulate additional group information in the AM to RM heartbeat > -- > > Key: YARN-4093 > URL: https://issues.apache.org/jira/browse/YARN-4093 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, yarn >Affects Versions: 2.7.1 >Reporter: Robert Grandl >Assignee: Robert Grandl > Labels: patch > Attachments: AllocateRequest_extension.docx, YARN-4093.patch > > > In this JIRA we propose to enhance the AM RM protocol with a new message > which encapsulates additional information about group of tasks. The RM > scheduler will benefit of the additional information to take better decisions > at the scheduling time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4093) Encapsulate additional information through AM to RM heartbeat
[ https://issues.apache.org/jira/browse/YARN-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Grandl updated YARN-4093: Attachment: YARN-4093.patch > Encapsulate additional information through AM to RM heartbeat > - > > Key: YARN-4093 > URL: https://issues.apache.org/jira/browse/YARN-4093 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, yarn >Affects Versions: 2.7.1 >Reporter: Robert Grandl >Assignee: Robert Grandl > Attachments: AllocateRequest_extension.docx, YARN-4093.patch > > > In this JIRA we propose to enhance the AM RM protocol with a new message > which encapsulates additional information about group of tasks. The RM > scheduler will benefit of the additional information to take better decisions > at the scheduling time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4093) Encapsulate additional information through AM to RM heartbeat
[ https://issues.apache.org/jira/browse/YARN-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Grandl updated YARN-4093: Attachment: AllocateRequest_extension.docx Added a proposed design doc > Encapsulate additional information through AM to RM heartbeat > - > > Key: YARN-4093 > URL: https://issues.apache.org/jira/browse/YARN-4093 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, yarn >Affects Versions: 2.7.1 >Reporter: Robert Grandl >Assignee: Robert Grandl > Attachments: AllocateRequest_extension.docx > > > In this JIRA we propose to enhance the AM RM protocol with a new message > which encapsulates additional information about group of tasks. The RM > scheduler will benefit of the additional information to take better decisions > at the scheduling time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4093) Encapsulate additional information through AM to RM heartbeat
[ https://issues.apache.org/jira/browse/YARN-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Grandl updated YARN-4093: Issue Type: Sub-task (was: Improvement) Parent: YARN-2745 > Encapsulate additional information through AM to RM heartbeat > - > > Key: YARN-4093 > URL: https://issues.apache.org/jira/browse/YARN-4093 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, yarn >Affects Versions: 2.7.1 >Reporter: Robert Grandl >Assignee: Robert Grandl > > In this JIRA we propose to enhance the AM RM protocol with a new message > which encapsulates additional information about group of tasks. The RM > scheduler will benefit of the additional information to take better decisions > at the scheduling time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4093) Encapsulate additional information through AM to RM heartbeat
Robert Grandl created YARN-4093: --- Summary: Encapsulate additional information through AM to RM heartbeat Key: YARN-4093 URL: https://issues.apache.org/jira/browse/YARN-4093 Project: Hadoop YARN Issue Type: Improvement Components: api, yarn Reporter: Robert Grandl Assignee: Robert Grandl In this JIRA we propose to enhance the AM RM protocol with a new message which encapsulates additional information about group of tasks. The RM scheduler will benefit of the additional information to take better decisions at the scheduling time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode
[ https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720708#comment-14720708 ] Hadoop QA commented on YARN-4092: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 20s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 57s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 2s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 57s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 39s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 0m 15s | Post-patch findbugs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client compilation is broken. | | {color:red}-1{color} | findbugs | 0m 30s | Post-patch findbugs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common compilation is broken. | | {color:red}-1{color} | findbugs | 0m 46s | Post-patch findbugs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager compilation is broken. | | {color:green}+1{color} | findbugs | 0m 46s | The patch does not introduce any new Findbugs (version ) warnings. | | {color:red}-1{color} | yarn tests | 0m 15s | Tests failed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 2m 0s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 54m 2s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 98m 56s | | \\ \\ || Reason || Tests || | Failed build | hadoop-yarn-client | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12753069/YARN-4092.2.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / cbb2495 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8936/artifact/patchprocess/whitespace.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8936/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8936/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8936/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8936/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8936/console | This message was automatically generated. > RM HA UI redirection needs to be fixed when both RMs are in standby mode > > > Key: YARN-4092 > URL: https://issues.apache.org/jira/browse/YARN-4092 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4092.1.patch, YARN-4092.2.patch, YARN-4092.3.patch > > > In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be > accessible. It will keep redirecting between both RMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS
[ https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720668#comment-14720668 ] Jason Lowe commented on YARN-3942: -- Yeah that's going to be tricky, especially if we need to move most of the code into YARN. Haven't had time to give this much thought, but the only way I can think of to keep most of the functionality in YARN is to have the timeline client be able to specify when a new "session" starts (i.e.: entity file writer should start writing to a new file and user provides some clue/hint as to what to name the file). We can then have a plugin on the entity file server side that allows apps to override the getTimelineStoreForRead functionality. If that was in place then the Tez side could start a new session (dag file) each time the dag changed. The Tez-specific plugin on the timeline server side could then translate dag/vertex/task/attempt IDs into the appropriate dag file to cache. There would still be some questions as to how the timeline store cache would be managed on the server side and how to support multiple framework-specific plugins simultaneously. > Timeline store to read events from HDFS > --- > > Key: YARN-3942 > URL: https://issues.apache.org/jira/browse/YARN-3942 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-3942.001.patch > > > This adds a new timeline store plugin that is intended as a stop-gap measure > to mitigate some of the issues we've seen with ATS v1 while waiting for ATS > v2. The intent of this plugin is to provide a workable solution for running > the Tez UI against the timeline server on a large-scale clusters running many > thousands of jobs per day. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720667#comment-14720667 ] Joep Rottinghuis commented on YARN-3862: If we need to retrieve exactly known columns (and in addition we know if it is a metric, or a config value etc) then we can add these to the scan (or get) directly through {code} addColumn(byte [] family, byte [] qualifier) {code} For ColumnPrefixFilter is also clear. That is just restricting which rows are returned (it filters the keys). The confusion starts with org.apache.hadoop.hbase.filter.QualifierFilter. That can be used to retrieve only some columns, specifically when combined with a WhileMatchFilter. In addition we have the consideration whether we want to push these limits down to HBase (which is preferable) or whether we want to just pull back everything from HBase and restrict what we serialize in the result. I think it would be cleaner to have a direct separate API (method argument) to be able to specify which columns to retrieve. If we then add specific values to the scan, or prefix patterns to a filter is up to the implementation. > Decide which contents to retrieve and send back in response in TimelineReader > - > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3862-YARN-2928.wip.01.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode
[ https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4092: Attachment: YARN-4092.3.patch > RM HA UI redirection needs to be fixed when both RMs are in standby mode > > > Key: YARN-4092 > URL: https://issues.apache.org/jira/browse/YARN-4092 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4092.1.patch, YARN-4092.2.patch, YARN-4092.3.patch > > > In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be > accessible. It will keep redirecting between both RMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2928) YARN Timeline Service: Next generation
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C reassigned YARN-2928: Assignee: Vrushali C (was: Sangjin Lee) > YARN Timeline Service: Next generation > -- > > Key: YARN-2928 > URL: https://issues.apache.org/jira/browse/YARN-2928 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal > v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx, > TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf > > > We have the application timeline server implemented in yarn per YARN-1530 and > YARN-321. Although it is a great feature, we have recognized several critical > issues and features that need to be addressed. > This JIRA proposes the design and implementation changes to address those. > This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS
[ https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720574#comment-14720574 ] Hitesh Shah commented on YARN-3942: --- [~jlowe] [~rajesh.balamohan] observed that the timeline server was running out of memory in a certain scenario. In this scenario, we are using Hive-on-Tez but Hive re-uses the application to run 100s of DAGs/queries (doAs=false with perimeter security using say Ranger or Sentry). The EntityFileStore sizes a cache based on the no. of applications it can cache but in the above scenario, even a single app could be very large. Ideally, if each "dag" was in a separate file and all of its entries treated as a single cache entity - that would probably work better but making this generic enough may be a bit tricky. Any suggestions here? > Timeline store to read events from HDFS > --- > > Key: YARN-3942 > URL: https://issues.apache.org/jira/browse/YARN-3942 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-3942.001.patch > > > This adds a new timeline store plugin that is intended as a stop-gap measure > to mitigate some of the issues we've seen with ATS v1 while waiting for ATS > v2. The intent of this plugin is to provide a workable solution for running > the Tez UI against the timeline server on a large-scale clusters running many > thousands of jobs per day. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720571#comment-14720571 ] Robert Kanter commented on YARN-3528: - It looks like this doesn't apply cleanly to branch-2. [~brahmareddy], can you create a branch-2 version of the patch? > Tests with 12345 as hard-coded port break jenkins > - > > Key: YARN-3528 > URL: https://issues.apache.org/jira/browse/YARN-3528 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0 > Environment: ASF Jenkins >Reporter: Steve Loughran >Assignee: Brahma Reddy Battula >Priority: Blocker > Labels: test > Attachments: YARN-3528-002.patch, YARN-3528-003.patch, > YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, > YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528.patch > > > A lot of the YARN tests have hard-coded the port 12345 for their services to > come up on. > This makes it impossible to have scheduled or precommit tests to run > consistently on the ASF jenkins hosts. Instead the tests fail regularly and > appear to get ignored completely. > A quick grep of "12345" shows up many places in the test suite where this > practise has developed. > * All {{BaseContainerManagerTest}} subclasses > * {{TestNodeManagerShutdown}} > * {{TestContainerManager}} > + others > This needs to be addressed through portscanning and dynamic port allocation. > Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-3528: Affects Version/s: (was: 3.0.0) 2.8.0 Target Version/s: 2.8.0 (was: 3.0.0) Issue Type: Improvement (was: Bug) > Tests with 12345 as hard-coded port break jenkins > - > > Key: YARN-3528 > URL: https://issues.apache.org/jira/browse/YARN-3528 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 > Environment: ASF Jenkins >Reporter: Steve Loughran >Assignee: Brahma Reddy Battula >Priority: Blocker > Labels: test > Attachments: YARN-3528-002.patch, YARN-3528-003.patch, > YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, > YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528.patch > > > A lot of the YARN tests have hard-coded the port 12345 for their services to > come up on. > This makes it impossible to have scheduled or precommit tests to run > consistently on the ASF jenkins hosts. Instead the tests fail regularly and > appear to get ignored completely. > A quick grep of "12345" shows up many places in the test suite where this > practise has developed. > * All {{BaseContainerManagerTest}} subclasses > * {{TestNodeManagerShutdown}} > * {{TestContainerManager}} > + others > This needs to be addressed through portscanning and dynamic port allocation. > Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode
[ https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4092: Attachment: YARN-4092.2.patch > RM HA UI redirection needs to be fixed when both RMs are in standby mode > > > Key: YARN-4092 > URL: https://issues.apache.org/jira/browse/YARN-4092 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4092.1.patch, YARN-4092.2.patch > > > In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be > accessible. It will keep redirecting between both RMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode
[ https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720560#comment-14720560 ] Xuan Gong commented on YARN-4092: - added a test case > RM HA UI redirection needs to be fixed when both RMs are in standby mode > > > Key: YARN-4092 > URL: https://issues.apache.org/jira/browse/YARN-4092 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4092.1.patch, YARN-4092.2.patch > > > In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be > accessible. It will keep redirecting between both RMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4080) Capacity planning for long running services on YARN
[ https://issues.apache.org/jira/browse/YARN-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720538#comment-14720538 ] Subru Krishnan commented on YARN-4080: -- [~mding], your proposal looks interesting and thanks for taking a look at YARN-1051. You are right that the main use case of the reservation system is to address SLAs but it can be used for capacity planning for long running services by specifying start time as now and deadline as infinity. This should provide more predictability for long running services as you can handle dynamic resource requirements of a service as YARN-1051 allows expressing time varying capacity. Additionally in combination with YARN-2877, you should be able to achieve the dynamic host based reservation mechanics you have proposed. > Capacity planning for long running services on YARN > --- > > Key: YARN-4080 > URL: https://issues.apache.org/jira/browse/YARN-4080 > Project: Hadoop YARN > Issue Type: Improvement > Components: api, resourcemanager >Reporter: MENG DING > > YARN-1197 addresses the functionality of container resource resize. One major > use case of this feature is for long running services managed by Slider to > dynamically flex up and down resource allocation of individual components > (e.g., HBase region server), based on application metrics/alerts obtained > through third-party monitoring and policy engine. > One key issue with increasing container resource at any point of time is that > the additional resource needed by the application component may not be > available *on the specific node*. In this case, we need to rely on preemption > logic to reclaim the required resource back from other (preemptable) > applications running on the same node. But this may not be possible today > because: > * preemption doesn't consider constraints of pending resource requests, such > as hard locality requirements, user limits, etc (being addressed in YARN-2154 > and possibly in YARN-3769?) > * there may not be any preemptable container available due to the fact that > no queue is over its guaranteed capacity. > What we need, ideally, is a way for YARN to support future capacity planning > of long running services. At the minimum, we need to provide a way to let > YARN know about the resource usage prediction/pattern of a long running > service. And given this knowledge, YARN should be able to preempt resources > from other applications to accommodate the resource needs of the long running > service. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode
[ https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720446#comment-14720446 ] Hadoop QA commented on YARN-4092: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 2s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 0s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 7s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 35s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 28s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 7s | The patch appears to introduce 2 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 50m 10s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 95m 26s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerEventLog | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | | | org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestMaxRunningAppsEnforcer | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12753032/YARN-4092.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / beb65c9 | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8935/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8935/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8935/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8935/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8935/console | This message was automatically generated. > RM HA UI redirection needs to be fixed when both RMs are in standby mode > > > Key: YARN-4092 > URL: https://issues.apache.org/jira/browse/YARN-4092 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4092.1.patch > > > In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be > accessible. It will keep redirecting between both RMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720406#comment-14720406 ] Li Lu commented on YARN-4075: - Hi [~varun_saxena], is this JIRA still blocked by YARN-4074, or it can progress since some of the interface discussions are reaching agreements? Thanks! > [reader REST API] implement support for querying for flows and flow runs > > > Key: YARN-4075 > URL: https://issues.apache.org/jira/browse/YARN-4075 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > > We need to be able to query for flows and flow runs via REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run table
[ https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720392#comment-14720392 ] Vrushali C commented on YARN-3901: -- Hi [~gtCarrera9] Thanks for the first review pass! To answer your questions: bq. IIUC, we now directly write data into the flow run related tables upon application start, finish, and periodic flush, and we only perform the aggregations in our coprocessors Yes, data is written to flow run and flow activity tables in a quick simple write but the correct values to be returned are determined at read time AND (TBD) at flush/compaction time. During flush/compaction, the data from various cells will be 'merged' into fewer number of cells so that next read calls are faster. bq. How are those coprocessor connected. Is it through an Hbase configuration externally, or there're some lines set them up in this patch that I missed (which is quite possible)? During table creation time, we specify the coprocessor class. This can also be done later by alter table command as desired. bq. I noticed you're performing aggregation work in the coprocessor (FlowScanner), this is slightly different to the approach in YARN-3816 (app level aggregation). My hunch is that we may need some sort of common APIs for aggregating metrics, so that we can centralize the aggregation logic? Or, why is the flow run level aggregation significantly different to app level aggregation (so that we cannot share the same aggregation logic)? There are some differences between the two aggregations, I think. Not sure if the classes can be reused without complicating development efforts. For the PoC I would like to focus on these tables independently. We could file follow up jiras to refactor the code as we see fit when the whole picture emerges, does that sound good? Keep the questions coming, thanks! > Populate flow run data in the flow_run table > > > Key: YARN-3901 > URL: https://issues.apache.org/jira/browse/YARN-3901 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: YARN-3901-YARN-2928.1.patch, > YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch > > > As per the schema proposed in YARN-3815 in > https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf > filing jira to track creation and population of data in the flow run table. > Some points that are being considered: > - Stores per flow run information aggregated across applications, flow version > RM’s collector writes to on app creation and app completion > - Per App collector writes to it for metric updates at a slower frequency > than the metric updates to application table > primary key: cluster ! user ! flow ! flow run id > - Only the latest version of flow-level aggregated metrics will be kept, even > if the entity and application level keep a timeseries. > - The running_apps column will be incremented on app creation, and > decremented on app completion. > - For min_start_time the RM writer will simply write a value with the tag for > the applicationId. A coprocessor will return the min value of all written > values. - > - Upon flush and compactions, the min value between all the cells of this > column will be written to the cell without any tag (empty tag) and all the > other cells will be discarded. > - Ditto for the max_end_time, but then the max will be kept. > - Tags are represented as #type:value. The type can be not set (0), or can > indicate running (1) or complete (2). In those cases (for metrics) only > complete app metrics are collapsed on compaction. > - The m! values are aggregated (summed) upon read. Only when applications are > completed (indicated by tag type 2) can the values be collapsed. > - The application ids that have completed and been aggregated into the flow > numbers are retained in a separate column for historical tracking: we don’t > want to re-aggregate for those upon replay > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run table
[ https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720377#comment-14720377 ] Li Lu commented on YARN-3901: - Hi [~vrushalic], thanks for the work! While looking at the latest patch, I have some general questions. IIUC, we now directly write data into the flow run related tables upon application start, finish, and periodic flush, and we only perform the aggregations in our coprocessors? I remember this design and I think this looks fine, but w.r.t the coprocessors, I'm unclear about: # How are those coprocessor connected. Is it through an Hbase configuration externally, or there're some lines set them up in this patch that I missed (which is quite possible)? # I noticed you're performing aggregation work in the coprocessor (FlowScanner), this is slightly different to the approach in YARN-3816 (app level aggregation). My hunch is that we may need some sort of common APIs for aggregating metrics, so that we can centralize the aggregation logic? Or, why is the flow run level aggregation significantly different to app level aggregation (so that we cannot share the same aggregation logic)? I'll keep looking at this patch later today, more comments may come during the weekend or next Monday. > Populate flow run data in the flow_run table > > > Key: YARN-3901 > URL: https://issues.apache.org/jira/browse/YARN-3901 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: YARN-3901-YARN-2928.1.patch, > YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch > > > As per the schema proposed in YARN-3815 in > https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf > filing jira to track creation and population of data in the flow run table. > Some points that are being considered: > - Stores per flow run information aggregated across applications, flow version > RM’s collector writes to on app creation and app completion > - Per App collector writes to it for metric updates at a slower frequency > than the metric updates to application table > primary key: cluster ! user ! flow ! flow run id > - Only the latest version of flow-level aggregated metrics will be kept, even > if the entity and application level keep a timeseries. > - The running_apps column will be incremented on app creation, and > decremented on app completion. > - For min_start_time the RM writer will simply write a value with the tag for > the applicationId. A coprocessor will return the min value of all written > values. - > - Upon flush and compactions, the min value between all the cells of this > column will be written to the cell without any tag (empty tag) and all the > other cells will be discarded. > - Ditto for the max_end_time, but then the max will be kept. > - Tags are represented as #type:value. The type can be not set (0), or can > indicate running (1) or complete (2). In those cases (for metrics) only > complete app metrics are collapsed on compaction. > - The m! values are aggregated (summed) upon read. Only when applications are > completed (indicated by tag type 2) can the values be collapsed. > - The application ids that have completed and been aggregated into the flow > numbers are retained in a separate column for historical tracking: we don’t > want to re-aggregate for those upon replay > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode
[ https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720320#comment-14720320 ] Xuan Gong commented on YARN-4092: - The purpose: when the RM finds there is no active RM at that time, it will send a request to itself with a delay. > RM HA UI redirection needs to be fixed when both RMs are in standby mode > > > Key: YARN-4092 > URL: https://issues.apache.org/jira/browse/YARN-4092 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4092.1.patch > > > In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be > accessible. It will keep redirecting between both RMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode
[ https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4092: Attachment: YARN-4092.1.patch > RM HA UI redirection needs to be fixed when both RMs are in standby mode > > > Key: YARN-4092 > URL: https://issues.apache.org/jira/browse/YARN-4092 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4092.1.patch > > > In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be > accessible. It will keep redirecting between both RMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4082) Container shouldn't be killed when node's label updated.
[ https://issues.apache.org/jira/browse/YARN-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720310#comment-14720310 ] Varun Vasudev commented on YARN-4082: - Thanks for the patch [~leftnoteasy]. Couple of minor fixes - 1. {code} + public void incUsedResource(String nodeLabel, Resource resourceToInc, SchedulerApplicationAttempt application) { {code} and {code} + public void decUsedResource(String nodeLabel, Resource resourceToDec, SchedulerApplicationAttempt application) { {code} need to formatted for line length. 2. {code} +String newPartition; +if (newLabels.isEmpty()) { + newPartition = RMNodeLabelsManager.NO_LABEL; +} else { + newPartition = newLabels.iterator().next(); +} + +String oldPartition = node.getPartition(); {code} Can you add a comment explaining that only one label is allowed per node? Also, can you move this code outside the for loop? Seems un-neccessary to evaluate it for every application. Rest of the patch looks good to me. > Container shouldn't be killed when node's label updated. > > > Key: YARN-4082 > URL: https://issues.apache.org/jira/browse/YARN-4082 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4082.1.patch, YARN-4082.2.patch > > > From YARN-2920, containers will be killed if partition of a node changed. > Instead of killing containers, we should update resource-usage-by-partition > properly when node's partition updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode
Xuan Gong created YARN-4092: --- Summary: RM HA UI redirection needs to be fixed when both RMs are in standby mode Key: YARN-4092 URL: https://issues.apache.org/jira/browse/YARN-4092 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be accessible. It will keep redirecting between both RMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1556) NPE getting application report with a null appId
[ https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720249#comment-14720249 ] Hudson commented on YARN-1556: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #308 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/308/]) YARN-1556. NPE getting application report with a null appId. Contributed by Weiwei Yang. (junping_du: rev beb65c9465806114237aa271b07b31ff3c1f4404) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java > NPE getting application report with a null appId > > > Key: YARN-1556 > URL: https://issues.apache.org/jira/browse/YARN-1556 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.7.1 >Reporter: Steve Loughran >Assignee: Weiwei Yang >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-1556.patch > > > If you accidentally pass in a null appId to get application report, you get > an NPE back. This is arguably as intended, except that maybe a guard > statement could report this in such a way as to make it easy for callers to > track down the cause. > {code} > java.lang.NullPointerException: java.lang.NullPointerException > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) > ... 28 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720246#comment-14720246 ] Li Lu commented on YARN-4074: - bq. One thing I forgot to mention is that the current POC patch is a diff against the patch for YARN-3901, to be able to isolate the changes for this JIRA. Thanks for reminding this! I'll take a look at it shortly. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4091) Improvement: Introduce more debug/diagnostics information to detail out scheduler activity
[ https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720243#comment-14720243 ] Sunil G commented on YARN-4091: --- Thank you [~Naganarasimha] for linking this issue. yes, this will be a subset here. > Improvement: Introduce more debug/diagnostics information to detail out > scheduler activity > -- > > Key: YARN-4091 > URL: https://issues.apache.org/jira/browse/YARN-4091 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G > Attachments: Improvement on debugdiagnostic information - YARN.pdf > > > As schedulers are improved with various new capabilities, more configurations > which tunes the schedulers starts to take actions such as limit assigning > containers to an application, or introduce delay to allocate container etc. > There are no clear information passed down from scheduler to outerworld under > these various scenarios. This makes debugging very tougher. > This ticket is an effort to introduce more defined states on various parts in > scheduler where it skips/rejects container assignment, activate application > etc. Such information will help user to know whats happening in scheduler. > Attaching a short proposal for initial discussion. We would like to improve > on this as we discuss. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project
[ https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720242#comment-14720242 ] Sangjin Lee commented on YARN-4058: --- Done. Thanks. > Miscellaneous issues in NodeManager project > --- > > Key: YARN-4058 > URL: https://issues.apache.org/jira/browse/YARN-4058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Fix For: YARN-2928 > > Attachments: YARN-4058.YARN-2928.001.patch, > YARN-4058.YARN-2928.002.patch > > > # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing > # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is > created and then checked whether it exists in context.getApplications(). > everytime ApplicationImpl is created state machine is intialized and > TimelineClient is created which is required only if added to the context. > # Remove unused imports in TimelineServiceV2Publisher & > TestSystemMetricsPublisherForV2.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1556) NPE getting application report with a null appId
[ https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720234#comment-14720234 ] Hudson commented on YARN-1556: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2246 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2246/]) YARN-1556. NPE getting application report with a null appId. Contributed by Weiwei Yang. (junping_du: rev beb65c9465806114237aa271b07b31ff3c1f4404) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java > NPE getting application report with a null appId > > > Key: YARN-1556 > URL: https://issues.apache.org/jira/browse/YARN-1556 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.7.1 >Reporter: Steve Loughran >Assignee: Weiwei Yang >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-1556.patch > > > If you accidentally pass in a null appId to get application report, you get > an NPE back. This is arguably as intended, except that maybe a guard > statement could report this in such a way as to make it easy for callers to > track down the cause. > {code} > java.lang.NullPointerException: java.lang.NullPointerException > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) > ... 28 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3970) REST api support for Application Priority
[ https://issues.apache.org/jira/browse/YARN-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720233#comment-14720233 ] Sunil G commented on YARN-3970: --- Thank you [~Naganarasimha] Yes. We can have the improvement to save an extra save to statestore. And the fix for the same looks good. Few comments 1. updateAppPriority --> updateApplicationPriority. I prefer we can have full expanded name here as its a separate class to identify a web app object. 2. {{priority.getPriority() != targetPriority.getPriority()}} We could use {{!priority.equals(targetPriority)}} 3. {code} +AppPriority effectivePriority = new AppPriority( +app.getApplicationSubmissionContext().getPriority().getPriority()); {code} If {{app.getApplicationSubmissionContext().getPriority()}} is NULL, we will get n NPE here. > REST api support for Application Priority > - > > Key: YARN-3970 > URL: https://issues.apache.org/jira/browse/YARN-3970 > Project: Hadoop YARN > Issue Type: Sub-task > Components: webapp >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Naganarasimha G R > Attachments: YARN-3970.20150828-1.patch > > > REST api support for application priority. > - get/set priority of an application > - get default priority of a queue > - get cluster max priority -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project
[ https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720212#comment-14720212 ] Junping Du commented on YARN-4058: -- Yes. Adding a new commit to correct hadoop-yarn/CHANGES.txt is the right way. > Miscellaneous issues in NodeManager project > --- > > Key: YARN-4058 > URL: https://issues.apache.org/jira/browse/YARN-4058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Fix For: YARN-2928 > > Attachments: YARN-4058.YARN-2928.001.patch, > YARN-4058.YARN-2928.002.patch > > > # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing > # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is > created and then checked whether it exists in context.getApplications(). > everytime ApplicationImpl is created state machine is intialized and > TimelineClient is created which is required only if added to the context. > # Remove unused imports in TimelineServiceV2Publisher & > TestSystemMetricsPublisherForV2.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project
[ https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720204#comment-14720204 ] Sangjin Lee commented on YARN-4058: --- Thanks for finding that [~djp]. I'd love to edit that commit, but then that would disrupt you all again because I need to force push. How about adding a new commit that fixes hadoop-yarn/CHANGES.txt? > Miscellaneous issues in NodeManager project > --- > > Key: YARN-4058 > URL: https://issues.apache.org/jira/browse/YARN-4058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Fix For: YARN-2928 > > Attachments: YARN-4058.YARN-2928.001.patch, > YARN-4058.YARN-2928.002.patch > > > # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing > # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is > created and then checked whether it exists in context.getApplications(). > everytime ApplicationImpl is created state machine is intialized and > TimelineClient is created which is required only if added to the context. > # Remove unused imports in TimelineServiceV2Publisher & > TestSystemMetricsPublisherForV2.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3970) REST api support for Application Priority
[ https://issues.apache.org/jira/browse/YARN-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720101#comment-14720101 ] Hadoop QA commented on YARN-3970: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 52s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 8s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 6s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 20s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 52s | The applied patch generated 5 new checkstyle issues (total was 164, now 169). | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 31s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 54m 18s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 94m 18s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12753002/YARN-3970.20150828-1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / beb65c9 | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/8934/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8934/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8934/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8934/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8934/console | This message was automatically generated. > REST api support for Application Priority > - > > Key: YARN-3970 > URL: https://issues.apache.org/jira/browse/YARN-3970 > Project: Hadoop YARN > Issue Type: Sub-task > Components: webapp >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Naganarasimha G R > Attachments: YARN-3970.20150828-1.patch > > > REST api support for application priority. > - get/set priority of an application > - get default priority of a queue > - get cluster max priority -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1556) NPE getting application report with a null appId
[ https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720075#comment-14720075 ] Hudson commented on YARN-1556: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2265 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2265/]) YARN-1556. NPE getting application report with a null appId. Contributed by Weiwei Yang. (junping_du: rev beb65c9465806114237aa271b07b31ff3c1f4404) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java > NPE getting application report with a null appId > > > Key: YARN-1556 > URL: https://issues.apache.org/jira/browse/YARN-1556 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.7.1 >Reporter: Steve Loughran >Assignee: Weiwei Yang >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-1556.patch > > > If you accidentally pass in a null appId to get application report, you get > an NPE back. This is arguably as intended, except that maybe a guard > statement could report this in such a way as to make it easy for callers to > track down the cause. > {code} > java.lang.NullPointerException: java.lang.NullPointerException > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) > ... 28 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3260) NPE if AM attempts to register before RM processes launch event
[ https://issues.apache.org/jira/browse/YARN-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3260: Assignee: (was: Naganarasimha G R) > NPE if AM attempts to register before RM processes launch event > --- > > Key: YARN-3260 > URL: https://issues.apache.org/jira/browse/YARN-3260 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe > > The RM on one of our clusters was running behind on processing > AsyncDispatcher events, and this caused AMs to fail to register due to an > NPE. The AM was launched and attempting to register before the > RMAppAttemptImpl had processed the LAUNCHED event, and the client to AM token > had not been generated yet. The NPE occurred because the > ApplicationMasterService tried to encode the missing token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project
[ https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720017#comment-14720017 ] Junping Du commented on YARN-4058: -- [~sjlee0], the JIRA number is not correct in your commits. Please update it to a correct one. > Miscellaneous issues in NodeManager project > --- > > Key: YARN-4058 > URL: https://issues.apache.org/jira/browse/YARN-4058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Fix For: YARN-2928 > > Attachments: YARN-4058.YARN-2928.001.patch, > YARN-4058.YARN-2928.002.patch > > > # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing > # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is > created and then checked whether it exists in context.getApplications(). > everytime ApplicationImpl is created state machine is intialized and > TimelineClient is created which is required only if added to the context. > # Remove unused imports in TimelineServiceV2Publisher & > TestSystemMetricsPublisherForV2.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1556) NPE getting application report with a null appId
[ https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14719329#comment-14719329 ] Hudson commented on YARN-1556: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1049 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1049/]) YARN-1556. NPE getting application report with a null appId. Contributed by Weiwei Yang. (junping_du: rev beb65c9465806114237aa271b07b31ff3c1f4404) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/CHANGES.txt > NPE getting application report with a null appId > > > Key: YARN-1556 > URL: https://issues.apache.org/jira/browse/YARN-1556 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.7.1 >Reporter: Steve Loughran >Assignee: Weiwei Yang >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-1556.patch > > > If you accidentally pass in a null appId to get application report, you get > an NPE back. This is arguably as intended, except that maybe a guard > statement could report this in such a way as to make it easy for callers to > track down the cause. > {code} > java.lang.NullPointerException: java.lang.NullPointerException > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) > ... 28 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3933) Race condition when calling AbstractYarnScheduler.completedContainer.
[ https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14719318#comment-14719318 ] Junping Du commented on YARN-3933: -- I cancel the patch, please see Jenkins' comments: "The patch file was not named according to hadoop's naming conventions". Basically, you should rename your patch with prefix as "YARN-3933" and with postfix as ".patch". > Race condition when calling AbstractYarnScheduler.completedContainer. > - > > Key: YARN-3933 > URL: https://issues.apache.org/jira/browse/YARN-3933 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.2 >Reporter: Lavkesh Lahngir >Assignee: Shiwei Guo > Labels: patch > Attachments: patch.BUGFIX-JIRA-YARN-3933.txt > > > In our cluster we are seeing available memory and cores being negative. > Initial inspection: > Scenario no. 1: > In capacity scheduler the method allocateContainersToNode() checks if > there are excess reservation of containers for an application, and they are > no longer needed then it calls queue.completedContainer() which causes > resources being negative. And they were never assigned in the first place. > I am still looking through the code. Can somebody suggest how to simulate > excess containers assignments ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1556) NPE getting application report with a null appId
[ https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14719319#comment-14719319 ] Hudson commented on YARN-1556: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #321 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/321/]) YARN-1556. NPE getting application report with a null appId. Contributed by Weiwei Yang. (junping_du: rev beb65c9465806114237aa271b07b31ff3c1f4404) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/CHANGES.txt > NPE getting application report with a null appId > > > Key: YARN-1556 > URL: https://issues.apache.org/jira/browse/YARN-1556 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.7.1 >Reporter: Steve Loughran >Assignee: Weiwei Yang >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-1556.patch > > > If you accidentally pass in a null appId to get application report, you get > an NPE back. This is arguably as intended, except that maybe a guard > statement could report this in such a way as to make it easy for callers to > track down the cause. > {code} > java.lang.NullPointerException: java.lang.NullPointerException > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) > ... 28 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3933) Race condition when calling AbstractYarnScheduler.completedContainer.
[ https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14719313#comment-14719313 ] Junping Du commented on YARN-3933: -- This is not a lucky thing but a default behavior for YARN to schedule resource. Negative available resource just mark resource commit (consumption + reservation) is larger than current system resources which means YARN support resource over-commitment which get supported in mostly modern OS or Distributed OS. I just comment on YARN-4067 which seems to be an invalid JIRA to me. > Race condition when calling AbstractYarnScheduler.completedContainer. > - > > Key: YARN-3933 > URL: https://issues.apache.org/jira/browse/YARN-3933 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.2 >Reporter: Lavkesh Lahngir >Assignee: Shiwei Guo > Labels: patch > Attachments: patch.BUGFIX-JIRA-YARN-3933.txt > > > In our cluster we are seeing available memory and cores being negative. > Initial inspection: > Scenario no. 1: > In capacity scheduler the method allocateContainersToNode() checks if > there are excess reservation of containers for an application, and they are > no longer needed then it calls queue.completedContainer() which causes > resources being negative. And they were never assigned in the first place. > I am still looking through the code. Can somebody suggest how to simulate > excess containers assignments ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3933) Race condition when calling AbstractYarnScheduler.completedContainer.
[ https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3933: - Assignee: Shiwei Guo (was: Lavkesh Lahngir) > Race condition when calling AbstractYarnScheduler.completedContainer. > - > > Key: YARN-3933 > URL: https://issues.apache.org/jira/browse/YARN-3933 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.2 >Reporter: Lavkesh Lahngir >Assignee: Shiwei Guo > Labels: patch > Attachments: patch.BUGFIX-JIRA-YARN-3933.txt > > > In our cluster we are seeing available memory and cores being negative. > Initial inspection: > Scenario no. 1: > In capacity scheduler the method allocateContainersToNode() checks if > there are excess reservation of containers for an application, and they are > no longer needed then it calls queue.completedContainer() which causes > resources being negative. And they were never assigned in the first place. > I am still looking through the code. Can somebody suggest how to simulate > excess containers assignments ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3933) Race condition when calling AbstractYarnScheduler.completedContainer.
[ https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14719298#comment-14719298 ] Junping Du commented on YARN-3933: -- Looks like you are not YARN contributors yet, adding you to this elite group. :) > Race condition when calling AbstractYarnScheduler.completedContainer. > - > > Key: YARN-3933 > URL: https://issues.apache.org/jira/browse/YARN-3933 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.2 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Labels: patch > Attachments: patch.BUGFIX-JIRA-YARN-3933.txt > > > In our cluster we are seeing available memory and cores being negative. > Initial inspection: > Scenario no. 1: > In capacity scheduler the method allocateContainersToNode() checks if > there are excess reservation of containers for an application, and they are > no longer needed then it calls queue.completedContainer() which causes > resources being negative. And they were never assigned in the first place. > I am still looking through the code. Can somebody suggest how to simulate > excess containers assignments ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3933) Race condition when calling AbstractYarnScheduler.completedContainer.
[ https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3933: - Summary: Race condition when calling AbstractYarnScheduler.completedContainer. (was: Resources(both core and memory) are being negative) > Race condition when calling AbstractYarnScheduler.completedContainer. > - > > Key: YARN-3933 > URL: https://issues.apache.org/jira/browse/YARN-3933 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.2 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Labels: patch > Attachments: patch.BUGFIX-JIRA-YARN-3933.txt > > > In our cluster we are seeing available memory and cores being negative. > Initial inspection: > Scenario no. 1: > In capacity scheduler the method allocateContainersToNode() checks if > there are excess reservation of containers for an application, and they are > no longer needed then it calls queue.completedContainer() which causes > resources being negative. And they were never assigned in the first place. > I am still looking through the code. Can somebody suggest how to simulate > excess containers assignments ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3933) Resources(both core and memory) are being negative
[ https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14719292#comment-14719292 ] Junping Du commented on YARN-3933: -- Hi [~guoshiwei], we should just update the description and title for this JIRA instead of creating a new one. No worry. I will mark YARN-4089 as duplicated one for this JIRA and assign this JIRA to you given you would like to work on this and already have patch to fix it. > Resources(both core and memory) are being negative > -- > > Key: YARN-3933 > URL: https://issues.apache.org/jira/browse/YARN-3933 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.2 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Labels: patch > Attachments: patch.BUGFIX-JIRA-YARN-3933.txt > > > In our cluster we are seeing available memory and cores being negative. > Initial inspection: > Scenario no. 1: > In capacity scheduler the method allocateContainersToNode() checks if > there are excess reservation of containers for an application, and they are > no longer needed then it calls queue.completedContainer() which causes > resources being negative. And they were never assigned in the first place. > I am still looking through the code. Can somebody suggest how to simulate > excess containers assignments ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4067) available resource could be set negative
[ https://issues.apache.org/jira/browse/YARN-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14719279#comment-14719279 ] Junping Du commented on YARN-4067: -- I don't think we should cap negative value to zero in any case. In some cases, YARN provide flexible resource model to allow resource to do over commit. Just like OS can claim/allocate more memory resource than physical one - backed by virtual memory mechanism, YARN's resource model is also flexible here - backed by mechanisms like: resource/container preemption, dynamic resource configuration (YARN-291), etc. We never have assumption that available resource couldn't be negative value, and this negative value can notify YARN to balance resource consumption again in some way. Thus, I propose to resolve this JIRA as Not A problem or Invalid. > available resource could be set negative > > > Key: YARN-4067 > URL: https://issues.apache.org/jira/browse/YARN-4067 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4067.patch > > > as mentioned in YARN-4045 by [~leftnoteasy], available memory could be > negative due to reservation, propose to use componentwiseMax to > updateQueueStatistics in order to cap negative value to zero -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (YARN-3933) Resources(both core and memory) are being negative
[ https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3933: - Comment: was deleted (was: So I should better open a new issue instead?) > Resources(both core and memory) are being negative > -- > > Key: YARN-3933 > URL: https://issues.apache.org/jira/browse/YARN-3933 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.2 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Labels: patch > Attachments: patch.BUGFIX-JIRA-YARN-3933.txt > > > In our cluster we are seeing available memory and cores being negative. > Initial inspection: > Scenario no. 1: > In capacity scheduler the method allocateContainersToNode() checks if > there are excess reservation of containers for an application, and they are > no longer needed then it calls queue.completedContainer() which causes > resources being negative. And they were never assigned in the first place. > I am still looking through the code. Can somebody suggest how to simulate > excess containers assignments ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3970) REST api support for Application Priority
[ https://issues.apache.org/jira/browse/YARN-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3970: Attachment: YARN-3970.20150828-1.patch Hi [~sunilg] & [~rohithsharma], Attaching the first patch as per the prev discussion and also there was one issue in CapacityScheduler.updateApplicationPriority, suppose already application is running with the Cluster Max Priority and user specifies some priority greater than MaxPriority, unnecessarily RMStatestore update and queue's treeset is updated with MaxPriority again. > REST api support for Application Priority > - > > Key: YARN-3970 > URL: https://issues.apache.org/jira/browse/YARN-3970 > Project: Hadoop YARN > Issue Type: Sub-task > Components: webapp >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Naganarasimha G R > Attachments: YARN-3970.20150828-1.patch > > > REST api support for application priority. > - get/set priority of an application > - get default priority of a queue > - get cluster max priority -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default
[ https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718641#comment-14718641 ] Junping Du commented on YARN-4087: -- Patch LGTM. bq. +1, if fail-fast hasn't been in any prior release and we are not drastically altering the behavior. I believe fail-fast just involve recently. However, the default behavior when RM/NM state store get failed could be different from previous releases: it failed NM/RM daemons previously, now we could tolerant it keep running with log some error messages. We should definitely note this in our release notes. Also, may be we should mark this JIRA as incompatible (for behavior)? > Set YARN_FAIL_FAST to be false by default > - > > Key: YARN-4087 > URL: https://issues.apache.org/jira/browse/YARN-4087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4087.1.patch, YARN-4087.2.patch > > > Increasingly, I feel setting this property to be false makes more sense > especially in production environment, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously
[ https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718597#comment-14718597 ] Jason Lowe commented on YARN-4088: -- bq. See the problem with slower heartbeats is that if the tasks are short-running, there will be a cluster-wide throughput drop due to the feedback delay. The nodemanager will do an out-of-band heartbeat if a container is killed, and IMHO should do the same when a container completes (not sure what's so special about killed vs. exiting wrt. scheduling). Of course you can still get storms of heartbeats even though you explicitly tuned down the heartbeat interval if the cluster is churning containers at a very fast rate. > RM should be able to process heartbeats from NM asynchronously > -- > > Key: YARN-4088 > URL: https://issues.apache.org/jira/browse/YARN-4088 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager, scheduler >Reporter: Srikanth Kandula > > Today, the RM sequentially processes one heartbeat after another. > Imagine a 3000 server cluster with each server heart-beating every 3s. This > gives the RM 1ms on average to process each NM heartbeat. That is tough. > It is true that there are several underlying datastructures that will be > touched during heartbeat processing. So, it is non-trivial to parallelize the > NM heartbeat. Yet, it is quite doable... > Parallelizing the NM heartbeat would substantially improve the scalability of > the RM, allowing it to either > a) run larger clusters or > b) support faster heartbeats or dynamic scaling of heartbeats > c) take more asks from each application or > c) use cleverer/ more expensive algorithms such as node labels or better > packing or ... > Indeed the RM's scalability limit has been cited as the motivating reason for > a variety of efforts which will become less needed if this can be solved. > Ditto for slow heartbeats. See Sparrow and Mercury papers for example. > Can we take a shot at this? > If not, could we discuss why. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1556) NPE getting application report with a null appId
[ https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1556: - Priority: Minor (was: Trivial) > NPE getting application report with a null appId > > > Key: YARN-1556 > URL: https://issues.apache.org/jira/browse/YARN-1556 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.7.1 >Reporter: Steve Loughran >Assignee: Weiwei Yang >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-1556.patch > > > If you accidentally pass in a null appId to get application report, you get > an NPE back. This is arguably as intended, except that maybe a guard > statement could report this in such a way as to make it easy for callers to > track down the cause. > {code} > java.lang.NullPointerException: java.lang.NullPointerException > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) > ... 28 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1556) NPE getting application report with a null appId
[ https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718568#comment-14718568 ] Hudson commented on YARN-1556: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #316 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/316/]) YARN-1556. NPE getting application report with a null appId. Contributed by Weiwei Yang. (junping_du: rev beb65c9465806114237aa271b07b31ff3c1f4404) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java > NPE getting application report with a null appId > > > Key: YARN-1556 > URL: https://issues.apache.org/jira/browse/YARN-1556 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.7.1 >Reporter: Steve Loughran >Assignee: Weiwei Yang >Priority: Trivial > Fix For: 2.8.0 > > Attachments: YARN-1556.patch > > > If you accidentally pass in a null appId to get application report, you get > an NPE back. This is arguably as intended, except that maybe a guard > statement could report this in such a way as to make it easy for callers to > track down the cause. > {code} > java.lang.NullPointerException: java.lang.NullPointerException > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) > ... 28 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1556) NPE getting application report with a null appId
[ https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718525#comment-14718525 ] Hudson commented on YARN-1556: -- FAILURE: Integrated in Hadoop-trunk-Commit #8363 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8363/]) YARN-1556. NPE getting application report with a null appId. Contributed by Weiwei Yang. (junping_du: rev beb65c9465806114237aa271b07b31ff3c1f4404) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java > NPE getting application report with a null appId > > > Key: YARN-1556 > URL: https://issues.apache.org/jira/browse/YARN-1556 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.7.1 >Reporter: Steve Loughran >Assignee: Weiwei Yang >Priority: Trivial > Fix For: 2.8.0 > > Attachments: YARN-1556.patch > > > If you accidentally pass in a null appId to get application report, you get > an NPE back. This is arguably as intended, except that maybe a guard > statement could report this in such a way as to make it easy for callers to > track down the cause. > {code} > java.lang.NullPointerException: java.lang.NullPointerException > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) > ... 28 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4085) Generate file with container resource limits in the container work dir
[ https://issues.apache.org/jira/browse/YARN-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718497#comment-14718497 ] Steve Loughran commented on YARN-4085: -- +1 some YARN_CORES value which has had whatever vcore => phycore mapping the cluster has applied > Generate file with container resource limits in the container work dir > -- > > Key: YARN-4085 > URL: https://issues.apache.org/jira/browse/YARN-4085 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Minor > > Currently, a container doesn't know what resource limits are being imposed on > it. It would be helpful if the NM generated a simple file in the container > work dir with the resource limits specified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1556) NPE getting application report with a null appId
[ https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718483#comment-14718483 ] Junping Du commented on YARN-1556: -- +1. Patch LGTM. Will fix whitespace issue reported by Mr Jenkins in commit. > NPE getting application report with a null appId > > > Key: YARN-1556 > URL: https://issues.apache.org/jira/browse/YARN-1556 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.7.1 >Reporter: Steve Loughran >Assignee: Weiwei Yang >Priority: Trivial > Fix For: 2.8.0 > > Attachments: YARN-1556.patch > > > If you accidentally pass in a null appId to get application report, you get > an NPE back. This is arguably as intended, except that maybe a guard > statement could report this in such a way as to make it easy for callers to > track down the cause. > {code} > java.lang.NullPointerException: java.lang.NullPointerException > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) > ... 28 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1556) NPE getting application report with a null appId
[ https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718352#comment-14718352 ] Hadoop QA commented on YARN-1556: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 41s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 45s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 54s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 51s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 27s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 53m 33s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 92m 44s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752960/YARN-1556.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / e166c03 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8933/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8933/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8933/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8933/console | This message was automatically generated. > NPE getting application report with a null appId > > > Key: YARN-1556 > URL: https://issues.apache.org/jira/browse/YARN-1556 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.7.1 >Reporter: Steve Loughran >Assignee: Weiwei Yang >Priority: Trivial > Fix For: 2.8.0 > > Attachments: YARN-1556.patch > > > If you accidentally pass in a null appId to get application report, you get > an NPE back. This is arguably as intended, except that maybe a guard > statement could report this in such a way as to make it easy for callers to > track down the cause. > {code} > java.lang.NullPointerException: java.lang.NullPointerException > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source) > at >
[jira] [Assigned] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java
[ https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianyin Xin reassigned YARN-4090: - Assignee: Xianyin Xin > Make Collections.sort() more efficient in FSParentQueue.java > > > Key: YARN-4090 > URL: https://issues.apache.org/jira/browse/YARN-4090 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: Xianyin Xin > Attachments: sampling1.jpg, sampling2.jpg > > > Collections.sort() consumes too much time in a scheduling round. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1556) NPE getting application report with a null appId
[ https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-1556: -- Component/s: client > NPE getting application report with a null appId > > > Key: YARN-1556 > URL: https://issues.apache.org/jira/browse/YARN-1556 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.7.1 >Reporter: Steve Loughran >Assignee: Weiwei Yang >Priority: Trivial > Attachments: YARN-1556.patch > > > If you accidentally pass in a null appId to get application report, you get > an NPE back. This is arguably as intended, except that maybe a guard > statement could report this in such a way as to make it easy for callers to > track down the cause. > {code} > java.lang.NullPointerException: java.lang.NullPointerException > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) > ... 28 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1556) NPE getting application report with a null appId
[ https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-1556: -- Affects Version/s: 2.7.1 > NPE getting application report with a null appId > > > Key: YARN-1556 > URL: https://issues.apache.org/jira/browse/YARN-1556 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.7.1 >Reporter: Steve Loughran >Assignee: Weiwei Yang >Priority: Trivial > Attachments: YARN-1556.patch > > > If you accidentally pass in a null appId to get application report, you get > an NPE back. This is arguably as intended, except that maybe a guard > statement could report this in such a way as to make it easy for callers to > track down the cause. > {code} > java.lang.NullPointerException: java.lang.NullPointerException > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) > ... 28 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1556) NPE getting application report with a null appId
[ https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-1556: -- Attachment: YARN-1556.patch > NPE getting application report with a null appId > > > Key: YARN-1556 > URL: https://issues.apache.org/jira/browse/YARN-1556 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Steve Loughran >Assignee: Weiwei Yang >Priority: Trivial > Attachments: YARN-1556.patch > > > If you accidentally pass in a null appId to get application report, you get > an NPE back. This is arguably as intended, except that maybe a guard > statement could report this in such a way as to make it easy for callers to > track down the cause. > {code} > java.lang.NullPointerException: java.lang.NullPointerException > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) > ... 28 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1556) NPE getting application report with a null appId
[ https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718251#comment-14718251 ] Weiwei Yang commented on YARN-1556: --- I recently run into this problem. So I created a patch to resolve this problem. Please kindly help to review. Thanks. > NPE getting application report with a null appId > > > Key: YARN-1556 > URL: https://issues.apache.org/jira/browse/YARN-1556 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Steve Loughran >Assignee: Weiwei Yang >Priority: Trivial > > If you accidentally pass in a null appId to get application report, you get > an NPE back. This is arguably as intended, except that maybe a guard > statement could report this in such a way as to make it easy for callers to > track down the cause. > {code} > java.lang.NullPointerException: java.lang.NullPointerException > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) > ... 28 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-1556) NPE getting application report with a null appId
[ https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang reassigned YARN-1556: - Assignee: Weiwei Yang (was: haosdent) > NPE getting application report with a null appId > > > Key: YARN-1556 > URL: https://issues.apache.org/jira/browse/YARN-1556 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Steve Loughran >Assignee: Weiwei Yang >Priority: Trivial > > If you accidentally pass in a null appId to get application report, you get > an NPE back. This is arguably as intended, except that maybe a guard > statement could report this in such a way as to make it easy for callers to > track down the cause. > {code} > java.lang.NullPointerException: java.lang.NullPointerException > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) > ... 28 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4085) Generate file with container resource limits in the container work dir
[ https://issues.apache.org/jira/browse/YARN-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718231#comment-14718231 ] Steve Loughran commented on YARN-4085: -- Make it an env var, maybe one per limit (if unset for that var == not limited; allows for new resource limiting to be added later (YARN_CONTAINER_LIMIT_IO ... ) > Generate file with container resource limits in the container work dir > -- > > Key: YARN-4085 > URL: https://issues.apache.org/jira/browse/YARN-4085 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Minor > > Currently, a container doesn't know what resource limits are being imposed on > it. It would be helpful if the NM generated a simple file in the container > work dir with the resource limits specified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4083) Add a discovery mechanism for the scheduler addresss
[ https://issues.apache.org/jira/browse/YARN-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718228#comment-14718228 ] Steve Loughran commented on YARN-4083: -- +1 for some dynamicness, either the AM declares this or ZK does the heavy lifting (the YARN registry can publish the info) What's the security story here? That is: how do AM IP filters know when to bounce an HTTP Request over to the proxy? > Add a discovery mechanism for the scheduler addresss > > > Key: YARN-4083 > URL: https://issues.apache.org/jira/browse/YARN-4083 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > > Today many apps like Distributed Shell, REEF, etc rely on the fact that the > HADOOP_CONF_DIR of the NM is on the classpath to discover the scheduler > address. This JIRA proposes the addition of an explicit discovery mechanism > for the scheduler address -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3337) Provide YARN chaos monkey
[ https://issues.apache.org/jira/browse/YARN-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718157#comment-14718157 ] Robert Metzger commented on YARN-3337: -- For those looking for very simple YARN chaos monkey which is working similar as [~steve_l] described here, I have something here: https://github.com/rmetzger/yarn-chaos-monkey It is not running within the AM. In order to kill the containers, I'm basically ssh'ing into the remote host and kill the process. Maybe the link is helpful for somebody who immediately needs such a tool. > Provide YARN chaos monkey > - > > Key: YARN-3337 > URL: https://issues.apache.org/jira/browse/YARN-3337 > Project: Hadoop YARN > Issue Type: New Feature > Components: test >Affects Versions: 2.7.0 >Reporter: Steve Loughran > > To test failure resilience today you either need custom scripts or implement > Chaos Monkey-like logic in your application (SLIDER-202). > Killing AMs and containers on a schedule & probability is the core activity > here, one that could be handled by a CLI App/client lib that does this. > # entry point to have a startup delay before acting > # frequency of chaos wakeup/polling > # probability to AM failure generation (0-100) > # probability of non-AM container kill > # future: other operations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3970) REST api support for Application Priority
[ https://issues.apache.org/jira/browse/YARN-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718143#comment-14718143 ] Sunil G commented on YARN-3970: --- Hi Naga. Yes. I could see that you are planning to use {{getClientRMService}}. That saves the direct api invocation to AbstractYarnScheduler. This looks fine as all validation are handled. Thank You. On Fri, Aug 28, 2015, 12:38 PM Naganarasimha G R (JIRA) > REST api support for Application Priority > - > > Key: YARN-3970 > URL: https://issues.apache.org/jira/browse/YARN-3970 > Project: Hadoop YARN > Issue Type: Sub-task > Components: webapp >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Naganarasimha G R > > REST api support for application priority. > - get/set priority of an application > - get default priority of a queue > - get cluster max priority -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4065) container-executor error should include effective user id
[ https://issues.apache.org/jira/browse/YARN-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reassigned YARN-4065: - Assignee: Casey Brotherton > container-executor error should include effective user id > - > > Key: YARN-4065 > URL: https://issues.apache.org/jira/browse/YARN-4065 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Casey Brotherton >Assignee: Casey Brotherton >Priority: Trivial > > When container-executor fails to access it's config file, the following > message will be thrown: > {code} > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container executor initialization is : 24 > ExitCodeException exitCode=24: Invalid conf file provided : > /etc/hadoop/conf/container-executor.cfg > {code} > The real problem may be a change in the container-executor not running as set > uid root. > From: > https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-site/SecureContainer.html > {quote} > The container-executor program must be owned by root and have the permission > set ---sr-s---. > {quote} > The error message could be improved by printing out the effective user id > with the error message, and possibly the executable trying to access the > config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3970) REST api support for Application Priority
[ https://issues.apache.org/jira/browse/YARN-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718133#comment-14718133 ] Naganarasimha G R commented on YARN-3970: - Thanks for replying [~sunilg], bq. And it's also good to verify whether app is in accepted state or running state before invoking scheduler api to change priority On calling {{rm.getClientRMService().updateApplicationPriority()}} the above check will be taken care inside it. Also all acl related checks also will be handled. > REST api support for Application Priority > - > > Key: YARN-3970 > URL: https://issues.apache.org/jira/browse/YARN-3970 > Project: Hadoop YARN > Issue Type: Sub-task > Components: webapp >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Naganarasimha G R > > REST api support for application priority. > - get/set priority of an application > - get default priority of a queue > - get cluster max priority -- This message was sent by Atlassian JIRA (v6.3.4#6332)