[jira] [Commented] (YARN-8118) Better utilize gracefully decommissioning node managers
[ https://issues.apache.org/jira/browse/YARN-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429629#comment-16429629 ] Junping Du commented on YARN-8118: -- Thanks for contributing your idea and code, [~Karthik Palaniappan]! As Jason mentioned above, our main goal here is to remove decommissioning nodes from service ASAP with least price of interrupting existing progress that applications already made (existing containers running). in my opinion, in most cases, there is no significant difference between containers to be scheduled by existing applications or new applications. If there are any, the right solution should be via priority/preemption mechanism between applications. In another word, we don't have assumption on priority differences between existing and new applications in our typical decommissioning cases. However, in a pure cloud environment (like EMR, etc.), the scenario could be different - what I can imagine (please correct me if I am wrong) is: user(also an admin in yarn prospective) drop most workloads to a dedicated yarn cluster and wish the cluster can shrink to some minimal size later when applications get finished. If this is the case that current design and code want to target, then we should take Jason's suggestion above to have a new configure for cluster or a new parameter for graceful decommission CLI. We need to be careful here as previous decommissioning nodes operation is idempotent, here we need to figure out what means if new applications get submitted between multiple operations and how to track them - I don't think the current code provide a way. > Better utilize gracefully decommissioning node managers > --- > > Key: YARN-8118 > URL: https://issues.apache.org/jira/browse/YARN-8118 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.8.2 > Environment: * Google Compute Engine (Dataproc) > * Java 8 > * Hadoop 2.8.2 using client-mode graceful decommissioning >Reporter: Karthik Palaniappan >Priority: Major > Attachments: YARN-8118-branch-2.001.patch > > > Proposal design doc with background + details (please comment directly on > doc): > [https://docs.google.com/document/d/1hF2Bod_m7rPgSXlunbWGn1cYi3-L61KvQhPlY9Jk9Hk/edit#heading=h.ab4ufqsj47b7] > tl;dr Right now, DECOMMISSIONING nodes must wait for in-progress applications > to complete before shutting down, but they cannot run new containers from > those in-progress applications. This is wasteful, particularly in > environments where you are billed by resource usage (e.g. EC2). > Proposal: YARN should schedule containers from in-progress applications on > DECOMMISSIONING nodes, but should still avoid scheduling containers from new > applications. That will make in-progress applications complete faster and let > nodes decommission faster. Overall, this should be cheaper. > I have a working patch without unit tests that's surprisingly just a few real > lines of code (patch 001). If folks are happy with the proposal, I'll write > unit tests and also write a patch targeted at trunk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7142) Support placement policy in yarn native services
[ https://issues.apache.org/jira/browse/YARN-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429610#comment-16429610 ] Weiwei Yang edited comment on YARN-7142 at 4/8/18 3:51 AM: --- Hi [~gsaha]/[~leftnoteasy] Thanks for backing port this to branch-3.1. Not related to this task, I have a question about the format of placement policy in yaml file. It looks like it is more like an interpretation of how we specify placement constraints using Java API. I think we should be able to support a simple PC language, by specifying something like: {code:java} notin,node,foo {code} see more in [this doc|https://issues.apache.org/jira/secure/attachment/12911872/Placement%20Constraint%20Expression%20Syntax%20Specification.pdf] YARN-7921. I know this is only used distributed shell as a demo, but I think if we find this more easier to write, maybe we can use such expression here too? Just want to know your opinion. Thanks was (Author: cheersyang): Hi [~gsaha]/[~leftnoteasy] Thanks for backing port this to branch-3.1. Not related to this task, I have a question about the format of placement policy in yaml file. It looks like it is more like an interpretation of how we specify placement constraints using Java API. I think we should be able to support a simple PC language, by specifying something like: {code} notin,node,foo {code} see more in [^Placement Constraint Expression Syntax Specification.pdf], YARN-7921. I know this is only used distributed shell as a demo, but I think if we find this more easier to write, maybe we can use such expression here too? Just want to know your opinion. Thanks > Support placement policy in yarn native services > > > Key: YARN-7142 > URL: https://issues.apache.org/jira/browse/YARN-7142 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Billie Rinaldi >Assignee: Gour Saha >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-7142-branch-3.1.004.patch, YARN-7142.001.patch, > YARN-7142.002.patch, YARN-7142.003.patch, YARN-7142.004.patch > > > Placement policy exists in the API but is not implemented yet. > I have filed YARN-8074 to move the composite constraints implementation out > of this phase-1 implementation of placement policy. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7142) Support placement policy in yarn native services
[ https://issues.apache.org/jira/browse/YARN-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429610#comment-16429610 ] Weiwei Yang edited comment on YARN-7142 at 4/8/18 3:51 AM: --- Hi [~gsaha]/[~leftnoteasy] Thanks for backing port this to branch-3.1. Not related to this task, I have a question about the format of placement policy in yaml file. It looks like it is more like an interpretation of how we specify placement constraints using Java API. I think we should be able to support a simple PC language, by specifying something like: {code:java} notin,node,foo {code} see more in [^Placement Constraint Expression Syntax Specification.pdf] in YARN-7921. I know this is only used distributed shell as a demo, but I think if we find this more easier to write, maybe we can use such expression here too? Just want to know your opinion. Thanks was (Author: cheersyang): Hi [~gsaha]/[~leftnoteasy] Thanks for backing port this to branch-3.1. Not related to this task, I have a question about the format of placement policy in yaml file. It looks like it is more like an interpretation of how we specify placement constraints using Java API. I think we should be able to support a simple PC language, by specifying something like: {code:java} notin,node,foo {code} see more in [this doc|https://issues.apache.org/jira/secure/attachment/12911872/Placement%20Constraint%20Expression%20Syntax%20Specification.pdf] YARN-7921. I know this is only used distributed shell as a demo, but I think if we find this more easier to write, maybe we can use such expression here too? Just want to know your opinion. Thanks > Support placement policy in yarn native services > > > Key: YARN-7142 > URL: https://issues.apache.org/jira/browse/YARN-7142 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Billie Rinaldi >Assignee: Gour Saha >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-7142-branch-3.1.004.patch, YARN-7142.001.patch, > YARN-7142.002.patch, YARN-7142.003.patch, YARN-7142.004.patch > > > Placement policy exists in the API but is not implemented yet. > I have filed YARN-8074 to move the composite constraints implementation out > of this phase-1 implementation of placement policy. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7142) Support placement policy in yarn native services
[ https://issues.apache.org/jira/browse/YARN-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429610#comment-16429610 ] Weiwei Yang commented on YARN-7142: --- Hi [~gsaha]/[~leftnoteasy] Thanks for backing port this to branch-3.1. Not related to this task, I have a question about the format of placement policy in yaml file. It looks like it is more like an interpretation of how we specify placement constraints using Java API. I think we should be able to support a simple PC language, by specifying something like: {code} notin,node,foo {code} see more in [^Placement Constraint Expression Syntax Specification.pdf], YARN-7921. I know this is only used distributed shell as a demo, but I think if we find this more easier to write, maybe we can use such expression here too? Just want to know your opinion. Thanks > Support placement policy in yarn native services > > > Key: YARN-7142 > URL: https://issues.apache.org/jira/browse/YARN-7142 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Billie Rinaldi >Assignee: Gour Saha >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-7142-branch-3.1.004.patch, YARN-7142.001.patch, > YARN-7142.002.patch, YARN-7142.003.patch, YARN-7142.004.patch > > > Placement policy exists in the API but is not implemented yet. > I have filed YARN-8074 to move the composite constraints implementation out > of this phase-1 implementation of placement policy. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8095) Allow disable non-exclusive allocation
[ https://issues.apache.org/jira/browse/YARN-8095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429595#comment-16429595 ] Weiwei Yang commented on YARN-8095: --- Hi [~kyungwan nam] Here is your config: ||Queue||Priority||Capacity||Accessible-Parition|| |root.longlived|0|50~100|default| |root.batch|1|50~100|default| |root.label1|0|0|label1| Looks like the used-resource by {{root.batch}} and {{root.longlived}} is bigger than total resource in {{default}} partition, then it starts to use free resource from {{root.label1}}. When an app is submitted to {{root.label1}}, then it needs to kick-off the preemption for resource. But P(root.batch) > P(root.longlived) && U(root.longlived) > 50%, that's why container from {{root.longlived}} was selected for preemption right? In this case, why don't you set {{root.longlived}} priority higher if you "don’t want long-lived apps to be killed abruptly"? Thanks > Allow disable non-exclusive allocation > -- > > Key: YARN-8095 > URL: https://issues.apache.org/jira/browse/YARN-8095 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.8.3 >Reporter: kyungwan nam >Priority: Major > > We have 'longlived' Queue, which is used for long-lived apps. > In situation where default Partition resources are not enough, containers for > long-lived app can be allocated to sharable Partition. > Since then, containers for long-lived app can be easily preempted. > We don’t want long-lived apps to be killed abruptly. > Currently, non-exclusive allocation can happen regardless of whether the > queue is accessible to the sharable Partition. > It would be good if non-exclusive allocation can be disabled at queue level. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8110) AMRMProxy recover should catch for all throwable to avoid premature exit
[ https://issues.apache.org/jira/browse/YARN-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429594#comment-16429594 ] Botong Huang commented on YARN-8110: Thanks [~subru]! > AMRMProxy recover should catch for all throwable to avoid premature exit > > > Key: YARN-8110 > URL: https://issues.apache.org/jira/browse/YARN-8110 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > Fix For: 2.10.0, 3.1.1, 2.9.2 > > Attachments: YARN-8110.v1.patch > > > In NM work preserving restart, when AMRMProxy recovers applications one by > one, the current catch only catch for IOException. If one app recovery throws > other thing (e.g. RuntimeException), it will fail the entire AMRMProxy > recovery. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8060) Create default readiness check for service components
[ https://issues.apache.org/jira/browse/YARN-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429567#comment-16429567 ] genericqa commented on YARN-8060: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 51s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 26s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 109 unchanged - 3 fixed = 109 total (was 112) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 14s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 2s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 36s{color} | {color:green} hadoop-yarn-services-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 87m 35s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | YARN-8060 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment
[jira] [Updated] (YARN-8060) Create default readiness check for service components
[ https://issues.apache.org/jira/browse/YARN-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-8060: - Attachment: YARN-8060.3.patch > Create default readiness check for service components > - > > Key: YARN-8060 > URL: https://issues.apache.org/jira/browse/YARN-8060 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Attachments: YARN-8060.1.patch, YARN-8060.2.patch, YARN-8060.3.patch > > > It is currently possible for a component instance to have READY status before > the AM retrieves an IP for the container. We should make sure the IP has been > retrieved before marking the instance as READY. > This default probe could also have an option to check for a DNS entry for the > instance's hostname if a DNS address is provided. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8060) Create default readiness check for service components
[ https://issues.apache.org/jira/browse/YARN-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429480#comment-16429480 ] genericqa commented on YARN-8060: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 52s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 22s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 54s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 19s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 109 unchanged - 3 fixed = 111 total (was 112) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 16s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 54s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 12s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 36s{color} | {color:green} hadoop-yarn-services-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 18s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 88m 41s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project
[jira] [Updated] (YARN-8018) Yarn Service Upgrade: Add support for initiating service upgrade
[ https://issues.apache.org/jira/browse/YARN-8018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-8018: - Fix Version/s: 3.2.0 > Yarn Service Upgrade: Add support for initiating service upgrade > > > Key: YARN-8018 > URL: https://issues.apache.org/jira/browse/YARN-8018 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8018.001.patch, YARN-8018.002.patch, > YARN-8018.003.patch, YARN-8018.004.patch, YARN-8018.005.patch, > YARN-8018.006.patch, YARN-8018.007.patch > > > Add support for initiating service upgrade which includes the following main > changes: > # Service API to initiate upgrade > # Persist service version on hdfs > # Start the upgraded version of service -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3401) [Security] users should not be able to create a generic TimelineEntity and associate arbitrary type
[ https://issues.apache.org/jira/browse/YARN-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429456#comment-16429456 ] Haibo Chen commented on YARN-3401: -- The goal/scope of this Jira to is ensure that AMs cannot forge or tamper with data generated by YARN iteself, which we code as system entities Here are some thoughts to get the discussion going. An entity can be categorized as either 1) a system entity generated by YARN including FLOW_ACTIVITY, FLOW_RUN, YARN_APPLICATION, YARN_APPLICATION_ATTEMPT, YARN_CONTAINER (,YARN_QUEUE and YARN_USER in the future), that can only be posted/modified by YARN or 2) an entity generated by AM which can be either a subAppEntity of any custom type, or an application entity of any custom type. The proposal is: a) Since YARN does not write any SubAppEntity, AMs are free to do whatever they like with SubAppEntity. Upon receiving any SubAppEntity, there is no check being done, just store the SubAppEntity into SubAppEntityTable b) For entities in scopes other than application, FLOW_ACTIVITY or FLOW_RUN is populated within HBaseTimelineWriter upon application creation/finish events, we shall ignore any entity of such type sent from anyone; c) Both YARN and AMs can generate application-scoped entities. We can reserve YARN_APPLICATION, YARN_APPLICATION_ATTEMPT and YARN_CONTAINER. AMs are free to create application entities of any custom type excluding those that are reserved. In terms of where each type of application entity is stored in HBase, |Source|Type|Destination| |yarn|YARN_APPLICATION|ApplicationTable| |yarn|YARN_APP_ATTEMPT|EntityTable| |yarn|YARN_CONTAINER|EntityTable| |am|any unreserved type|EntityTable| *A prerequisite to achieve this is that we can identify the source of the entity, or from whom an entity is sent.* The user indicated in the REST request should be a good indicator of the source of an entity (If AMs are running as yarn user, that means the admin don't care about security at all, AMs can do all sorts of crazy things in addition to override/forge system entities). In terms of pseudo-code, the above proposal is {code:java} HBaseTimelineWriterImpl.writeEntity(TimelineCollectorContext context, TimelineEntity data, UserGroupInformation callerUgi) { if (data instanceof SubAppEntity) { //store data in SubAppEntityTable } else if (data.type == YARN_FLOW_ACTIVITY || data.type == YARN_FLOW_RUN) { //ignore } else if (data.type == YARN_APPLICATION) { // verify callerUgi is yarn && store data in ApplicationTable // update flow run upon application creation/finish events } else if (data.type == YARN_CONTAINER || data.type == YARN_APPLICATION_ATTEMPT) { //verify callerUgi is yarn && store data in EntityTable } else { // this is a custom application entity // store data in EntityTable } }{code} The complication, IMO, is around the aggregated application-level metrics posted by TimelineCollector. There are metrics such as MEMORY and CPU that are rolled up from container metrics. In that sense, MEMORY and CPU posted by TimelineCollector are system entities. And there are other aggregates rolled up from AM custom metrics. TimelineCollectors today run as yarn inside NMs, and they write application-level aggregated metrics as YARN_APPLICATION entities (system entities) to ApplicationTable (It is safe for yarn user to take app-custom metrics and write them as system entities, but not for AM users to write system entities). But if we were to run TimelineCollector along with the AM as the AM user (not sure how far away we are, but that was indicated as the ultimate mode in YARN-2928), they should not be trusted any more. An implication of that is we cannot trust MEMORY and CPU sent from TimelineCollector either. I am not sure what is the best strategy is to address the intrinsic conflict between the fact that aggregated app-level MEMORY, CPU usage should be generated by YARN and the possibility of TimelineCollector not running as yarn. > [Security] users should not be able to create a generic TimelineEntity and > associate arbitrary type > --- > > Key: YARN-3401 > URL: https://issues.apache.org/jira/browse/YARN-3401 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Haibo Chen >Priority: Major > Labels: YARN-5355 > > IIUC it is possible for users to create a generic TimelineEntity and set an > arbitrary entity type. For example, for a YARN app, the right entity API is > ApplicationEntity. However, today nothing stops users from instantiating a > base TimelineEntity class and set the applic
[jira] [Commented] (YARN-8060) Create default readiness check for service components
[ https://issues.apache.org/jira/browse/YARN-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429436#comment-16429436 ] Billie Rinaldi commented on YARN-8060: -- Thanks for the review, [~shaneku...@gmail.com]! Great suggestions. I think I have addressed them all in patch 2. While I was adding the configuration property to disable the default probe, I noticed some inconsistencies with how other config property values are retrieved (hardcoded defaults and the like). I couldn't resist cleaning these up and improving the descriptions in the site docs. Now there is only one property that is not retrieved through the recommended YarnServiceConf helper methods, and that is docker.network. In retrospect, perhaps we should have renamed this to have the yarn.service prefix and converted it to use the YarnServiceConf methods as well. The advantage of the the YarnServiceConf methods is that they allow the default values for properties to be read from the YARN Configuration, if the properties are specified there. > Create default readiness check for service components > - > > Key: YARN-8060 > URL: https://issues.apache.org/jira/browse/YARN-8060 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Attachments: YARN-8060.1.patch, YARN-8060.2.patch > > > It is currently possible for a component instance to have READY status before > the AM retrieves an IP for the container. We should make sure the IP has been > retrieved before marking the instance as READY. > This default probe could also have an option to check for a DNS entry for the > instance's hostname if a DNS address is provided. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8060) Create default readiness check for service components
[ https://issues.apache.org/jira/browse/YARN-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-8060: - Attachment: YARN-8060.2.patch > Create default readiness check for service components > - > > Key: YARN-8060 > URL: https://issues.apache.org/jira/browse/YARN-8060 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Attachments: YARN-8060.1.patch, YARN-8060.2.patch > > > It is currently possible for a component instance to have READY status before > the AM retrieves an IP for the container. We should make sure the IP has been > retrieved before marking the instance as READY. > This default probe could also have an option to check for a DNS entry for the > instance's hostname if a DNS address is provided. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7667) Docker Stop grace period should be configurable
[ https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429411#comment-16429411 ] Shane Kumpf commented on YARN-7667: --- Thanks for the patch, [~ebadger]. Overall this looks good. I tested a few different values and can see the correct grace period being set in the cmd file and in the stop call in the docker daemon debug logs: {code:java} Apr 07 15:05:44 host dockerd[8987]: time="2018-04-07T15:05:44.577115114Z" level=debug msg="Calling POST /v1.37/containers/container_e01_1523113380632_0001_01_02/stop?t=2" Apr 07 15:05:44 host dockerd[8987]: time="2018-04-07T15:05:44.577396387Z" level=debug msg="Sending kill signal 15 to container 3dd78436322153be5ee0437296e40bfd818fbc17b7015bc37db3d97d977fc0a1" Apr 07 15:05:46 host dockerd[8987]: time="2018-04-07T15:05:46.596738936Z" level=info msg="Container 3dd78436322153be5ee0437296e40bfd818fbc17b7015bc37db3d97d977fc0a1 failed to exit within 2 seconds of signal 15 - using the force"{code} Couple minor comments on the patch: # Is there a reason not to set the value for {{yarn.nodemanager.runtime.linux.docker.stop.grace-period}} in {{yarn-default.xml}} to 10? # I don't think the new {{DockerStopCommand}} constructor is necessary, {{new DockerStopCommand(containerId).setGracePeriod(dockerStopGracePeriod)}} would achieve the same. > Docker Stop grace period should be configurable > --- > > Key: YARN-7667 > URL: https://issues.apache.org/jira/browse/YARN-7667 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7667.001.patch, YARN-7667.002.patch, > YARN-7667.003.patch > > > {{DockerStopCommand}} has a {{setGracePeriod}} method, but it is never > called. So, the stop uses the 10 second default grace period from docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8126) [Follow up] Support auto-spawning of admin configured services during bootstrap of rm
[ https://issues.apache.org/jira/browse/YARN-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429366#comment-16429366 ] Rohith Sharma K S commented on YARN-8126: - Copy pasting [~gsaha] comments from YARN-8048. 1. SystemServiceManagerImpl.java {code:java} if (!services.add(service)) { int count = ignoredUserServices.containsKey(userName) ? ignoredUserServices.get(userName) : 0; ignoredUserServices.put(userName, count + 1); LOG.warn( "Ignoring service {} for the user {} as it is already present," + " filename = {}", service.getName(), userName, filename); } LOG.info("Added service {} for the user {}, filename = {}", service.getName(), userName, filename); {code} I think the info log will get printed here every time the warn log is also printed inside the if block. Should the info log go in the else block? 2. Should we rename TestSystemServiceImpl.java to TestSystemServiceManagerImpl.java? 3. TestSystemServiceImpl {code:java} Assert.assertTrue( "Service name doesn't exist in expected " + "userService " + serviceNames, serviceNames.contains(next.getName())); {code} Should we combine these 2 strings into 1 -> "Service name doesn't exist in expected " + "userService " 4. Do we need to add documentation about "yarn.service.system-service.dir” anywhere? 5. Under hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services-api/src/test/resources/ should we rename the dir “users" to "system-services" to better reflect what these files/tests are for (and change resourcePath in TestSystemServiceImpl.java accordingly) 6. For an additional test to see if a dir is skipped, do you want to add a directory named “bad" (probably needs a dummy file under it otherwise github will not allow you to commit an empty dir) under the path hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services-api/src/test/resources//users/sync/user1/? > [Follow up] Support auto-spawning of admin configured services during > bootstrap of rm > - > > Key: YARN-8126 > URL: https://issues.apache.org/jira/browse/YARN-8126 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Priority: Major > > YARN-8048 adds support auto-spawning of admin configured services during > bootstrap of rm. > This JIRA is to follow up some of the comments discussed in YARN-8048. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8126) [Follow up] Support auto-spawning of admin configured services during bootstrap of rm
Rohith Sharma K S created YARN-8126: --- Summary: [Follow up] Support auto-spawning of admin configured services during bootstrap of rm Key: YARN-8126 URL: https://issues.apache.org/jira/browse/YARN-8126 Project: Hadoop YARN Issue Type: Sub-task Reporter: Rohith Sharma K S YARN-8048 adds support auto-spawning of admin configured services during bootstrap of rm. This JIRA is to follow up some of the comments discussed in YARN-8048. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8048) Support auto-spawning of admin configured services during bootstrap of rm/apiserver
[ https://issues.apache.org/jira/browse/YARN-8048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429361#comment-16429361 ] Rohith Sharma K S commented on YARN-8048: - Thanks [~gsaha] for the detailed review. I will create follow up JIRA to handle these comments rather than handling in addendum patch. > Support auto-spawning of admin configured services during bootstrap of > rm/apiserver > --- > > Key: YARN-8048 > URL: https://issues.apache.org/jira/browse/YARN-8048 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8048.001.patch, YARN-8048.002.patch, > YARN-8048.003.patch, YARN-8048.004.patch, YARN-8048.005.patch, > YARN-8048.006.patch > > > Goal is to support auto-spawning of admin configured services during > bootstrap of resourcemanager/apiserver. > *Requirement:* Some of the services might required to be consumed by yarn > itself ex: Hbase for atsv2. Instead of depending on user installed HBase or > sometimes user may not required to install HBase at all, in such conditions > running HBase app on YARN will help for ATSv2. > Before YARN cluster is started, admin configure these services spec and place > it in common location in HDFS. At the time of RM/apiServer bootstrap, these > services will be submitted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7221) Add security check for privileged docker container
[ https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429360#comment-16429360 ] Shane Kumpf commented on YARN-7221: --- Thanks for the patch, [~eyang]. Sorry I'm just getting a chance to review. I tested out this feature and didn't find any issues. {quote}do we agree on the last change to check submitting user for sudo privileges instead of yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user {quote} I agree with that approach. This is inline with the existing privileged container ACL checks in the runtime. Two minor comments on the current patch: # In the privileged case, I think we can omit adding the {{--group-add}} to the cmd file # The checkstyle issue is valid and should be addressed I'll note that I would have preferred if we did not set the user in the cmd file in this case, as having the cmd file represent the actual docker command that will be executed was a useful feature for troubleshooting purposes. This is breaking down the more we conditionally remove in c-e. However, let's move forward with the current approach of passing the user via the cmd file, to limit additional change here. > Add security check for privileged docker container > -- > > Key: YARN-7221 > URL: https://issues.apache.org/jira/browse/YARN-7221 > Project: Hadoop YARN > Issue Type: Sub-task > Components: security >Affects Versions: 3.0.0, 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-7221.001.patch, YARN-7221.002.patch, > YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, > YARN-7221.006.patch, YARN-7221.007.patch, YARN-7221.008.patch, > YARN-7221.009.patch, YARN-7221.010.patch, YARN-7221.011.patch, > YARN-7221.012.patch, YARN-7221.013.patch, YARN-7221.014.patch, > YARN-7221.015.patch, YARN-7221.016.patch, YARN-7221.017.patch, > YARN-7221.018.patch, YARN-7221.019.patch, YARN-7221.020.patch > > > When a docker is running with privileges, majority of the use case is to have > some program running with root then drop privileges to another user. i.e. > httpd to start with privileged and bind to port 80, then drop privileges to > www user. > # We should add security check for submitting users, to verify they have > "sudo" access to run privileged container. > # We should remove --user=uid:gid for privileged containers. > > Docker can be launched with --privileged=true, and --user=uid:gid flag. With > this parameter combinations, user will not have access to become root user. > All docker exec command will be drop to uid:gid user to run instead of > granting privileges. User can gain root privileges if container file system > contains files that give user extra power, but this type of image is > considered as dangerous. Non-privileged user can launch container with > special bits to acquire same level of root power. Hence, we lose control of > which image should be run with --privileges, and who have sudo rights to use > privileged container images. As the result, we should check for sudo access > then decide to parameterize --privileged=true OR --user=uid:gid. This will > avoid leading developer down the wrong path. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8048) Support auto-spawning of admin configured services during bootstrap of rm/apiserver
[ https://issues.apache.org/jira/browse/YARN-8048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429280#comment-16429280 ] Gour Saha commented on YARN-8048: - [~rohithsharma] thank you for the patch. Sorry for the late review. This is a very interesting and helpful feature and the patch looks great. Few minor comments which can be taken care of in an addendum patch of maybe another sub-task under the parent (if they make sense to you) - 1. SystemServiceManagerImpl.java {code:java} if (!services.add(service)) { int count = ignoredUserServices.containsKey(userName) ? ignoredUserServices.get(userName) : 0; ignoredUserServices.put(userName, count + 1); LOG.warn( "Ignoring service {} for the user {} as it is already present," + " filename = {}", service.getName(), userName, filename); } LOG.info("Added service {} for the user {}, filename = {}", service.getName(), userName, filename); {code} I think the info log will get printed here every time the warn log is also printed inside the if block. Should the info log go in the else block? 2. Should we rename TestSystemServiceImpl.java to TestSystemServiceManagerImpl.java? 3. TestSystemServiceImpl {code:java} Assert.assertTrue( "Service name doesn't exist in expected " + "userService " + serviceNames, serviceNames.contains(next.getName())); {code} Should we combine these 2 strings into 1 -> "Service name doesn't exist in expected " + "userService " 4. Do we need to add documentation about "yarn.service.system-service.dir” anywhere? 5. Under hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services-api/src/test/resources/ should we rename the dir “users" to "system-services" to better reflect what these files/tests are for (and change resourcePath in TestSystemServiceImpl.java accordingly) 6. For an additional test to see if a dir is skipped, do you want to add a directory named “bad" (probably needs a dummy file under it otherwise github will not allow you to commit an empty dir) under the path hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services-api/src/test/resources//users/sync/user1/? > Support auto-spawning of admin configured services during bootstrap of > rm/apiserver > --- > > Key: YARN-8048 > URL: https://issues.apache.org/jira/browse/YARN-8048 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8048.001.patch, YARN-8048.002.patch, > YARN-8048.003.patch, YARN-8048.004.patch, YARN-8048.005.patch, > YARN-8048.006.patch > > > Goal is to support auto-spawning of admin configured services during > bootstrap of resourcemanager/apiserver. > *Requirement:* Some of the services might required to be consumed by yarn > itself ex: Hbase for atsv2. Instead of depending on user installed HBase or > sometimes user may not required to install HBase at all, in such conditions > running HBase app on YARN will help for ATSv2. > Before YARN cluster is started, admin configure these services spec and place > it in common location in HDFS. At the time of RM/apiServer bootstrap, these > services will be submitted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4781) Support intra-queue preemption for fairness ordering policy.
[ https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429269#comment-16429269 ] Sunil G commented on YARN-4781: --- Thanks [~eepayne]. {quote}I'm not seeing the {{FairOrderingPolicy}} consider pending + used {quote} I think below code also calls getMagnitude {code:java} protected class FairComparator implements Comparator { @Override public int compare(final SchedulableEntity r1, final SchedulableEntity r2) { int res = (int) Math.signum( getMagnitude(r1) - getMagnitude(r2) ); return res; } } private double getMagnitude(SchedulableEntity r) { double mag = r.getSchedulingResourceUsage().getCachedUsed( CommonNodeLabelsManager.ANY).getMemorySize(); if (sizeBasedWeight) { double weight = Math.log1p(r.getSchedulingResourceUsage().getCachedDemand( CommonNodeLabelsManager.ANY).getMemorySize()) / Math.log(2); mag = mag / weight; } return mag; }{code} Also in {{AbstractComparatorOrderingPolicy}} {code:java} public static void updateSchedulingResourceUsage(ResourceUsage ru) { ru.setCachedUsed(CommonNodeLabelsManager.ANY, ru.getAllUsed()); ru.setCachedPending(CommonNodeLabelsManager.ANY, ru.getAllPending()); } {code} Hence in case we use sizeBasedWeight, we are considering pending as well. So i had this doubt.. bq.I don't think these comparators need to be {{Serializable}}, do they? Yes. You are correct. I think, we dont need this. > Support intra-queue preemption for fairness ordering policy. > > > Key: YARN-4781 > URL: https://issues.apache.org/jira/browse/YARN-4781 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: YARN-4781.001.patch, YARN-4781.002.patch, > YARN-4781.003.patch > > > We introduced fairness queue policy since YARN-3319, which will let large > applications make progresses and not starve small applications. However, if a > large application takes the queue’s resources, and containers of the large > app has long lifespan, small applications could still wait for resources for > long time and SLAs cannot be guaranteed. > Instead of wait for application release resources on their own, we need to > preempt resources of queue with fairness policy enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org