[jira] [Commented] (YARN-2327) YARN should warn about nodes with poor clock synchronization
[ https://issues.apache.org/jira/browse/YARN-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120225#comment-14120225 ] Alejandro Abdelnur commented on YARN-2327: -- I'm not sure we should get into this. I would rely on the assumption that NTP is properly configured, including authentication to avoid attacks there. YARN should warn about nodes with poor clock synchronization Key: YARN-2327 URL: https://issues.apache.org/jira/browse/YARN-2327 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen YARN should warn about nodes with poor clock synchronization. YARN relies on approximate clock synchronization to report certain elapsed time statistics (see YARN-2251), but we currently don't warn if this assumption is violated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2327) YARN should warn about nodes with poor clock synchronization
[ https://issues.apache.org/jira/browse/YARN-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120423#comment-14120423 ] Alejandro Abdelnur commented on YARN-2327: -- fair enough, warning is good. BTW, why is this a YARN thing? shouldn't apply this to HDFS as well? (ie DTs) YARN should warn about nodes with poor clock synchronization Key: YARN-2327 URL: https://issues.apache.org/jira/browse/YARN-2327 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen YARN should warn about nodes with poor clock synchronization. YARN relies on approximate clock synchronization to report certain elapsed time statistics (see YARN-2251), but we currently don't warn if this assumption is violated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode
[ https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104179#comment-14104179 ] Alejandro Abdelnur commented on YARN-2424: -- I disagree on YARN-1253 being a breakage. Personally, I would never recommend using this in production. Given that, I'm OK with the patch if: * the NM logs print a WARN at startup stating the setting. * the container stdout/stderr also prints a WARN to alert the user of the setting. LCE should support non-cgroups, non-secure mode --- Key: YARN-2424 URL: https://issues.apache.org/jira/browse/YARN-2424 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1 Reporter: Allen Wittenauer Priority: Blocker Attachments: YARN-2424.patch After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios. This is a fairly serious regression, as turning on LCE prior to turning on full-blown security is a fairly standard procedure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (YARN-2424) LCE should support non-cgroups, non-secure mode
[ https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104257#comment-14104257 ] Alejandro Abdelnur edited comment on YARN-2424 at 8/20/14 6:10 PM: --- I disagree on me being rude (or very rude) just for disagreeing with something. IMO security fixes trump backwards compatibility. Anyway, I'm -0 with the patch if the WARNs are printed in in the RM at startup as Owen suggests. I insists that the WARN should be in the stderr/stdout of every container. Otherwise this will go completely unnoticed to users running apps. It should be obvious to them that they are exposed. was (Author: tucu00): I disagree in me being rude (or very rude) just for disagreeing with something. IMO security fixes trump backwards compatibility. Anyway, I'm -0 with the patch if the WARNs are printed in in the RM at startup as Owen suggests. I insists that the WARN should be in the stderr/stdout of every container. Otherwise this will go completely unnoticed to users running apps. It should be obvious to them that they are exposed. LCE should support non-cgroups, non-secure mode --- Key: YARN-2424 URL: https://issues.apache.org/jira/browse/YARN-2424 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1 Reporter: Allen Wittenauer Priority: Blocker Attachments: YARN-2424.patch After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios. This is a fairly serious regression, as turning on LCE prior to turning on full-blown security is a fairly standard procedure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode
[ https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104257#comment-14104257 ] Alejandro Abdelnur commented on YARN-2424: -- I disagree in me being rude (or very rude) just for disagreeing with something. IMO security fixes trump backwards compatibility. Anyway, I'm -0 with the patch if the WARNs are printed in in the RM at startup as Owen suggests. I insists that the WARN should be in the stderr/stdout of every container. Otherwise this will go completely unnoticed to users running apps. It should be obvious to them that they are exposed. LCE should support non-cgroups, non-secure mode --- Key: YARN-2424 URL: https://issues.apache.org/jira/browse/YARN-2424 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1 Reporter: Allen Wittenauer Priority: Blocker Attachments: YARN-2424.patch After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios. This is a fairly serious regression, as turning on LCE prior to turning on full-blown security is a fairly standard procedure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode
[ https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104314#comment-14104314 ] Alejandro Abdelnur commented on YARN-2424: -- if you don't have to kinit it is obvious security is OFF, no? LCE should support non-cgroups, non-secure mode --- Key: YARN-2424 URL: https://issues.apache.org/jira/browse/YARN-2424 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1 Reporter: Allen Wittenauer Priority: Blocker Attachments: YARN-2424.patch After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios. This is a fairly serious regression, as turning on LCE prior to turning on full-blown security is a fairly standard procedure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode
[ https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104993#comment-14104993 ] Alejandro Abdelnur commented on YARN-2424: -- sure, fine, enough cycles spent on this, thx. LCE should support non-cgroups, non-secure mode --- Key: YARN-2424 URL: https://issues.apache.org/jira/browse/YARN-2424 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1 Reporter: Allen Wittenauer Priority: Blocker Attachments: Y2424-1.patch, YARN-2424.patch After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios. This is a fairly serious regression, as turning on LCE prior to turning on full-blown security is a fairly standard procedure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode
[ https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102357#comment-14102357 ] Alejandro Abdelnur commented on YARN-2424: -- [~raviprak], allow to sudo to more than one user in unsecure mode, it doesn't give you any extra security. Actually, it may give you a sense of false security. On using groups in the LCE blacklist/whitelist, i'll comment in YARN-2429. LCE should support non-cgroups, non-secure mode --- Key: YARN-2424 URL: https://issues.apache.org/jira/browse/YARN-2424 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1 Reporter: Allen Wittenauer Priority: Blocker Attachments: YARN-2424.patch After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios. This is a fairly serious regression, as turning on LCE prior to turning on full-blown security is a fairly standard procedure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2429) LCE should blacklist based upon group
[ https://issues.apache.org/jira/browse/YARN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102367#comment-14102367 ] Alejandro Abdelnur commented on YARN-2429: -- Unless I'm mistaken, the blacklisting is done in the C code. Currently Hadoop uses the {{Groups}} class to fetch group info, there are multiple plugins for it (shell, ldap, jni, ...). This means that you'd have to either get all groups of the user before calling the LCE and passing them as params, or the LCE would have to connect to the same group source as the Java side of things. LCE should blacklist based upon group - Key: YARN-2429 URL: https://issues.apache.org/jira/browse/YARN-2429 Project: Hadoop YARN Issue Type: New Feature Reporter: Allen Wittenauer It should be possible to list a group to ban, not just individual users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode
[ https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102450#comment-14102450 ] Alejandro Abdelnur commented on YARN-2424: -- If security is OFF, I can simply submit a job as ANY user by simply doing -Duser.name=ANY. User ANY will be the one used by YARN and HDFS (I'll leave it up to the reader to see how to do this). I really don't like what this JIRA is proposing, and I've indicated what it would have to be done for me not to -1. LCE should support non-cgroups, non-secure mode --- Key: YARN-2424 URL: https://issues.apache.org/jira/browse/YARN-2424 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1 Reporter: Allen Wittenauer Priority: Blocker Attachments: YARN-2424.patch After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios. This is a fairly serious regression, as turning on LCE prior to turning on full-blown security is a fairly standard procedure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode
[ https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102538#comment-14102538 ] Alejandro Abdelnur commented on YARN-2424: -- [~aw], I think you are missing the point. I know that in an un-secure cluster you can fake the user name to interact with HDFS or YARN from anywhere at anytime. YARN-1253 is not about protecting HDFS or YARN, it is about protecting the node at OS level by enforcing the use of a least privileged user. LCE should support non-cgroups, non-secure mode --- Key: YARN-2424 URL: https://issues.apache.org/jira/browse/YARN-2424 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1 Reporter: Allen Wittenauer Priority: Blocker Attachments: YARN-2424.patch After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios. This is a fairly serious regression, as turning on LCE prior to turning on full-blown security is a fairly standard procedure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode
[ https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102629#comment-14102629 ] Alejandro Abdelnur commented on YARN-2424: -- Ravi, all the config in the container-executor.cfg is EXCLUSIVELY for enforcing constraints on the process to be launched, it does not restrict a launched JVM process from doing a {{System.setProperty(user.name, ANY)}} to gain access to HDFS as user ANY (if Kerberos is ON, setting 'user.name' property has no effect). BTW, I'm not OK with making this a valid configuration, it is not. LCE should support non-cgroups, non-secure mode --- Key: YARN-2424 URL: https://issues.apache.org/jira/browse/YARN-2424 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1 Reporter: Allen Wittenauer Priority: Blocker Attachments: YARN-2424.patch After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios. This is a fairly serious regression, as turning on LCE prior to turning on full-blown security is a fairly standard procedure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode
[ https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102794#comment-14102794 ] Alejandro Abdelnur commented on YARN-2424: -- Repeating myself from a previous comment: YARN-1253 is not about protecting HDFS or YARN, it is about protecting the node at OS level by enforcing the use of a least privileged user. LCE should support non-cgroups, non-secure mode --- Key: YARN-2424 URL: https://issues.apache.org/jira/browse/YARN-2424 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1 Reporter: Allen Wittenauer Priority: Blocker Attachments: YARN-2424.patch After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios. This is a fairly serious regression, as turning on LCE prior to turning on full-blown security is a fairly standard procedure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode
[ https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102941#comment-14102941 ] Alejandro Abdelnur commented on YARN-2424: -- Having more than one 'least privileged' user does not bring you any benefit as they can always step on each other by faking the username at job submission. LCE should support non-cgroups, non-secure mode --- Key: YARN-2424 URL: https://issues.apache.org/jira/browse/YARN-2424 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1 Reporter: Allen Wittenauer Priority: Blocker Attachments: YARN-2424.patch After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios. This is a fairly serious regression, as turning on LCE prior to turning on full-blown security is a fairly standard procedure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode
[ https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103063#comment-14103063 ] Alejandro Abdelnur commented on YARN-2424: -- You are saying this is proactive troubleshooting then, not meant for production? If so, then, as I said before: * the property has 'use-only-for-troubleshooting' in its name. * the NM logs print a WARN at startup and on every started container stating the flag and its un-secure nature * the container stdout/stderr also print a WARN to alert the user of the cluster setup. LCE should support non-cgroups, non-secure mode --- Key: YARN-2424 URL: https://issues.apache.org/jira/browse/YARN-2424 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1 Reporter: Allen Wittenauer Priority: Blocker Attachments: YARN-2424.patch After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios. This is a fairly serious regression, as turning on LCE prior to turning on full-blown security is a fairly standard procedure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode
[ https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101533#comment-14101533 ] Alejandro Abdelnur commented on YARN-2424: -- I really don't like it, it is not my business how you run your clusters, but this is dangerous, specially in a multi-tenancy scenario. From Allen's comment (the one I highlighted) it is not clear to me this is meant only for setup/troubleshooting usage. I would not -1 this JIRA if... * the property has 'use-only-for-troubleshooting' in its name. * the NM logs print a WARN at startup and on every started container stating the flag and its un-secure nature * the container stdout/stderr also print a WARN to alert the user of the cluster setup. LCE should support non-cgroups, non-secure mode --- Key: YARN-2424 URL: https://issues.apache.org/jira/browse/YARN-2424 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1 Reporter: Allen Wittenauer Priority: Blocker Labels: regression Attachments: YARN-2424.patch After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios. This is a fairly serious regression, as turning on LCE prior to turning on full-blown security is a fairly standard procedure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode
[ https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100134#comment-14100134 ] Alejandro Abdelnur commented on YARN-2424: -- please refer to yarn-1253 comments, it was stated there that the old behavior had security issues. LCE should support non-cgroups, non-secure mode --- Key: YARN-2424 URL: https://issues.apache.org/jira/browse/YARN-2424 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1 Reporter: Allen Wittenauer Priority: Blocker Labels: regression Attachments: YARN-2424.patch After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios. This is a fairly serious regression, as turning on LCE prior to turning on full-blown security is a fairly standard procedure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode
[ https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100158#comment-14100158 ] Alejandro Abdelnur commented on YARN-2424: -- please go over todd's comment over the security issues on sudoing as user without secure auth. definitely you don't want to do that in a multi-tenant cluster. btw, fixing a security bug is not a regression. LCE should support non-cgroups, non-secure mode --- Key: YARN-2424 URL: https://issues.apache.org/jira/browse/YARN-2424 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1 Reporter: Allen Wittenauer Priority: Blocker Labels: regression Attachments: YARN-2424.patch After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios. This is a fairly serious regression, as turning on LCE prior to turning on full-blown security is a fairly standard procedure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode
[ https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100302#comment-14100302 ] Alejandro Abdelnur commented on YARN-2424: -- I think I did, if I'm reading correctly you are stating that is better for troubleshooting, specially in multi-tenant scenarios: bq. It's also worth pointing out that one of the key benefits of running tasks as the user who submitted them is that it makes troubleshooting much easier. When one hops on a node, it is evident as to which user's tasks one is looking at it, even if those tasks aren't validated as that user. This is especially important in heavy multi-tenant scenarios. LCE should support non-cgroups, non-secure mode --- Key: YARN-2424 URL: https://issues.apache.org/jira/browse/YARN-2424 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1 Reporter: Allen Wittenauer Priority: Blocker Labels: regression Attachments: YARN-2424.patch After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios. This is a fairly serious regression, as turning on LCE prior to turning on full-blown security is a fairly standard procedure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073328#comment-14073328 ] Alejandro Abdelnur commented on YARN-2348: -- Allen suggestion of making selectable from the browser makes sense. In Oozie, we are doing this. Because JavaScript does not have built in libraries for TZ handling, what we did is: * have request parameter that specifies the desired TZ for datetime values, default value is UTC. * TZ conversion happens on the server side when producing the JSON output using the TZ request parameter. * have a REST call that returns the list of available TZ. * have a dropdown in the UI that shows the available TZs (uses the rest call from previous bullet) * use a cookie to remember the user selected TZ * if the cookie is present, set the TZ request parameter with it. ResourceManager web UI should display locale time instead of UTC time - Key: YARN-2348 URL: https://issues.apache.org/jira/browse/YARN-2348 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Leitao Guo Attachments: 1.before-change.jpg, 2.after-change.jpg, YARN-2348.patch ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display local time of users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068704#comment-14068704 ] Alejandro Abdelnur commented on YARN-796: - Wandga, previously I've missed the new doc explaining label predicates. Thanks for pointing it out. How about first shooting for the following? * RM has list of valid labels. (hot reloadable) * NMs have list of labels. (hot reloadable) * NMs report labels at registration time and on heartbeats when they change * label-expressions support (AND) only * app able to specify a label-expression when making a resource request * queues to AND augment the label expression with the queue label-expression And later we can add (in a backwards compatible way) * add support for OR and NOT to label-expressions * add label ACLs * centralized per NM configuration, REST API for it, etc, etc Thoughts? Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068048#comment-14068048 ] Alejandro Abdelnur commented on YARN-796: - i agree with sandy and allen. said that, we currently dont do any thing centralized on per nodemanager basis, if we want to so that we should think solving it in a more general way than just labels. and i would suggest doing that (if we decide to) in a diff jira. Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068120#comment-14068120 ] Alejandro Abdelnur commented on YARN-796: - Wangda, your usecase is throwing overboard the work pf the scheduler regarding matching nodes with data locality. you can solve it in a much better way using scheduler queues configuration, which can be dynamically adjusted. Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068144#comment-14068144 ] Alejandro Abdelnur commented on YARN-796: - Wangda, i'm afraid i'm lost with your last comment. i thought labels were to express desired node affinity base on a label, not to fence off nodes. i don't understand how you will achieve fencing off a node with a label unless you have a more complex annotation mechanism than just a label (ie book this node only if label X is present) also you would have to add ACLs to labels to avoid anybody simply asking for a label. am i missing something? Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2194) Add Cgroup support for RedHat 7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045097#comment-14045097 ] Alejandro Abdelnur commented on YARN-2194: -- do we need to have a special container executor? Or a specialized {{LCEResourcesHandler}} would do the trick? Add Cgroup support for RedHat 7 --- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan In previous versions of RedHat, we can build custom cgroup hierarchies with use of the cgconfig command from the libcgroup package. From RedHat 7, package libcgroup is deprecated and it is not recommended to use it since it can easily create conflicts with the default cgroup hierarchy. The systemd is provided and recommended for cgroup management. We need to add support for this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2194) Add Cgroup support for RedHat 7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045105#comment-14045105 ] Alejandro Abdelnur commented on YARN-2194: -- i was not meaning autodetection, i was meaning that a new resource handler may be enough to deal with cgroups in RH7, without having to write a new LCE. Add Cgroup support for RedHat 7 --- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan In previous versions of RedHat, we can build custom cgroup hierarchies with use of the cgconfig command from the libcgroup package. From RedHat 7, package libcgroup is deprecated and it is not recommended to use it since it can easily create conflicts with the default cgroup hierarchy. The systemd is provided and recommended for cgroup management. We need to add support for this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2139) Add support for disk IO isolation/scheduling for containers
[ https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027379#comment-14027379 ] Alejandro Abdelnur commented on YARN-2139: -- As Sandy says, for local reads short circuit kicks in for local HDFS blocks. It kicks in for any other local FS access. So block io controller would kick in. For remote HDFS reads, network io controller would kick in. So effectively we can control the resources the container uses. Add support for disk IO isolation/scheduling for containers --- Key: YARN-2139 URL: https://issues.apache.org/jira/browse/YARN-2139 Project: Hadoop YARN Issue Type: New Feature Reporter: Wei Yan Assignee: Wei Yan -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (YARN-1368) Common work to re-populate containers’ state into scheduler
[ https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998164#comment-13998164 ] Alejandro Abdelnur edited comment on YARN-1368 at 5/15/14 5:08 AM: --- [~jianhe], I understand the patch is taking a different approach, which is based on the work Anubhav started. Instead hijacking the JIRA, the correct way should have been proposing -to the assignee/author of the original patch- the changes and offering to contribute/breakdown tasks. Please do so next time. was (Author: tucu00): [~ jianhe], I understand the patch is taking a different approach, which is based on the work Anubhav started. Instead hijacking the JIRA, the correct way should have been proposing -to the assignee/author of the original patch- the changes and offering to contribute/breakdown tasks. Please do so next time. Common work to re-populate containers’ state into scheduler --- Key: YARN-1368 URL: https://issues.apache.org/jira/browse/YARN-1368 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.combined.001.patch, YARN-1368.preliminary.patch YARN-1367 adds support for the NM to tell the RM about all currently running containers upon registration. The RM needs to send this information to the schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover the current allocation state of the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (YARN-1368) Common work to re-populate containers’ state into scheduler
[ https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998164#comment-13998164 ] Alejandro Abdelnur edited comment on YARN-1368 at 5/15/14 5:09 AM: --- [~jianhe], I understand the patch is taking a different approach, which is based on the work Anubhav started. Instead hijacking the JIRA, the correct way should have been proposing [to the assignee/author of the original patch] the changes and offering to contribute/breakdown tasks. Please do so next time. was (Author: tucu00): [~jianhe], I understand the patch is taking a different approach, which is based on the work Anubhav started. Instead hijacking the JIRA, the correct way should have been proposing -to the assignee/author of the original patch- the changes and offering to contribute/breakdown tasks. Please do so next time. Common work to re-populate containers’ state into scheduler --- Key: YARN-1368 URL: https://issues.apache.org/jira/browse/YARN-1368 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.combined.001.patch, YARN-1368.preliminary.patch YARN-1367 adds support for the NM to tell the RM about all currently running containers upon registration. The RM needs to send this information to the schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover the current allocation state of the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler
[ https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998164#comment-13998164 ] Alejandro Abdelnur commented on YARN-1368: -- [~ jianhe], I understand the patch is taking a different approach, which is based on the work Anubhav started. Instead hijacking the JIRA, the correct way should have been proposing -to the assignee/author of the original patch- the changes and offering to contribute/breakdown tasks. Please do so next time. Common work to re-populate containers’ state into scheduler --- Key: YARN-1368 URL: https://issues.apache.org/jira/browse/YARN-1368 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.combined.001.patch, YARN-1368.preliminary.patch YARN-1367 adds support for the NM to tell the RM about all currently running containers upon registration. The RM needs to send this information to the schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover the current allocation state of the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-373) Allow an AM to reuse the resources allocated to container for a new container
[ https://issues.apache.org/jira/browse/YARN-373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur resolved YARN-373. - Resolution: Won't Fix [doing self-clean up of JIRAs] Allow an AM to reuse the resources allocated to container for a new container - Key: YARN-373 URL: https://issues.apache.org/jira/browse/YARN-373 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur When a container completes, instead the corresponding resources being freed up, it should be possible for the AM to reuse the assigned resources for a new container. As part of the reallocation, the AM would notify the RM about partial resources being freed up and the RM would make the necessary corrections in the corresponding node. With this functionality, an AM can ensure it gets a container in the same node where previous containers run. This will allow getting rid of the ShuffleHandler as a service in the NMs and run it as regular container task of the corresponding AM. In this case, the reallocation would reduce the CPU/MEM obtained for the original container to the what is needed for serving the shuffle. Note that in this example the MR AM would only do this reallocation for one of the many tasks that may have run in a particular node (as a single shuffle task could serve all the map outputs from all map tasks run in that node). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler
[ https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997024#comment-13997024 ] Alejandro Abdelnur commented on YARN-1368: -- [~jianhe], [~vinodkv], Unless I'm missing something, Anubhav was working on this JIRA. It is great that Jian did the refactoring to have common code for the schedulers and some testcases for it, but most of the work has been done by Anubhav and he was working actively on it. We should reassign the JIRA back to Anubhav and let him drive it to completion, agree? Common work to re-populate containers’ state into scheduler --- Key: YARN-1368 URL: https://issues.apache.org/jira/browse/YARN-1368 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-1368.1.patch, YARN-1368.combined.001.patch, YARN-1368.preliminary.patch YARN-1367 adds support for the NM to tell the RM about all currently running containers upon registration. The RM needs to send this information to the schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover the current allocation state of the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Moved] (YARN-1943) Multitenant LinuxContainerExecutor is incompatible with Simple Security mode.
[ https://issues.apache.org/jira/browse/YARN-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur moved HADOOP-10505 to YARN-1943: --- Component/s: (was: security) nodemanager Fix Version/s: (was: 2.3.0) 2.3.0 Affects Version/s: (was: 2.3.0) 2.3.0 Key: YARN-1943 (was: HADOOP-10505) Project: Hadoop YARN (was: Hadoop Common) Multitenant LinuxContainerExecutor is incompatible with Simple Security mode. - Key: YARN-1943 URL: https://issues.apache.org/jira/browse/YARN-1943 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0 Reporter: jay vyas Priority: Critical Labels: linux Fix For: 2.3.0 As of hadoop 2.3.0, commit cc74a18c makes it so that nonsecureLocalUser replaces the user who submits a job if security is disabled: {noformat} return UserGroupInformation.isSecurityEnabled() ? user : nonsecureLocalUser; {noformat} However, the only way to enable security, is to NOT use SIMPLE authentication mode: {noformat} public static boolean isSecurityEnabled() { return !isAuthenticationMethodEnabled(AuthenticationMethod.SIMPLE); } {noformat} Thus, the framework ENFORCES that SIMPLE login security -- nonSecureuser for submission of LinuxExecutorContainer. This results in a confusing issue, wherein we submit a job as sally and then get an exception that user nobody is not whitelisted and has UID MAX_ID. My proposed solution is that we should be able to leverage LinuxContainerExector regardless of hadoop's view of the security settings on the cluster, i.e. decouple LinuxContainerExecutor logic from the isSecurityEnabled return value. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1943) Multitenant LinuxContainerExecutor is incompatible with Simple Security mode.
[ https://issues.apache.org/jira/browse/YARN-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969746#comment-13969746 ] Alejandro Abdelnur commented on YARN-1943: -- [~jayunit100], please refer to YARN-1253 for details on why is this way. IMO this is not a bug. Multitenant LinuxContainerExecutor is incompatible with Simple Security mode. - Key: YARN-1943 URL: https://issues.apache.org/jira/browse/YARN-1943 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0 Reporter: jay vyas Priority: Critical Labels: linux Fix For: 2.3.0 As of hadoop 2.3.0, commit cc74a18c makes it so that nonsecureLocalUser replaces the user who submits a job if security is disabled: {noformat} return UserGroupInformation.isSecurityEnabled() ? user : nonsecureLocalUser; {noformat} However, the only way to enable security, is to NOT use SIMPLE authentication mode: {noformat} public static boolean isSecurityEnabled() { return !isAuthenticationMethodEnabled(AuthenticationMethod.SIMPLE); } {noformat} Thus, the framework ENFORCES that SIMPLE login security -- nonSecureuser for submission of LinuxExecutorContainer. This results in a confusing issue, wherein we submit a job as sally and then get an exception that user nobody is not whitelisted and has UID MAX_ID. My proposed solution is that we should be able to leverage LinuxContainerExector regardless of hadoop's view of the security settings on the cluster, i.e. decouple LinuxContainerExecutor logic from the isSecurityEnabled return value. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1943) Multitenant LinuxContainerExecutor is incompatible with Simple Security mode.
[ https://issues.apache.org/jira/browse/YARN-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969821#comment-13969821 ] Alejandro Abdelnur commented on YARN-1943: -- O the yarn-site.xml of the NMs: {code} property descriptionThe UNIX user that containers will run as when Linux-container-executor is used in nonsecure mode (a use case for this is using cgroups)./description nameyarn.nodemanager.linux-container-executor.nonsecure-mode.local-user/name valuenobody/value /property {code} Multitenant LinuxContainerExecutor is incompatible with Simple Security mode. - Key: YARN-1943 URL: https://issues.apache.org/jira/browse/YARN-1943 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0 Reporter: jay vyas Priority: Critical Labels: linux Fix For: 2.3.0 As of hadoop 2.3.0, commit cc74a18c makes it so that nonsecureLocalUser replaces the user who submits a job if security is disabled: {noformat} return UserGroupInformation.isSecurityEnabled() ? user : nonsecureLocalUser; {noformat} However, the only way to enable security, is to NOT use SIMPLE authentication mode: {noformat} public static boolean isSecurityEnabled() { return !isAuthenticationMethodEnabled(AuthenticationMethod.SIMPLE); } {noformat} Thus, the framework ENFORCES that SIMPLE login security -- nonSecureuser for submission of LinuxExecutorContainer. This results in a confusing issue, wherein we submit a job as sally and then get an exception that user nobody is not whitelisted and has UID MAX_ID. My proposed solution is that we should be able to leverage LinuxContainerExector regardless of hadoop's view of the security settings on the cluster, i.e. decouple LinuxContainerExecutor logic from the isSecurityEnabled return value. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1943) Multitenant LinuxContainerExecutor is incompatible with Simple Security mode.
[ https://issues.apache.org/jira/browse/YARN-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969973#comment-13969973 ] Alejandro Abdelnur commented on YARN-1943: -- not really, refer to the {{container-executor.c}} file, you'll see how things work. Multitenant LinuxContainerExecutor is incompatible with Simple Security mode. - Key: YARN-1943 URL: https://issues.apache.org/jira/browse/YARN-1943 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0 Reporter: jay vyas Priority: Critical Labels: linux Fix For: 2.3.0 As of hadoop 2.3.0, commit cc74a18c makes it so that nonsecureLocalUser replaces the user who submits a job if security is disabled: {noformat} return UserGroupInformation.isSecurityEnabled() ? user : nonsecureLocalUser; {noformat} However, the only way to enable security, is to NOT use SIMPLE authentication mode: {noformat} public static boolean isSecurityEnabled() { return !isAuthenticationMethodEnabled(AuthenticationMethod.SIMPLE); } {noformat} Thus, the framework ENFORCES that SIMPLE login security -- nonSecureuser for submission of LinuxExecutorContainer. This results in a confusing issue, wherein we submit a job as sally and then get an exception that user nobody is not whitelisted and has UID MAX_ID. My proposed solution is that we should be able to leverage LinuxContainerExector regardless of hadoop's view of the security settings on the cluster, i.e. decouple LinuxContainerExecutor logic from the isSecurityEnabled return value. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1849) NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters
[ https://issues.apache.org/jira/browse/YARN-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941007#comment-13941007 ] Alejandro Abdelnur commented on YARN-1849: -- +1 pending jenkins. NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters Key: YARN-1849 URL: https://issues.apache.org/jira/browse/YARN-1849 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1849-1.patch, yarn-1849-2.patch, yarn-1849-2.patch, yarn-1849-3.patch While running an UnmanagedAM on secure cluster, ran into an NPE on failover/restart. This is similar to YARN-1821. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935594#comment-13935594 ] Alejandro Abdelnur commented on YARN-796: - scheduler configurations are refreshed dynamically, if the list of valid labels is there, it could be refreshed as well. i would prefer to detect reject typos from a user experience and troubleshooting point of view. Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Arun C Murthy It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934293#comment-13934293 ] Alejandro Abdelnur commented on YARN-796: - Arun, doing a recap on the config, is this what you mean? ResourceManager {{yarn-site.xml}} would specify the valid labels systemwide (you didn't suggest this, but it prevent label typos going unnoticed): {code} property nameyarn.resourcemanager.valid-labels/name valuelabelA, labelB, labelX/value /properties {code} NodeManagers yarn-site.xml would specify the labels of the node: {code} property nameyarn.nodemanager.labels/name valuelabelA, labelX/value /properties {code} Scheduler configuration, in its queue configuration would specify what labels can be used when requesting allocations in that queue: {code} property nameyarn.scheduler.capacity.root.A.allowed-labels/name valuelabelA/value /properties {code} Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Arun C Murthy It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1808) add ability for AM to attach simple state information to containers
[ https://issues.apache.org/jira/browse/YARN-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925866#comment-13925866 ] Alejandro Abdelnur commented on YARN-1808: -- where this container be stored? RM or NMs? you would need a DT for the container to be able to push its state to the 'state store'. also, what is simple for you, a string, a long? add ability for AM to attach simple state information to containers --- Key: YARN-1808 URL: https://issues.apache.org/jira/browse/YARN-1808 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.4.0 Reporter: Steve Loughran Priority: Minor AM restart could be aided if we could add some state information to the containers themselves. This allows the AM to rebuild its state by querying all of its containers for their status. The AM will of course also need to be able to write this. This isn't critical: code running in the containers can do the same thing. It just appears to be a common use case -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (YARN-1808) add ability for AM to attach simple state information to containers
[ https://issues.apache.org/jira/browse/YARN-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925866#comment-13925866 ] Alejandro Abdelnur edited comment on YARN-1808 at 3/10/14 4:41 PM: --- where this container state be stored? RM or NMs? you would need a DT for the container to be able to push its state to the 'state store'. also, what is simple for you, a string, a long? was (Author: tucu00): where this container be stored? RM or NMs? you would need a DT for the container to be able to push its state to the 'state store'. also, what is simple for you, a string, a long? add ability for AM to attach simple state information to containers --- Key: YARN-1808 URL: https://issues.apache.org/jira/browse/YARN-1808 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.4.0 Reporter: Steve Loughran Priority: Minor AM restart could be aided if we could add some state information to the containers themselves. This allows the AM to rebuild its state by querying all of its containers for their status. The AM will of course also need to be able to write this. This isn't critical: code running in the containers can do the same thing. It just appears to be a common use case -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1702) Expose kill app functionality as part of RM web services
[ https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908479#comment-13908479 ] Alejandro Abdelnur commented on YARN-1702: -- It seems my comment on the umbrella JIRA has gone unnoticed, posting here as well. Before we start coding work on this, it would be great to see how security will handled (authentication, ACLs, tokens, etc). I'm a bit a concern about introducing a second protocol everywhere. From a maintenance and security risk point of view is doubling our development/support efforts. Granted, HDFS offers data over RPC and HTTP. But, HTTP, when using HttpFS (how I recommend using it HTTP access) is a gateway that ends up doing RPC to HDFS. Thus the only protocol accessing HDFS is RPC. Have we considered having a C implementation of Hadoop RPC, with the multiplatform support of protobuffers that may give us the multi-platform support we are aiming to with a single protocol interface. Thoughts? Expose kill app functionality as part of RM web services Key: YARN-1702 URL: https://issues.apache.org/jira/browse/YARN-1702 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1702.2.patch, apache-yarn-1702.3.patch Expose functionality to kill an app via the ResourceManager web services API. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1695) Implement the rest (writable APIs) of RM web-services
[ https://issues.apache.org/jira/browse/YARN-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898052#comment-13898052 ] Alejandro Abdelnur commented on YARN-1695: -- err, [~vinodkv], isn't this a duplicate of the JIRA you opened back in DEC, YARN-1538? Implement the rest (writable APIs) of RM web-services - Key: YARN-1695 URL: https://issues.apache.org/jira/browse/YARN-1695 Project: Hadoop YARN Issue Type: New Feature Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli MAPREDUCE-2863 added the REST web-services to RM and NM. But all the APIs added there were only focused on obtaining information from the cluster. We need to have the following REST APIs to finish the feature - Application submission/termination (Priority): This unblocks easy client interaction with a YARN cluster - Application Client protocol: For resource scheduling by apps written in an arbitrary language. Will have to think about throughput concerns - ContainerManagement Protocol: Again for arbitrary language apps. One important thing to note here is that we already have client libraries on all the three protocols that do some some heavy-lifting. One part of the effort is to figure out if they can be made any thinner and/or how web-services will implement the same functionality. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1695) Implement the rest (writable APIs) of RM web-services
[ https://issues.apache.org/jira/browse/YARN-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898112#comment-13898112 ] Alejandro Abdelnur commented on YARN-1695: -- Before we start coding work on this, it would be great to see how security will handled (authentication, ACLs, tokens, etc). I'm a bit a concern about introducing a second protocol everywhere. From a maintenance and security risk point of view is doubling our development/support efforts. Granted, HDFS offers data over RPC and HTTP. But, HTTP, when using HttpFS (how I recommend using it HTTP access) is a gateway that ends up doing RPC to HDFS. Thus the only protocol accessing HDFS is RPC. Have we considered having a C implementation of Hadoop RPC, with the multiplatform support of protobuffers that may give us the multi-platform support we are aiming to with a single protocol interface. Thoughts? Implement the rest (writable APIs) of RM web-services - Key: YARN-1695 URL: https://issues.apache.org/jira/browse/YARN-1695 Project: Hadoop YARN Issue Type: New Feature Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli MAPREDUCE-2863 added the REST web-services to RM and NM. But all the APIs added there were only focused on obtaining information from the cluster. We need to have the following REST APIs to finish the feature - Application submission/termination (Priority): This unblocks easy client interaction with a YARN cluster - Application Client protocol: For resource scheduling by apps written in an arbitrary language. Will have to think about throughput concerns - ContainerManagement Protocol: Again for arbitrary language apps. One important thing to note here is that we already have client libraries on all the three protocols that do some some heavy-lifting. One part of the effort is to figure out if they can be made any thinner and/or how web-services will implement the same functionality. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1577) Unmanaged AM is broken because of YARN-1493
[ https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated YARN-1577: - Target Version/s: 2.4.0 (was: 2.3.0) moving it out of 2.3 as [~vinodkv] reverted from 2.3 the JIRAs introducing the problem. Unmanaged AM is broken because of YARN-1493 --- Key: YARN-1577 URL: https://issues.apache.org/jira/browse/YARN-1577 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Jian He Assignee: Jian He Priority: Blocker Today unmanaged AM client is waiting for app state to be Accepted to launch the AM. This is broken since we changed in YARN-1493 to start the attempt after the application is Accepted. We may need to introduce an attempt state report that client can rely on to query the attempt state and choose to launch the unmanaged AM. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1577) Unmanaged AM is broken because of YARN-1493
[ https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13891853#comment-13891853 ] Alejandro Abdelnur commented on YARN-1577: -- the problem I'm seeing with YARN-1493 is when trying to register the UAM with the scheduler, the exception I'm getting follows. Reverting YARN-1493 friends makes this problem go away. The patches to revert, in order are: YARN-1566 YARN-1041 YARN-1166 YARN-1490 YARN-1493 {code} 2014-02-03 10:58:40,403 ERROR UserGroupInformation - PriviledgedActionException as:llama (auth:PROXY) via tucu (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN] 2014-02-03 10:58:40,407 WARN LlamaAMServiceImpl - Reserve() error: com.cloudera.llama.util.LlamaException: AM_CANNOT_REGISTER - cannot register AM 'application_1391453743418_0001' for queue 'root.queue1' : org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN] com.cloudera.llama.util.LlamaException: AM_CANNOT_REGISTER - cannot register AM 'application_1391453743418_0001' for queue 'root.queue1' : org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN] … Caused by: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN] at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at $Proxy12.registerApplicationMaster(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:196) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138) at com.cloudera.llama.am.yarn.YarnRMConnector._initYarnApp(YarnRMConnector.java:270) at com.cloudera.llama.am.yarn.YarnRMConnector.access$200(YarnRMConnector.java:80) at com.cloudera.llama.am.yarn.YarnRMConnector$3.run(YarnRMConnector.java:212) at com.cloudera.llama.am.yarn.YarnRMConnector$3.run(YarnRMConnector.java:209) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at com.cloudera.llama.am.yarn.YarnRMConnector.register(YarnRMConnector.java:209) ... 20 more Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): SIMPLE authentication is not enabled. Available:[TOKEN] at org.apache.hadoop.ipc.Client.call(Client.java:1406) at org.apache.hadoop.ipc.Client.call(Client.java:1359) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at $Proxy11.registerApplicationMaster(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) ... 37 more {code} Unmanaged AM is broken because of YARN-1493 --- Key: YARN-1577 URL: https://issues.apache.org/jira/browse/YARN-1577 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Jian He Assignee: Jian He Priority: Blocker Today unmanaged AM client is waiting for app state to be Accepted to launch the AM. This is broken since we changed in YARN-1493 to start the attempt after the application is Accepted. We may need to introduce an attempt state report that
[jira] [Commented] (YARN-1629) IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer
[ https://issues.apache.org/jira/browse/YARN-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880466#comment-13880466 ] Alejandro Abdelnur commented on YARN-1629: -- +1 IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer -- Key: YARN-1629 URL: https://issues.apache.org/jira/browse/YARN-1629 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1629-1.patch, YARN-1629-2.patch, YARN-1629.patch This can occur when the second-to-last app in a queue's pending app list is made runnable. The app is pulled out from under the iterator. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1623) Include queue name in RegisterApplicationMasterResponse
[ https://issues.apache.org/jira/browse/YARN-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879160#comment-13879160 ] Alejandro Abdelnur commented on YARN-1623: -- +1 pending jenkins Include queue name in RegisterApplicationMasterResponse --- Key: YARN-1623 URL: https://issues.apache.org/jira/browse/YARN-1623 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1623-1.patch, YARN-1623.patch This provides the YARN change necessary to support MAPREDUCE-5732. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1601) 3rd party JARs are missing from hadoop-dist output
[ https://issues.apache.org/jira/browse/YARN-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872270#comment-13872270 ] Alejandro Abdelnur commented on YARN-1601: -- Thanks Steve. Thanks Sean, regarding the stax-api JAR, I've opened another JIRA to fix that, HADOOP-10235. 3rd party JARs are missing from hadoop-dist output -- Key: YARN-1601 URL: https://issues.apache.org/jira/browse/YARN-1601 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-1601.patch With the build changes of YARN-888 we are leaving out all 3rd party JArs used directly by YARN under /share/hadoop/yarn/lib/. We did not notice this when running minicluster because they all happen to be in the classpath from hadoop-common and hadoop-yarn. As 3d party JARs are not 'public' interfaces we cannot rely on them being provided to yarn by common and hdfs. (ie if common and hdfs stop using a 3rd party dependency that yarn uses this would break yarn if yarn does not pull that dependency explicitly). Also, this will break bigtop hadoop build when they move to use branch-2 as they expect to find jars in /share/hadoop/yarn/lib/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1608) LinuxContainerExecutor has a few DEBUG messages at INFO level
[ https://issues.apache.org/jira/browse/YARN-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872937#comment-13872937 ] Alejandro Abdelnur commented on YARN-1608: -- +1 LinuxContainerExecutor has a few DEBUG messages at INFO level - Key: YARN-1608 URL: https://issues.apache.org/jira/browse/YARN-1608 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Trivial Labels: log Attachments: yarn-1608-1.patch LCE has a few INFO level log messages meant to be at debug level. In fact, they are logged both at INFO and DEBUG. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1608) LinuxContainerExecutor has a few DEBUG messages at INFO level
[ https://issues.apache.org/jira/browse/YARN-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872938#comment-13872938 ] Alejandro Abdelnur commented on YARN-1608: -- +1 LinuxContainerExecutor has a few DEBUG messages at INFO level - Key: YARN-1608 URL: https://issues.apache.org/jira/browse/YARN-1608 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Trivial Labels: log Attachments: yarn-1608-1.patch LCE has a few INFO level log messages meant to be at debug level. In fact, they are logged both at INFO and DEBUG. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1601) 3rd party JARs are missing from hadoop-dist output
Alejandro Abdelnur created YARN-1601: Summary: 3rd party JARs are missing from hadoop-dist output Key: YARN-1601 URL: https://issues.apache.org/jira/browse/YARN-1601 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur With the build changes of YARN-888 we are leaving out all 3rd party JArs used directly by YARN under /share/hadoop/yarn/lib/. We did not notice this when running minicluster because they all happen to be in the classpath from hadoop-common and hadoop-yarn. As 3d party JARs are not 'public' interfaces we cannot rely on them being provided to yarn by common and hdfs. (ie if common and hdfs stop using a 3rd party dependency that yarn uses this would break yarn if yarn does not pull that dependency explicitly). Also, this will break bigtop hadoop build when they move to use branch-2 as they expect to find jars in /share/hadoop/yarn/lib/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1601) 3rd party JARs are missing from hadoop-dist output
[ https://issues.apache.org/jira/browse/YARN-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated YARN-1601: - Attachment: YARN-1601.patch Patch that adds to the hadoop-yarn POM the submodules as dependencies, this is required to be able to populate the /share/hadoop/lib/ dir with 3rd party JARs used by Yarn. The hadoop-yarn POM isn't a parent of any other POM so this does not affect existing dependencies. It just makes the assembly for yarn to pick up Yarn 3rd party JARs when creating the tarball 3rd party JARs are missing from hadoop-dist output -- Key: YARN-1601 URL: https://issues.apache.org/jira/browse/YARN-1601 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-1601.patch With the build changes of YARN-888 we are leaving out all 3rd party JArs used directly by YARN under /share/hadoop/yarn/lib/. We did not notice this when running minicluster because they all happen to be in the classpath from hadoop-common and hadoop-yarn. As 3d party JARs are not 'public' interfaces we cannot rely on them being provided to yarn by common and hdfs. (ie if common and hdfs stop using a 3rd party dependency that yarn uses this would break yarn if yarn does not pull that dependency explicitly). Also, this will break bigtop hadoop build when they move to use branch-2 as they expect to find jars in /share/hadoop/yarn/lib/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869658#comment-13869658 ] Alejandro Abdelnur commented on YARN-888: - test failure (timeout) seems unrelated. [~vinodkv], [~ste...@apache.org], [~kkambatl], [~rvs], over the weekend I've updated the patch with comments in the YARN non-leaf POM stating no dependencies should be added there. No other changes. Unless I hear further comments I'm planning to commit this later today to trunk and branch-2. Thanks for your reviews. clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-888.patch, YARN-888.patch, YARN-888.patch, YARN-888.patch, yarn-888-2.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869658#comment-13869658 ] Alejandro Abdelnur edited comment on YARN-888 at 1/13/14 5:22 PM: -- test failure (timeout) seems unrelated. [~vinodkv], [~ste...@apache.org], [~kkambatl], [~rvs], over the weekend I've updated the patch with comments in the YARN non-leaf POMs stating no dependencies should be added there. No other changes. Unless I hear further comments I'm planning to commit this later today to trunk and branch-2. Thanks for your reviews. was (Author: tucu00): test failure (timeout) seems unrelated. [~vinodkv], [~ste...@apache.org], [~kkambatl], [~rvs], over the weekend I've updated the patch with comments in the YARN non-leaf POM stating no dependencies should be added there. No other changes. Unless I hear further comments I'm planning to commit this later today to trunk and branch-2. Thanks for your reviews. clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-888.patch, YARN-888.patch, YARN-888.patch, YARN-888.patch, yarn-888-2.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869825#comment-13869825 ] Alejandro Abdelnur commented on YARN-888: - Thanks [~vinodkv]. clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.4.0 Attachments: YARN-888.patch, YARN-888.patch, YARN-888.patch, YARN-888.patch, yarn-888-2.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869843#comment-13869843 ] Alejandro Abdelnur commented on YARN-888: - OK, it seems we have our first victim, MAPREDUCE-5722. I don't know why this didn't come up before. Taking care of it right now. clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.4.0 Attachments: YARN-888.patch, YARN-888.patch, YARN-888.patch, YARN-888.patch, yarn-888-2.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869856#comment-13869856 ] Alejandro Abdelnur commented on YARN-888: - False alarm, it seems i had stale POMs in my local Maven cache, still we need to take care of MAPREDUCE-5362 (the equiv of this JIRA for MR). clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.4.0 Attachments: YARN-888.patch, YARN-888.patch, YARN-888.patch, YARN-888.patch, yarn-888-2.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags
[ https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869940#comment-13869940 ] Alejandro Abdelnur commented on YARN-1399: -- IMO, we should stick to the current API as Sandy suggest, it is a bit unfortunate the default being ALL instead of OWN but, well, what to do? Allow users to annotate an application with multiple tags - Key: YARN-1399 URL: https://issues.apache.org/jira/browse/YARN-1399 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Nowadays, when submitting an application, users can fill the applicationType field to facilitate searching it later. IMHO, it's good to accept multiple tags to allow users to describe their applications in multiple aspects, including the application type. Then, searching by tags may be more efficient for users to reach their desired application collection. It's pretty much like the tag system of online photo/video/music and etc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags
[ https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869959#comment-13869959 ] Alejandro Abdelnur commented on YARN-1399: -- In MR-land when submitting a job we can specify VIEW/MODIFY ACLs. It seems that in Yarn-land this is not possible for AMs. If I'm right with this, it seems like a missing functionality would naturally scope down what is returned by getApplications. And we could do that with in a backwards compatible way. Allow users to annotate an application with multiple tags - Key: YARN-1399 URL: https://issues.apache.org/jira/browse/YARN-1399 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Nowadays, when submitting an application, users can fill the applicationType field to facilitate searching it later. IMHO, it's good to accept multiple tags to allow users to describe their applications in multiple aspects, including the application type. Then, searching by tags may be more efficient for users to reach their desired application collection. It's pretty much like the tag system of online photo/video/music and etc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1598) HA-related rmadmin commands don't work on a secure cluster
[ https://issues.apache.org/jira/browse/YARN-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870215#comment-13870215 ] Alejandro Abdelnur commented on YARN-1598: -- LTGM, +1 after jenkins. HA-related rmadmin commands don't work on a secure cluster -- Key: YARN-1598 URL: https://issues.apache.org/jira/browse/YARN-1598 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: yarn-1598-1.patch The HA-related commands like -getServiceState -checkHealth etc. don't work in a secure cluster. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated YARN-888: Attachment: YARN-888.patch new patch adding following comment in YARN non-leaf POMs: !-- Do not add dependencies here, add them to the POM of the leaf module -- clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-888.patch, YARN-888.patch, YARN-888.patch, YARN-888.patch, yarn-888-2.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868316#comment-13868316 ] Alejandro Abdelnur commented on YARN-888: - [~vinodkv], any further concerns? Also, latest patch cleanly applies to branch-2 at the moment. clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-888.patch, YARN-888.patch, YARN-888.patch, yarn-888-2.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866845#comment-13866845 ] Alejandro Abdelnur commented on YARN-888: - [~vinodvk], While having dependencies in non-leaf POMs may reduce the size of leaf POMs, it drags non-required dependencies (unless you only put in non-leaf POMs dependencies that are common to all the leaf modules). Yes IntelliJ seems to get funny with dependencies in non-leaf modules. That is one of the motivations (agree it is an IntelliJ issue, on the other hand the change does not affect the project built at all and allows IntelliJ users to build/debug from the IDE out of the box without doing funny voodoo). The other motivation, and IMO is more important, is to clean up the dependencies modules like yarn-api and yarn-client have. Restricting them to what is used on the client side. Using the dependency:tree and dependency:analyze plugins I’ve reduced the 3rd party JARs required by the clients significantly. As [~ste...@apache.org] pointed out there is much more work we should do in this direction, this is a first non-intrusive baby step in that direction. To give you and idea, before the this patch *hadoop-yarn-api* reports as required dependencies by itself: {code} +- org.slf4j:slf4j-api:jar:1.7.5:compile +- org.slf4j:slf4j-log4j12:jar:1.7.5:compile | \- log4j:log4j:jar:1.2.17:compile +- org.apache.hadoop:hadoop-annotations:jar:3.0.0-SNAPSHOT:compile | +- tomcat:jasper-compiler:jar:5.5.23:test +- com.google.inject.extensions:guice-servlet:jar:3.0:compile +- io.netty:netty:jar:3.6.2.Final:compile +- com.google.protobuf:protobuf-java:jar:2.5.0:compile +- commons-io:commons-io:jar:2.4:compile +- com.google.inject:guice:jar:3.0:compile | +- javax.inject:javax.inject:jar:1:compile | \- aopalliance:aopalliance:jar:1.0:compile +- com.sun.jersey:jersey-server:jar:1.9:compile | +- asm:asm:jar:3.2:compile | \- com.sun.jersey:jersey-core:jar:1.9:compile +- com.sun.jersey:jersey-json:jar:1.9:compile | +- org.codehaus.jettison:jettison:jar:1.1:compile | | \- stax:stax-api:jar:1.0.1:compile | +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile | | \- javax.xml.bind:jaxb-api:jar:2.2.2:compile | | \- javax.activation:activation:jar:1.1:compile | +- org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile | +- org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:compile | +- org.codehaus.jackson:jackson-jaxrs:jar:1.8.8:compile (version managed from 1.8.3) | \- org.codehaus.jackson:jackson-xc:jar:1.8.8:compile (version managed from 1.8.3) \- com.sun.jersey.contribs:jersey-guice:jar:1.9:compile {code} With the patch, the required dependencies by itself are down to: {code} +- commons-lang:commons-lang:jar:2.6:compile +- com.google.guava:guava:jar:11.0.2:compile | \- com.google.code.findbugs:jsr305:jar:1.3.9:compile +- commons-logging:commons-logging:jar:1.1.3:compile +- org.apache.hadoop:hadoop-annotations:jar:3.0.0-SNAPSHOT:compile \- com.google.protobuf:protobuf-java:jar:2.5.0:compile {code} Does this address your concerns? clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-888.patch, YARN-888.patch, yarn-888-2.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866845#comment-13866845 ] Alejandro Abdelnur edited comment on YARN-888 at 1/9/14 5:50 PM: - [~vinodkv], While having dependencies in non-leaf POMs may reduce the size of leaf POMs, it drags non-required dependencies (unless you only put in non-leaf POMs dependencies that are common to all the leaf modules). Yes IntelliJ seems to get funny with dependencies in non-leaf modules. That is one of the motivations (agree it is an IntelliJ issue, on the other hand the change does not affect the project built at all and allows IntelliJ users to build/debug from the IDE out of the box without doing funny voodoo). The other motivation, and IMO is more important, is to clean up the dependencies modules like yarn-api and yarn-client have. Restricting them to what is used on the client side. Using the dependency:tree and dependency:analyze plugins I’ve reduced the 3rd party JARs required by the clients significantly. As [~ste...@apache.org] pointed out there is much more work we should do in this direction, this is a first non-intrusive baby step in that direction. To give you and idea, before the this patch *hadoop-yarn-api* reports as required dependencies by itself: {code} +- org.slf4j:slf4j-api:jar:1.7.5:compile +- org.slf4j:slf4j-log4j12:jar:1.7.5:compile | \- log4j:log4j:jar:1.2.17:compile +- org.apache.hadoop:hadoop-annotations:jar:3.0.0-SNAPSHOT:compile | +- tomcat:jasper-compiler:jar:5.5.23:test +- com.google.inject.extensions:guice-servlet:jar:3.0:compile +- io.netty:netty:jar:3.6.2.Final:compile +- com.google.protobuf:protobuf-java:jar:2.5.0:compile +- commons-io:commons-io:jar:2.4:compile +- com.google.inject:guice:jar:3.0:compile | +- javax.inject:javax.inject:jar:1:compile | \- aopalliance:aopalliance:jar:1.0:compile +- com.sun.jersey:jersey-server:jar:1.9:compile | +- asm:asm:jar:3.2:compile | \- com.sun.jersey:jersey-core:jar:1.9:compile +- com.sun.jersey:jersey-json:jar:1.9:compile | +- org.codehaus.jettison:jettison:jar:1.1:compile | | \- stax:stax-api:jar:1.0.1:compile | +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile | | \- javax.xml.bind:jaxb-api:jar:2.2.2:compile | | \- javax.activation:activation:jar:1.1:compile | +- org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile | +- org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:compile | +- org.codehaus.jackson:jackson-jaxrs:jar:1.8.8:compile (version managed from 1.8.3) | \- org.codehaus.jackson:jackson-xc:jar:1.8.8:compile (version managed from 1.8.3) \- com.sun.jersey.contribs:jersey-guice:jar:1.9:compile {code} With the patch, the required dependencies by itself are down to: {code} +- commons-lang:commons-lang:jar:2.6:compile +- com.google.guava:guava:jar:11.0.2:compile | \- com.google.code.findbugs:jsr305:jar:1.3.9:compile +- commons-logging:commons-logging:jar:1.1.3:compile +- org.apache.hadoop:hadoop-annotations:jar:3.0.0-SNAPSHOT:compile \- com.google.protobuf:protobuf-java:jar:2.5.0:compile {code} Does this address your concerns? was (Author: tucu00): [~vinodvk], While having dependencies in non-leaf POMs may reduce the size of leaf POMs, it drags non-required dependencies (unless you only put in non-leaf POMs dependencies that are common to all the leaf modules). Yes IntelliJ seems to get funny with dependencies in non-leaf modules. That is one of the motivations (agree it is an IntelliJ issue, on the other hand the change does not affect the project built at all and allows IntelliJ users to build/debug from the IDE out of the box without doing funny voodoo). The other motivation, and IMO is more important, is to clean up the dependencies modules like yarn-api and yarn-client have. Restricting them to what is used on the client side. Using the dependency:tree and dependency:analyze plugins I’ve reduced the 3rd party JARs required by the clients significantly. As [~ste...@apache.org] pointed out there is much more work we should do in this direction, this is a first non-intrusive baby step in that direction. To give you and idea, before the this patch *hadoop-yarn-api* reports as required dependencies by itself: {code} +- org.slf4j:slf4j-api:jar:1.7.5:compile +- org.slf4j:slf4j-log4j12:jar:1.7.5:compile | \- log4j:log4j:jar:1.2.17:compile +- org.apache.hadoop:hadoop-annotations:jar:3.0.0-SNAPSHOT:compile | +- tomcat:jasper-compiler:jar:5.5.23:test +- com.google.inject.extensions:guice-servlet:jar:3.0:compile +- io.netty:netty:jar:3.6.2.Final:compile +- com.google.protobuf:protobuf-java:jar:2.5.0:compile +- commons-io:commons-io:jar:2.4:compile +- com.google.inject:guice:jar:3.0:compile | +- javax.inject:javax.inject:jar:1:compile | \- aopalliance:aopalliance:jar:1.0:compile +-
[jira] [Commented] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866945#comment-13866945 ] Alejandro Abdelnur commented on YARN-888: - You got it. On the hybrid approach, it is quite cumbersome as you would have to verify that all children modules use the common dependency to add. IMO, leaving the noleaf modules slim will be much easier to handle. Plus we solve the prob for IntelliJ IDE users. clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-888.patch, YARN-888.patch, yarn-888-2.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867488#comment-13867488 ] Alejandro Abdelnur commented on YARN-888: - [~vinodkv], thx for taking the time to play with the patch. bq. hadoop-yarn-project's pom.xml has some deps ... This was an oversight from end as I've traversed the parent poms starting from the leafs and the yarn modules do not have hadoop-yarn-project as parent. This means the dependencies there were not being used. I'm attaching a new patch removing the dependencies section from the hadoop-yarn-project. Thanks for catching that. bq. I guess this was you meant by Correct. However, I wouldn't say the plugin is broken, but it has limitations (it cannot detect usage of classes loaded via reflection, it cannot detect use of constant for primitive types and Strings, etc). bq. We should set up a single node cluster atleast to ensure that all is well. The produced TARBALL has the exact same set of JAR files, so I would not expect this being an issue. However, just to be safe, I've just did a build with the patch, started minicluster and run a couple of example jobs. clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-888.patch, YARN-888.patch, YARN-888.patch, yarn-888-2.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated YARN-888: Attachment: YARN-888.patch new patch adding missing hadoop-common test JARs in a few modules. clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-888.patch, YARN-888.patch, yarn-888-2.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866017#comment-13866017 ] Alejandro Abdelnur commented on YARN-888: - I've run the resourcemanager and nodemanager testcases locally with the patch applied and I did not get any failures. clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-888.patch, YARN-888.patch, yarn-888-2.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866162#comment-13866162 ] Alejandro Abdelnur commented on YARN-888: - Karthik, thanks for the verification. Answering Steve, what you suggest it makes completely sense, but it is a much bigger undertaking. This JIRA is just cleaning-up/tweaking the current stuff without affecting the current end result. If you have time, would you give the latest patch a try? clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-888.patch, YARN-888.patch, yarn-888-2.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (YARN-1577) Unmanaged AM is broken because of YARN-1493
[ https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866308#comment-13866308 ] Alejandro Abdelnur edited comment on YARN-1577 at 1/9/14 4:34 AM: -- this seems a serious regression, shouldn't this be a blocker for 2.4? was (Author: tucu00): this seems a serius regression, shouldn't this be a blocker for 2.4? Unmanaged AM is broken because of YARN-1493 --- Key: YARN-1577 URL: https://issues.apache.org/jira/browse/YARN-1577 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Today unmanaged AM client is waiting for app state to be Accepted to launch the AM. This is broken since we changed in YARN-1493 to start the attempt after the application is Accepted. We may need to introduce an attempt state report that client can rely on to query the attempt state and choose to launch the unmanaged AM. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1577) Unmanaged AM is broken because of YARN-1493
[ https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866308#comment-13866308 ] Alejandro Abdelnur commented on YARN-1577: -- this seems a serius regression, shouldn't this be a blocker for 2.4? Unmanaged AM is broken because of YARN-1493 --- Key: YARN-1577 URL: https://issues.apache.org/jira/browse/YARN-1577 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Today unmanaged AM client is waiting for app state to be Accepted to launch the AM. This is broken since we changed in YARN-1493 to start the attempt after the application is Accepted. We may need to introduce an attempt state report that client can rely on to query the attempt state and choose to launch the unmanaged AM. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864748#comment-13864748 ] Alejandro Abdelnur commented on YARN-888: - [~rvs], mind if i take this JIRA from you? (besides being a cool JIRA number to own) the current dependencies in Yarn POMs are breaking intellij integration and this is kind of driving me crazy and I took a stub this morning and have a working patch. clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Roman Shaposhnik Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur reassigned YARN-888: --- Assignee: Alejandro Abdelnur (was: Roman Shaposhnik) Thanks Roman, I'll be posting the patch momentarily. If you have time to review it, it would be great. clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated YARN-888: Attachment: YARN-888.patch The patch moves all the dependencies to the leaf projects declaring explicitly what the module needs (used the dependency:analyze plugin to zero on that, commented in the POMs the dependencies not caught by the plugin as used). I've also did a DIST build and verified the JARs in the DIST are all the same (with the exception of the yarn-site JAR which is no more, the project for it is of type 'pom'). I've also verified Intellij now works fine compiling and running testcases. clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-888.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1538) Support web-services for application-submission and AM protocols
[ https://issues.apache.org/jira/browse/YARN-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857030#comment-13857030 ] Alejandro Abdelnur commented on YARN-1538: -- Is the idea to fully duplicate all AM facing RPC interfaces over HTTP? Having scheduling over HTTP but not the NM interface over HTTP does not buy much as you'll need to use RPC for starting the containers. Also, how about security and token support over HTTP? Support web-services for application-submission and AM protocols Key: YARN-1538 URL: https://issues.apache.org/jira/browse/YARN-1538 Project: Hadoop YARN Issue Type: New Feature Reporter: Vinod Kumar Vavilapalli We already have read-only web-services for YARN - MAPREDUCE-2863. It'll be great to support APIs for application submission and the scheduling protocol - as alternatives to RPCs. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1471) The SLS simulator is not running the preemption policy for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated YARN-1471: - Attachment: SLSCapacityScheduler.java [~curino], how about the attached alternate impl for the SLSCapacityScheduler.java? it fully leverages the existing wrapper without duplicating code. You'll need to add a new constructor the the wrapper to take a ResourceScheduler as parameter. The SchedulerWrapper interface should be name SLSSchedulerWrapper and the implementation SLSSchedulerWrapperImpl. HTH The SLS simulator is not running the preemption policy for CapacityScheduler Key: YARN-1471 URL: https://issues.apache.org/jira/browse/YARN-1471 Project: Hadoop YARN Issue Type: Bug Reporter: Carlo Curino Assignee: Carlo Curino Priority: Minor Attachments: SLSCapacityScheduler.java, YARN-1471.patch The simulator does not run the ProportionalCapacityPreemptionPolicy monitor. This is because the policy needs to interact with a CapacityScheduler, and the wrapping done by the simulator breaks this. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling
[ https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840977#comment-13840977 ] Alejandro Abdelnur commented on YARN-1404: -- [~vinodkv], thanks for summarizing our offline chat. Regarding *ACLs and an on/off switch*: IMO they are not necessary for the following reason. You need an external system installed and running in the node to use the resources of an unmanaged container. If you have direct access into the node to start the external system, you are 'trusted'. If you don't have direct access you cannot use the resources of an unmanaged container. I think this is a very strong requirement already and it would avoid adding a new ACL and an on/off switch. Regarding *Liveliness*: In the case of managed containers we don't have a liveliness 'report' and the container process could very well be hang. In such scenario is the responsibility of the AM to detected the liveliness of the container process and react if it is considered hung. In the case of unmanaged containers, the AM would the same responsibility. The only difference is that in the case of managed containers if the process exits the NM detects that, while in the case of unmanaged containers this responsibility would fall on the AM. Because of this I think we could do without a leaseRenewal/liveliness call. Regarding *NM assume a whole lot of things about containers* 3 bullet items: For the my current use case none if this is needed. It could be relatively easy to enable such functionality if a use case that needs it arises. Regarding *Can such trusted application mix and match managed and unmanaged containers?*: In the way I envision how this will work, when an AM asks for a container and gets an allocation for from the RM, the RM does not know if the AM will start a managed or an unmanaged container. It is only between the AM and the NM that this is known, when the ContainerLaunchContext is NULL. Regarding *YARN-1040 should enabled starting unmanaged containers*: If YARN-1040 would be implemented, yes, it would enable unmanaged containers. However the scope of YARN-1040 is much bigger than unmanaged containers. It should be also be possible implementing unmanaged containers as being discussed and later implement YARN-1040. Does this make sense? Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling - Key: YARN-1404 URL: https://issues.apache.org/jira/browse/YARN-1404 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-1404.patch Currently Hadoop Yarn expects to manage the lifecycle of the processes its applications run workload in. External frameworks/systems could benefit from sharing resources with other Yarn applications while running their workload within long-running processes owned by the external framework (in other words, running their workload outside of the context of a Yarn container process). Because Yarn provides robust and scalable resource management, it is desirable for some external systems to leverage the resource governance capabilities of Yarn (queues, capacities, scheduling, access control) while supplying their own resource enforcement. Impala is an example of such system. Impala uses Llama (http://cloudera.github.io/llama/) to request resources from Yarn. Impala runs an impalad process in every node of the cluster, when a user submits a query, the processing is broken into 'query fragments' which are run in multiple impalad processes leveraging data locality (similar to Map-Reduce Mappers processing a collocated HDFS block of input data). The execution of a 'query fragment' requires an amount of CPU and memory in the impalad. As the impalad shares the host with other services (HDFS DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications (MapReduce tasks). To ensure cluster utilization that follow the Yarn scheduler policies and it does not overload the cluster nodes, before running a 'query fragment' in a node, Impala requests the required amount of CPU and memory from Yarn. Once the requested CPU and memory has been allocated, Impala starts running the 'query fragment' taking care that the 'query fragment' does not use more resources than the ones that have been allocated. Memory is book kept per 'query fragment' and the threads used for the processing of the 'query fragment' are placed under a cgroup to contain CPU utilization. Today, for all resources that have been asked to Yarn RM, a (container) process must be
[jira] [Comment Edited] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling
[ https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840977#comment-13840977 ] Alejandro Abdelnur edited comment on YARN-1404 at 12/6/13 4:47 AM: --- [~vinodkv], thanks for summarizing our offline chat. Regarding *ACLs and an on/off switch*: IMO they are not necessary for the following reason. You need an external system installed and running in the node to use the resources of an unmanaged container. If you have direct access into the node to start the external system, you are 'trusted'. If you don't have direct access you cannot use the resources of an unmanaged container. I think this is a very strong requirement already and it would avoid adding code to manage a new ACL and an on/off switch. Regarding *Liveliness*: In the case of managed containers we don't have a liveliness 'report' and the container process could very well be hung. In such scenario is the responsibility of the AM to detected the liveliness of the container process and react if it is considered hung. In the case of unmanaged containers, the AM would have the same responsibility. The only difference is that in the case of managed containers if the process exits the NM detects that, while in the case of unmanaged containers this responsibility would fall on the AM. Because of this I think we could do without a leaseRenewal/liveliness call. Regarding *NM assume a whole lot of things about containers* 3 bullet items: For the my current use case none if this is needed. It could be relatively easy to enable such functionality if a use case that needs it arises. Regarding *Can such trusted application mix and match managed and unmanaged containers?*: In the way I envision how this would work, when an AM asks for a container and gets an allocation for from the RM, the RM does not know if the AM will start a managed or an unmanaged container. It is only between the AM and the NM that this is known, when the ContainerLaunchContext is NULL. Regarding *YARN-1040 should enabled starting unmanaged containers*: If YARN-1040 would be implemented, yes, it would enable unmanaged containers. However the scope of YARN-1040 is much bigger than unmanaged containers. It should be also be possible implementing unmanaged containers as being discussed and later implement YARN-1040. Does this make sense? was (Author: tucu00): [~vinodkv], thanks for summarizing our offline chat. Regarding *ACLs and an on/off switch*: IMO they are not necessary for the following reason. You need an external system installed and running in the node to use the resources of an unmanaged container. If you have direct access into the node to start the external system, you are 'trusted'. If you don't have direct access you cannot use the resources of an unmanaged container. I think this is a very strong requirement already and it would avoid adding a new ACL and an on/off switch. Regarding *Liveliness*: In the case of managed containers we don't have a liveliness 'report' and the container process could very well be hang. In such scenario is the responsibility of the AM to detected the liveliness of the container process and react if it is considered hung. In the case of unmanaged containers, the AM would the same responsibility. The only difference is that in the case of managed containers if the process exits the NM detects that, while in the case of unmanaged containers this responsibility would fall on the AM. Because of this I think we could do without a leaseRenewal/liveliness call. Regarding *NM assume a whole lot of things about containers* 3 bullet items: For the my current use case none if this is needed. It could be relatively easy to enable such functionality if a use case that needs it arises. Regarding *Can such trusted application mix and match managed and unmanaged containers?*: In the way I envision how this will work, when an AM asks for a container and gets an allocation for from the RM, the RM does not know if the AM will start a managed or an unmanaged container. It is only between the AM and the NM that this is known, when the ContainerLaunchContext is NULL. Regarding *YARN-1040 should enabled starting unmanaged containers*: If YARN-1040 would be implemented, yes, it would enable unmanaged containers. However the scope of YARN-1040 is much bigger than unmanaged containers. It should be also be possible implementing unmanaged containers as being discussed and later implement YARN-1040. Does this make sense? Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling - Key: YARN-1404 URL: https://issues.apache.org/jira/browse/YARN-1404
[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags
[ https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839215#comment-13839215 ] Alejandro Abdelnur commented on YARN-1399: -- We should HTML encode tags if presenting them on an HTML page. Allow users to annotate an application with multiple tags - Key: YARN-1399 URL: https://issues.apache.org/jira/browse/YARN-1399 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Zhijie Shen Nowadays, when submitting an application, users can fill the applicationType field to facilitate searching it later. IMHO, it's good to accept multiple tags to allow users to describe their applications in multiple aspects, including the application type. Then, searching by tags may be more efficient for users to reach their desired application collection. It's pretty much like the tag system of online photo/video/music and etc. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1403) Separate out configuration loading from QueueManager in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839299#comment-13839299 ] Alejandro Abdelnur commented on YARN-1403: -- * AllocationConfiguration.java ** Do we need the 2 constructors? The one receiving the config is a bit misleading as sets all default values except for placement policy ** Most getter methods that are querying maps do 2 lookups, containsKey() then get(), we could just do a single get() and if NULL return the default value. ie: {code} public int getQueueMaxApps(String queue) { Integer max = queueMaxApps.get(queue); return (max != null) ? max : queueMaxAppsDefault; } {code} ** Most getter methods are doing an if/else block to return default values, using (v!=null)? v : DEFAULT' would be shorter/simpler. ** getMaxResources(String queueName) should use a constant MAX RESOURCE instead creating one over an over ** hasAccess() 'lastPeriodIndex-1', space before/after '-' * AllocationFileLoaderService.java ** the time value constants should indicate in comments the time out (ms I assume) ** start() Thread run() loop, what happens if somebody deletes the file in long lastModified = allocFile.lastModified(); shouldn't we check for exists() before attempting reload detection? and warn if not there anymore. ** move 'ReloadListener' interface to the top of the class. Also, as it is an inner interface, it can simply be named 'Listener' and the method of it 'onReload()' * FairSCheudlerConfiguration.java ** false change line 242 * FSQueue.java ** If the FSQueue receives a QueueManager, and the QueueManager provides the AllocationConfiguration (which is updated), then the changes are much less and we don't have to iterate over all FSQueue instances to set an updated AllocationConfiguration. * QueueManager.java ** the 'updateQueuesWithReloadedConf()' method should receive the AllocationConfiguration as parameter, it would be more obvious instead getting it from the scheduler. Also, the name of the method should be 'updateConfiguration()' Separate out configuration loading from QueueManager in the Fair Scheduler -- Key: YARN-1403 URL: https://issues.apache.org/jira/browse/YARN-1403 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1403-1.patch, YARN-1403-2.patch, YARN-1403-3.patch, YARN-1403.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1471) The SLS simulator is not running the preemption policy for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839531#comment-13839531 ] Alejandro Abdelnur commented on YARN-1471: -- As the ProportionalCapacityPreemptionPolicy is CS specific, why does not live within the CS itself? Shouldn't be that way? The SLS simulator is not running the preemption policy for CapacityScheduler Key: YARN-1471 URL: https://issues.apache.org/jira/browse/YARN-1471 Project: Hadoop YARN Issue Type: Bug Reporter: Carlo Curino Assignee: Wei Yan Priority: Minor The simulator does not run the ProportionalCapacityPreemptionPolicy monitor. This is because the policy needs to interact with a CapacityScheduler, and the wrapping done by the simulator breaks this. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1403) Separate out configuration loading from QueueManager in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839644#comment-13839644 ] Alejandro Abdelnur commented on YARN-1403: -- * FairScheduler.java ** Given that the AllocationFileLoaderService is a service, shouldn'd its lifecycle start/stop be managed as other services? here reinitialize() is doing start and there is no stop. this is a bit off, no? * FairSchedulerQueueInfo.java ** in case of reload, the maxApps var here never gets updated, shouldn't the getMaxApplications() method be as follows so the max is properly refreshed? {code} public int getMaxApplications() { return scheduler.getAllocationConfiguration().getQueueMaxApps(queueName); } {code} Separate out configuration loading from QueueManager in the Fair Scheduler -- Key: YARN-1403 URL: https://issues.apache.org/jira/browse/YARN-1403 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1403-1.patch, YARN-1403-2.patch, YARN-1403-3.patch, YARN-1403-4.patch, YARN-1403-5.patch, YARN-1403.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1403) Separate out configuration loading from QueueManager in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839679#comment-13839679 ] Alejandro Abdelnur commented on YARN-1403: -- +1 Separate out configuration loading from QueueManager in the Fair Scheduler -- Key: YARN-1403 URL: https://issues.apache.org/jira/browse/YARN-1403 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1403-1.patch, YARN-1403-2.patch, YARN-1403-3.patch, YARN-1403-4.patch, YARN-1403-5.patch, YARN-1403.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1471) The SLS simulator is not running the preemption policy for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838579#comment-13838579 ] Alejandro Abdelnur commented on YARN-1471: -- [~curino], I'm not familiar on how the ProportionalCapacityPreemptionPolicy gets hold of data from the CapacityScheduler. The SLS simply wraps the Scheduler implementation using a proxy pattern. Thus, the scheduler API is fully exposed even if wrapped. If the ProportionalCapacityPreemptionPolicy cast the Scheduler interface to the CapacityScheduler class, I would be inclined to say that this is not a SLS issue but a ProportionalCapacityPreemptionPolicy issue. Are the methods of the CapacityScheduler used by the ProportionalCapacityPreemptionPolicy general purpose to qualify being in the Scheduler API? If not, how about implementing a 'safety valve', something like {{public T T getComponent(ClassT klass)}} in the Scheduler API, and the contract is to return NULL if such component is not implemented by the scheduler. Would something like this work? The SLS simulator is not running the preemption policy for CapacityScheduler Key: YARN-1471 URL: https://issues.apache.org/jira/browse/YARN-1471 Project: Hadoop YARN Issue Type: Bug Reporter: Carlo Curino Assignee: Wei Yan Priority: Minor The simulator does not run the ProportionalCapacityPreemptionPolicy monitor. This is because the policy needs to interact with a CapacityScheduler, and the wrapping done by the simulator breaks this. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1390) Provide a way to capture source of an application to be queried through REST or Java Client APIs
[ https://issues.apache.org/jira/browse/YARN-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836723#comment-13836723 ] Alejandro Abdelnur commented on YARN-1390: -- Agree with Steve, we should limit the length of a tag and number of tags. I'd suggest going hardcoded for now, i.e. 50chars/10tags and going configurable later if the need arises. Provide a way to capture source of an application to be queried through REST or Java Client APIs Key: YARN-1390 URL: https://issues.apache.org/jira/browse/YARN-1390 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla In addition to other fields like application-type (added in YARN-563), it is useful to have an applicationSource field to track the source of an application. The application source can be useful in (1) fetching only those applications a user is interested in, (2) potentially adding source-specific optimizations in the future. Examples of sources are: User-defined project names, Pig, Hive, Oozie, Sqoop etc. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags
[ https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836980#comment-13836980 ] Alejandro Abdelnur commented on YARN-1399: -- What is the concern for a tag being a valid unicode string? If queried via rest API the values would be urlencoded, thus no harm. Allow users to annotate an application with multiple tags - Key: YARN-1399 URL: https://issues.apache.org/jira/browse/YARN-1399 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Zhijie Shen Nowadays, when submitting an application, users can fill the applicationType field to facilitate searching it later. IMHO, it's good to accept multiple tags to allow users to describe their applications in multiple aspects, including the application type. Then, searching by tags may be more efficient for users to reach their desired application collection. It's pretty much like the tag system of online photo/video/music and etc. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags
[ https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837031#comment-13837031 ] Alejandro Abdelnur commented on YARN-1399: -- I would stick to exact tag match. Case insensitive seems reasonable, though I would implement it by lowercase or upper case tags on arrival and when querying. Then the matching is the cheapest. Regarding symbols, what is the harm in supporting them? One thing we didn't mentioned before, on querying I would support only OR, then the client must do any further filtering if it wants to do AND. Allow users to annotate an application with multiple tags - Key: YARN-1399 URL: https://issues.apache.org/jira/browse/YARN-1399 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Zhijie Shen Nowadays, when submitting an application, users can fill the applicationType field to facilitate searching it later. IMHO, it's good to accept multiple tags to allow users to describe their applications in multiple aspects, including the application type. Then, searching by tags may be more efficient for users to reach their desired application collection. It's pretty much like the tag system of online photo/video/music and etc. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1456) IntelliJ IDEA gets dependencies wrong for hadoop-yarn-server-resourcemanager
[ https://issues.apache.org/jira/browse/YARN-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836086#comment-13836086 ] Alejandro Abdelnur commented on YARN-1456: -- This seems a dup of YARN-888 and MAPREDUCE-5362 (same issue for MR modules) IntelliJ IDEA gets dependencies wrong for hadoop-yarn-server-resourcemanager - Key: YARN-1456 URL: https://issues.apache.org/jira/browse/YARN-1456 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Environment: IntelliJ IDEA 12.x 13.x beta Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Attachments: YARN-1456-001.patch When IntelliJ IDEA imports the hadoop POMs into the IDE, somehow it fails to pick up all the transitive dependencies of the yarn-client, and so can't resolve commons logging, com.google.* classes and the like. While this is probably an IDEA bug, it does stop you building Hadoop from inside the IDE, making debugging significantly harder -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1442) change yarn minicluster base directory via system property
[ https://issues.apache.org/jira/browse/YARN-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832716#comment-13832716 ] Alejandro Abdelnur commented on YARN-1442: -- Instead introducing a new system property yarn.minicluster.directory we should have a common hadoop.test.dir used by all minicluster and tests, if different components need their own subdir, they can created it under it. Else we need to chase all properties to set. change yarn minicluster base directory via system property -- Key: YARN-1442 URL: https://issues.apache.org/jira/browse/YARN-1442 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.2.0 Reporter: André Kelpe Priority: Minor Attachments: HADOOP-10122.patch The yarn minicluster used for testing uses the target directory by default. We use gradle for building our projects and we would like to see it using a different directory. This patch makes it possible to use a different directory by setting the yarn.minicluster.directory system property. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (YARN-951) Add hard minimum resource capabilities for container launching
[ https://issues.apache.org/jira/browse/YARN-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur resolved YARN-951. - Resolution: Won't Fix Add hard minimum resource capabilities for container launching -- Key: YARN-951 URL: https://issues.apache.org/jira/browse/YARN-951 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Wei Yan This is a follow up of YARN-789, which enabled FairScheduler to handle zero capabilities resource requests in one dimension (either zero CPU or zero memory). When resource enforcement is enabled (cgroups for CPU and ProcfsBasedProcessTree for memory) we cannot use zero because the underlying container processes will be killed. We need to introduce an absolute or hard minimum: * For CPU. Hard enforcement can be done via a cgroup cpu controller. Using an absolute minimum of a few CPU shares (ie 10) in the LinuxContainerExecutor we ensure there is enough CPU cycles to run the sleep process. This absolute minimum would only kick-in if zero is allowed, otherwise will never kick in as the shares for 1 CPU are 1024. * For Memory. Hard enforcement is currently done by the ProcfsBasedProcessTree.java, using a minimum absolute of 1 or 2 MBs would take care of zero memory resources. And again, this absolute minimum would only kick-in if zero is allowed, otherwise will never kick in as the increment memory is in several MBs if not 1GB. There would be no default for this hard minimum, if not set no correction will be done. If set, then the MAX(hard-minimum, container-resource-capability) will be used. Effectively there will not be any impact unless the hard minimum capabilities are explicitly set. And, even if set, unless the scheduler is configured to allow zero capabilities, the hard-minimum value will not kick in unless is set to a value higher than the MIN capabilities for a container. Expected values, when set, would be 10 shares for CPU and 2 MB for memory. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling
[ https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated YARN-1404: - Description: Currently Hadoop Yarn expects to manage the lifecycle of the processes its applications run workload in. External frameworks/systems could benefit from sharing resources with other Yarn applications while running their workload within long-running processes owned by the external framework (in other words, running their workload outside of the context of a Yarn container process). Because Yarn provides robust and scalable resource management, it is desirable for some external systems to leverage the resource governance capabilities of Yarn (queues, capacities, scheduling, access control) while supplying their own resource enforcement. Impala is an example of such system. Impala uses Llama (http://cloudera.github.io/llama/) to request resources from Yarn. Impala runs an impalad process in every node of the cluster, when a user submits a query, the processing is broken into 'query fragments' which are run in multiple impalad processes leveraging data locality (similar to Map-Reduce Mappers processing a collocated HDFS block of input data). The execution of a 'query fragment' requires an amount of CPU and memory in the impalad. As the impalad shares the host with other services (HDFS DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications (MapReduce tasks). To ensure cluster utilization that follow the Yarn scheduler policies and it does not overload the cluster nodes, before running a 'query fragment' in a node, Impala requests the required amount of CPU and memory from Yarn. Once the requested CPU and memory has been allocated, Impala starts running the 'query fragment' taking care that the 'query fragment' does not use more resources than the ones that have been allocated. Memory is book kept per 'query fragment' and the threads used for the processing of the 'query fragment' are placed under a cgroup to contain CPU utilization. Today, for all resources that have been asked to Yarn RM, a (container) process must be started via the corresponding NodeManager. Failing to do this, will result on the cancelation of the container allocation relinquishing the acquired resource capacity back to the pool of available resources. To avoid this, Impala starts a dummy container process doing 'sleep 10y'. Using a dummy container process has its drawbacks: * the dummy container process is in a cgroup with a given number of CPU shares that are not used and Impala is re-issuing those CPU shares to another cgroup for the thread running the 'query fragment'. The cgroup CPU enforcement works correctly because of the CPU controller implementation (but the formal specified behavior is actually undefined). * Impala may ask for CPU and memory independent of each other. Some requests may be only memory with no CPU or viceversa. Because a container requires a process, complete absence of memory or CPU is not possible even if the dummy process is 'sleep', a minimal amount of memory and CPU is required for the dummy process. Because of this it is desirable to be able to have a container without a backing process. was: Currently a container allocation requires to start a container process with the corresponding NodeManager's node. For applications that need to use the allocated resources out of band from Yarn this means that a dummy container process must be started. Impala/Llama is an example of such application which is currently starting a 'sleep 10y' (10 years) process as the container process. And the resource capabilities are used out of by and the Impala process collocated in the node. The Impala process ensures the processing associated to that resources do not exceed the capabilities of the container. Also, if the container is lost/preempted/killed, Impala stops using the corresponding resources. In addition, in the case of Llama, the current requirement of having a container process, gets complicates when hard resource enforcement (memory -ContainersMonitor- or cpu -via cgroups-) is enabled because Impala/Llama request resources with CPU and memory independently of each other. Some requests are CPU only and others are memory only. Unmanaged containers solve this problem as there is no underlying process with zero CPU or zero memory. Summary: Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling (was: Add support for unmanaged containers) Updated the summary and the description to better describe the use case driving this JIRA. I've closed YARN-951 as won't fix as it is a workaround of the problem this JIRA is trying to address. I don't think there is a need for an umbrella JIRA as this is the only change we need. Enable external systems/frameworks to share
[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling
[ https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829640#comment-13829640 ] Alejandro Abdelnur commented on YARN-1404: -- The proposal to address this JIRA is: * Allow a NULL ContainerLaunchContext in the startContainer() call, this signals the is not process to be started with the container. * The ContainerLaunch logic would use a latch to lock when there is not associated process. The latch will be released on container completion (preemption or terminated by the AM) The changes to achieve this are minimal and they do not alter at all the lifecycle of a container, nor in the RM, nor in the NM. As previously mentioned by Bikas, this can be seen as a special case of the functionality that YARN-1040 is proposing for managing multiple processes with the same container. The scope of work of YARN-1040 is significantly larger and requires API changes, while this JIRA does not require API changes and the changes are not incompatible with each other. Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling - Key: YARN-1404 URL: https://issues.apache.org/jira/browse/YARN-1404 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-1404.patch Currently Hadoop Yarn expects to manage the lifecycle of the processes its applications run workload in. External frameworks/systems could benefit from sharing resources with other Yarn applications while running their workload within long-running processes owned by the external framework (in other words, running their workload outside of the context of a Yarn container process). Because Yarn provides robust and scalable resource management, it is desirable for some external systems to leverage the resource governance capabilities of Yarn (queues, capacities, scheduling, access control) while supplying their own resource enforcement. Impala is an example of such system. Impala uses Llama (http://cloudera.github.io/llama/) to request resources from Yarn. Impala runs an impalad process in every node of the cluster, when a user submits a query, the processing is broken into 'query fragments' which are run in multiple impalad processes leveraging data locality (similar to Map-Reduce Mappers processing a collocated HDFS block of input data). The execution of a 'query fragment' requires an amount of CPU and memory in the impalad. As the impalad shares the host with other services (HDFS DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications (MapReduce tasks). To ensure cluster utilization that follow the Yarn scheduler policies and it does not overload the cluster nodes, before running a 'query fragment' in a node, Impala requests the required amount of CPU and memory from Yarn. Once the requested CPU and memory has been allocated, Impala starts running the 'query fragment' taking care that the 'query fragment' does not use more resources than the ones that have been allocated. Memory is book kept per 'query fragment' and the threads used for the processing of the 'query fragment' are placed under a cgroup to contain CPU utilization. Today, for all resources that have been asked to Yarn RM, a (container) process must be started via the corresponding NodeManager. Failing to do this, will result on the cancelation of the container allocation relinquishing the acquired resource capacity back to the pool of available resources. To avoid this, Impala starts a dummy container process doing 'sleep 10y'. Using a dummy container process has its drawbacks: * the dummy container process is in a cgroup with a given number of CPU shares that are not used and Impala is re-issuing those CPU shares to another cgroup for the thread running the 'query fragment'. The cgroup CPU enforcement works correctly because of the CPU controller implementation (but the formal specified behavior is actually undefined). * Impala may ask for CPU and memory independent of each other. Some requests may be only memory with no CPU or viceversa. Because a container requires a process, complete absence of memory or CPU is not possible even if the dummy process is 'sleep', a minimal amount of memory and CPU is required for the dummy process. Because of this it is desirable to be able to have a container without a backing process. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1407) RM Web UI and REST APIs should uniformly use YarnApplicationState
[ https://issues.apache.org/jira/browse/YARN-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13827164#comment-13827164 ] Alejandro Abdelnur commented on YARN-1407: -- LGTM, +1 RM Web UI and REST APIs should uniformly use YarnApplicationState - Key: YARN-1407 URL: https://issues.apache.org/jira/browse/YARN-1407 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1407-1.patch, YARN-1407-2.patch, YARN-1407.patch RMAppState isn't a public facing enum like YarnApplicationState, so we shouldn't return values or list filters that come from it. However, some Blocks and AppInfo are still using RMAppState. It is not 100% clear to me whether or not fixing this would be a backwards-incompatible change. The change would only reduce the set of possible strings that the API returns, so I think not. We have also been changing the contents of RMAppState since 2.2.0, e.g. in YARN-891. It would still be good to fix this ASAP (i.e. for 2.2.1). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1403) Separate out configuration loading from QueueManager in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824268#comment-13824268 ] Alejandro Abdelnur commented on YARN-1403: -- * AllocationFileLoaderService.java: ** start()/stop() should set a volatile boolean 'running' to true/false the reloadThread should loop while 'running'. The stop() should interrupt the thread for force a wake up if sleeping. ** reloadThread run(), the try block should include the reload, then when interrupted by stop() would skip the reloading if exiting. ** reloadAllocs(), we are not charge by the character, method name should be reloadAllocations() ** if (allocFile == null) return; use {} ** what happens if reloadListener.queueConfigurationReloaded(info); throws an exception? in what state things end up? ** not sure the logic using lastReloadAttemptFailed is correct, in the exception handling in thread run() * QueueConfiguration.java ** QueueConfiguration() constructor, shouldn't placementpolicy be the default? * QueueManager.java ** shouldn't this be a composite service? ** it is starting but not stopping the AllocationFileLoaderService ** the initialize() setting the reload-listener is too hidden, this should be done next to where the AllocationFileService is created. Wouldn't be simpler/cleaner if the QueueManager should be a service that encapsulates the reloading, queue allocations, ACLs and queue placement. And the FS should just see methods of it. Separate out configuration loading from QueueManager in the Fair Scheduler -- Key: YARN-1403 URL: https://issues.apache.org/jira/browse/YARN-1403 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1403-1.patch, YARN-1403.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1392) Allow sophisticated app-to-queue placement policies in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822939#comment-13822939 ] Alejandro Abdelnur commented on YARN-1392: -- LGTM +1. when committing, please remove 2 false changes in the FairScheduler.java lines 696/1173 (empty lines) Allow sophisticated app-to-queue placement policies in the Fair Scheduler - Key: YARN-1392 URL: https://issues.apache.org/jira/browse/YARN-1392 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1392-1.patch, YARN-1392-1.patch, YARN-1392-2.patch, YARN-1392-3.patch, YARN-1392.patch Currently the Fair Scheduler supports app-to-queue placement by username. It would be beneficial to allow more sophisticated policies that rely on primary and secondary groups and fallbacks. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container
[ https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13821597#comment-13821597 ] Alejandro Abdelnur commented on YARN-1040: -- [~bikassaha], if I got it right, you suggest: * {{StartContainerRequest}} would have a new property {{boolean multipleProcesses (false)}} * An additional API {{startProcess(ContainerId, ContainerLaunchContext)}} will be used to start multiple processes within the same container. * In a {{StartContainerRequest}}, if the {{ContainerLaunchContext == null}} and {{multipleProcesses = true}}, the container is started with no associated process and the container allocation will not timeout as it as been claimed by the AM (because of the start container request). If that is the case, then YARN-1404 would be a special case of this JIRA. Am i right? De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container Key: YARN-1040 URL: https://issues.apache.org/jira/browse/YARN-1040 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 3.0.0 Reporter: Steve Loughran The AM should be able to exec 1 process in a container, rather than have the NM automatically release the container when the single process exits. This would let an AM restart a process on the same container repeatedly, which for HBase would offer locality on a restarted region server. We may also want the ability to exec multiple processes in parallel, so that something could be run in the container while a long-lived process was already running. This can be useful in monitoring and reconfiguring the long-lived process, as well as shutting it down. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1404) Add support for unmanaged containers
[ https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13821657#comment-13821657 ] Alejandro Abdelnur commented on YARN-1404: -- [~ste...@apache.org] bq. 1. I'd be inclined to treat this as a special case of YARN-1040 I've just commented in YARN-1040 following Bikas' comment on this https://issues.apache.org/jira/browse/YARN-1040?focusedCommentId=13821597page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13821597 bq. It's dangerously easy to leak containers here; I know llama keeps an eye on things, but I worry about other people's code -though admittedly, any long-lived command line app yes could do the same. We can have NM configs to disable no-process or multi-process, but still you can workaround this around by having a dummy process. This is how Llama is doing things today, but it is not ideal for several reasons. IMO, from Yarn perspective we need to allow AMs to be able to do sophisticated things within the Yarn programming model (like you are trying to do with long-lived containers or what I'm doing with Llama). bq. For the multi-process (and that includes processes=0), we really do need some kind of lease renewal option to stop containers being retained forever. It would become the job of the AM to do the renewal As I've mentioned above, I don't think we need a special lease for this: https://issues.apache.org/jira/browse/YARN-1404?focusedCommentId=13820200page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13820200 (look for 'The reason I've taken the approach of leaving the container leases out of band is:') [~vinodkv] bq. -1 for this... I think you are jumping too fast here. bq. As I repeated on other JIRAs, please change the title with the problem statement instead of solutions. IMO that makes completely sense for bugs, for improvements/new-features a description of it communicates more as it will be the commit message. The shortcomings the JIRA is trying to address should be captured in the description. Take for example the following JIRA summaries, would you change them to describe a problem? * AHS should support application-acls and queue-acls * AM's tracking URL should be a URL instead of a string * Add support for zipping/unzipping logs while in transit for the NM logs web-service * YARN should have a ClusterId/ServiceId bq. I indicated offline about llama with others. I don't think you need NodeManagers either to do what you want, forget about containers. All you need is use the ResourceManager/scheduler in isolation using MockRM/LightWeightRM (YARN-1385) - your need seems to be using the scheduling logic in YARN and not use the physical resources. The whole point of Llama is to allow Impala to share resources in a real Yarn cluster doing other workloads like Map-Reduce. In other words, Impala/Llama and other AMs must share cluster resources. Add support for unmanaged containers Key: YARN-1404 URL: https://issues.apache.org/jira/browse/YARN-1404 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-1404.patch Currently a container allocation requires to start a container process with the corresponding NodeManager's node. For applications that need to use the allocated resources out of band from Yarn this means that a dummy container process must be started. Impala/Llama is an example of such application which is currently starting a 'sleep 10y' (10 years) process as the container process. And the resource capabilities are used out of by and the Impala process collocated in the node. The Impala process ensures the processing associated to that resources do not exceed the capabilities of the container. Also, if the container is lost/preempted/killed, Impala stops using the corresponding resources. In addition, in the case of Llama, the current requirement of having a container process, gets complicates when hard resource enforcement (memory -ContainersMonitor- or cpu -via cgroups-) is enabled because Impala/Llama request resources with CPU and memory independently of each other. Some requests are CPU only and others are memory only. Unmanaged containers solve this problem as there is no underlying process with zero CPU or zero memory. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1407) apps REST API filters queries by YarnApplicationState, but includes RMAppStates in response
[ https://issues.apache.org/jira/browse/YARN-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13821692#comment-13821692 ] Alejandro Abdelnur commented on YARN-1407: -- Agree with [~tgraves], +1 after doc additions. apps REST API filters queries by YarnApplicationState, but includes RMAppStates in response --- Key: YARN-1407 URL: https://issues.apache.org/jira/browse/YARN-1407 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1407.patch RMAppState isn't a public facing enum like YarnApplicationState, so we shouldn't return values that come from it. It is not 100% clear to me whether or not fixing this would be a backwards-incompatible change. The change would only reduce the set of possible strings that the API returns, so I think not. We have also been changing the contents of RMAppState since 2.2.0, e.g. in YARN-891. It would still be good to fix this ASAP (i.e. for 2.2.1). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1241) In Fair Scheduler maxRunningApps does not work for non-leaf queues
[ https://issues.apache.org/jira/browse/YARN-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13821695#comment-13821695 ] Alejandro Abdelnur commented on YARN-1241: -- patch needs to be rebased, it does not apply. In Fair Scheduler maxRunningApps does not work for non-leaf queues -- Key: YARN-1241 URL: https://issues.apache.org/jira/browse/YARN-1241 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1241-1.patch, YARN-1241-2.patch, YARN-1241-3.patch, YARN-1241-4.patch, YARN-1241-5.patch, YARN-1241-6.patch, YARN-1241.patch Setting the maxRunningApps property on a parent queue should make it that the sum of apps in all subqueues can't exceed it -- This message was sent by Atlassian JIRA (v6.1#6144)