[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142337#comment-16142337 ] Arthur Rand commented on SPARK-16742: - Gotcha, https://issues.apache.org/jira/browse/SPARK-21842 is to track work. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt >Assignee: Arthur Rand > Fix For: 2.3.0 > > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16140293#comment-16140293 ] Marcelo Vanzin commented on SPARK-16742: Both renewal and creating new tickets after the TTL (those are different things). > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt >Assignee: Arthur Rand > Fix For: 2.3.0 > > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16140111#comment-16140111 ] Arthur Rand commented on SPARK-16742: - Hello [~vanzin], I'm assuming you're talking about automatic ticket renewal, correct? I was just starting to look into that w.r.t. Mesos, I'll create a ticket. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt >Assignee: Arthur Rand > Fix For: 2.3.0 > > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106501#comment-16106501 ] Arthur Rand commented on SPARK-16742: - Hello [~vanzin], I addressed the comments for the second PR (https://github.com/apache/spark/pull/18519). It is ready for final review. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16072970#comment-16072970 ] Apache Spark commented on SPARK-16742: -- User 'mgummelt' has created a pull request for this issue: https://github.com/apache/spark/pull/18519 > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971897#comment-15971897 ] Apache Spark commented on SPARK-16742: -- User 'mgummelt' has created a pull request for this issue: https://github.com/apache/spark/pull/17665 > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969341#comment-15969341 ] Michael Gummelt commented on SPARK-16742: - [~jerryshao] No, but you can look at our solution here: https://github.com/mesosphere/spark/commit/0a2cc4248039ca989e177e96e92a594a025661fe#diff-79391110e9f26657e415aa169a004998R129 > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968545#comment-15968545 ] Saisai Shao commented on SPARK-16742: - [~mgummelt], do you have a design doc of the kerberos support for Spark on Mesos, so that my work of SPARK-19143 could be based on yours. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963601#comment-15963601 ] Michael Gummelt commented on SPARK-16742: - bq. So, assuming that Mesos is configured properly, then it should be OK for Spark code to distribute user credentials. Right. It's just a matter of the cluster admin syncing Mesos credentials and kerberos credentials properly. In summary, it's simpler in YARN because YARN is Kerberos-aware, whereas Mesos isn't. bq. That sounds like you might need the current code that distributes keytabs and logs in the cluster to make even client mode work in this setup. Since client mode requires network access to the Mesos master, we generally assume that the user is on the same network as their datacenter, and can thus kinit against the KDC. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963592#comment-15963592 ] Marcelo Vanzin commented on SPARK-16742: bq. It authenticates the Mesos principal, and this principal is allowed to launch processes only as certain Linux users. It's up the cluster admin to setup this mapping appropriately. Ok, that sounds similar then. Basically, you *can* set up Mesos so that it can do this securely, which was my initial question. (Being able to set things up in an insecure way is not actually that interesting; I just wanted to make sure there *is* a way to set things up securely.) So, assuming that Mesos is configured properly, then it should be OK for Spark code to distribute user credentials. bq. I actually said a "user might not be kinit'd". They may, however, have access to the keytab. That sounds like you might need the current code that distributes keytabs and logs in the cluster to make even client mode work in this setup. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963583#comment-15963583 ] Michael Gummelt commented on SPARK-16742: - bq. That sounds problematic. The way YARN works is that it actually authenticates the user. Are you saying that Mesos doesn't do user authentication? AFAICT, YARN doesn't authenticate the Linux user. The KDC authenticates the kerberos principal, and YARN maps this principal to a Linux user via {{hadoop.security.auth_to_local}}. So if a user authenticated to the KDC via a principal "Joe", and the {{auth_to_local}} rule maps "Joe" to "root", then "Joe" can launch processes as "root", even though he never provided "root" credentials. It's up to the cluster administrator to properly setup this Kerberos -> Linux mapping. It's a similar story with Mesos. Mesos doesn't authenticate the Linux user. It authenticates the Mesos principal, and this principal is allowed to launch processes only as certain Linux users. It's up the cluster admin to setup this mapping appropriately. The big difference is that, by default, YARN will map the kerberos principal to the linux user with the same name, so there's no problem. Whereas Mesos will allow the driver to launch executors as any user that their Mesos principal is allowed to launch users as. So it's up to the admin to only provide users with consistent Mesos and Kerberos credentials. bq. Are you saying that for YARN or Mesos? When YARN runs in Kerberos mode, Kerberos dictates the user. I'm talking about YARN. See the above comment. If {{auth_to_local}} is used like I think it is, then that's what ultimately determines the Linux user, not just Kerberos. bq. The use case you mention ("user starting an application in cluster mode with no kerberos credentials") sounds actually worrying I actually said a "user might not be kinit'd". They may, however, have access to the keytab. But since they're not on the same network as the KDC, they can't authenticate directly. But they do have the creds. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963559#comment-15963559 ] Marcelo Vanzin commented on SPARK-16742: bq. But in Spark, this isn't currently derived from the Kerberos principal. It's configured by the user. That sounds problematic. The way YARN works is that it actually authenticates the user. Are you saying that Mesos doesn't do user authentication? The overarching point I'm trying to make with my comments is that for kerberos support to be properly secure, the cluster manager needs to be secure. That means running applications from different users in a way that doesn't allow them to hack each other. YARN does that by doing authentication when users request applications to run, and by running the containers as the requested user. The exact way in which YARN achieves that seems kinda tangential to the actual question, which is: What is the story for Mesos? Basically, the way in which you support Kerberos will depend on how your cluster manager does security. If Mesos behaves more like Spark Standalone than it does like YARN, then any solution that requires distributing user credentials is a non-starter, because it just becomes a security liability. bq. It would be a vulnerability, for example, if the Linux user for the executors is simply derived from that of the driver, because two human users running as the same Linux user, but logged in via different Kerberos principals, would be able to see each others' tokens. Are you saying that for YARN or Mesos? When YARN runs in Kerberos mode, Kerberos dictates the user. That's how the user is authenticating to YARN. There's a requirement that an OS user exists matching that particular user, but that's just a configuration detail. The security comes from the fact that the user is authenticating to the KDC. bq. You're right that we could implement cluster mode in some form, but I'd rather keep the initial PR small. I hope that's acceptable. The main point I'm trying to convey here is that running things in client and cluster mode should be exactly the same from the point of view of distributing tokens. The use case you mention ("user starting an application in cluster mode with no kerberos credentials") sounds actually worrying, because what's authenticating the user? > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963469#comment-15963469 ] Michael Gummelt commented on SPARK-16742: - [~jerryshao] Great! The current RPC used in Mesos is very simple. The executor just periodically requests the latest credentials from the driver, which uses the keytab to periodically renew. We can swap in a different mechanism once that exists. I left a comment on your design doc. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963446#comment-15963446 ] Michael Gummelt commented on SPARK-16742: - bq. The most basic feature needed for any kerberos-related work is user isolation (different users cannot mess with each others' processes). I was under the impression that Mesos supported that. Mesos of course supports configuring the Linux user that process runs as. But in Spark, this isn't currently derived from the Kerberos principal. It's configured by the user, and the *Mesos* principal of the scheduler, along with ACLs configured in Mesos, is what determines which Linux users are allowed. That's why I was asking about {{hadoop.security.auth_to_local}}, to understand how YARN determines what Linux user to run executors as. It would be a vulnerability, for example, if the Linux user for the executors is simply derived from that of the driver, because two human users running as the same Linux user, but logged in via different Kerberos principals, would be able to see each others' tokens. bq. I don't know where this notion that cluster mode requires you to distribute keytabs comes from As you said, it's mostly the renewal use case that requires distributing the keytab, but that's not all. In many Mesos setups, and certainly in DC/OS, the submitting user might not already be kinit'd. They may be running from outside the datacenter entirely, without network access to the KDC. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963136#comment-15963136 ] Marcelo Vanzin commented on SPARK-16742: bq. The problem is then that a kerberos-authenticated user submitting their job would be unaware that their credentials are being leaked to other users. That's the gist of it, yes. But note that it isn't restricted to files. If all the user processes are running as the same user, one can just dump the other's heap, or connect using JVMTI, and get the credentials. Same problem. The most basic feature needed for any kerberos-related work is user isolation (different users cannot mess with each others' processes). I was under the impression that Mesos supported that. bq. I'm assuming that hadoop.security.auth_to_local is what maps the Kerberos user to the Unix user... I'm not exactly familiar with all the YARN settings but yes, the result you get is that the submitting user runs YARN containers as their own user (nor as some generic, shared user). Without that, you shouldn't even bother thinking about inserting Kerberos in the picture, IMO. bq. We avoid the shared-file problem for keytabs entirely See my first comment above, that's not enough. bq. We're probably going to punt on cluster mode for now You don't need to punt on cluster mode. I don't know where this notion that cluster mode requires you to distribute keytabs comes from; Spark works just fine in YARN cluster mode without distributing keytabs. All you need to distribute are delegation tokens. Keytabs aren't even necessary to log in and submit the app at all (you can use passwords with kinit, after all). The only thing distributing keytabs buys you is running applications for longer than the delegation tokens' max lifetime (normally 7 days by default). bq. If you see any blockers Lack of user isolation is always a blocker; without that there's no way to prevent one user from seeing another's credentials. But I've asked this in the past and the answer I got is that Mesos supports it... > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962455#comment-15962455 ] Saisai Shao commented on SPARK-16742: - Hi [~mgummelt], I'm working on the design of SPARK-19143, by looking at your comments, I think part of the works are overlapped, especially the RPC part to propagate Credentials. Here is my current WIP design (https://docs.google.com/document/d/1Y8CY3XViViTYiIQO9ySoid0t9q3H163fmroCV1K3NTk/edit?usp=sharing). In my current design I offer a standard RPC solution to support different cluster managers. It would be great if we could collaborate together to meet the same goal. My main concern is that if Mesos's implementation is quite different from Yarn's, then it requires more effort to align with different cluster managers, if your proposal is similar to what I proposed here, then my work can be based on yours. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962450#comment-15962450 ] Michael Gummelt commented on SPARK-16742: - Also, note that the above Mesos implementation is not dependent on Mesos in any way. It just uses Spark's existing RPC mechanisms to transmit delegation tokens. I see that there's a related effort here to standardize this RPC mechanism: https://issues.apache.org/jira/browse/SPARK-19143. We'd be more than happy to adopt that standard once it exists. But hopefully our one-off RPC that we're currently using is acceptable in the interim. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962440#comment-15962440 ] Michael Gummelt commented on SPARK-16742: - Hi [~vanzin], [~ganger85] and Strat.io are pulling back their Mesos Kerberos implementation for now, and we at Mesosphere are about to submit a PR to upstream our implementation. I have a few questions I'd like to run by you to make sure that PR goes smoothly. 1) I've been following your comments on this Spark Standalone Kerberos PR: https://github.com/apache/spark/pull/17530. It looks like your concern is that in *cluster mode*, the keytab is written to a file on the host running the driver, and is owned by the user of the Spark Worker, which will be the same for each job. So jobs submitted by multiple users will be able to read each other's keytabs. In *client mode*, it looks like the delegation tokens are written to a file (HADOOP_TOKEN_FILE_LOCATION) on the host running the executor, which suffers from the same problem as the keytab in cluster mode. The problem is then that a kerberos-authenticated user submitting their job would be unaware that their credentials are being leaked to other users. Is this an accurate description of the issue? 2) I understand that YARN writes delegation tokens via {{amContainer.setTokens()}}, which ultimately results in the delegation token being written to a file owned by the submitting user. However, since the "submitting user" is a Kerberos user, not a Unix user, I'm assuming that {{hadoop.security.auth_to_local}} is what maps the Kerberos user to the Unix user who runs the ApplicationMaster and owns that file. Is that correct? To avoid the shared-file problem for delegation tokens, our Mesos implementation currently has the Executor issue an RPC call to fetch the delegation token from the driver. There therefore isn't any need for at-rest encryption, and if in-motion encryption is in the user's threat model, then can be sure to run Spark with SSL. We avoid the shared-file problem for keytabs entirely, because there's no need to distribute the keytab, at least in client mode. Unlike YARN, the driver and the equivalent of the "ApplicationMaster" in Mesos are one and the same. They both exist in the same process, the {{spark-submit}} process. We're probably going to punt on cluster mode for now, just for simplicity, but we should be able to solve this in cluster mode as well, because unlike standalone, and much like YARN, Mesos controls what user the driver runs as. What do you think of the above approach? If you see any blockers, I would very much appreciate teasing those out now rather than during the PR. Thanks! > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867916#comment-15867916 ] Abel Rincón commented on SPARK-16742: - Hi all, we recent push our new implementation, and you can take a look over the code at the PR. I'm creating a little doc to explain the solution. BTW Some headlines. Enable using standard principal and keytab args, also allow to use proxy user over the real principal with --proxy-user arg. Diver side uses kerberos authentication DAGScheduler get the hadoop delegation tokens related, using kerberos authentication and create each task with those tokens. Executors side uses hadoop tickets authentication > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867374#comment-15867374 ] Abel Rincón commented on SPARK-16742: - Hi all we are working on a solution with hadoop delegation tokens and without proxy users, I hope that today you can take a look over the new code. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863371#comment-15863371 ] Saisai Shao commented on SPARK-16742: - The proposed solution is quite different from what existed in Spark on YARN. IIUC this solution looks doesn't honor delegation token, and wraps every HDFS operation with {{executeSecure}}, I simply doubt that this approach requires other components, like sql, streaming, should also know the existence of such APIs and try to wrap them. Also if newly added codes ignore this wrapper, this will lead to error. From my understanding it is quite intrusive. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851539#comment-15851539 ] Apache Spark commented on SPARK-16742: -- User 'arinconstrio' has created a pull request for this issue: https://github.com/apache/spark/pull/16788 > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820530#comment-15820530 ] Jorge Lopez-Malla commented on SPARK-16742: --- In Stratio we have had a very busy end of the year releasing our product and we are now resuming the development. In fact, if someone will go to the Spark Summit East, Abel and I will talk about Stratio Kerberos Spark integration solution. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816742#comment-15816742 ] Mohammad Kamrul Islam commented on SPARK-16742: --- any update on this effort? > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15490657#comment-15490657 ] Abel Rincón commented on SPARK-16742: - We at Stratio are working on this issue, Stratio design doc: https://docs.google.com/document/d/1h9UvLCQ5e6s8L9jqRAuPowJAom_We1f5LFSPvrimDqM/edit?usp=sharing > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org