Re: Kerberos and Airflow

2018-08-05 Thread Dan Davydov
I look forward to reading the draft and working on it with you! Not 100% sure I can make it so SF for the hackathon (I'm in New York now), but I can participate remotely. On Sat, Aug 4, 2018 at 9:30 AM Bolke de Bruin wrote: > Hi Dan, > > Don’t misunderstand me. I think what I proposed is

Re: Kerberos and Airflow

2018-08-04 Thread Bolke de Bruin
Hi Dan, Don’t misunderstand me. I think what I proposed is complementary to the dag submit function. The only thing you mentioned I don’t think is needed is to fully serialize up front and therefore excluding callback etc (although there are other serialization libraries like marshmallow that

Re: Kerberos and Airflow

2018-08-03 Thread Dan Davydov
I designed a system similar to what you are describing which is in use at Airbnb (only DAGs on a whitelist would be allowed to merged to the git repo if they used certain types of impersonation), it worked for simple use cases, but the problem was doing access control becomes very difficult, e.g.

Re: Kerberos and Airflow

2018-08-02 Thread Bolke de Bruin
You mentioned you would like to make sure that the DAG (and its tasks) runs in a confined set of settings. Ie. A given set of connections at submission time not at run time. So here we can make use of the fact that both the scheduler and the worker parse the DAG. Firstly, when scheduler

Re: Kerberos and Airflow

2018-08-02 Thread Dan Davydov
I'm very intrigued, and am curious how this would work in a bit more detail, especially for dynamically created DAGs (how would static manifests map to DAGs that are generated from rows in a MySQL table for example)? You could of course have something like regexes in your manifest file like

Re: Kerberos and Airflow

2018-08-02 Thread Bolke de Bruin
Also: using the Kubernetes executor combined with some of the things we discussed greatly enhances the security of Airflow as the environment isn’t really shared anymore. B. > On 2 Aug 2018, at 19:51, Bolke de Bruin wrote: > > Hi Dan, > > I discussed this a little bit with one of the

Re: Kerberos and Airflow

2018-08-02 Thread Bolke de Bruin
Hi Dan, I discussed this a little bit with one of the security architects here. We think that you can have a fair trade off between security and usability by having a kind of manifest with the dag you are submitting. This manifest can then specify what the generated tasks/dags are allowed to

Re: Kerberos and Airflow

2018-07-29 Thread Dan Davydov
*Let’s say we trust the owner field of the DAGs I think we could do the following.* *Obviously, the trusting the user part is key here. It is one of the reasons I was suggesting using “airflow submit” to update / add dags in Airflow* *This is the hard part about my question.* I think in a true

Re: Kerberos and Airflow

2018-07-29 Thread Bolke de Bruin
Ah gotcha. That’s another issue actually (but related). Let’s say we trust the owner field of the DAGs I think we could do the following. We then have a table (and interface) to tell Airflow what users have access to what connections. The scheduler can then check if the task in the dag can

Re: Kerberos and Airflow

2018-07-29 Thread Dan Davydov
The concern is how to secure secrets on the scheduler such that only certain DAGs can access them, and in the case of files that create DAGs dynamically, only some set of DAGs should be able to access these secrets. e.g. if there is a secret/keytab that can be read by DAG A generated by file X,

Re: Kerberos and Airflow

2018-07-29 Thread Bolke de Bruin
I’m not sure what you mean. The example I created allows for dynamic DAGs, as the scheduler obviously knows about the tasks when they are ready to be scheduled. This isn’t any different from a static DAG or a dynamic one. For Kerberos it isnt that special. Basically a keytab are the revokable

Re: Kerberos and Airflow

2018-07-28 Thread Dan Davydov
This makes sense, and thanks for putting this together. I might pick this up myself depending on if we can get the rest of the mutli-tenancy story nailed down, but I still think the tricky part is figuring out how to allow dynamic DAGs (e.g. DAGs created from rows in a Mysql table) to work with

Re: Kerberos and Airflow

2018-07-28 Thread Bolke de Bruin
Here: https://github.com/bolkedebruin/airflow/tree/secure_connections Is a working rudimentary implementation that allows securing the connections (only LocalExecutor at the moment) * It enforces the use of “conn_id” instead of

Re: Kerberos and Airflow

2018-07-28 Thread Bolke de Bruin
Well, I don’t think a hook (or task) should be obtain it by itself. It should be supplied. At the moment you start executing the task you cannot trust it anymore (ie. it is unmanaged / non airflow code). So we could change the basehook to understand supplied credentials and populate a hash

Re: Kerberos and Airflow

2018-07-28 Thread Dan Davydov
*So basically in the scheduler we parse the dag. Either from the manifest (new) or from smart parsing (probably harder, maybe some auto register?) we know what connections and keytabs are available dag wide or per task.* This is the hard part that I was curious about, for dynamically created DAGs,

Re: Kerberos and Airflow

2018-07-27 Thread Bolke de Bruin
Sure. In general I consider keytabs as a part of connection information. Connections should be secured by sending the connection information a task needs as part of information the executor gets. A task should then not need access to the connection table in Airflow. Keytabs could then be send

Re: Kerberos and Airflow

2018-07-27 Thread Sid Anand
+1 Kerberos is important for us (PayPal)... happy to help on this effort as well. -s On Fri, Jul 27, 2018 at 8:41 AM Hitesh Shah wrote: > Hi Taylor > > +1 on upstreaming this. It would be great if you can submit a pull request > to enhance the apache airflow docs. > > thanks > Hitesh > > > On

Re: Kerberos and Airflow

2018-07-26 Thread Alejandro Fernandez
I'm also interested in this, especially regarding how to distribute keytabs to Airflow dev boxes. +1 for creating a doc so multiple people can contribute instead of reading email threads. Thanks, Alejandro Fernandez Apache Ambari PMC On Thu, Jul 26, 2018 at 2:31 PM, Taylor Edmiston wrote: >

Re: Kerberos and Airflow

2018-07-26 Thread Taylor Edmiston
While we're on the topic, I'd love any feedback from Bolke or others who've used Kerberos with Airflow on this quick guide I put together yesterday. It's similar to what's in the Airflow docs but instead all on one page and slightly expanded.

Re: Kerberos and Airflow

2018-07-26 Thread Driesprong, Fokko
Hi Ry, You should ask Bolke de Bruin. He's really experienced with Kerberos and he did also the implementation for Airflow. Beside that he worked also on implementing Kerberos in Ambari. Just want to let you know. Cheers, Fokko Op do 26 jul. 2018 om 23:03 schreef Ry Walker > Hi everyone - > >