> I mean "one team writing DAGs for multiple clients, and those tasks can't > collide". We don't require actual security from malicious users, we just need > some safety rails to prevent accidents.
I think "/tmp" is only one of the problems. I am not sure if you are aware but DAG writers have a lot of power in the current airflow. They could even accidentally - for example - delete the whole metadata db or all dag history by issuing an ORM command to delete those. There are no protections (and that's by design until multi-tenancy is implemented. So worrying about /tmp accidental clashes by inexperienced users is the least of your worries I believe. Airflow (currently) assumes a lot of trust in the DAG writers that they are not doing anything "crazy" (again this is by design assumption is that DAG writers know what they are doing and their code is reviewed by their peers before executed). However when you do want to only focus on file access, then /tmp is also not your only problem. Depending which executors you use there are also other possibilities of "clashing" 1) Local Executor- the tasks are run as processes on the same machine as scheduler and ANY file (not only /tmp) can be shared/overwritten. If your teams choose some "/file/file-storage" they could also overwrite those files (there is no way to provide different access level to tasks belonging to different tasks 2) Celery Executor - those are usually separated from scheduler but still one "Worker" can handle multiple tasks from (potentially) different teams and same problems can occur. You can potentially separate different teams by using different queues (and each team having separate set of workers) but this is not at all "safe" as any DAG writer can override the queue to another value - effectively any team member can run the dags as another team member. No protection against that (except code review) is built-in currently. 3) Kubernetes Executor - here the situation is a bit better. Each task is always run in a separate new POD and the only shared volumes are those which you explicitly add in POD template (but still a user could run conceptually `DELETE * from DA` and delete all dags from all teams. No protection against such cases in this case (same in Local/Celery) is possible currently. So In short - there are no "good" protections. If you want to protect against "accidental" /tmp file override between teams - use K8S executor. What you could also provide is to set TMP_DIR to a different path for each team or make your teams only use DockerOperator or K8S operator to introduce file-level separation (but this would require some conventions adopted by the teams and trust that they are not breaking them - there is nothing in Airflow to enforce those. You could potentially "check" some of those via cluster policies: https://airflow.apache.org/docs/apache-airflow/stable/concepts/cluster-policies.html - but those checks will only be able to "check" if your conventions are followed, but you would not be able to detect if a member of one team pretends to be a member of another team (unless you also add some separation of folders and permissions for dag submissions and link the location of DAGs to DAG location). This is not foul-proof (because any DAG writer could override the location dynamically when DAG is parsed. J. On Fri, Jan 14, 2022 at 9:40 PM Chris Redekop <[email protected]> wrote: > > I mean "one team writing DAGs for multiple clients, and those tasks can't > collide". We don't require actual security from malicious users, we just need > some safety rails to prevent accidents. > > On Fri, Jan 14, 2022 at 1:31 PM Jed Cunningham <[email protected]> > wrote: >> >> Hey Chris, >> >> I think the answer depends on what you mean by "multi-tenancy". I think you >> mean one team writing DAGs for multiple clients and those tasks can't >> collide. If so, the easiest way to have isolated workers is with >> KubernetesExecutor. No shared tmp! >> >> If instead you mean multiple teams sharing an instance (what I consider >> multi-tenancy), it's a totally different situation, and in most cases having >> separate instances is the right call if you require "security". >> >> Remember, DAGs are arbitrary python and you can do all sorts of interesting >> things in them. Do you need isolation for accidental collisions, or do you >> need to protect tenant-a from possibly-bad-actor-tenant-b? >> >> More reading on Airflow multi-tenancy: >> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-1%3A+Improve+Airflow+Security >> https://lists.apache.org/[email protected]:lte=1y:multi-tenancy >> >> Jed
