@Adam Dutko<mailto:a...@runbymany.com> - Thanks for your insights — every piece of feedback is valuable. Very good points and I agree that having this feature positions Airflow as more enterprise-ready and improves its appeal to executive leadership.
@Amogh Desai<mailto:amoghdesai....@gmail.com> – great observation from the diagram. As Jarek mentioned, AIP-67 will indeed enable association with multiple tenants through custom Authorization managers. Roles weren’t depicted in the diagram to reduce a layer of complexity but, for instance, data engineers would be assigned multiple roles corresponding to their cross-tenant access – such as analytics-viewer, data-science-viewer, and data-engineering-user. After authentication, they would be authorized to interact with resources across all 3 tenants. Regarding reference implementation, Vincent is already at work with AWS Auth Manager<https://github.com/apache/airflow/tree/main/airflow/providers/amazon/aws/auth_manager> (based on AWS Identity Center and Amazon Verified Permissions) – it is of course single-tenant today, but once AIP=67 work is done, we will upgrade it to support multi-tenant deployment. Also, as Jarek called out, we should have an implementation of Keycloak auth manager to offer the same functionality as an open-source alternative, outside of the AIP-67 scope. On the UI front, I agree that a top-level tenant filter is essential for managing visibility and access to resources, allowing users to switch between tenants seamlessly. I trust our UI experts in the community to lead this design effectively. We can figure the UI specifics out once AIP is approved. Shubham From: Jarek Potiuk <ja...@potiuk.com> Date: Friday, March 15, 2024 at 1:55 AM To: "users@airflow.apache.org" <users@airflow.apache.org> Cc: "d...@airflow.apache.org" <d...@airflow.apache.org>, "Mehta, Shubham" <shu...@amazon.com> Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [DISCUSS] DRAFT AIP-67 Multi-tenant deployment of Airflow components CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le contenu ne présente aucun risque. Hey Amogh, Both cases are covered in the AIP. Both are pure UI things. No need for --tenant flag in the UI. This will be covered by a custom Auth Manager (following AiP-56 implementation) - initially we will not have an implementation for it other than Demo, but anyone can implement their own Auth Manager implementing both filtering and multi-tenant access. Eventually we might have a reference KeyCloak based implementation that could be configurable and reusable but it was deliberately moved out of this AIP and users who want to implement authentication/filtering will have to implement their own Auth Manager. This is very deliberate decision to both delegate it all out and leave the implementation for later - hopefully those who would want to have it will implement it in the way that nicely integrates with their own identify system and define their own rules. This is the learning we had From FAB that buoldinsuch custom roles and access rules is very complex to be done in general way, but it is rather straightforward if you have your own organisation and you could quickly implement Auth manager that does: If this_gruoup of users: 'pass it for all tenants' elif that group of users: 'pass it for this tenant derived from group name' else: ... This is what everyone who wants to do multi-tenancy will have to do initially - implement their own custom Auth Manager that will do all those decisions. It will actually speed up the case where some users who are determined will be able to do it (following Airflow as a Platform theme) and overtime someone will implement the KeyCloak Auth Manager that will add similar features. All the needed filtering capabilities will be there in AIP-56 based Auth Manager to use and in AIP-67 we are NOT implementing multi-tenancy for FAB and we will only implement a hard-coded Demo Auth Manager the we will be able to show how to implement your own integrating with your system. The learning from FAB is that you should never, ever built and base your application and tie it with role system based on complex rule-based authentication and authorisation system, because soon you will find that it is not flexible enough - and it will take years to unentangle. So we opted in for Authentication + Authorisation API that can be implemented in many ways - simple for simple cases , complex for complex ones and even someone could come up with a generic implementation. Using AIP-56 you can define your rules and roles in as flexible or as opinionated way as you want. This was all extremely well thought and we deliberately implemented AIP-56 in such way to make it 'paltform API' that users can implement it - and multi-tenancy is also the way to battle test the API for AIP-56 because for quite some time it will be the only way to implement multi-tenancy. So In a way multi-tenancy AIP-67 is a way to make our users who want to use it, to also have to deliberately implement Auth Manager - and that's all been planned and built in the way we designed all those AIPs. J. pt., 15 mar 2024, 08:03 użytkownik Amogh Desai <amoghdesai....@gmail.com<mailto:amoghdesai....@gmail.com>> napisał: @Mehta, Shubham<mailto:shu...@amazon.com> Thanks a lot for the case study, it helps to visualise the AIP better. I would like to bring up a point about the User Interface changes that I could not catch in the AIP: 1. If we want to let the data engineering team view across tenants, we should have the --tenant flag run with multiple values (?) Not sure, if that is in scope or if it was considered 2. We need to have the user interface provide some sort of a filter to filter out on tenants and their resources (DAGs, DAG runs, connections, variables, etc.) Otherwise, seeing tons of these resources from every team would be overwhelming and impractical Thanks & Regards, Amogh Desai On Fri, Mar 15, 2024 at 5:52 AM Adam Dutko <a...@runbymany.com<mailto:a...@runbymany.com>> wrote: Shubham, I typically lurk on this list. I find your example of [Rocket] and J's request for more voices a good opportunity to speak up. As a platform/data engineer I see the utility of scoped access to multiple tenants. Not only would this help with troubleshooting and optimization like you mention but it would be useful to reference when discussing Airflow with the CISO, CFO and CTO. There are "hacks" to fix things like Airflow dashboard scroll (TM) - a lot of DAGs everyone" has access to, environment partitioning (put it in a new tenant), scoped access to certain areas of the system and particular tenant components etc. I admittedly don't know if other concerns may arise related to performance and user interface design, only time will tell. Thank you for all you do for the Airflow community J. I hope the above proves helpful in advancing this conversation. -Adam ________________________________ From: Mehta, Shubham <shu...@amazon.com.INVALID> Sent: Thursday, March 14, 2024 3:21:12 PM To: users@airflow.apache.org<mailto:users@airflow.apache.org> <users@airflow.apache.org<mailto:users@airflow.apache.org>>; d...@airflow.apache.org<mailto:d...@airflow.apache.org> <d...@airflow.apache.org<mailto:d...@airflow.apache.org>> Subject: Re: [DISCUSS] DRAFT AIP-67 Multi-tenant deployment of Airflow components Hi folks, Firstly, thanks Jarek for putting together such a thorough and well-thought-out proposal. I am very much in support of the multi-tenancy proposal. Having discussed this with over 30 customers (AWS and non-AWS), there's a clear desire to shift focus from the complex management of multiple Airflow environments to enhancing their capabilities, such as enabling data quality checks and lineage. This proposal is a significant step towards achieving that goal. Acknowledging that not every Airflow user has enough time to thoroughly review the AIP, I have drafted a user scenario that encapsulates what's possible with the implementation of multi-tenancy support: ---- Scenario: Multi-Tenancy in Apache Airflow at [Rocket] ---- [Rocket], a leading [mobile gaming platform], has adeptly structured its cloud operations using Apache Airflow to provide an efficient and secure multi-tenant environment for orchestrating their complex workflows. This approach caters to the diverse needs of their three main user groups: the Data Engineering team, the Data Science team, and the Data Analytics team. All teams share basic Airflow components like the Scheduler and Webserver, providing centralized management with shared cost. Each team has its own distinct tenant cluster, offering self-sufficiency, flexibility, and isolation. The Data Engineering team builds ETL/ELT pipelines and produces user profile, telemetry, and marketing data. The Analytics team works with marketing data and user information to build comprehensive dashboards. The Data Science team uses Kubernetes as their execution environment for heavy-duty machine learning tasks, producing a churn prediction dataset. Members of each team can only see and work with their own workflows. However, Data engineers are granted access to all tenants, enabling them to assist with DAG troubleshooting and optimization across all teams. Upon logging in, users are presented with a tenant-specific view, displaying only the relevant DAGs and artifacts. For those with multi-tenant access, seamless navigation between different tenant views is available without the need for re-authentication. This setup lets each team work independently with their own tools and data, while also getting help from data engineers when needed. It's secure, efficient, and user-friendly. Image: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fimgur.com%2Fgallery%2FuQNqiVc&data=05%7C02%7Cadam%40runbymany.com%7Ca5ad7fc69319431a5e7b08dc445bf26b%7C33632a8512c2443d9b064cfa3bf99965%7C0%7C0%7C638460408922495976%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=sLtUKZXCuHm1FKYE40Z6hx1ovbBjBfy14jhPO89BtYk%3D&reserved=0<https://imgur.com/gallery/uQNqiVc> (highly recommend reviewing the image to understand the underlying setup) ----------------------------------------------------------------------------------- I’d suggest that interested Airflow users review the scenario and share your support or concerns on this concept in this thread or AIP. For those interested in diving deeper into the details, the AIP is available here - https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FAIRFLOW%2FAIP-67%2BMulti-tenant%2Bdeployment%2Bof%2BAirflow%2Bcomponents&data=05%7C02%7Cadam%40runbymany.com%7Ca5ad7fc69319431a5e7b08dc445bf26b%7C33632a8512c2443d9b064cfa3bf99965%7C0%7C0%7C638460408922503020%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=xCpLUSx48MIbsM3PhdCVLC4K2xMIgZPr1oSsTn0WX8M%3D&reserved=0<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-tenant+deployment+of+Airflow+components> Thanks Shubham Product Manager - Amazon MWAA From: Jarek Potiuk <ja...@potiuk.com<mailto:ja...@potiuk.com>> Reply-To: "users@airflow.apache.org<mailto:users@airflow.apache.org>" <users@airflow.apache.org<mailto:users@airflow.apache.org>> Date: Monday, March 11, 2024 at 4:05 PM To: "d...@airflow.apache.org<mailto:d...@airflow.apache.org>" <d...@airflow.apache.org<mailto:d...@airflow.apache.org>>, "users@airflow.apache.org<mailto:users@airflow.apache.org>" <users@airflow.apache.org<mailto:users@airflow.apache.org>> Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [DISCUSS] DRAFT AIP-67 Multi-tenant deployment of Airflow components CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le contenu ne présente aucun risque. I have iterated and already got a LOT of comments from a LOT of people (Thanks everyone who spent time on it ). I'd say the document is out of draft already, it very much describes the idea of multi-tenancy that I hope we will be voting on some time in the future. Taking into account that ~ 30% of people in our survey said they want "mutl-tenancy" - what I am REALLY interested in is to get honest feedback about the proposal. Manly: *"Is this the multi-tenancy you were looking for?" Or were you looking for different droids (err, tenants) maybe?. I do not want to exercise my Jedi skills to influence your opinion, that's why the document is there (and some people say it's nice, readable and pretty complete) so that you can judge yourself and give feedback. The document is here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FAIRFLOW%2FAIP-67%2BMulti-tenant%2Bdeployment%2Bof%2BAirflow%2Bcomponents&data=05%7C02%7Cadam%40runbymany.com%7Ca5ad7fc69319431a5e7b08dc445bf26b%7C33632a8512c2443d9b064cfa3bf99965%7C0%7C0%7C638460408922506692%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=z%2FRmroqdvn55ZGRfacDvc8FBbBct0q6gsLawOMFRp58%3D&reserved=0<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-tenant+deployment+of+Airflow+components> Feel free to comment here, or in the document. I would love to hear more voices, and have some ideas what to do next to validate the idea, so please - engage for now - but also expect some follow-ups. J. On Wed, Mar 6, 2024 at 9:16 AM Jarek Potiuk <ja...@potiuk.com<mailto:ja...@potiuk.com><mailto:ja...@potiuk.com<mailto:ja...@potiuk.com>>> wrote: Sooo.. Seems that it's an AIP time :D I've just published a Draft of AIP-67: Multi-tenant deployment of Airflow components https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FAIRFLOW%2F%255BDRAFT%255D%2BAIP-67%2BMulti-tenant%2Bdeployment%2Bof%2BAirflow%2Bcomponents&data=05%7C02%7Cadam%40runbymany.com%7Ca5ad7fc69319431a5e7b08dc445bf26b%7C33632a8512c2443d9b064cfa3bf99965%7C0%7C0%7C638460408922513021%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=SEx34Mse4xix4u7W1U8pi9HyV%2FOIfHiMsewzhd5kroM%3D&reserved=0<https://cwiki.apache.org/confluence/display/AIRFLOW/%5BDRAFT%5D+AIP-67+Multi-tenant+deployment+of+Airflow+components> This AIP is a bit lighter in detail than the others you could see from Jed , Nikolas and Maciej. This is really a DRAFT / High Level idea of Multi-Tenancy that could be implemented as the follow-up after previous steps of Multi-Tenancy implemented (or being implemented) right now. I decided to - rather than describe all the details now - focus on the concept of Multitenancy that I wanted to propose. Most of all explaining the concept, comparing it to current ways of achieving some forms of multi-tenancy and showing benefits and drawbacks of the solution and connected costs (i.e. what complexity we need to add to achieve it). When thinking about Multi-tenancy, I realized few things: * everyone might understand multi-tenancy differently * some forms of multi-tenancy are achievable even today * but - most of all - I started to question myself "Is this what we can do, enough for some, sufficiently numerous groups of users to call it a useful feature for them". So before we get into more details - my aim is to make sure we are all at the same page on what we CAN do as a multi-tenancy, and eventually to decide whether we SHOULD do it. Have fun. Bring in comments and feedback. More about all the currently active AIPs at today's Town Hall BTW. Do you think it's a surprise that 5 AIPS were announced just before the Town Hall? I think not :D J.