And just to clarify. Those "slim" images are not at all "toothless". You can actually do stuff with them :)
The 4 providers that are preinstalled are there: apache-airflow-providers-ftp | File Transfer Protocol (FTP) https://tools.ietf.org/html/rfc114 | 2.1.2 apache-airflow-providers-http | Hypertext Transfer Protocol (HTTP) https://www.w3.org/Protocols/ | 2.1.2 apache-airflow-providers-imap | Internet Message Access Protocol (IMAP) https://tools.ietf.org/html/rfc3501 | 2.2.3 apache-airflow-providers-sqlite | SQLite https://www.sqlite.org/ | 2.1.3 We could probably further slim them down but that would limit the extensibility a bit and I consider 500 MB uncompressed as pretty "decent" - it's ~ 130-160 MB of compressed data when you pull the image. J. On Sun, May 1, 2022 at 5:26 PM Jarek Potiuk <[email protected]> wrote: > Hello everyone, > > TL;DR: I am looking for consensus on releasing "slim" versions of PROD > images - ones that will be way smaller and contain no providers nor > other extras and would be database-specific. > > Context: > > Now after we are done with some infra changes that were also released > in 2.3.0 I came back to the issue raised in in > https://github.com/apache/airflow/issues/20849 which was originally > about "vanilla" image for Airflow, but I renamed the idea to "slim" > image (following similar convention by various distro and Python > providers). The issue itself explains why there is a need for such > images. > > The idea is to have a very small "base" ("slim") image that users will > be able to extend - not only a "regular" (see the relation with > "slim" :D ?) image where we pre-install a set of providers and > support multiple database backends. > > The "slim" images also have the advantage that we can use > "no-constraints" dependencies with them - which means that in those > images, the dependencies are "latest" that airflow supports even if > some providers would limit the dependencies. > > I looked at what it would mean and really what it translates to is > that we would have to push many more images. > > The bad news: > > We need to push matrix of 4 * 3 = 12 new "slim" images (plus some > aliases for "latest") > * Python versions: 3.7, 3.8, 3.9, 3.10 > * Database: postgres, mysql, mssql > > Postgres images would be additionally multiplatform (AMD64/ARM64) and > for now MySQL and MsSQL would be just AMD64 (until we add support for > ARM for those). > Those are plenty of images, but this is a rather normal approach if > you look for a number of other images published by multiple > "platform-like" products. > > The good news: > > We only need to do it at release time and we already have the right > set of scripts and parameters to enable that. It will take a bit > longer, but those images are much smaller and building and pushing > them is WAY faster and smaller han the regular image. > > Some comparison: > > Size (uncompressed): Regular (1.1G), Slim (500MB) > Time to build single image: Regular(6m), Slim (up to 3m) > > Overall the release process would take some 20 mins longer if we > release the slim images (and I already made it a separate step so it > should not block "regular" release). > > The very good news: > > I've actually prepared PR: > https://github.com/apache/airflow/pull/23391 to add this feature > (including the docs), and it's a very small change. It does not change > any of the source code of airflow or Dockerfile, we basically need to > extend our "dev" script to build and push images to ... build and push > more images. I actually even .. prepared and pushed 2.3.0 images of > airflow to my private dockerhub account so that everyone can see how > it will look like. > > You can see it here: > > https://hub.docker.com/repository/docker/potiuk/airflow/tags?page=1&ordering=last_updated&name=2.3.0 > > I **believe** those changes don't even need PMC votes for release, and > this is more a procedural change than software release, so we > **could** release the "slim" 2.3.0 images even now - so that they are > available as of 2.3.0. I think even if we see that this is a welcome > change (despite the complexity of our dockerhub images available) it > could even be agreed to via lasy-consensus if we see consensus > forming. > > J. >
