And just to clarify. Those "slim" images are not at all "toothless". You
can actually do stuff with them :)

The 4 providers that are preinstalled are there:

apache-airflow-providers-ftp    | File Transfer Protocol (FTP)
https://tools.ietf.org/html/rfc114             | 2.1.2
apache-airflow-providers-http   | Hypertext Transfer Protocol (HTTP)
https://www.w3.org/Protocols/            | 2.1.2
apache-airflow-providers-imap   | Internet Message Access Protocol (IMAP)
https://tools.ietf.org/html/rfc3501 | 2.2.3
apache-airflow-providers-sqlite | SQLite https://www.sqlite.org/
                                   | 2.1.3

We could probably further slim them down but that would limit the
extensibility a bit and I consider 500 MB uncompressed as pretty "decent" -
it's ~ 130-160 MB of compressed data when you pull the image.

J.



On Sun, May 1, 2022 at 5:26 PM Jarek Potiuk <[email protected]> wrote:

> Hello everyone,
>
> TL;DR: I am looking for consensus on releasing "slim" versions of PROD
> images - ones that will be way smaller and contain no providers nor
> other extras and would be database-specific.
>
> Context:
>
> Now after we are done with some infra changes that were also released
> in 2.3.0 I came back to the issue raised in in
> https://github.com/apache/airflow/issues/20849 which was originally
> about "vanilla" image for Airflow, but I renamed the idea to "slim"
> image (following similar convention by various distro and Python
> providers). The issue itself explains why there is a need for such
> images.
>
> The idea is to have a very small "base" ("slim") image that users will
> be able to extend  - not only a "regular" (see the relation with
> "slim" :D ?)  image where we pre-install a set of providers and
> support multiple database backends.
>
> The "slim" images also have the advantage that we can use
> "no-constraints" dependencies with them - which means that in those
> images, the dependencies are "latest" that airflow supports even if
> some providers would limit the dependencies.
>
> I looked at what it would mean and really what it translates to is
> that we would have to push many more images.
>
> The bad news:
>
> We need to push matrix of 4 * 3 = 12 new "slim" images (plus some
> aliases for "latest")
> *  Python versions: 3.7, 3.8, 3.9, 3.10
> *  Database: postgres, mysql, mssql
>
> Postgres images would be additionally multiplatform (AMD64/ARM64) and
> for now MySQL and MsSQL would  be just AMD64 (until we add support for
> ARM for those).
> Those are plenty of images, but this is a rather normal approach if
> you look for a number of other images published by multiple
> "platform-like" products.
>
> The good news:
>
> We only need to do it at release time and we already have the right
> set of scripts and parameters to enable that. It will take a bit
> longer, but those images are much smaller and building and pushing
> them is WAY faster and smaller han the regular image.
>
> Some comparison:
>
> Size (uncompressed): Regular (1.1G), Slim (500MB)
> Time to build single image: Regular(6m), Slim (up to 3m)
>
> Overall the release process would take some 20 mins longer if we
> release the slim images (and I already made it a separate step so it
> should not block "regular" release).
>
> The very good news:
>
> I've actually prepared PR:
> https://github.com/apache/airflow/pull/23391 to add this feature
> (including the docs), and it's a very small change. It does not change
> any of the source code of airflow or Dockerfile, we basically need to
> extend our "dev" script to build and push images to ... build and push
> more images. I actually even .. prepared and pushed 2.3.0 images of
> airflow to my private dockerhub account so that everyone can see how
> it will look like.
>
> You can see it here:
>
> https://hub.docker.com/repository/docker/potiuk/airflow/tags?page=1&ordering=last_updated&name=2.3.0
>
> I **believe** those changes don't even need PMC votes for release, and
> this is more a procedural change than software release, so we
> **could** release the "slim" 2.3.0 images even now - so that they are
> available as of 2.3.0. I think even if we see that this is a welcome
> change (despite the complexity of our dockerhub images available) it
> could even be agreed to via lasy-consensus if we see consensus
> forming.
>
> J.
>

Reply via email to