I need some basic information on the Docker images that constitute an
Airflow installation on Kubernetes.


   1. If I turn off statsd (that is, I do not need to aggregate Airflow
   stats)  and postgresql (that is, I am using a separate RDS instance for
   Airflow metastore), then I see that only one Docker image (
   apache/airflow:2.5.0) is needed for a successful deployment. Is this
   correct observation?
   2. I am assuming that the apache/airflow:2.5.0 image is being used for
   launching all three of web server, scheduler and triggerer pods. Please
   confirm.
   3. If I need to use an Airflow Operator that is not included in the
   apache/airflow:2.5.0 Docker image, how do I go about doing this?
   Specifically, I am interested in the Spark Submit Operator for launching
   Apache Spark jobs in Kubernetes through Airflow. Does this mean that I need
   to create another Docker image with Spark Submit Operator installed in it?
   4. Finally, how do I access the Spark binaries for launching the
   spark-submit command from within a DAG task? Does this mean that Spark
   binaries need to be included in the Docker image?

Many thanks.

Reply via email to