I need some basic information on the Docker images that constitute an Airflow installation on Kubernetes.
1. If I turn off statsd (that is, I do not need to aggregate Airflow stats) and postgresql (that is, I am using a separate RDS instance for Airflow metastore), then I see that only one Docker image ( apache/airflow:2.5.0) is needed for a successful deployment. Is this correct observation? 2. I am assuming that the apache/airflow:2.5.0 image is being used for launching all three of web server, scheduler and triggerer pods. Please confirm. 3. If I need to use an Airflow Operator that is not included in the apache/airflow:2.5.0 Docker image, how do I go about doing this? Specifically, I am interested in the Spark Submit Operator for launching Apache Spark jobs in Kubernetes through Airflow. Does this mean that I need to create another Docker image with Spark Submit Operator installed in it? 4. Finally, how do I access the Spark binaries for launching the spark-submit command from within a DAG task? Does this mean that Spark binaries need to be included in the Docker image? Many thanks.
