1. yes
2. yes
3. You need to build your image:
https://airflow.apache.org/docs/docker-stack/build.html has lots of
examples, step by step guides and discusses all ways you can approach
it in detail. You likely need to add apache-airflow-provider-spark
(https://airflow.apache.org/docs/apache-airflow-providers-apache-spark/stable/index.html)
as dependency - maybe it needs some extra dependencies, but you will
find out by trying.
4. Hard to say. You have to look at available ways of running spark
jobs and existing operators - some of them might need spark binaries I
guess, some of them might use pyspark. Just look at the docs and try
it yourself if unsure. In either case - the link above explains how
you can extend the image by adding both system dependencies and python
dependencies, so you need to follow those.

J,

On Tue, Jan 24, 2023 at 8:18 PM Sahib Aulakh <[email protected]> wrote:
>
> I need some basic information on the Docker images that constitute an Airflow 
> installation on Kubernetes.
>
> If I turn off statsd (that is, I do not need to aggregate Airflow stats)  and 
> postgresql (that is, I am using a separate RDS instance for Airflow 
> metastore), then I see that only one Docker image (apache/airflow:2.5.0) is 
> needed for a successful deployment. Is this correct observation?
> I am assuming that the apache/airflow:2.5.0 image is being used for launching 
> all three of web server, scheduler and triggerer pods. Please confirm.
> If I need to use an Airflow Operator that is not included in the 
> apache/airflow:2.5.0 Docker image, how do I go about doing this? 
> Specifically, I am interested in the Spark Submit Operator for launching 
> Apache Spark jobs in Kubernetes through Airflow. Does this mean that I need 
> to create another Docker image with Spark Submit Operator installed in it?
> Finally, how do I access the Spark binaries for launching the spark-submit 
> command from within a DAG task? Does this mean that Spark binaries need to be 
> included in the Docker image?
>
> Many thanks.

Reply via email to