Herewith a more fleshed out example: An example of a *build.gradle.kts* file:
plugins { id("java") } val sparkJarsDir = objects.directoryProperty().convention(layout.buildDirectory.dir("sparkJars")) repositories { mavenCentral() } val sparkJars: Configuration by configurations.creating { isCanBeResolved = true isCanBeConsumed = false } dependencies { sparkJars("com.fasterxml.jackson.core:jackson-databind:2.18.0") } val copySparkJars by tasks.registering(Copy::class) { group = "build" description = "Copies the appropriate jars to the configured spark jars directory" from(sparkJars) into(sparkJarsDir) } Now, the *Dockerfile*: FROM spark:3.5.3-scala2.12-java17-ubuntu USER root COPY --chown=spark:spark build/sparkJars/* "$SPARK_HOME/jars/" USER spark Kind regards, Damien On Tue, Oct 15, 2024 at 4:19 PM Damien Hawes <marley.ha...@gmail.com> wrote: > The simplest solution that I have found in solving this was to use Gradle > (or Maven, if you prefer), and list the dependencies that I want copied to > $SPARK_HOME/jars as project dependencies. > > Summary of steps to follow: > > 1. Using your favourite build tool, declare a dependency on your required > packages. > 2. Write your Dockerfile, with or without the Spark binaries inside it. > 3. Using your build tool to copy the dependencies to a location that the > Docker daemon can access. > 4. Copy the dependencies into the correct directory. > 5. Ensure those files have the correct permissions. > > In my opinion, it is pretty easy to do this with Gradle. > > Op di 15 okt. 2024 15:28 schreef Nimrod Ofek <ofek.nim...@gmail.com>: > >> Hi all, >> >> I am creating a base Spark image that we are using internally. >> We need to add some packages to the base image: >> spark:3.5.1-scala2.12-java17-python3-r-ubuntu >> >> Of course I do not want to Start Spark with --packages "..." - as it is >> not efficient at all - I would like to add the needed jars to the image. >> >> Ideally, I would have add to my image something that will add the needed >> packages - something like: >> >> RUN $SPARK_HOME/bin/add-packages "..." >> >> But AFAIK there is no such option. >> >> Other than running Spark to add those packages and then creating the >> image - or running Spark always with --packages "..." - what can I do? >> Is there a way to run just the code that is run by the --package command >> - without running Spark, so I can add the needed dependencies to my image? >> >> I am sure this is something that I am not the only one nor the first one >> to encounter... >> >> Thanks! >> Nimrod >> >> >> >