Herewith a more fleshed out example:

An example of a *build.gradle.kts* file:

plugins {
    id("java")
}

val sparkJarsDir =
objects.directoryProperty().convention(layout.buildDirectory.dir("sparkJars"))

repositories {
    mavenCentral()
}

val sparkJars: Configuration by configurations.creating {
    isCanBeResolved = true
    isCanBeConsumed = false
}

dependencies {
    sparkJars("com.fasterxml.jackson.core:jackson-databind:2.18.0")
}

val copySparkJars by tasks.registering(Copy::class) {
    group = "build"
    description = "Copies the appropriate jars to the configured spark
jars directory"
    from(sparkJars)
    into(sparkJarsDir)
}

Now, the *Dockerfile*:

FROM spark:3.5.3-scala2.12-java17-ubuntu

USER root

COPY --chown=spark:spark build/sparkJars/* "$SPARK_HOME/jars/"

USER spark


Kind regards,

Damien

On Tue, Oct 15, 2024 at 4:19 PM Damien Hawes <marley.ha...@gmail.com> wrote:

> The simplest solution that I have found in solving this was to use Gradle
> (or Maven, if you prefer), and list the dependencies that I want copied to
> $SPARK_HOME/jars as project dependencies.
>
> Summary of steps to follow:
>
> 1. Using your favourite build tool, declare a dependency on your required
> packages.
> 2. Write your Dockerfile, with or without the Spark binaries inside it.
> 3. Using your build tool to copy the dependencies to a location that the
> Docker daemon can access.
> 4. Copy the dependencies into the correct directory.
> 5. Ensure those files have the correct permissions.
>
> In my opinion, it is pretty easy to do this with Gradle.
>
> Op di 15 okt. 2024 15:28 schreef Nimrod Ofek <ofek.nim...@gmail.com>:
>
>> Hi all,
>>
>> I am creating a base Spark image that we are using internally.
>> We need to add some packages to the base image:
>> spark:3.5.1-scala2.12-java17-python3-r-ubuntu
>>
>> Of course I do not want to Start Spark with --packages "..." - as it is
>> not efficient at all - I would like to add the needed jars to the image.
>>
>> Ideally, I would have add to my image something that will add the needed
>> packages - something like:
>>
>> RUN $SPARK_HOME/bin/add-packages "..."
>>
>> But AFAIK there is no such option.
>>
>> Other than running Spark to add those packages and then creating the
>> image - or running Spark always with --packages "..."  - what can I do?
>> Is there a way to run just the code that is run by the --package command
>> - without running Spark, so I can add the needed dependencies to my image?
>>
>> I am sure this is something that I am not the only one nor the first one
>> to encounter...
>>
>> Thanks!
>> Nimrod
>>
>>
>>
>

Reply via email to