Two ideas you could try:

You can try spark.driver.extraClassPath as well. Spark loads the user's jar
in a child classloader, so Spark/Yarn/Hadoop can only see your classes
reflectively. Hadoop's Configuration should use the thread ctx classloader,
and Spark should set that to the loader that loads your jar. The
extraClassPath option just adds jars directly to the Java command that
creates the driver/executor.

I can't immediately tell how your error might arise, unless there is some
timing issue with the Spark and Hadoop setup. Can you share the full
stacktrace of the ClassNotFound exception? That might tell us when Hadoop
is looking up this class.

Good luck!
- scrypso


On Tue, Dec 13, 2022, 17:05 Hariharan <hariharan...@gmail.com> wrote:

> Missed to mention it above, but just to add, the error is coming from the
> driver. I tried using *--driver-class-path /path/to/my/jar* as well, but
> no luck.
>
> Thanks!
>
> On Mon, Dec 12, 2022 at 4:21 PM Hariharan <hariharan...@gmail.com> wrote:
>
>> Hello folks,
>>
>> I have a spark app with a custom implementation of
>> *fs.s3a.s3.client.factory.impl* which is packaged into the same jar.
>> Output of *jar tf*
>>
>> *2620 Mon Dec 12 11:23:00 IST 2022 aws/utils/MyS3ClientFactory.class*
>>
>> However when I run the my spark app with spark-submit in cluster mode, it
>> fails with the following error:
>>
>> *java.lang.RuntimeException: java.lang.RuntimeException:
>> java.lang.ClassNotFoundException: Class aws.utils.MyS3ClientFactory not
>> found*
>>
>> I tried:
>> 1. passing in the jar to the *--jars* option (with the local path)
>> 2. Passing in the jar to *spark.yarn.jars* option with an HDFS path
>>
>> but still the same error.
>>
>> Any suggestions on what I'm missing?
>>
>> Other pertinent details:
>> Spark version: 3.3.0
>> Hadoop version: 3.3.4
>>
>> Command used to run the app
>> */spark/bin/spark-submit --class MyMainClass --deploy-mode cluster
>> --master yarn  --conf spark.executor.instances=6   /path/to/my/jar*
>>
>> TIA!
>>
>

Reply via email to