Two ideas you could try: You can try spark.driver.extraClassPath as well. Spark loads the user's jar in a child classloader, so Spark/Yarn/Hadoop can only see your classes reflectively. Hadoop's Configuration should use the thread ctx classloader, and Spark should set that to the loader that loads your jar. The extraClassPath option just adds jars directly to the Java command that creates the driver/executor.
I can't immediately tell how your error might arise, unless there is some timing issue with the Spark and Hadoop setup. Can you share the full stacktrace of the ClassNotFound exception? That might tell us when Hadoop is looking up this class. Good luck! - scrypso On Tue, Dec 13, 2022, 17:05 Hariharan <hariharan...@gmail.com> wrote: > Missed to mention it above, but just to add, the error is coming from the > driver. I tried using *--driver-class-path /path/to/my/jar* as well, but > no luck. > > Thanks! > > On Mon, Dec 12, 2022 at 4:21 PM Hariharan <hariharan...@gmail.com> wrote: > >> Hello folks, >> >> I have a spark app with a custom implementation of >> *fs.s3a.s3.client.factory.impl* which is packaged into the same jar. >> Output of *jar tf* >> >> *2620 Mon Dec 12 11:23:00 IST 2022 aws/utils/MyS3ClientFactory.class* >> >> However when I run the my spark app with spark-submit in cluster mode, it >> fails with the following error: >> >> *java.lang.RuntimeException: java.lang.RuntimeException: >> java.lang.ClassNotFoundException: Class aws.utils.MyS3ClientFactory not >> found* >> >> I tried: >> 1. passing in the jar to the *--jars* option (with the local path) >> 2. Passing in the jar to *spark.yarn.jars* option with an HDFS path >> >> but still the same error. >> >> Any suggestions on what I'm missing? >> >> Other pertinent details: >> Spark version: 3.3.0 >> Hadoop version: 3.3.4 >> >> Command used to run the app >> */spark/bin/spark-submit --class MyMainClass --deploy-mode cluster >> --master yarn --conf spark.executor.instances=6 /path/to/my/jar* >> >> TIA! >> >