Hi all,

This week, I tried upgrading to Spark 3.5.0, as it contained some fixes for spark-protobuf that I need for my project. However, my code is no longer running under Spark 3.5.0.

My build.sbt file is configured as follows:

val sparkV      = "3.5.0"
val hadoopV     = "3.3.6"

libraryDependencies ++= Seq(
  "org.apache.spark"     %% "spark-core"       % sparkV  % "provided",
  "org.apache.spark"     %% "spark-sql"        % sparkV  % "provided",
  "org.apache.hadoop"    %  "hadoop-client"    % hadoopV % "provided",
  "org.apache.spark"     %% "spark-protobuf"   % sparkV,
)

I am using sbt-assembly to build a fat JAR, but I exclude Spark and Hadoop JARs to limit the assembled JAR size. Spark (and its dependencies) are supplied in our environment by the jars/ directory included in the the Spark distribution.

However, when running my application (which uses protobuf-java's CodedOutputStream for writing delimited protobuf files) with Spark 3.5.0, I now get the following error:

...
Caused by: java.lang.ClassNotFoundException: com.google.protobuf.CodedOutputStream
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
    ... 22 more

When inspecting the jars/ directory in the newest Spark release (spark-3.5.0-bin-hadoop3), I noticed the protobuf-java JAR was no longer included in the release, while it was present in Spark 3.4.1. My code seems to compile because protobuf-java is still a dependency of spark-core:3.5.0, but since the JAR is no longer included, the class cannot be found at runtime.

Is this expected/intentional behaviour? I was able to resolve the issue by manually adding protobuf-java as a dependency to my own project and including it in the fat JAR, but it seems weird to me that it is no longer shipped with Spark since the newest release. I also could not find any mention of this change in the release notes or elsewhere, but perhaps I missed something.

Thanks in advance for any help!

Cheers,
Gijs

Reply via email to