krisztiansala opened a new issue, #56385:
URL: https://github.com/apache/spark/issues/56385
## Summary
When using `df.write.format("avro").save(path)` on **Dataproc Serverless
runtime 3.0** (Spark 4.0.1, Scala 2.13), every avro write fails with:
```
java.lang.NoClassDefFoundError: scala/collection/immutable/StringOps
at
org.apache.spark.sql.avro.AvroFileFormat.supportFieldName(AvroFileFormat.scala:163)
at
org.apache.spark.sql.execution.datasources.DataSourceUtils$.$anonfun$checkFieldNames$1(DataSourceUtils.scala:74)
...
Caused by: java.lang.ClassNotFoundException:
scala.collection.immutable.StringOps
```
## Root cause
`scala.collection.immutable.StringOps` exists as a **class** in Scala 2.12
but was moved to `scala.collection.StringOps` in Scala 2.13 —
`scala.collection.immutable.StringOps` is only a type alias (no `.class` file)
in 2.13.
The `AvroFileFormat.supportFieldName` method referenced in the stack trace
is **not present in `spark-avro_2.13-4.0.0.jar`** from Maven Central (Spark 4.0
migrated spark-avro to DataSource V2). The class loading from the Dataproc
Serverless runtime 3.0's internal JAR bundle, which contains a `AvroFileFormat`
compiled against **Scala 2.12** while the runtime stdlib is Scala 2.13.
In other words: the runtime ships a Scala 2.12-compiled V1 compatibility
shim for `AvroFileFormat` in a Scala 2.13 environment, causing class loading to
fail at the first String operation inside the shim.
## Reproduction
On Dataproc Serverless runtime 3.0 (Spark 4.0.1), submit any PySpark batch
that writes a DataFrame in avro format:
```python
df.write.mode("overwrite").format("avro").save("gs://my-bucket/output/")
```
Fails immediately with the `ClassNotFoundException` above.
- Workaround: use runtime 2.3 (Spark 3.5) instead — avro writes succeed.
- Supplying an external `spark-avro_2.13-4.0.0.jar` does not help; the
runtime's internal Scala 2.12 `AvroFileFormat` is still picked up by
`DataSourceUtils.checkFieldNames`.
## Environment
- Dataproc Serverless runtime: **3.0.13** (latest as of 2026-06)
- Spark version: **4.0.1**
- Scala runtime: **2.13** (confirmed by runtime 3.0 docs)
- External spark-avro JAR: `spark-avro_2.13-4.0.0` (does NOT contain
`AvroFileFormat` — only V2 classes in `org.apache.spark.sql.v2.avro.*`)
- Runtime 2.3 (Spark 3.5, Scala 2.13) with `spark-avro_2.13-3.5.5.jar`:
**works correctly**
## Expected behavior
Avro format writes should work on Dataproc Serverless runtime 3.0 (Spark
4.0.1 + Scala 2.13) without ClassNotFoundException.
## Suggested fix
Ensure the `AvroFileFormat` V1 compatibility shim bundled inside the Spark
4.0 / Dataproc runtime 3.0 distribution is compiled against **Scala 2.13**
(referencing `scala.collection.StringOps`, not
`scala.collection.immutable.StringOps`).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]