krisztiansala opened a new issue, #56385:
URL: https://github.com/apache/spark/issues/56385

   ## Summary
   
   When using `df.write.format("avro").save(path)` on **Dataproc Serverless 
runtime 3.0** (Spark 4.0.1, Scala 2.13), every avro write fails with:
   
   ```
   java.lang.NoClassDefFoundError: scala/collection/immutable/StringOps
       at 
org.apache.spark.sql.avro.AvroFileFormat.supportFieldName(AvroFileFormat.scala:163)
       at 
org.apache.spark.sql.execution.datasources.DataSourceUtils$.$anonfun$checkFieldNames$1(DataSourceUtils.scala:74)
       ...
   Caused by: java.lang.ClassNotFoundException: 
scala.collection.immutable.StringOps
   ```
   
   ## Root cause
   
   `scala.collection.immutable.StringOps` exists as a **class** in Scala 2.12 
but was moved to `scala.collection.StringOps` in Scala 2.13 — 
`scala.collection.immutable.StringOps` is only a type alias (no `.class` file) 
in 2.13.
   
   The `AvroFileFormat.supportFieldName` method referenced in the stack trace 
is **not present in `spark-avro_2.13-4.0.0.jar`** from Maven Central (Spark 4.0 
migrated spark-avro to DataSource V2). The class loading from the Dataproc 
Serverless runtime 3.0's internal JAR bundle, which contains a `AvroFileFormat` 
compiled against **Scala 2.12** while the runtime stdlib is Scala 2.13.
   
   In other words: the runtime ships a Scala 2.12-compiled V1 compatibility 
shim for `AvroFileFormat` in a Scala 2.13 environment, causing class loading to 
fail at the first String operation inside the shim.
   
   ## Reproduction
   
   On Dataproc Serverless runtime 3.0 (Spark 4.0.1), submit any PySpark batch 
that writes a DataFrame in avro format:
   
   ```python
   df.write.mode("overwrite").format("avro").save("gs://my-bucket/output/")
   ```
   
   Fails immediately with the `ClassNotFoundException` above.
   
   - Workaround: use runtime 2.3 (Spark 3.5) instead — avro writes succeed.
   - Supplying an external `spark-avro_2.13-4.0.0.jar` does not help; the 
runtime's internal Scala 2.12 `AvroFileFormat` is still picked up by 
`DataSourceUtils.checkFieldNames`.
   
   ## Environment
   
   - Dataproc Serverless runtime: **3.0.13** (latest as of 2026-06)
   - Spark version: **4.0.1**
   - Scala runtime: **2.13** (confirmed by runtime 3.0 docs)
   - External spark-avro JAR: `spark-avro_2.13-4.0.0` (does NOT contain 
`AvroFileFormat` — only V2 classes in `org.apache.spark.sql.v2.avro.*`)
   - Runtime 2.3 (Spark 3.5, Scala 2.13) with `spark-avro_2.13-3.5.5.jar`: 
**works correctly**
   
   ## Expected behavior
   
   Avro format writes should work on Dataproc Serverless runtime 3.0 (Spark 
4.0.1 + Scala 2.13) without ClassNotFoundException.
   
   ## Suggested fix
   
   Ensure the `AvroFileFormat` V1 compatibility shim bundled inside the Spark 
4.0 / Dataproc runtime 3.0 distribution is compiled against **Scala 2.13** 
(referencing `scala.collection.StringOps`, not 
`scala.collection.immutable.StringOps`).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to