Hi, I have an application that runs in a Spark-2.4.4 cluster and it transforms two RDD to DataFrame with `rdd.toDF()` then outputs them to file.
For slave resource usage optimization, the application executes the job in multi-thread. The code snippet looks like this: And I found that `toDF()` is not thread-safe. The application failed sometimes by `java.lang.UnsupportedOperationException`. You can reproduce it from the following code snippet (1% to happen, it's easier to happen when the case class has a large number of fields.): You may get a similar exception message: `Schema for type A is not supported` >From the error message, it was caused by `ScalaReflection.schemaFor()`. I had looked at the code, it seems like Spark uses Scala reflection to get the data type and as I know there is a concurrency issue in Scala reflection. SPARK-26555 <https://issues.apache.org/jira/browse/SPARK-26555> thread-safety-in-scala-reflection-with-type-matching <https://stackoverflow.com/questions/55150590/thread-safety-in-scala-reflection-with-type-matching> Should we fix it? I can not find any document about thread-safe in creating DataFrame. I had workarounded this by adding a lock when transforming RDD to DataFrame. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org