Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21376#discussion_r189459239
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonInferSchema.scala
---
@@ -66,9 +67,13 @@ private[sql] object JsonInferSchema {
s"Parse Mode: ${FailFastMode.name}.", e)
}
}
- }
- }.fold(StructType(Nil))(
- compatibleRootType(columnNameOfCorruptRecord, parseMode))
+ }.reduceOption(typeMerger).toIterator
+ }
+
+ // Here we get RDD local iterator then fold, instead of calling
`RDD.fold` directly, because
+ // `RDD.fold` will run the fold function in DAGScheduler event loop
thread, which may not have
+ // active SparkSession and `SQLConf.get` may point to the wrong
configs.
+ val rootType =
mergedTypesFromPartitions.toLocalIterator.fold(StructType(Nil))(typeMerger)
--- End diff --
can the same problem happen also in other places? this seems to be quite a
tricky issue which may happen in general. Can we avoid it somehow?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]