We have an application working in Spark 1.6. It uses the databricks csv
library for the output format when writing out.

I'm attempting an upgrade to Spark 2. When writing with both the native
DataFrameWriter#csv() method and with first specifying the
"com.databricks.spark.csv" format (I suspect underlying format is the same
but I don't know how to verify), I get the following error:

java.lang.UnsupportedOperationException: CSV data source does not support
struct<[bunch of field names and types]> data type

There are 20 fields, mostly plain strings with a couple of dates. The
source object is a Dataset[T] where T is a case class with various fields
The line just looks like: someDataset.write.csv(outputPath)

Googling returned this fairly recent pull request: https://mail-
archives.apache.org/mod_mbox/spark-commits/201605.mbox/%
3c65d35a72bd05483392857098a2635...@git.apache.org%3E

If I'm reading that correctly, the schema shows that each record has one
field of this complex struct type? And the validation thinks it's something
that it can't serialize. I would expect the schema to have a bunch of
fields in it matching the case class, so maybe there's something I'm
misunderstanding.

Efe

Reply via email to