Ah, BTW, there is an issue, SPARK-16216, about printing dates and timestamps here. So please ignore the integer values for dates
2016-08-19 9:54 GMT+09:00 Hyukjin Kwon <gurwls...@gmail.com>: > Ah, sorry, I should have read this carefully. Do you mind if I ask your > codes to test? > > I would like to reproduce. > > > I just tested this by myself but I couldn't reproduce as below (is this > what your doing, right?): > > case class ClassData(a: String, b: Date) > > val ds: Dataset[ClassData] = Seq( > ("a", Date.valueOf("1990-12-13")), > ("a", Date.valueOf("1990-12-13")), > ("a", Date.valueOf("1990-12-13")) > ).toDF("a", "b").as[ClassData] > ds.write.csv("/tmp/data.csv") > spark.read.csv("/tmp/data.csv").show() > > prints as below: > > +---+----+ > |_c0| _c1| > +---+----+ > | a|7651| > | a|7651| > | a|7651| > +---+----+ > > > > 2016-08-19 9:27 GMT+09:00 Efe Selcuk <efema...@gmail.com>: > >> Thanks for the response. The problem with that thought is that I don't >> think I'm dealing with a complex nested type. It's just a dataset where >> every record is a case class with only simple types as fields, strings and >> dates. There's no nesting. >> >> That's what confuses me about how it's interpreting the schema. The >> schema seems to be one complex field rather than a bunch of simple fields. >> >> On Thu, Aug 18, 2016, 5:07 PM Hyukjin Kwon <gurwls...@gmail.com> wrote: >> >>> Hi Efe, >>> >>> If my understanding is correct, supporting to write/read complex types >>> is not supported because CSV format can't represent the nested types in its >>> own format. >>> >>> I guess supporting them in writing in external CSV is rather a bug. >>> >>> I think it'd be great if we can write and read back CSV in its own >>> format but I guess we can't. >>> >>> Thanks! >>> >>> On 19 Aug 2016 6:33 a.m., "Efe Selcuk" <efema...@gmail.com> wrote: >>> >>>> We have an application working in Spark 1.6. It uses the databricks csv >>>> library for the output format when writing out. >>>> >>>> I'm attempting an upgrade to Spark 2. When writing with both the native >>>> DataFrameWriter#csv() method and with first specifying the >>>> "com.databricks.spark.csv" format (I suspect underlying format is the same >>>> but I don't know how to verify), I get the following error: >>>> >>>> java.lang.UnsupportedOperationException: CSV data source does not >>>> support struct<[bunch of field names and types]> data type >>>> >>>> There are 20 fields, mostly plain strings with a couple of dates. The >>>> source object is a Dataset[T] where T is a case class with various fields >>>> The line just looks like: someDataset.write.csv(outputPath) >>>> >>>> Googling returned this fairly recent pull request: >>>> https://mail-archives.apache.org/mod_mbox/spark- >>>> commits/201605.mbox/%3C65d35a72bd05483392857098a2635cc2@git. >>>> apache.org%3E >>>> >>>> If I'm reading that correctly, the schema shows that each record has >>>> one field of this complex struct type? And the validation thinks it's >>>> something that it can't serialize. I would expect the schema to have a >>>> bunch of fields in it matching the case class, so maybe there's something >>>> I'm misunderstanding. >>>> >>>> Efe >>>> >>> >