I'm trying to process data using Spark and then query it using Drill.
When I create a parquet file using a Spark 1.6.1 job, and then try to query
it in Drill 1.8.0, I notice that the dates are in an unknown format. All
string and other types seem fine. I'm using the java.sql.Date class because
I get "unsupported" errors when I use java.util.Date and try to save in
parquet format. If I create the parquet file using CTAS in Drill, I don't
have this problem; this is strictly a problem exchanging data between the
two products.
For example, if I create an RDD of dates, convert that to a DF, then save
that DF, and read the file back into Spark, it sees the correct values.
...
76 case class foo(dt: java.sql.Date)
80 val format = new java.text.SimpleDateFormat("MM/dd/yyyy")
81 val dates = test.map(x => foo( new java.sql.Date(
format.parse(x).getTime ) ) )
83 val df = dates.toDF
85 df.write.save("blah/test.parquet")
86 val df2 = sqlContext.read.parquet("blah/test.parquet")
87 df2.first
res10: org.apache.spark.sql.Row = [2016-06-08]
However, If I query the file using Drill, I get a different result:
select * from blah limit 1;
+-------------+---------------+
| dt | dir0 |
+-------------+---------------+
| 349-06-19 | test.parquet |
Any idea what I need to do in order to be able to query dates in
Spark-created parquet files with Drill?
Thanks