Re: Spark SQL, Hive Parquet data types

2015-02-23 Thread The Watcher
Yes, recently we improved ParquetRelation2 quite a bit. Spark SQL uses its own Parquet support to read partitioned Parquet tables declared in Hive metastore. Only writing to partitioned tables is not covered yet. These improvements will be included in Spark 1.3.0. Just created SPARK-5948 to

Re: Spark SQL, Hive Parquet data types

2015-02-23 Thread Cheng Lian
Yes, recently we improved ParquetRelation2 quite a bit. Spark SQL uses its own Parquet support to read partitioned Parquet tables declared in Hive metastore. Only writing to partitioned tables is not covered yet. These improvements will be included in Spark 1.3.0. Just created SPARK-5948 to

Re: Spark SQL, Hive Parquet data types

2015-02-23 Thread Cheng Lian
Ah, sorry for not being clear enough. So now in Spark 1.3.0, we have two Parquet support implementations, the old one is tightly coupled with the Spark SQL framework, while the new one is based on data sources API. In both versions, we try to intercept operations over Parquet tables

Re: Spark SQL, Hive Parquet data types

2015-02-20 Thread Cheng Lian
For the second question, we do plan to support Hive 0.14, possibly in Spark 1.4.0. For the first question: 1. In Spark 1.2.0, the Parquet support code doesn’t support timestamp type, so you can’t. 2. In Spark 1.3.0, timestamp support was added, also Spark SQL uses its own Parquet support

Re: Spark SQL, Hive Parquet data types

2015-02-20 Thread The Watcher
1. In Spark 1.3.0, timestamp support was added, also Spark SQL uses its own Parquet support to handle both read path and write path when dealing with Parquet tables declared in Hive metastore, as long as you’re not writing to a partitioned table. So yes, you can. Ah, I had

Re: Spark SQL, Hive Parquet data types

2015-02-20 Thread yash datta
For the old parquet path (available in 1.2.1) , i made a few changes for being able to read/write to a table partitioned on timestamp type column https://github.com/apache/spark/pull/4469 On Fri, Feb 20, 2015 at 8:28 PM, The Watcher watche...@gmail.com wrote: 1. In Spark 1.3.0,

Spark SQL, Hive Parquet data types

2015-02-19 Thread The Watcher
Still trying to get my head around Spark SQL Hive. 1) Let's assume I *only* use Spark SQL to create and insert data into HIVE tables, declared in a Hive meta-store. Does it matter at all if Hive supports the data types I need with Parquet, or is all that matters what Catalyst spark's parquet