Yes, recently we improved ParquetRelation2 quite a bit. Spark SQL uses its
own Parquet support to read partitioned Parquet tables declared in Hive
metastore. Only writing to partitioned tables is not covered yet. These
improvements will be included in Spark 1.3.0.
Just created SPARK-5948 to
Yes, recently we improved ParquetRelation2 quite a bit. Spark SQL uses
its own Parquet support to read partitioned Parquet tables declared in
Hive metastore. Only writing to partitioned tables is not covered yet.
These improvements will be included in Spark 1.3.0.
Just created SPARK-5948 to
Ah, sorry for not being clear enough.
So now in Spark 1.3.0, we have two Parquet support implementations, the
old one is tightly coupled with the Spark SQL framework, while the new
one is based on data sources API. In both versions, we try to intercept
operations over Parquet tables
For the second question, we do plan to support Hive 0.14, possibly in
Spark 1.4.0.
For the first question:
1. In Spark 1.2.0, the Parquet support code doesn’t support timestamp
type, so you can’t.
2. In Spark 1.3.0, timestamp support was added, also Spark SQL uses its
own Parquet support
1. In Spark 1.3.0, timestamp support was added, also Spark SQL uses
its own Parquet support to handle both read path and write path when
dealing with Parquet tables declared in Hive metastore, as long as you’re
not writing to a partitioned table. So yes, you can.
Ah, I had
For the old parquet path (available in 1.2.1) , i made a few changes for
being able to read/write to a table partitioned on timestamp type column
https://github.com/apache/spark/pull/4469
On Fri, Feb 20, 2015 at 8:28 PM, The Watcher watche...@gmail.com wrote:
1. In Spark 1.3.0,
Still trying to get my head around Spark SQL Hive.
1) Let's assume I *only* use Spark SQL to create and insert data into HIVE
tables, declared in a Hive meta-store.
Does it matter at all if Hive supports the data types I need with Parquet,
or is all that matters what Catalyst spark's parquet