Re: Caching parquet table (with GZIP) on Spark 1.3.1

2015-06-07 Thread Cheng Lian
Is it possible that some Parquet files of this data set have different schema as others? Especially those ones reported in the exception messages. One way to confirm this is to use [parquet-tools] [1] to inspect these files: $ parquet-schema Cheng [1]: https://github.com/apache/parquet

Caching parquet table (with GZIP) on Spark 1.3.1

2015-05-26 Thread shshann
we tried to cache table through hiveCtx = HiveContext(sc) hiveCtx.cacheTable("table name") as described on Spark 1.3.1's document and we're on CDH5.3.0 with Spark 1.3.1 built with Hadoop 2.6 following error message would occur if we tried to cache table with parquet format & GZIP though we're not