Is it possible that some Parquet files of this data set have different
schema as others? Especially those ones reported in the exception messages.
One way to confirm this is to use [parquet-tools] [1] to inspect these
files:
$ parquet-schema
Cheng
[1]: https://github.com/apache/parquet
we tried to cache table through
hiveCtx = HiveContext(sc)
hiveCtx.cacheTable("table name")
as described on Spark 1.3.1's document and we're on CDH5.3.0 with Spark
1.3.1 built with Hadoop 2.6
following error message would occur if we tried to cache table with parquet
format & GZIP
though we're not