Hi, Lian:
Thanks for the information. It works as expect in the spark with this setting.
Yong
Subject: Re: Is this a Spark issue or Hive issue that Spark cannot read the
string type data in the Parquet generated by Hive
To: java8...@hotmail.com; user@spark.apache.org
From: lian.cs@gmail.com
Hi, Spark Users:
I have a problem related to Spark cannot recognize the string type in the
Parquet schema generated by Hive.
Version of all components:
Spark 1.3.1Hive 0.12.0Parquet 1.3.2
I generated a detail low level table in the Parquet format using MapReduce java
code. This table can be read
Please set the the SQL option spark.sql.parquet.binaryAsString to true
when reading Parquet files containing strings generated by Hive.
This is actually a bug of parquet-hive. When generating Parquet schema
for a string field, Parquet requires a "UTF8" annotation, something like:
message
BTW, just checked that this bug should have been fixed since Hive
0.14.0. So the SQL option I mentioned is mostly used for reading legacy
Parquet files generated by older versions of Hive.
Cheng
On 9/25/15 2:42 PM, Cheng Lian wrote:
Please set the the SQL option