What exactly do you mean by "get schema from a parquet file"?
- If you are trying to inspect Parquet files, parquet-tools can be
pretty neat: https://github.com/Parquet/parquet-mr/issues/321
- If you are trying to get Parquet schema of Parquet MessageType, you
may resort to readFooterX() and readAllFootersX() utility methods in
ParquetFileReader
- If you are trying to get Spark SQL StructType schema out of a Parquet
file, then the most convenient way is to load it as a DataFrame.
However, "loading" it as a DataFrame doesn't mean we scan the whole
file. Instead, we only try to do minimum metadata discovery work like
schema discovery and schema merging.
Cheng
On 9/1/15 7:07 PM, Hafiz Mujadid wrote:
Hi all!
Is there any way to get schema from a parquet file without loading into
dataframe?
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Schema-From-parquet-file-tp24535.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org