Hello,
To load your data as parquet, you can either:
A. use spark:
https://docs.databricks.com/spark/latest/data-sources/read-csv.html and
write it directly as a parquet file
(df.write.format("parquet").saveAsTable("parquet_table"))
B. Load it as a csv file in Hive, and perform a CREATE TABLE
We are using Cloudera CDH 5.11
I have seen solution for small xlsx files with only handful columns in
header, in my case the csv file to be loaded into a new hive table has 618
columns.
1.
Would it be saved as parquet by default if I upload it (save it to csv
first) through HUE-> File