Re: what's the best practice to create an external hive table based on a csv file on HDFS with 618 columns in header?

2018-07-24 Thread Furcy Pin
Hello, To load your data as parquet, you can either: A. use spark: https://docs.databricks.com/spark/latest/data-sources/read-csv.html and write it directly as a parquet file (df.write.format("parquet").saveAsTable("parquet_table")) B. Load it as a csv file in Hive, and perform a CREATE TABLE

what's the best practice to create an external hive table based on a csv file on HDFS with 618 columns in header?

2018-07-23 Thread Raymond Xie
We are using Cloudera CDH 5.11 I have seen solution for small xlsx files with only handful columns in header, in my case the csv file to be loaded into a new hive table has 618 columns. 1. Would it be saved as parquet by default if I upload it (save it to csv first) through HUE-> File