Re: [Spark SQL] INSERT OVERWRITE to a hive partitioned table (pointing to s3) from spark is too slow.

ehbhaskar Mon, 05 Nov 2018 15:10:03 -0800

Here's code with correct data frame.

self.session = SparkSession \
            .builder \
            .appName(self.app_name) \
            .config("spark.dynamicAllocation.enabled", "false") \
            .config("hive.exec.dynamic.partition.mode", "nonstrict") \
            .config("mapreduce.fileoutputcommitter.algorithm.version", "2")
\
            .config("hive.load.dynamic.partitions.thread", "10") \
            .config("hive.mv.files.thread", "30") \
            .config("fs.trash.interval", "0") \
            .enableHiveSupport()
            
columns_with_default = "col1, NULL as col2, col2, col4, NULL as col5,
partition_col1, partition_col2"
source_data_df_to_write = self.session.sql(
                 "SELECT %s FROM TEMP_VIEW" % (columns_with_default))




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: [Spark SQL] INSERT OVERWRITE to a hive partitioned table (pointing to s3) from spark is too slow.

Reply via email to