Spark DF to Hive table with both Partition and Bucketing not working

umargeek Tue, 19 Jun 2018 20:42:01 -0700

Hi Folks,

I am trying to save a spark data frame after reading from ORC file and add
two new columns and finally trying to save it to hive table with both
partition and bucketing feature.


Using Spark 2.3 (as both partition and bucketing feature are available in
this version).

Looking for advise.

Code Snippet:

df_orc_data =
spark.read.format("orc").option("delimiter","|").option("header",
"true").option("inferschema", "true").load(filtered_path)
df_fil_ts_data = df_orc_data.withColumn("START_TS",
lit(process_time).cast("timestamp"))
daily = (datetime.datetime.utcnow().strftime('%Y-%m-%d'))
df_filtered_data =
df_fil_ts_data.withColumn("DAYPART",lit(daily).cast("string")
hour = (datetime.datetime.utcnow().strftime('%H'))
df_filtered = df_filtered_data.withColumn("HRS",lit(hour).cast("string"))
(df_filtered.write.partitionBy("DAYPART").bucketBy(24,"HRS").sortBy("HRS").mode("append").orc('/user/umar/netflow_filtered').saveAsTable("default.DDOS_NETFLOW_FILTERED"))

Error:
"'save' does not support bucketing right now;"



Thanks,
Umar



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Spark DF to Hive table with both Partition and Bucketing not working

Reply via email to