Hi Folks,

I am trying to save a spark data frame after reading from ORC file and add
two new columns and finally trying to save it to hive table with both
partition and bucketing feature.

Using Spark 2.3 (as both partition and bucketing feature are available in
this version).

Looking for advise.

Code Snippet:

df_orc_data =
spark.read.format("orc").option("delimiter","|").option("header",
"true").option("inferschema", "true").load(filtered_path)
df_fil_ts_data = df_orc_data.withColumn("START_TS",
lit(process_time).cast("timestamp"))
daily = (datetime.datetime.utcnow().strftime('%Y-%m-%d'))
df_filtered_data =
df_fil_ts_data.withColumn("DAYPART",lit(daily).cast("string")
hour = (datetime.datetime.utcnow().strftime('%H'))
df_filtered = df_filtered_data.withColumn("HRS",lit(hour).cast("string"))
(df_filtered.write.partitionBy("DAYPART").bucketBy(24,"HRS").sortBy("HRS").mode("append").orc('/user/umar/netflow_filtered').saveAsTable("default.DDOS_NETFLOW_FILTERED"))

Error:
"'save' does not support bucketing right now;"



Thanks,
Umar



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to