Hi Folks, I am trying to save a spark data frame after reading from ORC file and add two new columns and finally trying to save it to hive table with both partition and bucketing feature.
Using Spark 2.3 (as both partition and bucketing feature are available in this version). Looking for advise. Code Snippet: df_orc_data = spark.read.format("orc").option("delimiter","|").option("header", "true").option("inferschema", "true").load(filtered_path) df_fil_ts_data = df_orc_data.withColumn("START_TS", lit(process_time).cast("timestamp")) daily = (datetime.datetime.utcnow().strftime('%Y-%m-%d')) df_filtered_data = df_fil_ts_data.withColumn("DAYPART",lit(daily).cast("string") hour = (datetime.datetime.utcnow().strftime('%H')) df_filtered = df_filtered_data.withColumn("HRS",lit(hour).cast("string")) (df_filtered.write.partitionBy("DAYPART").bucketBy(24,"HRS").sortBy("HRS").mode("append").orc('/user/umar/netflow_filtered').saveAsTable("default.DDOS_NETFLOW_FILTERED")) Error: "'save' does not support bucketing right now;" Thanks, Umar -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org