Any help?
Need urgent help. Someone please clarify the doubt?
-- Forwarded message --
From: Aakash Basu <aakash.spark@gmail.com>
Date: Thu, Apr 5, 2018 at 2:28 PM
Subject: [Structured Streaming] How to save entire column aggregation to a
file
To: user <user@spark.apache.org>
Hi,
I want to save an aggregate to a file without using any window, watermark
or groupBy. So, my aggregation is at entire column level.
df = spark.sql("select avg(col1) as aver from ds")
Now, the challenge is as follows -
1) If I use outputMode = Append, but "*Append output mode not supported
when there are streaming aggregations on streaming DataFrames/DataSets
without watermark*"
query2 = df \
.writeStream \
.format("parquet") \
.option("path", "/home/aakashbasu/Downloads/Kafka_Testing/Temp_AvgStore/") \
.option("checkpointLocation", "/home/aakashbasu/Downloads/Kafka_Testing/") \
.trigger(processingTime='3 seconds') \
.start()
2) If I use outputMode = Complete, but "*Data source parquet does not
support Complete output mode;*"
query2 = df \
.writeStream \
.outputMode("complete") \
.format("parquet") \
.option("path", "/home/aakashbasu/Downloads/Kafka_Testing/Temp_AvgStore/") \
.option("checkpointLocation", "/home/aakashbasu/Downloads/Kafka_Testing/") \
.trigger(processingTime='3 seconds') \
.start()
What to do? How to go about it?
Thanks,
Aakash.