Fwd: [Structured Streaming] How to save entire column aggregation to a file

2018-04-06 Thread Aakash Basu
Any help?

Need urgent help. Someone please clarify the doubt?

-- Forwarded message --
From: Aakash Basu <aakash.spark@gmail.com>
Date: Thu, Apr 5, 2018 at 2:28 PM
Subject: [Structured Streaming] How to save entire column aggregation to a
file
To: user <user@spark.apache.org>


Hi,

I want to save an aggregate to a file without using any window, watermark
or groupBy. So, my aggregation is at entire column level.

df = spark.sql("select avg(col1) as aver from ds")


Now, the challenge is as follows -

1) If I use outputMode = Append, but "*Append output mode not supported
when there are streaming aggregations on streaming DataFrames/DataSets
without watermark*"

query2 = df \
.writeStream \
.format("parquet") \
.option("path", "/home/aakashbasu/Downloads/Kafka_Testing/Temp_AvgStore/") \
.option("checkpointLocation", "/home/aakashbasu/Downloads/Kafka_Testing/") \
.trigger(processingTime='3 seconds') \
.start()



2) If I use outputMode = Complete, but "*Data source parquet does not
support Complete output mode;*"

query2 = df \
.writeStream \
.outputMode("complete") \
.format("parquet") \
.option("path", "/home/aakashbasu/Downloads/Kafka_Testing/Temp_AvgStore/") \
.option("checkpointLocation", "/home/aakashbasu/Downloads/Kafka_Testing/") \
.trigger(processingTime='3 seconds') \
.start()


What to do? How to go about it?

Thanks,
Aakash.


[Structured Streaming] How to save entire column aggregation to a file

2018-04-05 Thread Aakash Basu
Hi,

I want to save an aggregate to a file without using any window, watermark
or groupBy. So, my aggregation is at entire column level.

df = spark.sql("select avg(col1) as aver from ds")


Now, the challenge is as follows -

1) If I use outputMode = Append, but "*Append output mode not supported
when there are streaming aggregations on streaming DataFrames/DataSets
without watermark*"

query2 = df \
.writeStream \
.format("parquet") \
.option("path", "/home/aakashbasu/Downloads/Kafka_Testing/Temp_AvgStore/") \
.option("checkpointLocation", "/home/aakashbasu/Downloads/Kafka_Testing/") \
.trigger(processingTime='3 seconds') \
.start()



2) If I use outputMode = Complete, but "*Data source parquet does not
support Complete output mode;*"

query2 = df \
.writeStream \
.outputMode("complete") \
.format("parquet") \
.option("path", "/home/aakashbasu/Downloads/Kafka_Testing/Temp_AvgStore/") \
.option("checkpointLocation", "/home/aakashbasu/Downloads/Kafka_Testing/") \
.trigger(processingTime='3 seconds') \
.start()


What to do? How to go about it?

Thanks,
Aakash.