[SS] Why is a streaming aggregation required for complete output mode?

Jacek Laskowski Fri, 18 Aug 2017 05:25:42 -0700

Hi,

Why is the requirement for a streaming aggregation in a streaming
query? What would happen if Spark allowed Complete without a single
aggregation? This is the latest master.


scala> val q = ids.
     |   writeStream.
     |   format("memory").
     |   queryName("dups").
     |   outputMode(OutputMode.Complete).  // <-- memory sink supports
checkpointing for Complete output mode only
     |   trigger(Trigger.ProcessingTime(30.seconds)).
     |   option("checkpointLocation", "checkpoint-dir"). // <-- use
checkpointing to save state between restarts
     |   start
org.apache.spark.sql.AnalysisException: Complete output mode not
supported when there are no streaming aggregations on streaming
DataFrames/Datasets;;
Project [cast(time#10 as bigint) AS time#15L, id#6]
+- Deduplicate [id#6], true
   +- Project [cast(time#5 as timestamp) AS time#10, id#6]
      +- Project [_1#2 AS time#5, _2#3 AS id#6]
         +- StreamingExecutionRelation MemoryStream[_1#2,_2#3], [_1#2, _2#3]

  at 
org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:297)
  at 
org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.checkForStreaming(UnsupportedOperationChecker.scala:115)
  at 
org.apache.spark.sql.streaming.StreamingQueryManager.createQuery(StreamingQueryManager.scala:232)
  at 
org.apache.spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:278)
  at 
org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:249)
  ... 57 elided

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

[SS] Why is a streaming aggregation required for complete output mode?

Reply via email to