Hi,

I've been wondering what the "proper" physical plan should be when
more than one withWatermark operator is used in a query (as below).

I think a SparkPlan should have one EventTimeWatermarkExec physical
operator only in a single query, but wonder if I didn't think about
something important (a sort of edge case perhaps).

I also think the last withWatermark and hence EventTimeWatermark is in
effect. Correct?

BTW, Since joins of streaming queries are not supported, the only way
two or more EventTimeWatermarkExec operators could be used and applies
separately is for two or more streaming queries unioned. Correct?

scala> sq.explain(true)
== Parsed Logical Plan ==
EventTimeWatermark timestamp#773: timestamp, interval 40 seconds
+- EventTimeWatermark timestamp#773: timestamp, interval 10 seconds
   +- LogicalRDD [timestamp#773, value#774L]

== Analyzed Logical Plan ==
timestamp: timestamp, value: bigint
EventTimeWatermark timestamp#773: timestamp, interval 40 seconds
+- EventTimeWatermark timestamp#773: timestamp, interval 10 seconds
   +- LogicalRDD [timestamp#773, value#774L]

== Optimized Logical Plan ==
EventTimeWatermark timestamp#773: timestamp, interval 40 seconds
+- EventTimeWatermark timestamp#773: timestamp, interval 10 seconds
   +- LogicalRDD [timestamp#773, value#774L]

== Physical Plan ==
EventTimeWatermark timestamp#773: timestamp, interval 40 seconds
+- EventTimeWatermark timestamp#773: timestamp, interval 10 seconds
   +- Scan ExistingRDD[timestamp#773,value#774L]

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to