Hi, https://stackoverflow.com/q/46032001/1305344 :)
Pozdrawiam, Jacek Laskowski ---- https://about.me/JacekLaskowski Spark Structured Streaming (Apache Spark 2.2+) https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Mon, Sep 4, 2017 at 9:05 AM, Jacek Laskowski <ja...@japila.pl> wrote: > Hi, > > It's by default event time-based as there's no way to define the > column using withWatermark operator. > > See > http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset@withWatermark(eventTime:String,delayThreshold:String):org.apache.spark.sql.Dataset[T] > > But... > > Given your initial Dataset can have no event time column you can > auto-generate one using current_date or current_timestamp or some > other way at processing time that would give you the other option (at > processing time). > > And the last but not least... > > In the most generic solution using > KeyValueGroupedDataset.flatMapGroupsWithState, you can pre-define the > strategies or write a custom one. That's why they call it a solution > for an "arbitrary aggregation". > > * > http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.KeyValueGroupedDataset > > * https://youtu.be/JAb4FIheP28 > > Pozdrawiam, > Jacek Laskowski > ---- > https://about.me/JacekLaskowski > Spark Structured Streaming (Apache Spark 2.2+) > https://bit.ly/spark-structured-streaming > Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Fri, Sep 1, 2017 at 8:15 PM, kant kodali <kanth...@gmail.com> wrote: >> Is watermark always set using processing time or event time or both? --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org