Hi,

https://stackoverflow.com/q/46032001/1305344 :)

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Spark Structured Streaming (Apache Spark 2.2+)
https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Mon, Sep 4, 2017 at 9:05 AM, Jacek Laskowski <ja...@japila.pl> wrote:
> Hi,
>
> It's by default event time-based as there's no way to define the
> column using withWatermark operator.
>
> See 
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset@withWatermark(eventTime:String,delayThreshold:String):org.apache.spark.sql.Dataset[T]
>
> But...
>
> Given your initial Dataset can have no event time column you can
> auto-generate one using current_date or current_timestamp or some
> other way at processing time that would give you the other option (at
> processing time).
>
> And the last but not least...
>
> In the most generic solution using
> KeyValueGroupedDataset.flatMapGroupsWithState, you can pre-define the
> strategies or write a custom one. That's why they call it a solution
> for an "arbitrary aggregation".
>
> * 
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.KeyValueGroupedDataset
>
> * https://youtu.be/JAb4FIheP28
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> Spark Structured Streaming (Apache Spark 2.2+)
> https://bit.ly/spark-structured-streaming
> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Fri, Sep 1, 2017 at 8:15 PM, kant kodali <kanth...@gmail.com> wrote:
>> Is watermark always set using processing time or event time or both?

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to