Re: Determine global watermark via StreamingQueryProgress eventTime watermark String

2021-03-16 Thread Jungtaek Lim
There was a similar question (but another approach) and I've explained the
current status a bit.

https://lists.apache.org/thread.html/r89a61a10df71ccac132ce5d50b8fe405635753db7fa2aeb79f82fb77%40%3Cuser.spark.apache.org%3E

I guess this would also answer your question as well. At least for now,
Spark doesn't expose the current watermark in specific micro-batch to the
user level. It's abstracted away. I'm not sure knowing the exact global
watermark "outside" of the query would be able to affect the running query.

If there's a strong demand, we could probably consider adding some function
which provides the current watermark. I guess producing dropped events via
side-output is something we are in favor of (if that is not quite hard to
do), more than exposing the current watermark and letting users do that
instead.


On Wed, Mar 17, 2021 at 1:20 AM dwichman 
wrote:

> Hi Spark Developers,
>
> Is it possible to reliably determine the current global watermark that is
> being used in a streaming query via StreamingQueryProgress.onQueryProgress
> eventTime watermark String?
>
>
> https://spark.apache.org/docs/2.2.1/api/java/org/apache/spark/sql/streaming/StreamingQueryProgress.html
>
> The intention would be to precede the query watermark function with
> something like a map function and compare event times with the assumed
> global watermark to determine if the event will be dropped (i.e. too late).
>
> If StreamingQueryProgress.onQueryProgress eventTime watermark does not
> accurately reflect the current global watermark, is there another way to
> reliably determine it?
>
> Thanks for your help.
>
> -Derek
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Determine global watermark via StreamingQueryProgress eventTime watermark String

2021-03-16 Thread dwichman
Hi Spark Developers,

Is it possible to reliably determine the current global watermark that is
being used in a streaming query via StreamingQueryProgress.onQueryProgress
eventTime watermark String?

https://spark.apache.org/docs/2.2.1/api/java/org/apache/spark/sql/streaming/StreamingQueryProgress.html
 

The intention would be to precede the query watermark function with
something like a map function and compare event times with the assumed
global watermark to determine if the event will be dropped (i.e. too late).

If StreamingQueryProgress.onQueryProgress eventTime watermark does not
accurately reflect the current global watermark, is there another way to
reliably determine it?

Thanks for your help.

-Derek



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org