[jira] [Commented] (SPARK-23703) Collapse sequential watermarks

2018-05-05 Thread Jose Torres (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464853#comment-16464853
 ] 

Jose Torres commented on SPARK-23703:
-

Up to you. It might be worth asking if there are use cases for that kind of 
thing, but on the other hand I don't know of other systems that support it.

> Collapse sequential watermarks 
> ---
>
> Key: SPARK-23703
> URL: https://issues.apache.org/jira/browse/SPARK-23703
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Priority: Major
>
> When there are two sequential EventTimeWatermark nodes in a query plan, the 
> topmost one overrides the column tracking metadata from its children, but 
> leaves the nodes themselves untouched. When there is no intervening stateful 
> operation to consume the watermark, we should remove the lower node entirely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23703) Collapse sequential watermarks

2018-05-03 Thread Jungtaek Lim (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463209#comment-16463209
 ] 

Jungtaek Lim commented on SPARK-23703:
--

Agreed. Is it worth to discuss in dev. mailing list? Or we can simply propose 
the patch for the fix?

> Collapse sequential watermarks 
> ---
>
> Key: SPARK-23703
> URL: https://issues.apache.org/jira/browse/SPARK-23703
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Priority: Major
>
> When there are two sequential EventTimeWatermark nodes in a query plan, the 
> topmost one overrides the column tracking metadata from its children, but 
> leaves the nodes themselves untouched. When there is no intervening stateful 
> operation to consume the watermark, we should remove the lower node entirely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23703) Collapse sequential watermarks

2018-05-03 Thread Jose Torres (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463190#comment-16463190
 ] 

Jose Torres commented on SPARK-23703:
-

No, I don't know of any actual use cases for this. I think just writing an 
analyzer rule disallowing it could be a valid resolution here.

> Collapse sequential watermarks 
> ---
>
> Key: SPARK-23703
> URL: https://issues.apache.org/jira/browse/SPARK-23703
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Priority: Major
>
> When there are two sequential EventTimeWatermark nodes in a query plan, the 
> topmost one overrides the column tracking metadata from its children, but 
> leaves the nodes themselves untouched. When there is no intervening stateful 
> operation to consume the watermark, we should remove the lower node entirely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23703) Collapse sequential watermarks

2018-05-03 Thread Jungtaek Lim (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463189#comment-16463189
 ] 

Jungtaek Lim commented on SPARK-23703:
--

Actually I haven't hear about multiple watermarks on same source, which makes 
the things complicated. What I've heard is event-time window with single time 
field, and watermark for such field. Do you have/hear actual use cases for this?

> Collapse sequential watermarks 
> ---
>
> Key: SPARK-23703
> URL: https://issues.apache.org/jira/browse/SPARK-23703
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Priority: Major
>
> When there are two sequential EventTimeWatermark nodes in a query plan, the 
> topmost one overrides the column tracking metadata from its children, but 
> leaves the nodes themselves untouched. When there is no intervening stateful 
> operation to consume the watermark, we should remove the lower node entirely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23703) Collapse sequential watermarks

2018-05-03 Thread Jose Torres (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462584#comment-16462584
 ] 

Jose Torres commented on SPARK-23703:
-

I'm no longer entirely convinced that this (and the parent JIRA) are correct. 
We might not want to support these scenarios at all.

The question here is what we should do with the query:

df.withWatermark(“a”, …)
   .withWatermark(“b”, …)
   .agg(...)

What we do right now is definitely wrong. We (in MicroBatchExecution) calculate 
separate watermarks on "a" and "b", take their minimum, and then pass that as 
the watermark value to the aggregate. But the aggregate only sees "b" as a 
watermarked column, because only "b" has EventTimeWatermark.delayKey set in its 
attribute metadata at the aggregate node. EventTimeWatermark("b").output erases 
the metadata for "a" in its output.

So we need to somehow resolve this mismatch.

> Collapse sequential watermarks 
> ---
>
> Key: SPARK-23703
> URL: https://issues.apache.org/jira/browse/SPARK-23703
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Priority: Major
>
> When there are two sequential EventTimeWatermark nodes in a query plan, the 
> topmost one overrides the column tracking metadata from its children, but 
> leaves the nodes themselves untouched. When there is no intervening stateful 
> operation to consume the watermark, we should remove the lower node entirely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23703) Collapse sequential watermarks

2018-05-03 Thread Jungtaek Lim (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462339#comment-16462339
 ] 

Jungtaek Lim commented on SPARK-23703:
--

[~joseph.torres]

Could you provide simple code or query showing this behavior? It would make me 
(and possible other contributors) better understanding of rationalize on this 
issue, and maybe relevant internal too.

Once I could understand the details I'd also like to work on this.

> Collapse sequential watermarks 
> ---
>
> Key: SPARK-23703
> URL: https://issues.apache.org/jira/browse/SPARK-23703
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Priority: Major
>
> When there are two sequential EventTimeWatermark nodes in a query plan, the 
> topmost one overrides the column tracking metadata from its children, but 
> leaves the nodes themselves untouched. When there is no intervening stateful 
> operation to consume the watermark, we should remove the lower node entirely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org