[jira] [Commented] (SPARK-23703) Collapse sequential watermarks
[ https://issues.apache.org/jira/browse/SPARK-23703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464853#comment-16464853 ] Jose Torres commented on SPARK-23703: - Up to you. It might be worth asking if there are use cases for that kind of thing, but on the other hand I don't know of other systems that support it. > Collapse sequential watermarks > --- > > Key: SPARK-23703 > URL: https://issues.apache.org/jira/browse/SPARK-23703 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Jose Torres >Priority: Major > > When there are two sequential EventTimeWatermark nodes in a query plan, the > topmost one overrides the column tracking metadata from its children, but > leaves the nodes themselves untouched. When there is no intervening stateful > operation to consume the watermark, we should remove the lower node entirely. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23703) Collapse sequential watermarks
[ https://issues.apache.org/jira/browse/SPARK-23703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463209#comment-16463209 ] Jungtaek Lim commented on SPARK-23703: -- Agreed. Is it worth to discuss in dev. mailing list? Or we can simply propose the patch for the fix? > Collapse sequential watermarks > --- > > Key: SPARK-23703 > URL: https://issues.apache.org/jira/browse/SPARK-23703 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Jose Torres >Priority: Major > > When there are two sequential EventTimeWatermark nodes in a query plan, the > topmost one overrides the column tracking metadata from its children, but > leaves the nodes themselves untouched. When there is no intervening stateful > operation to consume the watermark, we should remove the lower node entirely. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23703) Collapse sequential watermarks
[ https://issues.apache.org/jira/browse/SPARK-23703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463190#comment-16463190 ] Jose Torres commented on SPARK-23703: - No, I don't know of any actual use cases for this. I think just writing an analyzer rule disallowing it could be a valid resolution here. > Collapse sequential watermarks > --- > > Key: SPARK-23703 > URL: https://issues.apache.org/jira/browse/SPARK-23703 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Jose Torres >Priority: Major > > When there are two sequential EventTimeWatermark nodes in a query plan, the > topmost one overrides the column tracking metadata from its children, but > leaves the nodes themselves untouched. When there is no intervening stateful > operation to consume the watermark, we should remove the lower node entirely. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23703) Collapse sequential watermarks
[ https://issues.apache.org/jira/browse/SPARK-23703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463189#comment-16463189 ] Jungtaek Lim commented on SPARK-23703: -- Actually I haven't hear about multiple watermarks on same source, which makes the things complicated. What I've heard is event-time window with single time field, and watermark for such field. Do you have/hear actual use cases for this? > Collapse sequential watermarks > --- > > Key: SPARK-23703 > URL: https://issues.apache.org/jira/browse/SPARK-23703 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Jose Torres >Priority: Major > > When there are two sequential EventTimeWatermark nodes in a query plan, the > topmost one overrides the column tracking metadata from its children, but > leaves the nodes themselves untouched. When there is no intervening stateful > operation to consume the watermark, we should remove the lower node entirely. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23703) Collapse sequential watermarks
[ https://issues.apache.org/jira/browse/SPARK-23703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462584#comment-16462584 ] Jose Torres commented on SPARK-23703: - I'm no longer entirely convinced that this (and the parent JIRA) are correct. We might not want to support these scenarios at all. The question here is what we should do with the query: df.withWatermark(“a”, …) .withWatermark(“b”, …) .agg(...) What we do right now is definitely wrong. We (in MicroBatchExecution) calculate separate watermarks on "a" and "b", take their minimum, and then pass that as the watermark value to the aggregate. But the aggregate only sees "b" as a watermarked column, because only "b" has EventTimeWatermark.delayKey set in its attribute metadata at the aggregate node. EventTimeWatermark("b").output erases the metadata for "a" in its output. So we need to somehow resolve this mismatch. > Collapse sequential watermarks > --- > > Key: SPARK-23703 > URL: https://issues.apache.org/jira/browse/SPARK-23703 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Jose Torres >Priority: Major > > When there are two sequential EventTimeWatermark nodes in a query plan, the > topmost one overrides the column tracking metadata from its children, but > leaves the nodes themselves untouched. When there is no intervening stateful > operation to consume the watermark, we should remove the lower node entirely. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23703) Collapse sequential watermarks
[ https://issues.apache.org/jira/browse/SPARK-23703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462339#comment-16462339 ] Jungtaek Lim commented on SPARK-23703: -- [~joseph.torres] Could you provide simple code or query showing this behavior? It would make me (and possible other contributors) better understanding of rationalize on this issue, and maybe relevant internal too. Once I could understand the details I'd also like to work on this. > Collapse sequential watermarks > --- > > Key: SPARK-23703 > URL: https://issues.apache.org/jira/browse/SPARK-23703 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Jose Torres >Priority: Major > > When there are two sequential EventTimeWatermark nodes in a query plan, the > topmost one overrides the column tracking metadata from its children, but > leaves the nodes themselves untouched. When there is no intervening stateful > operation to consume the watermark, we should remove the lower node entirely. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org