arunmahadevan commented on issue #23576: [SPARK-26655] [SS] Support multiple 
aggregates in append mode
URL: https://github.com/apache/spark/pull/23576#issuecomment-461542627
 
 
   > * I have a plan tree A: EventTimeExec -> B: StatefulOperator -> C: 
StatefulOperator. Can C use the watermark in A? If so, is it safe to do that 
when B transforms or projects away the watermarked column - if not, what are 
the rules for how watermarks must be provided with multiple aggregates?
   
   Typically C cannot since A is the input watermark of B and assuming it does 
some aggregation, it needs to emit a new watermark. Theres a new check in the 
`UnsupportedOperationChecker` where it checks that each aggregate's grouping 
expression has a event time watermark attribute, which kind of enforces this. 
So one would have to explicitly specify a timestamp output column and a second 
watermark like
   ```java
   input.withWatermark("ts", ...)
         .groupBy(window($"ts", ...), $"key").count()
         .select($"window.end" as "windowts", $"count")
         .withWatermark("windowts", ...)
         .groupBy(...)
   ```
   
   > * Do all of our optimization and execution rules respect the semantics of 
operator watermarks?
   
   Need to check if it would interfere with multiple watermarks or we need any 
new rules.
   
   >* We can currently call `withWatermark` at any point in the query plan. Is 
this consistent with operator watermarks? Even if we can support the two of 
them together, do we want to?
   
   I thought `withWatermark` should be called before the `groupBy` so that the 
grouping attribute will have a watermark otherwise it fails in the 
`UnsupportedOperationChecker`. With multiple aggregates, it should be called 
before each aggregate.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to