arunmahadevan opened a new pull request #23576: [SPARK-26655] [SS] Support 
multiple aggregates in append mode
URL: https://github.com/apache/spark/pull/23576
 
 
   ## What changes were proposed in this pull request?
   
   This patch proposes to add support for multiple aggregates in append mode. In
   append mode, the aggregates are emitted only after the watermark passes
   the threshold (e.g. the window boundary) and the emitted value is not
   affected by further late data. This allows to chain multiple aggregates
   in 'Append' output mode without worrying about retractions etc.
   
   However the current event time watermarks in structured streaming are
   tracked at a global level and this does not work when aggregates are
   chained. The downstream watermarks usually lags the ones before and the
   global (min or max) watermarks will not let the stages make progress
   independently.
   
   The patch tracks the watermarks at each (stateful)
   operator so that the aggregate outputs are generated when the watermark
   passes the thresholds at the corresponding stateful operator. The values
   are also saved into the commit/offset logs (similar to global watermark)
   
   Each aggregate should have a corresponding watermark defined while
   creating the query (E.g. via withWatermark) and this is used to
   track the progress of event time corresponding to the stateful operator.
   
   
   ## How was this patch tested?
   
   New and existing unit tests
   
   Please review http://spark.apache.org/contributing.html before opening a 
pull request.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to