arunmahadevan opened a new pull request #23576: [SPARK-26655] [SS] Support multiple aggregates in append mode URL: https://github.com/apache/spark/pull/23576 ## What changes were proposed in this pull request? This patch proposes to add support for multiple aggregates in append mode. In append mode, the aggregates are emitted only after the watermark passes the threshold (e.g. the window boundary) and the emitted value is not affected by further late data. This allows to chain multiple aggregates in 'Append' output mode without worrying about retractions etc. However the current event time watermarks in structured streaming are tracked at a global level and this does not work when aggregates are chained. The downstream watermarks usually lags the ones before and the global (min or max) watermarks will not let the stages make progress independently. The patch tracks the watermarks at each (stateful) operator so that the aggregate outputs are generated when the watermark passes the thresholds at the corresponding stateful operator. The values are also saved into the commit/offset logs (similar to global watermark) Each aggregate should have a corresponding watermark defined while creating the query (E.g. via withWatermark) and this is used to track the progress of event time corresponding to the stateful operator. ## How was this patch tested? New and existing unit tests Please review http://spark.apache.org/contributing.html before opening a pull request.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org