TengHu opened a new pull request #12297: URL: https://github.com/apache/flink/pull/12297
> ## What is the purpose of the change > Flink triggers all panes belonging to one window at the same time. In other words, all panes are aligned and their triggers all fire simultaneously, causing the spiking workload effect. This pull request adds WindowStagger to generate staggering offset for each window assignment at runtime, so the workloads are distributed across time. Hence each window assignment is based on window size, window offset and staggering offset (generated in runtime). > > This change only modifies TumblingProcessingTimeWindows, will send out other windows change in other PRs. > > ## Brief change log > * _Add WindowStagger for generating staggering offsets_ > * _Enable TumblingProcessingTimeWindows to generate staggering offsets if user enabled_ > > ## Verifying this change > This change is already covered by existing tests, such as _TumblingProcessingTimeWindowsTest_. > > This change added tests and can be verified as follows: > > * _Added unit tests for WindowStagger_ > * _Validated the change by running in our clusters with 3500 task managers in total on a stateful streaming program using sliding and tumbling windowing. Some dashboards are shown below_ > > ![](https://camo.githubusercontent.com/298c26fefe598c1b4605a83f84451adfde4fae50/68747470733a2f2f6973737565732e6170616368652e6f72672f6a6972612f7365637572652f6174746163686d656e742f31323937313835362f737461676765725f77696e646f775f64656c61792e706e67) > ![](https://camo.githubusercontent.com/3c4769bd8b7352c8860eaaa4bc6bbffc64104975/68747470733a2f2f6973737565732e6170616368652e6f72672f6a6972612f7365637572652f6174746163686d656e742f31323937313835372f737461676765725f77696e646f775f7468726f7567687075742e706e67) > ![](https://camo.githubusercontent.com/5fd01124c4c21d9388eab78c498c928ff6a651db/68747470733a2f2f6973737565732e6170616368652e6f72672f6a6972612f7365637572652f6174746163686d656e742f31323937313835392f737461676765725f77696e646f772e706e67) > > _some system metrics_ > > ![buffers_in_queue](https://user-images.githubusercontent.com/10646097/60139232-00f29d80-9762-11e9-84f4-3bfbde28c028.png) > ![buffer_usage](https://user-images.githubusercontent.com/10646097/60139234-03ed8e00-9762-11e9-99d1-de845d02a8c6.png) > ![output_record_rate](https://user-images.githubusercontent.com/10646097/60139237-064fe800-9762-11e9-959d-2db96c7f7bf6.png) > > ## Does this pull request potentially affect one of the following parts: > * Dependencies (does it add or upgrade a dependency): no > * The public API, i.e., is any changed class annotated with `@Public(Evolving)`: yes, TumblingProcessingTimeWindows(potentially all WindowAssigners) > * The serializers: no > * The runtime per-record code paths (performance sensitive): don't know, probably no ? > * Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no > * The S3 file system connector: no > > ## Documentation > * Does this pull request introduce a new feature? yes > * If yes, how is the feature documented? JavaDocs ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org