[GitHub] [flink-web] AHeise commented on a change in pull request #436: Add Apache Flink release 1.13.0

GitBox Mon, 26 Apr 2021 01:29:33 -0700


AHeise commented on a change in pull request #436:
URL: https://github.com/apache/flink-web/pull/436#discussion_r620080598




##########
File path: _posts/2021-04-22-release-1.13.0.md
##########
@@ -0,0 +1,374 @@
+---
+layout: post 
+title:  "Apache Flink 1.13.0 Release Announcement"
+date: 2021-04-22T08:00:00.000Z 
+categories: news 
+authors:
+- stephan:
+  name: "Stephan Ewen"
+  twitter: "StephanEwen"
+- dwysakowicz:
+  name: "Dawid Wysakowicz"
+  twitter: "dwysakowicz"
+
+excerpt: The Apache Flink community is excited to announce the release of 
Flink 1.13.0! Close to xxx contributors worked on over xxx threads to bring 
significant improvements to usability and observability as well as new features 
that improve elasticity of Flink’s Application-style deployments.
+---
+
+
+The Apache Flink community is excited to announce the release of Flink 1.13.0! 
Close to xxx
+contributors worked on over xxx threads to bring significant improvements to 
usability and
+observability as well as new features that improve elasticity of Flink’s 
Application-style
+deployments.
+
+This release brings us a big step forward in one of our major efforts: Making 
Stream Processing
+Applications as natural and as simple to manage as any other application. The 
new reactive scaling
+mode means that scaling streaming applications in and out now works like in 
any other application,
+by just changing the number of parallel processes.
+
+We also added a series of improvements that help users better understand the 
performance of
+applications. When the streams don't flow as fast as you’d hope, these can 
help you to understand
+why: Load and backpressure visualization to identify bottlenecks, CPU flame 
graphs to identify hot
+code paths in your application, and State Access Latencies to see how the 
State Backends are keeping
+up.
+
+This blog post describes all major new features and improvements, important 
changes to be aware of
+and what to expect moving forward.
+
+{% toc %}
+
+We encourage you to [download the 
release](https://flink.apache.org/downloads.html) and share your
+feedback with the community through
+the [Flink mailing 
lists](https://flink.apache.org/community.html#mailing-lists)
+or [JIRA](https://issues.apache.org/jira/projects/FLINK/summary).
+
+## Notable Features and Improvements
+
+### Reactive mode
+
+The Reactive Mode is the latest piece in Flink's initiative for making Stream 
Processing
+Applications as natural and as simple to manage as any other application.
+
+Flink has a dual nature when it comes to resource management and deployments: 
You can deploy
+clusters onto Resource Managers like Kubernetes or Yarn in such a way that 
Flink actively manages
+the resource, and allocates and releases workers as needed. That is especially 
useful for jobs and
+applications that rapidly change their required resources, like batch 
applications and ad-hoc SQL
+queries. The application parallelism rules, the number of workers follows. We 
call this active
+scaling.
+
+For long running streaming applications, it is often a nicer model to just 
deploy them like any
+other long-running application: The application doesn't really need to know 
that it runs on K8s,
+EKS, Yarn, etc. and doesn't try to acquire a specific amount of workers; 
instead, it just uses the
+number of workers that is given to it. The number of workers rules, the 
application parallelism
+adjusts to that. We call that re-active scaling.
+
+The Application Deployment Mode started this effort, making deployments 
application-like (avoiding
+having to separate deployment steps to (1) start cluster and (2) submit 
application). The reactive
+scheduler completes this, and you now don't have to use extra tools (scripts 
or a K8s operator) any
+more to keep the number of workers and the application parallelism settings in 
sync.
+
+You can now put an auto-scaler around Flink applications like around other 
typical applications — as
+long as you are mindful when configuring the autoscaler that stateful 
applications still spend
+effort in moving state around when scaling.
+
+
+### Bottleneck detection, Backpressure and Idleness Monitoring
+
+One of the most important metrics to investigate when a job does not consume 
records as fast as you
+would expect is the backpressure ratio. It lets you track down bottlenecks in 
your pipelines. The
+current mechanism had two limitations:
+It was heavy, because it worked by repeatedly taking stack trace samples of 
your running tasks. It
+was difficult to find out which vertex was the source of backpressure. In 
Flink 1.13, we reworked
+the mechanism to include new metrics for the time tasks spend being 
backpressured, along with a
+reworked graphical representation of the job (including a percentage of time 
particular vertices are
+backpressured).
+
+
+<figure style="align-content: center">
+  <img src="{{ site.baseurl 
}}/img/blog/2021-04-xx-release-1.13.0/bottleneck.png" style="width: 900px"/>
+</figure>
+
+### Support for CPU flame graphs in Web UI
+
+It is desirable to provide better visibility into the distribution of CPU 
resources while executing
+user code. One of the most visually effective means to do that are Flame 
Graphs. They allow to
+easily answer question like:
+Which methods are currently consuming CPU resources? How does consumption by 
one method compare to
+the others? Which series of calls on the stack led to executing a particular 
method? Flame Graphs
+are constructed by sampling stack traces a number of times. Every method call 
is represented by a
+bar, where the length of the bar is proportional to the number of times it is 
present in the
+samples. In order to prevent unintended impacts on production environments, 
Flame Graphs are
+currently available as an opt-in feature that needs to be enabled in the 
configuration. Once enabled
+they are accessible via a new component in the UI at the level of the selected 
operator:
+
+<figure style="align-content: center">
+  <img src="{{ site.baseurl }}/img/blog/2021-04-xx-release-1.13.0/7.png" 
style="display: block; margin-left: auto; margin-right: auto; width: 600px"/>
+</figure>
+
+### Access Latency Metrics for State
+
+State interactions are a crucial part of the majority of data
+pipelines. Especially in case of using RocksDB they might be rather IO 
intensive and therefore they
+play an important role in the overall performance of the pipeline. Therefore, 
it is important to be
+able to get insights into what is going on under the hood. To provide more 
insights, we exposed
+latency tracking metrics.
+
+The metrics are disabled by default, but you can enable them using the
+`state.backend.rocksdb.latency-track-enabled` option.
+
+### Unified binary savepoint format
+
+All available state backends are forced to produce a single common unified 
binary format for their
+savepoints. This means that savepoints are now mutually interchangeable. You 
are no longer locked
+into the first state backend you chose when starting your application for the 
first time. It makes
+it easier to start with Heap Backend and switch later on to RocksDB, if JVM 
Heap becomes too full (
+which you usually see when the GC times start to go up too much).
+
+### Support user-specified pod templates for Active Kubernetes Deployments
+
+The native Kubernetes deployment received an important update that it supports 
custom pod templates.
+Flink from now on allows users to define the JobManager and TaskManager pods 
via template files.
+This allows to support advanced features that are not supported by Flink 
Kubernetes config options
+directly. Major Observability Improvements
+
+What runs on Flink are often critical workloads with SLAs, so it is important 
to have the right
+tools to understand what is happening inside the applications.
+
+If your application does not progress as expected, the latency is higher or 
the throughput lower
+than you would expect, these features help you figure out what is going on.
+
+### Unaligned Checkpoints - Production Ready

Review comment:
       I'd not mention 1.12.3 here and focus more on features.
   
   > 
   > Unaligned checkpoints now support adaptive triggering with timeouts which 
will only perform unaligned checkpoint in case of backpressure and use aligned 
checkpoint otherwise. Thus, checkpoint times become reliable in all situation 
while keeping extra state to a minimum.
   > 
   > Further, you can now rescale from unaligned checkpoints to provide more 
resources in case of sudden spikes (together with reactive mode). This feature 
makes it easier to catch up and reduce backpressure in a timely fashion.
   > 
   > Flink 1.13 brings together all features that the community initially 
envisioned for unaligned checkpoints. Together with all bugfixes that happened 
in 1.12, we generally encourage the use of unaligned checkpoints for all 
applications with potential backpressure.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-web] AHeise commented on a change in pull request #436: Add Apache Flink release 1.13.0

Reply via email to