This is an automated email from the ASF dual-hosted git repository. tzulitai pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/flink-web.git
commit 040c74ca45142d39b0aa206cc40937645053b89f Author: Robert Metzger <rmetz...@apache.org> AuthorDate: Tue Aug 13 11:58:27 2019 +0200 Add 1.9.0 release announcement --- _posts/2019-08-22-release-1.9.0.md | 367 +++++++++++++++++++++++++++++++++++++ img/blog/release-19-flip1.png | Bin 0 -> 32435 bytes img/blog/release-19-stack.png | Bin 0 -> 66386 bytes img/blog/release-19-web1.png | Bin 0 -> 121643 bytes img/blog/release-19-web2.png | Bin 0 -> 133292 bytes 5 files changed, 367 insertions(+) diff --git a/_posts/2019-08-22-release-1.9.0.md b/_posts/2019-08-22-release-1.9.0.md new file mode 100644 index 0000000..9bab119 --- /dev/null +++ b/_posts/2019-08-22-release-1.9.0.md @@ -0,0 +1,367 @@ +--- +layout: post +title: "Apache Flink 1.9.0 Release Announcement" +date: 2019-8-22 12:10:00 +categories: news +authors: +- till: + name: "Marta Moreira" + twitter: "morsapaes" +--- + + +The Apache Flink community is proud to announce the release of Apache Flink +1.9.0. + +The Apache Flink project's goal is to develop a stream processing system to +unify and power many forms of real-time and offline data processing +applications as well as event-driven applications. In this release, we have +made a huge step forward in that effort, by integrating Flink’s stream and +batch processing capabilities under a single, unified runtime. + +Significant features on this path are batch-style recovery for batch jobs and +a preview of the new Blink-based query engine for Table API and SQL queries. +We are also excited to announce the availability of the State Processor API, +which is one of the most frequently requested features and enables users to +read and write savepoints with Flink DataSet jobs. Finally, Flink 1.9 includes +a reworked WebUI and previews of Flink’s new Python Table API and its +integration with the Apache Hive ecosystem. + +This blog post describes all major new features and improvements, important +changes to be aware of and what to expect moving forward. For more details, +check the [complete release +changelog](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344601). + +The binary distribution and source artifacts for this release are now +available via the [Downloads](https://flink.apache.org/downloads.html) page of +the Flink project, along with the updated +[documentation](https://ci.apache.org/projects/flink/flink-docs-release-1.9/). +Flink 1.9 is API-compatible with previous 1.x releases for APIs annotated with +the `@Public` annotation. + +Please feel encouraged to download the release and share your thoughts with +the community through the Flink [mailing +lists](https://flink.apache.org/community.html#mailing-lists) or +[JIRA](https://issues.apache.org/jira/projects/FLINK/summary). As always, +feedback is very much appreciated! + + +{% toc %} + + +## New Features and Improvements + + +### Fine-grained Batch Recovery (FLIP-1) + +The time to recover a batch (DataSet, Table API and SQL) job from a task +failure was significantly reduced. Until Flink 1.9, task failures in batch +jobs were recovered by canceling all tasks and restarting the whole job, i.e, +the job was started from scratch and all progress was voided. With this +release, Flink can be configured to limit the recovery to only those tasks +that are in the same **failover region**. A failover region is the set of +tasks that are connected via pipelined data exchanges. Hence, the +batch-shuffle connections of a job define the boundaries of its failover +regions. More details are available in +[FLIP-1](https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures). +![alt_text]({{site.baseurl}}/img/blog/release-19-flip1.png "Fine-grained Batch +Recovery") To use this new failover strategy, you need to do the following +settings: + * Make sure you have the entry `jobmanager.execution.failover-strategy: + region` in your `flink-conf.yaml`. + +**Note:** The configuration of the 1.9 distribution has that entry by default, + but when reusing a configuration file from previous setups, you have to add + it manually. + +Moreover, you need to set the `ExecutionMode` of batch jobs in the +`ExecutionConfig` to `BATCH` to configure that data shuffles are not pipelined +and jobs have more than one failover region. + +The "Region" failover strategy also improves the recovery of “embarrassingly +parallel” streaming jobs, i.e., jobs without any shuffle like keyBy() or +rebalance. When such a job is recovered, only the tasks of the affected +pipeline (failover region) are restarted. For all other streaming jobs, the +recovery behavior is the same as in prior Flink versions. + + +### State Processor API (FLIP-43) + +Up to Flink 1.9, accessing the state of a job from the outside was limited to +the (still) experimental [Queryable +State](https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/queryable_state.html). +This release introduces a new, powerful library to read, write and modify +state snapshots using the batch DataSet API. In practice, this means: + + * Flink job state can be bootstrapped by reading data from external systems, + such as external databases, and converting it into a savepoint. + * State in savepoints can be queried using any of Flink’s batch APIs + (DataSet, Table, SQL), for example to analyze relevant state patterns or + check for discrepancies in state that can support application auditing or + troubleshooting. + * The schema of state in savepoints can be migrated offline, compared to the + previous approach requiring online migration on schema access. + * Invalid data in savepoints can be identified and corrected. + +The new State Processor API covers all variations of snapshots: savepoints, +full checkpoints and incremental checkpoints. More details are available in +[FLIP-43](https://cwiki.apache.org/confluence/display/FLINK/FLIP-43%3A+State+Processor+API) + + +### Stop-with-Savepoint (FLIP-34) + +[Cancelling with a +savepoint](https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html#operations) +is a common operation for stopping/restarting, forking or updating Flink jobs. +However, the existing implementation did not guarantee output persistence to +external storage systems for exactly-once sinks. To improve the end-to-end +semantics when stopping a job, Flink 1.9 introduces a new `SUSPEND` mode to +stop a job with a savepoint that is consistent with the emitted data. +You can suspend a job with Flink’s CLI client as follows: + +``` +bin/flink stop -s [:targetDirectory] :jobId +``` + +The final job state is set to `FINISHED` on success, allowing +users to detect failures of the requested operation. + +More details are available in +[FLIP-34](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103090212) + + + +### Flink WebUI Rework + +After a +[discussion](http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Change-underlying-Frontend-Architecture-for-Flink-Web-Dashboard-td24902.html) +about modernizing the internals of Flink’s WebUI, this component was +reconstructed using the latest stable version of Angular — basically, a bump +from Angular 1.x to 7.x. The redesigned version is the default in 1.9.0, +however there is a link to switch to the old WebUI. + +<div class="row"> <div class="col-sm-6"> <span><img class="thumbnail" + src="{{site.baseurl}}/img/blog/release-19-web1.png" /></span> </div> <div + class="col-sm-6"> <span><img class="thumbnail" + src="{{site.baseurl}}/img/blog/release-19-web2.png" /></span> </div> + </div> + +**Note:** Moving forward, feature parity for the old version of the WebUI +will not be guaranteed. + + +### Preview of the new Blink SQL Query Processor + +Following the [donation of +Blink]({{site.baseurl}}/news/2019/02/13/unified-batch-streaming-blink.html) to +Apache Flink, the community worked on integrating Blink’s query optimizer and +runtime for the Table API and SQL. As a first step, we refactored the +monolithic `flink-table` module into smaller modules +([FLIP-32](https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions)). +This resulted in a clear separation of and well-defined interfaces between the +Java and Scala API modules and the optimizer and runtime modules. + +<span><img style="width:50%" +src="{{site.baseurl}}/img/blog/release-19-stack.png" /></span> + +Next, we extended Blink’s planner to implement the new optimizer interface +such that there are now two pluggable query processors to execute Table API +and SQL statements: the pre-1.9 Flink processor and the new Blink-based query +processor. The Blink-based query processor offers better SQL coverage (full TPC-H +coverage in 1.9, TPC-DS coverage is planned for the next release) and improved +performance for batch queries as the result of more extensive query +optimization (cost-based plan selection and more optimization rules), improved +code-generation, and tuned operator implementations. +The Blink-based query processor also provides a more powerful streaming runner, +with some new features (e.g. dimension table join, TopN, deduplication) and +optimizations to solve data-skew in aggregation and more useful built-in +functions. + +**Note:** The semantics and set of supported operations of the query +processors are mostly, but not fully aligned. + +However, the integration of Blink’s query processor is not fully completed +yet. Therefore, the pre-1.9 Flink processor is still the default processor in +Flink 1.9 and recommended for production settings. You can enable the Blink +processor by configuring it via the `EnvironmentSettings` when creating a +`TableEnvironment`. The selected processor must be on the classpath of the +executing Java process. For cluster setups, both query processors are +automatically loaded with the default configuration. When running a query from +your IDE you need to explicitly [add a planner +dependency](https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/#table-program-dependencies) +to your project. + + +#### **Other Improvements to the Table API and SQL** + +Besides the exciting progress around the Blink planner, the community worked +on a whole set of other improvements to these interfaces, including: + + * **Scala-free Table API and SQL for Java users + ([FLIP-32](https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions))** + + As part of the refactoring and splitting of the flink-table module, two + separate API modules for Java and Scala were created. For Scala users, + nothing really changes, but Java users can use the Table API and/or SQL now + without pulling in a Scala dependency. + + * **Rework of the Table API Type System** + **([FLIP-37](https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System))** + + The community implemented a [new data type + system](https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/types.html#data-types) + to detach the Table API from Flink’s + [TypeInformation](https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/types_serialization.html#flinks-typeinformation-class) + class and improve its compliance with the SQL standard. This is still a + work in progress and expected to be completed in the next release. In + Flink 1.9, UDFs are―among other things―not ported to the new type system + yet. + + * **Multi-column and Multi-row Transformations for Table API** + **([FLIP-29](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97552739))** + + The functionality of the Table API was extended with a set of + transformations that support multi-row and/or multi-column inputs and + outputs. These transformations significantly ease the implementation of + processing logic that would be cumbersome to implement with relational + operators. + + * **New, Unified Catalog APIs** + **([FLIP-30](https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs))** + + We reworked the catalog APIs to store metadata and unified the handling of + internal and external catalogs. This effort was mainly initiated as a + prerequisite for the Hive integration (see below), but improves the overall + convenience of managing catalog metadata in Flink. Besides improving the + catalog interfaces, we also extended their functionality. Previously table + definitions for Table API or SQL queries were volatile. With Flink 1.9, the + metadata of tables which are registered with a SQL DDL statement can be + persisted in a catalog. This means you can add a table that is backed by a + Kafka topic to a Metastore catalog and from then on query this table + whenever your catalog is connected to Metastore. + + * **DDL Support in the SQL API + ([FLINK-10232](https://issues.apache.org/jira/browse/FLINK-10232))** + + Up to this point, Flink SQL only supported DML statements (e.g. `SELECT`, + `INSERT`). External tables (table sources and sinks) had to be registered + via Java/Scala code or configuration files. For 1.9, we added support for + SQL DDL statements to register and remove tables and views (`CREATE TABLE, + DROP TABLE)`. However, we did not add + stream-specific syntax extensions to define timestamp extraction and + watermark generation, yet. Full support for streaming use cases is planned + for the next release. + + +### Preview of Full Hive Integration (FLINK-10556) + +Apache Hive is widely used in Hadoop’s ecosystem to store and query large +amounts of structured data. Besides being a query processor, Hive features a +catalog called Metastore to manage and organize large datasets. A common +integration point for query processors is to integrate with Hive’s Metastore +in order to be able to tap into the data managed by Hive. + +Recently, the community started implementing an external catalog for Flink’s +Table API and SQL that connects to Hive’s Metastore. In Flink 1.9, users will +be able to query and process all data that is stored in Hive. As described +earlier, you will also be able to persist metadata of Flink tables in Metastore. +Moreover, the Hive integration includes support to use Hive’s UDFs in Flink +Table API or SQL queries. More details are available in +[FLINK-10556](https://issues.apache.org/jira/browse/FLINK-10556). + +While, previously, table definitions for Table API or SQL queries were always +volatile, the new catalog connector additionally allows persisting a table in +Metastore that is created with a SQL DDL statement (see above). This means +that you connect to Metastore and register a table that is, for example, +backed by a Kafka topic. From now on, you can query that table whenever your +catalog is connected to Metastore. + +Please note that the Hive support in Flink 1.9 is experimental. We are +planning to stabilize these features for the next release and are looking +forward to your feedback. + + +### Preview of the new Python Table API (FLIP-38) + +This release also introduces a first version of a Python Table API +([FLIP-38](https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API)). +This marks the start towards our goal of bringing +full-fledged Python support to Flink. The feature was designed as a slim +Python API wrapper around the Table API, basically translating Python Table +API method calls into Java Table API calls. In the initial version that ships +with Flink 1.9, the Python Table API does not support UDFs yet, but just +standard relational operations. Support for UDFs implemented in Python is on +the roadmap for future releases. + +If you’d like to try the new Python API, you have to manually [install +PyFlink](https://ci.apache.org/projects/flink/flink-docs-release-1.9/flinkDev/building.html#build-pyflink). +From there, you can have a look at [this +walkthrough](https://ci.apache.org/projects/flink/flink-docs-release-1.9/tutorials/python_table_api.html) +or explore it on your own. The [community is currently +working](http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Publish-the-PyFlink-into-PyPI-td31201.html) +on preparing a `pyflink` Python package that will be made available for +installation via `pip`. + + +## Important Changes + + * The Table API and SQL are now part of the default configuration of the + Flink distribution. Before, the Table API and SQL had to be enabled by + moving the corresponding JAR file from ./opt to ./lib. + * The machine learning library (flink-ml) has been removed in preparation for + [FLIP-39](https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo/edit). + * The old DataSet and DataStream Python APIs have been removed in favor of + [FLIP-38](https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API). + * Flink can be compiled and run on Java 9. Note that certain components + interacting with external systems (connectors, filesystems, reporters) may + not work since the respective projects may have skipped Java 9 support. + + +## Release Notes + +Please review the [release +notes](https://ci.apache.org/projects/flink/flink-docs-release-1.9/release-notes/flink-1.9.html) +for a more detailed list of changes and new features if you plan to upgrade +your Flink setup to Flink 1.9.0. + + +## List of Contributors + +We would like to thank all contributors who have made this release possible: + +Abdul Qadeer (abqadeer), Aitozi, Alberto Romero, Aleksey Pak, Alexander +Fedulov, Alice Yan, Aljoscha Krettek, Aloys, Andrew Duffy, Andrey Zagrebin, +Ankur, Artsem Semianenka, Benchao Li, Biao Liu, Bo WANG, Bowen L, Chesnay +Schepler, Clark Yang, Congxian Qiu, Cristian, Danny Chan, David Moravek, Dawid +Wysakowicz, Dian Fu, EronWright, Fabian Hueske, Fabio Lombardelli, Fokko +Driesprong, Gao Yun, Gary Yao, Gen Luo, Gyula Fora, Hequn Cheng, +Hongtao Zhang, Huang Xingbo, HuangXingBo, Hugo Da Cruz Louro, Humberto +Rodríguez A, Hwanju Kim, Igal Shilman, Jamie Grier, Jark Wu, Jason, Jasper +Yue, Jeff Zhang, Jiangjie (Becket) Qin, Jiezhi.G, Jincheng Sun, Jing Zhang, +Jingsong Lee, Juan Gentile, Jungtaek Lim, Kailash Dayanand, Kevin +Bohinski, Konstantin Knauf, Konstantinos Papadopoulos, Kostas Kloudas, Kurt +Young, Lakshmi, Lakshmi Gururaja Rao, Leeviiii, LouisXu, Maximilian Michels, +Nico Kruber, Niels Basjes, Paul Lam, PengFei Li, Peter Huang, Pierre Zemb, +Piotr Nowojski, Piyush Narang, Richard Deurwaarder, Robert Metzger, Robert +Stoll, Romano Vacca, Rong Rong, Rui Li, Ryantaocer, Scott Mitchell, Seth +Wiesman, Shannon Carey, Shimin Yang, Stefan Richter, Stephan Ewen, Stephen +Connolly, Steven Wu, SuXingLee, TANG Wen-hui, Thomas Weise, Till Rohrmann, +Timo Walther, Tom Goong, TsReaper, Tzu-Li (Gordon) Tai, Ufuk Celebi, +Victor Wong, WangHengwei, Wei Zhong, WeiZhong94, Xintong Song, Xpray, +XuQianJin-Stars, Xuefu Zhang, Xupingyong, Yangze Guo, Yu Li, Yun Gao, Yun +Tang, Zhanchun Zhang, Zhenghua Gao, Zhijiang, Zhu Zhu, Zili +Chen, aloys, arganzheng, azagrebin, bd2019us, beyond1920, biao.liub, +blueszheng, boshu Zheng, chenqi, chummyhe89, chunpinghe, dcadmin, +dianfu, godfrey he, guanghui01.rong, hehuiyuan, hello, hequn8128, +jackyyin, joongkeun.yang, klion26, lamber-ken, leesf, liguowei, +lincoln-lil, liyafan82, luoqi, mans2singh, maqingxiang, maxin, mjl, okidogi, +ozan, potseluev, qiangsi.lq, qiaoran, robbinli, shaoxuan-wang, shengqian.zhou, +shenlang.sl, shuai-xu, sunhaibotb, tianchen, tianchen92, +tison, tom_gong, vinoyang, vthinkxie, wanggeng3, wenhuitang, winifredtamg, +xl38154, xuyang1706, yangfei5, yanghua, yuzhao.cyz, +zhangxin516, zhangxinxing, zhaofaxian, zhijiang, zjuwangg, 林小铂, +黄培松, 时无两丶. + + + + diff --git a/img/blog/release-19-flip1.png b/img/blog/release-19-flip1.png new file mode 100755 index 0000000..dda2626 Binary files /dev/null and b/img/blog/release-19-flip1.png differ diff --git a/img/blog/release-19-stack.png b/img/blog/release-19-stack.png new file mode 100755 index 0000000..877b51f Binary files /dev/null and b/img/blog/release-19-stack.png differ diff --git a/img/blog/release-19-web1.png b/img/blog/release-19-web1.png new file mode 100755 index 0000000..1b8c8cb Binary files /dev/null and b/img/blog/release-19-web1.png differ diff --git a/img/blog/release-19-web2.png b/img/blog/release-19-web2.png new file mode 100755 index 0000000..6c29f44 Binary files /dev/null and b/img/blog/release-19-web2.png differ