Re: Re: Re: [Discuss]- Donate Iceberg Flink Connector
Thanks Abid, Count me in, and drop a note, if I can help in any way. Thanks, Peter On Tue, Oct 11, 2022, 20:13 wrote: > Hi Martijn, > > Yes catalog integration exists and catalogs can be created using Flink > SQL. > > https://iceberg.apache.org/docs/latest/flink/#creating-catalogs-and-using-catalogs > has more details. > We may need some discussion within Iceberg community but based on the > current iceberg-flink code structure we are looking to externalize this as > well. > > Thanks > Abid > > > On 2022/10/11 08:24:44 Martijn Visser wrote: > > Hi Abid, > > > > Thanks for the FLIP. I have a question about Iceberg's Catalog: has that > > integration between Flink and Iceberg been created already and are you > > looking to externalize that as well? > > > > Thanks, > > > > Martijn > > > > On Tue, Oct 11, 2022 at 12:14 AM wrote: > > > > > Hi Marton, > > > > > > Yes, we are initiating this as part of the Externalize Flink Connectors > > > effort. Plan is to externalize the existing Flink connector from > Iceberg > > > repo into a separate repo under the Flink umbrella. > > > > > > Sorry about the doc permissions! I was able to create a FLIP-267: > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP+267%3A+Iceberg+Connector > > > < > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP+267%3A+Iceberg+Connector > > > > > > > Lets use that to discuss. > > > > > > Thanks > > > Abid > > > > > > On 2022/10/10 12:57:32 Márton Balassi wrote: > > > > Hi Abid, > > > > > > > > Just to clarify does your suggestion mean that the Iceberg community > > > would > > > > like to remove the iceberg-flink connector from the Iceberg codebase > and > > > > maintain it under Flink instead? A new separate repo under the Flink > > > > project umbrella given the current existing effort to extract > connectors > > > to > > > > their individual repos (externalize) makes sense to me. > > > > > > > > [1] https://lists.apache.org/thread/mpzzlpob9ymkjfybm96vz2y2m5fjyvfo > > > > > > > > Best, > > > > Marton > > > > > > > > > > > > On Mon, Oct 10, 2022 at 5:31 AM Jingsong Li wrote: > > > > > > > > > Thanks Abid for driving. > > > > > > > > > > +1 for this. > > > > > > > > > > Can you open the permissions for > > > > > > > > > > > > > > https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing > > > > > ? > > > > > > > > > > Best, > > > > > Jingsong > > > > > > > > > > On Mon, Oct 10, 2022 at 9:22 AM Abid Mohammed > > > > > wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > I would like to start a discussion about contributing Iceberg > Flink > > > > > Connector to Flink. > > > > > > > > > > > > I created a doc < > > > > > > > > > https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing > > > > > > > > > with all the details following the Flink Connector template as I > don’t > > > have > > > > > permissions to create a FLIP yet. > > > > > > High level details are captured below: > > > > > > > > > > > > Motivation: > > > > > > > > > > > > This FLIP aims to contribute the existing Apache Iceberg Flink > > > Connector > > > > > to Flink. > > > > > > > > > > > > Apache Iceberg is an open table format for huge analytic > datasets. > > > > > Iceberg adds tables to compute engines including Spark, Trino, > > > PrestoDB, > > > > > Flink, Hive and Impala using a high-performance table format that > works > > > > > just like a SQL table. > > > > > > Iceberg avoids unpleasant surprises. Schema evolution works and > won’t > > > > > inadvertently un-delete data. Users don’t need to know about > > > partitioning > > > > > to get fast queries. Iceberg was designed to solve correctness > > > problems in > > > > > eventually-consistent cloud object stores. > > > > > > > > > > > > Iceberg supports both Flink’s DataStream API and Table API. > Based on > > > the > > > > > guideline of the Flink community, only the latest 2 minor versions > are > > > > > actively maintained. See the Multi-Engine Support#apache-flink for > > > further > > > > > details. > > > > > > > > > > > > > > > > > > Iceberg connector supports: > > > > > > > > > > > > • Source: detailed Source design < > > > > > > > > > https://docs.google.com/document/d/1q6xaBxUPFwYsW9aXWxYUh7die6O7rDeAPFQcTAMQ0GM/edit# > > > >, > > > > > based on FLIP-27 > > > > > > • Sink: detailed Sink design and interfaces used < > > > > > > > > > https://docs.google.com/document/d/1O-dPaFct59wUWQECXEEYIkl9_MOoG3zTbC2V-fZRwrg/edit# > > > > > > > > > > > > • Usable in both DataStream and Table API/SQL > > > > > > • DataStream read/append/overwrite > > > > > > • SQL create/alter/drop table, select, insert into, > insert > > > > > overwrite > > > > > > • Streaming or batch read in Java API > > > > > > • Support for Flink’s Python API > > > > > > > > > > > > See Iceberg Flink < > > > https://iceberg.apache.org/docs/latest/flink/#flink>for > > > > > detailed
Re: State of the Rescale API
Thanks for the attention to the rescale api. Dynamic resource adjust is useful for streaming jobs since the throughput can change in different time. The rescale api is a lightweight way to change the job's parallelism. This is importance for some jobs, for example, the jobs are in activities or related to money which can not be delayed. In our production scenario,we have supported a simple rescale api which may be not perfect. By this chance, I suggest to support the rescale api in adaptive scheduler for auto-scaling. Chesnay Schepler 于2022年10月11日周二 20:36写道: > The AdaptiveScheduler is not limited to reactive mode. There are no > deployment limitations for the scheduler itself. > In a nutshell, all that reactive mode does is crank the target > parallelism to infinity, when usually it is the parallelism the user has > set in the job/configuration. > > I think it would be fine if a new/revised rescale API were only > available in the Adaptive Scheduler (without reactive mode!) for starters. > We'd require way more stuff to make this useful for batch workloads. > > On 10/10/2022 16:47, Maximilian Michels wrote: > > Hey Gyula, > > > > Is the Adaptive Scheduler limited to the Reactive mode? I agree that if > we > > move forward with the Adaptive Scheduler solution it should support all > > deployment scenarios. > > > > Thanks, > > -Max > > > > On Sun, Oct 9, 2022 at 6:10 AM Gyula Fóra wrote: > > > >> Hi! > >> > >> I think we have to make sure that the Rescale API will work also without > >> the adaptive scheduler (for instance when we are running Flink with the > >> Kubernetes Native Integration or in other cases where the adaptive > >> scheduler is not supported). > >> > >> What do you think? > >> > >> Cheers > >> Gyula > >> > >> > >> > >> On Fri, Oct 7, 2022 at 3:50 PM Maximilian Michels > wrote: > >> > >>> We've been looking into ways to support programmatic rescaling of job > >>> vertices. This feature is typically required for any type of Flink > >>> autoscaler which does not merely set the default parallelism but > adjusts > >>> the parallelisms on a JobVertex level. > >>> > >>> We've had an initial discussion here: > >>> https://issues.apache.org/jira/browse/FLINK-29501 where Chesnay > suggested > >>> to use the infamous "rescaling" API: > >>> > >>> > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-rescaling > >>> This API is disabled as of > >>> https://issues.apache.org/jira/browse/FLINK-12312 > >>> . > >>> > >>> Since there is the Adaptive Scheduler in Flink now, it may be feasible > to > >>> re-enable the API (at least for streaming jobs) and allow overriding > the > >>> parallelism of job vertices in addition to the default parallelism. > >>> > >>> Any thoughts? > >>> > >>> -Max > >>> > >
[jira] [Created] (FLINK-29590) Fix constant fold issue for HiveDialect
luoyuxia created FLINK-29590: Summary: Fix constant fold issue for HiveDialect Key: FLINK-29590 URL: https://issues.apache.org/jira/browse/FLINK-29590 Project: Flink Issue Type: Bug Components: Connectors / Hive Affects Versions: 1.16.0 Reporter: luoyuxia -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer
Congratulations Danny! Best regards, Yuxia - 原始邮件 - 发件人: "Xingbo Huang" 收件人: "dev" 发送时间: 星期三, 2022年 10 月 12日 上午 9:44:22 主题: Re: [ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer Congratulations Danny! Best, Xingbo Sergey Nuyanzin 于2022年10月12日周三 01:26写道: > Congratulations, Danny > > On Tue, Oct 11, 2022, 15:18 Lincoln Lee wrote: > > > Congratulations Danny! > > > > Best, > > Lincoln Lee > > > > > > Congxian Qiu 于2022年10月11日周二 19:42写道: > > > > > Congratulations Danny! > > > > > > Best, > > > Congxian > > > > > > > > > Leonard Xu 于2022年10月11日周二 18:03写道: > > > > > > > Congratulations Danny! > > > > > > > > > > > > Best, > > > > Leonard > > > > > > > > > > > > > >
Re: [ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer
Congratulations Danny! Best, Xingbo Sergey Nuyanzin 于2022年10月12日周三 01:26写道: > Congratulations, Danny > > On Tue, Oct 11, 2022, 15:18 Lincoln Lee wrote: > > > Congratulations Danny! > > > > Best, > > Lincoln Lee > > > > > > Congxian Qiu 于2022年10月11日周二 19:42写道: > > > > > Congratulations Danny! > > > > > > Best, > > > Congxian > > > > > > > > > Leonard Xu 于2022年10月11日周二 18:03写道: > > > > > > > Congratulations Danny! > > > > > > > > > > > > Best, > > > > Leonard > > > > > > > > > > > > > >
Re: [DISCUSS] Updating Flink HBase Connectors
Hi Martijn, Thank you for your comment. About HBase 2.x, correct, that is my thought process, but it has to be tested and verified. +1 from my side about merging these updates with the connector externalization. Best, F --- Original Message --- On Tuesday, October 11th, 2022 at 16:30, Martijn Visser wrote: > > > Hi Ferenc, > > Thanks for opening the discussion on this topic! > > +1 for dropping HBase 1.x. > > Regarding HBase 2.x, if I understand correctly it should be possible to > connect to any 2.x cluster if you're using the 2.x client. Wouldn't it make > more sense to always support the latest available version, so basically 2.5 > at the moment? We could always include a test to check that implementation > against an older HBase version. > > I also have a follow-up question already: if there's an agreement on this > topic, does it make sense to directly build a new HBase connector in its > own external connector repo, since I believe the current connector uses the > old source/sink interfaces. We could then directly drop the ones in the > Flink repo and replace it with new implementations? > > Best regards, > > Martijn > > Op ma 10 okt. 2022 om 16:24 schreef Ferenc Csaky > > Hi everyone, > > > > Now that the connector externalization effort ig going on, I think it is > > definitely work to revisit the currently supported HBase versions for the > > Flink connector. Currently, ther is an HBase 1.4 and HBase 2.2 connector > > versions, although both of those versions are kind of outdated. > > > > From the HBase point of view the following can be considered [1]: > > > > - HBase 1.x is dead, so on the way forward it should be safe to drop it. > > - HBase 2.2 is EoL, but still used actively, we are also supporting it ins > > some of our still active releases as Cloudera. > > - HBase 2.4 is the main thing now and probably will be supported for a > > while (by us, definitely). > > - HBase 2.5 just came out, but 2.6 is expected pretty soon, so it is > > possible it won't live long. > > - HBase 3 is in alpha, but shooting for that probably would be early in > > the near future. > > > > In addition, if we are only using the standard HBase 2.x client APIs, then > > it should be possible to be compile it with any Hbase 2.x version. Also, > > any HBase 2.x cluster should be backwards compatible with all earlier HBase > > 2.x client libraries. I did not check this part thorougly but I think this > > should be true, so ideally it would be enough to have an HBase 2.4 > > connector. [2] > > > > Looking forward to your opinions about this topic. > > > > Best, > > F > > > > [1] https://hbase.apache.org/downloads.html > > [2] https://hbase.apache.org/book.html#hbase.versioning.post10 (Client > > API compatibility) > > > -- > Martijn > https://twitter.com/MartijnVisser82 > https://github.com/MartijnVisser
[jira] [Created] (FLINK-29589) Data Loss in Sink GlobalCommitter during Task Manager recovery
Krzysztof Chmielewski created FLINK-29589: - Summary: Data Loss in Sink GlobalCommitter during Task Manager recovery Key: FLINK-29589 URL: https://issues.apache.org/jira/browse/FLINK-29589 Project: Flink Issue Type: Bug Affects Versions: 1.14.6, 1.14.5, 1.14.4, 1.14.3, 1.14.2, 1.14.0 Reporter: Krzysztof Chmielewski Flink's Sink's architecture with global committer seems to be vulnerable for data loss during Task Manager recovery. The entire checkpoint can be lost by GlobalCommitter resulting with data loss for sinks. Issue was observed in Delta Sink connector on a real 1.14.x cluster and was replicated using Flink's 1.14.6 Test Utils classes. Scenario: 1. Streaming source emitting constant number of events per checkpoint (20 events for 5 commits) 2. Sink with parallelism > 1 with committer and GlobalCommitter elements. 3. Commitaers processed committables for checkpointId 2. 3. GlobalCommitter throws exception (desired exception) during checkpointId 2 (third commit) while processing data from checkpoint 1 (it is expected to global committer architecture lag one commit behind in reference to rest of the pipeline). 4. Streaming source ends 5. we are missing 20 records (one checkpoint). What is happening is that during recovery, committers are performing "retry" on committables for checkpointId 2, however those committables, reprocessed from "retry" task are not emit to the global committer. The issue can be reproduced using Junit Test builded with Flink's TestSink. The test was implemented [here|https://github.com/kristoffSC/flink/blob/Flink_1.14_DataLoss_SinkGlobalCommitter/flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/SinkITCase.java#:~:text=testGlobalCommitterMissingRecordsDuringRecovery]. I believe that problem is somewhere around `SinkOperator::notifyCheckpointComplete` method. In there we see that Retry async task is scheduled however its result is never emitted downstream like it is done for regular flow one line above. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] Updating Flink HBase Connectors
Hi Ferenc, Thanks for opening the discussion on this topic! +1 for dropping HBase 1.x. Regarding HBase 2.x, if I understand correctly it should be possible to connect to any 2.x cluster if you're using the 2.x client. Wouldn't it make more sense to always support the latest available version, so basically 2.5 at the moment? We could always include a test to check that implementation against an older HBase version. I also have a follow-up question already: if there's an agreement on this topic, does it make sense to directly build a new HBase connector in its own external connector repo, since I believe the current connector uses the old source/sink interfaces. We could then directly drop the ones in the Flink repo and replace it with new implementations? Best regards, Martijn Op ma 10 okt. 2022 om 16:24 schreef Ferenc Csaky > Hi everyone, > > Now that the connector externalization effort ig going on, I think it is > definitely work to revisit the currently supported HBase versions for the > Flink connector. Currently, ther is an HBase 1.4 and HBase 2.2 connector > versions, although both of those versions are kind of outdated. > > From the HBase point of view the following can be considered [1]: > > - HBase 1.x is dead, so on the way forward it should be safe to drop it. > - HBase 2.2 is EoL, but still used actively, we are also supporting it ins > some of our still active releases as Cloudera. > - HBase 2.4 is the main thing now and probably will be supported for a > while (by us, definitely). > - HBase 2.5 just came out, but 2.6 is expected pretty soon, so it is > possible it won't live long. > - HBase 3 is in alpha, but shooting for that probably would be early in > the near future. > > In addition, if we are only using the standard HBase 2.x client APIs, then > it should be possible to be compile it with any Hbase 2.x version. Also, > any HBase 2.x cluster should be backwards compatible with all earlier HBase > 2.x client libraries. I did not check this part thorougly but I think this > should be true, so ideally it would be enough to have an HBase 2.4 > connector. [2] > > Looking forward to your opinions about this topic. > > Best, > F > > [1] https://hbase.apache.org/downloads.html > [2] https://hbase.apache.org/book.html#hbase.versioning.post10 (Client > API compatibility) -- Martijn https://twitter.com/MartijnVisser82 https://github.com/MartijnVisser
RE: Re: Re: [Discuss]- Donate Iceberg Flink Connector
Hi Martijn, Yes catalog integration exists and catalogs can be created using Flink SQL. https://iceberg.apache.org/docs/latest/flink/#creating-catalogs-and-using-catalogs has more details. We may need some discussion within Iceberg community but based on the current iceberg-flink code structure we are looking to externalize this as well. Thanks Abid On 2022/10/11 08:24:44 Martijn Visser wrote: > Hi Abid, > > Thanks for the FLIP. I have a question about Iceberg's Catalog: has that > integration between Flink and Iceberg been created already and are you > looking to externalize that as well? > > Thanks, > > Martijn > > On Tue, Oct 11, 2022 at 12:14 AM wrote: > > > Hi Marton, > > > > Yes, we are initiating this as part of the Externalize Flink Connectors > > effort. Plan is to externalize the existing Flink connector from Iceberg > > repo into a separate repo under the Flink umbrella. > > > > Sorry about the doc permissions! I was able to create a FLIP-267: > > https://cwiki.apache.org/confluence/display/FLINK/FLIP+267%3A+Iceberg+Connector > > < > > https://cwiki.apache.org/confluence/display/FLINK/FLIP+267%3A+Iceberg+Connector > > > > > Lets use that to discuss. > > > > Thanks > > Abid > > > > On 2022/10/10 12:57:32 Márton Balassi wrote: > > > Hi Abid, > > > > > > Just to clarify does your suggestion mean that the Iceberg community > > would > > > like to remove the iceberg-flink connector from the Iceberg codebase and > > > maintain it under Flink instead? A new separate repo under the Flink > > > project umbrella given the current existing effort to extract connectors > > to > > > their individual repos (externalize) makes sense to me. > > > > > > [1] https://lists.apache.org/thread/mpzzlpob9ymkjfybm96vz2y2m5fjyvfo > > > > > > Best, > > > Marton > > > > > > > > > On Mon, Oct 10, 2022 at 5:31 AM Jingsong Li wrote: > > > > > > > Thanks Abid for driving. > > > > > > > > +1 for this. > > > > > > > > Can you open the permissions for > > > > > > > > > > https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing > > > > ? > > > > > > > > Best, > > > > Jingsong > > > > > > > > On Mon, Oct 10, 2022 at 9:22 AM Abid Mohammed > > > > wrote: > > > > > > > > > > Hi, > > > > > > > > > > I would like to start a discussion about contributing Iceberg Flink > > > > Connector to Flink. > > > > > > > > > > I created a doc < > > > > > > https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing > > > > > > > with all the details following the Flink Connector template as I don’t > > have > > > > permissions to create a FLIP yet. > > > > > High level details are captured below: > > > > > > > > > > Motivation: > > > > > > > > > > This FLIP aims to contribute the existing Apache Iceberg Flink > > Connector > > > > to Flink. > > > > > > > > > > Apache Iceberg is an open table format for huge analytic datasets. > > > > Iceberg adds tables to compute engines including Spark, Trino, > > PrestoDB, > > > > Flink, Hive and Impala using a high-performance table format that works > > > > just like a SQL table. > > > > > Iceberg avoids unpleasant surprises. Schema evolution works and won’t > > > > inadvertently un-delete data. Users don’t need to know about > > partitioning > > > > to get fast queries. Iceberg was designed to solve correctness > > problems in > > > > eventually-consistent cloud object stores. > > > > > > > > > > Iceberg supports both Flink’s DataStream API and Table API. Based on > > the > > > > guideline of the Flink community, only the latest 2 minor versions are > > > > actively maintained. See the Multi-Engine Support#apache-flink for > > further > > > > details. > > > > > > > > > > > > > > > Iceberg connector supports: > > > > > > > > > > • Source: detailed Source design < > > > > > > https://docs.google.com/document/d/1q6xaBxUPFwYsW9aXWxYUh7die6O7rDeAPFQcTAMQ0GM/edit# > > >, > > > > based on FLIP-27 > > > > > • Sink: detailed Sink design and interfaces used < > > > > > > https://docs.google.com/document/d/1O-dPaFct59wUWQECXEEYIkl9_MOoG3zTbC2V-fZRwrg/edit# > > > > > > > > > > • Usable in both DataStream and Table API/SQL > > > > > • DataStream read/append/overwrite > > > > > • SQL create/alter/drop table, select, insert into, insert > > > > overwrite > > > > > • Streaming or batch read in Java API > > > > > • Support for Flink’s Python API > > > > > > > > > > See Iceberg Flink < > > https://iceberg.apache.org/docs/latest/flink/#flink>for > > > > detailed usage instructions. > > > > > > > > > > Looking forward to the discussion! > > > > > > > > > > Thanks > > > > > Abid > > > > > > > >
[jira] [Created] (FLINK-29588) Add Flink Version and Application Version to Savepoint properties
Clara Xiong created FLINK-29588: --- Summary: Add Flink Version and Application Version to Savepoint properties Key: FLINK-29588 URL: https://issues.apache.org/jira/browse/FLINK-29588 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Affects Versions: kubernetes-operator-1.2.0 Reporter: Clara Xiong It is common that users need to upgrade long running FlinkDeployment's. An application upgrade or a major Flink upgrade might break the applications due to schema incompatibilities, especially the latter([Link)|https://nightlies.apache.org/flink/flink-docs-master/docs/ops/upgrading/#compatibility-table]. In this case, the users need to manually restore to a savepoint that is compatible with the version the user wants to try or re-process. Currently Flink Operator returns a list of completed Savepoint's for a FlinkDeployment. It will be helpful that the Savepoint's in SavepointHistory have the properties for Flink Version and application version so users can easily determine which savepoint to use. * {{String flinkVersion}} * {{String appVersion}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer
Congratulations, Danny On Tue, Oct 11, 2022, 15:18 Lincoln Lee wrote: > Congratulations Danny! > > Best, > Lincoln Lee > > > Congxian Qiu 于2022年10月11日周二 19:42写道: > > > Congratulations Danny! > > > > Best, > > Congxian > > > > > > Leonard Xu 于2022年10月11日周二 18:03写道: > > > > > Congratulations Danny! > > > > > > > > > Best, > > > Leonard > > > > > > > > >
Re: [ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer
Congratulations Danny! Best, Lincoln Lee Congxian Qiu 于2022年10月11日周二 19:42写道: > Congratulations Danny! > > Best, > Congxian > > > Leonard Xu 于2022年10月11日周二 18:03写道: > > > Congratulations Danny! > > > > > > Best, > > Leonard > > > > >
Re: State of the Rescale API
The AdaptiveScheduler is not limited to reactive mode. There are no deployment limitations for the scheduler itself. In a nutshell, all that reactive mode does is crank the target parallelism to infinity, when usually it is the parallelism the user has set in the job/configuration. I think it would be fine if a new/revised rescale API were only available in the Adaptive Scheduler (without reactive mode!) for starters. We'd require way more stuff to make this useful for batch workloads. On 10/10/2022 16:47, Maximilian Michels wrote: Hey Gyula, Is the Adaptive Scheduler limited to the Reactive mode? I agree that if we move forward with the Adaptive Scheduler solution it should support all deployment scenarios. Thanks, -Max On Sun, Oct 9, 2022 at 6:10 AM Gyula Fóra wrote: Hi! I think we have to make sure that the Rescale API will work also without the adaptive scheduler (for instance when we are running Flink with the Kubernetes Native Integration or in other cases where the adaptive scheduler is not supported). What do you think? Cheers Gyula On Fri, Oct 7, 2022 at 3:50 PM Maximilian Michels wrote: We've been looking into ways to support programmatic rescaling of job vertices. This feature is typically required for any type of Flink autoscaler which does not merely set the default parallelism but adjusts the parallelisms on a JobVertex level. We've had an initial discussion here: https://issues.apache.org/jira/browse/FLINK-29501 where Chesnay suggested to use the infamous "rescaling" API: https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-rescaling This API is disabled as of https://issues.apache.org/jira/browse/FLINK-12312 . Since there is the Adaptive Scheduler in Flink now, it may be feasible to re-enable the API (at least for streaming jobs) and allow overriding the parallelism of job vertices in addition to the default parallelism. Any thoughts? -Max
[jira] [Created] (FLINK-29587) Fail to generate code for SearchOperator
luoyuxia created FLINK-29587: Summary: Fail to generate code for SearchOperator Key: FLINK-29587 URL: https://issues.apache.org/jira/browse/FLINK-29587 Project: Flink Issue Type: Bug Components: Table SQL / Planner, Table SQL / Runtime Reporter: luoyuxia Can be reproduced with the following code with Hive dialect {code:java} // hive dialect tableEnv.executeSql("create table table1 (id int, val string, val1 string, dimid int)"); tableEnv.executeSql("create table table3 (id int)"); CollectionUtil.iteratorToList( tableEnv.executeSql( "select table1.id, table1.val, table1.val1 from table1 left semi join" + " table3 on table1.dimid = table3.id and table3.id = 100 where table1.dimid = 200") .collect());{code} The plan is {code:java} LogicalSink(table=[*anonymous_collect$1*], fields=[id, val, val1]) LogicalProject(id=[$0], val=[$1], val1=[$2]) LogicalFilter(condition=[=($3, 200)]) LogicalJoin(condition=[AND(=($3, $4), =($4, 100))], joinType=[semi]) LogicalTableScan(table=[[test-catalog, default, table1]]) LogicalTableScan(table=[[test-catalog, default, table3]])BatchPhysicalSink(table=[*anonymous_collect$1*], fields=[id, val, val1]) BatchPhysicalNestedLoopJoin(joinType=[LeftSemiJoin], where=[$f1], select=[id, val, val1], build=[right]) BatchPhysicalCalc(select=[id, val, val1], where=[=(dimid, 200)]) BatchPhysicalTableSourceScan(table=[[test-catalog, default, table1]], fields=[id, val, val1, dimid]) BatchPhysicalExchange(distribution=[broadcast]) BatchPhysicalCalc(select=[SEARCH(id, Sarg[]) AS $f1]) BatchPhysicalTableSourceScan(table=[[test-catalog, default, table3]], fields=[id]) {code} But it'll throw exception when generate code for it. The exception is {code:java} java.util.NoSuchElementException at com.google.common.collect.ImmutableRangeSet.span(ImmutableRangeSet.java:203) at org.apache.calcite.util.Sarg.isComplementedPoints(Sarg.java:148) at org.apache.flink.table.planner.codegen.calls.SearchOperatorGen$.generateSearch(SearchOperatorGen.scala:87) at org.apache.flink.table.planner.codegen.ExprCodeGenerator.visitCall(ExprCodeGenerator.scala:474) at org.apache.flink.table.planner.codegen.ExprCodeGenerator.visitCall(ExprCodeGenerator.scala:57) at org.apache.calcite.rex.RexCall.accept(RexCall.java:174) at org.apache.flink.table.planner.codegen.ExprCodeGenerator.generateExpression(ExprCodeGenerator.scala:143) at org.apache.flink.table.planner.codegen.CalcCodeGenerator$.$anonfun$generateProcessCode$4(CalcCodeGenerator.scala:140) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:58) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:51) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike.map(TraversableLike.scala:233) at scala.collection.TraversableLike.map$(TraversableLike.scala:226) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.flink.table.planner.codegen.CalcCodeGenerator$.produceProjectionCode$1(CalcCodeGenerator.scala:140) at org.apache.flink.table.planner.codegen.CalcCodeGenerator$.generateProcessCode(CalcCodeGenerator.scala:164) at org.apache.flink.table.planner.codegen.CalcCodeGenerator$.generateCalcOperator(CalcCodeGenerator.scala:49) at org.apache.flink.table.planner.codegen.CalcCodeGenerator.generateCalcOperator(CalcCodeGenerator.scala) at org.apache.flink.table.planner.plan.nodes.exec.common.CommonExecCalc.translateToPlanInternal(CommonExecCalc.java:100) at org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase.translateToPlan(ExecNodeBase.java:158) at org.apache.flink.table.planner.plan.nodes.exec.ExecEdge.translateToPlan(ExecEdge.java:257) at org.apache.flink.table.planner.plan.nodes.exec.batch.BatchExecExchange.translateToPlanInternal(BatchExecExchange.java:136) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29586) Let write buffer spillable
Jingsong Lee created FLINK-29586: Summary: Let write buffer spillable Key: FLINK-29586 URL: https://issues.apache.org/jira/browse/FLINK-29586 Project: Flink Issue Type: Improvement Components: Table Store Reporter: Jingsong Lee Assignee: Jingsong Lee Fix For: table-store-0.3.0 Column format and remote DFS may greatly affect the performance of compaction. We can change the writeBuffer to spillable to improve the performance. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29585) Migrate TableSchema to Schema for Hive connector
luoyuxia created FLINK-29585: Summary: Migrate TableSchema to Schema for Hive connector Key: FLINK-29585 URL: https://issues.apache.org/jira/browse/FLINK-29585 Project: Flink Issue Type: Improvement Components: Connectors / Hive Reporter: luoyuxia `TableSchema` is deprecated, we should migrate it to Schema -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29584) Upgrade java 11 version on the microbenchmark worker
Piotr Nowojski created FLINK-29584: -- Summary: Upgrade java 11 version on the microbenchmark worker Key: FLINK-29584 URL: https://issues.apache.org/jira/browse/FLINK-29584 Project: Flink Issue Type: Improvement Components: Benchmarks Affects Versions: 1.17.0 Reporter: Piotr Nowojski Assignee: Piotr Nowojski Fix For: 1.17.0 It looks like the older JDK 11 builds have problems with JIT in the microbenchmarks, as for example visible [here|http://codespeed.dak8s.net:8000/timeline/?ben=globalWindow=2]. Locally I was able to reproduce this problem and the issue goes away after upgrading to a newer JDK 11 build. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29583) Ensure correct subtaskId and checkpointId is set during committer state migration
Fabian Paul created FLINK-29583: --- Summary: Ensure correct subtaskId and checkpointId is set during committer state migration Key: FLINK-29583 URL: https://issues.apache.org/jira/browse/FLINK-29583 Project: Flink Issue Type: Bug Components: Connectors / Common Affects Versions: 1.15.2, 1.16.0, 1.17.0 Reporter: Fabian Paul We already discovered two problems during recovery of committers when a post commit topology is used * https://issues.apache.org/jira/browse/FLINK-29509 * https://issues.apache.org/jira/browse/FLINK-29512 Both problems also apply when recovering Flink 1.14 unified sinks committer state and migrate it to the extended unified model. As part of this ticket we should fix both issues for the migration and also increase the test coverage for the migration cases i.e. add test cases in CommitterOperatorTest and CommittableCollectorSerializerTest. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer
Congratulations Danny! Best, Congxian Leonard Xu 于2022年10月11日周二 18:03写道: > Congratulations Danny! > > > Best, > Leonard > >
[jira] [Created] (FLINK-29582) SavepointWriter should be usable without any transformation
Chesnay Schepler created FLINK-29582: Summary: SavepointWriter should be usable without any transformation Key: FLINK-29582 URL: https://issues.apache.org/jira/browse/FLINK-29582 Project: Flink Issue Type: Improvement Components: API / State Processor Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.17.0 The SavepointWriter of the state processor API currently enforces at least one transformation to be defined be the user. This is an irritating limitation; this means you can't use the API to delete a state or use the new uid remapping function from FLINK-29457 without specifying some dummy transformation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29581) Emit warning events for session job reconciliation exception
Xin Hao created FLINK-29581: --- Summary: Emit warning events for session job reconciliation exception Key: FLINK-29581 URL: https://issues.apache.org/jira/browse/FLINK-29581 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Xin Hao Same as FlinkDeployment, will be useful for monitoring. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29580) pulsar.consumer.autoUpdatePartitionsIntervalSeconds doesn't works and should be removed
Yufan Sheng created FLINK-29580: --- Summary: pulsar.consumer.autoUpdatePartitionsIntervalSeconds doesn't works and should be removed Key: FLINK-29580 URL: https://issues.apache.org/jira/browse/FLINK-29580 Project: Flink Issue Type: Bug Components: Connectors / Pulsar Affects Versions: 1.15.2, 1.17.0, 1.16.1 Reporter: Yufan Sheng -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] Reverting sink metric name changes made in 1.15
Currently I think that would be a mistake. Ultimately what we have here is the culmination of us never really considering how the numRecordsOut metric should behave for operators that emit data to other operators _and_ external systems. This goes beyond sinks. This even applies to numRecordsIn, for cases where functions query/write data from/to the outside, (e.g., Async IO). Having 2 separate metrics for that, 1 exclusively for internal data transfers, and 1 exclusively for external data transfers, is the only way to get a consistent metric definition in the long-run. We can jump back-and-forth now or just commit to it. I don't think we can really judge this based on FLIP-33. It was IIRC written before the two phase sinks were added, which heavily blurred the lines of what a sink even is. Because it definitely is _not_ the last operator in a chain anymore. What I would suggest is to stick with what we got (although I despise the name numRecordsSend), and alias the numRecordsOut metric for all non-TwoPhaseCommittingSink. On 11/10/2022 05:54, Qingsheng Ren wrote: Thanks for the details Chesnay! By “alias” I mean to respect the original definition made in FLIP-33 for numRecordsOut, which is the number of records written to the external system, and keep numRecordsSend as the same value as numRecordsOut for compatibility. I think keeping numRecordsOut for the output to the external system is more intuitive to end users because in most cases the metric of data flow output is more essential. I agree with you that a new metric is required, but considering compatibility and users’ intuition I prefer to keep the initial definition of numRecordsOut in FLIP-33 and name a new metric for sink writer’s output to downstream operators. This might be against consistency with metrics in other operators in Flink but maybe it’s acceptable to have the sink as a special case. Best, Qingsheng On Oct 10, 2022, 19:13 +0800, Chesnay Schepler , wrote: > I’m with Xintong’s idea to treat numXXXSend as an alias of numXXXOut But that's not possible. If it were that simple there would have never been a need to introduce another metric in the first place. It's a rather fundamental issue with how the new sinks work, in that they emit data to the external system (usually considered as "numRecordsOut" of sinks) while _also_ sending data to a downstream operator (usually considered as "numRecordsOut" of tasks). The original issue was that the numRecordsOut of the sink counted both (which is completely wrong). A new metric was always required; otherwise you inevitably end up breaking /some/ semantic. Adding a new metric for what the sink writes to the external system is, for better or worse, more consistent with how these metrics usually work in Flink. On 10/10/2022 12:45, Qingsheng Ren wrote: Thanks everyone for joining the discussion! > Do you have any idea what has happened in the process here? The discussion in this PR [1] shows some details and could be helpful to understand the original motivation of the renaming. We do have a test case for guarding metrics but unfortunaly the case was also modified so the defense was broken. I think the reason why both the developer and the reviewer forgot to trigger an discussion and gave a green pass on the change is that metrics are quite “trivial” to be noticed as public APIs. As mentioned by Martijn I couldn’t find a place noting that metrics are public APIs and should be treated carefully while contributing and reviewing. IMHO three actions could be made to prevent this kind of changes in the future: a. Add test case for metrics (which we already have in SinkMetricsITCase) b. We emphasize that any public-interface breaking changes should be proposed by a FLIP or discussed in mailing list, and should be listed in the release note. c. We remind contributors and reviewers about what should be considered as public API, and include metric names in it. For b and c these two pages [2][3] might be proper places. About the patch to revert this, it looks like we have a consensus on 1.16. As of 1.15 I think it’s worthy to trigger a minor version. I didn’t see complaints about this for now so it should be OK to save the situation asap. I’m with Xintong’s idea to treat numXXXSend as an alias of numXXXOut considering there could possibly some users have already adapted their system to the new naming, and have another internal metric for reflecting number of outgoing committable batches (actually the numRecordsIn of sink committer operator should be carrying this info already). [1] https://github.com/apache/flink/pull/18825 [2] https://flink.apache.org/contributing/contribute-code.html [3] https://flink.apache.org/contributing/reviewing-prs.html Best, Qingsheng On Oct 10, 2022, 17:40 +0800, Xintong Song , wrote: +1 for reverting these changes in Flink 1.16. For 1.15.3, can we make these metrics available via both names
Re: [ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer
Congratulations Danny! Best, Leonard
[jira] [Created] (FLINK-29579) Flink parquet reader cannot read fully optional elements in a repeated list
Tiansu Yu created FLINK-29579: - Summary: Flink parquet reader cannot read fully optional elements in a repeated list Key: FLINK-29579 URL: https://issues.apache.org/jira/browse/FLINK-29579 Project: Flink Issue Type: Bug Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) Affects Versions: 1.13.3 Reporter: Tiansu Yu While trying to read a parquet file containing the following field as part of the schema, {code:java} optional group attribute_values (LIST) { repeated group list { optional group element { optional binary attribute_key_id (STRING); optional binary attribute_value_id (STRING); optional int32 pos; } } } {code} I encountered the following problem {code:java} Exception in thread "main" java.lang.UnsupportedOperationException: List field [optional binary attribute_key_id (STRING)] in List [attribute_values] has to be required. at org.apache.flink.formats.parquet.utils.ParquetSchemaConverter.convertGroupElementToArrayTypeInfo(ParquetSchemaConverter.java:338) at org.apache.flink.formats.parquet.utils.ParquetSchemaConverter.convertParquetTypeToTypeInfo(ParquetSchemaConverter.java:271) at org.apache.flink.formats.parquet.utils.ParquetSchemaConverter.convertFields(ParquetSchemaConverter.java:81) at org.apache.flink.formats.parquet.utils.ParquetSchemaConverter.fromParquetType(ParquetSchemaConverter.java:61) at org.apache.flink.formats.parquet.ParquetInputFormat.(ParquetInputFormat.java:120) at org.apache.flink.formats.parquet.ParquetRowInputFormat.(ParquetRowInputFormat.java:39) {code} The main code that raises the problem goes as follows: {code:java} private static ObjectArrayTypeInfo convertGroupElementToArrayTypeInfo( GroupType arrayFieldType, GroupType elementType) { for (Type type : elementType.getFields()) { if (!type.isRepetition(Type.Repetition.REQUIRED)) { throw new UnsupportedOperationException( String.format( "List field [%s] in List [%s] has to be required. ", type.toString(), arrayFieldType.getName())); } } return ObjectArrayTypeInfo.getInfoFor(convertParquetTypeToTypeInfo(elementType)); } {code} I am not very familiar with internals of Parquet schema. But the problem looks like to me is that Flink is too restrictive on repetition types inside certain nested fields. Would love to hear some feedbacks on this (improvements, corrections / workarounds). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29578) Multiple executions of `runVertexCentricIteration` cause memory errors (flink gelly)
Shunyang Li created FLINK-29578: --- Summary: Multiple executions of `runVertexCentricIteration` cause memory errors (flink gelly) Key: FLINK-29578 URL: https://issues.apache.org/jira/browse/FLINK-29578 Project: Flink Issue Type: Bug Components: Library / Graph Processing (Gelly) Affects Versions: 1.15.1 Environment: * Flink 1.15.1 * Java 11 * Gradle 7.4.2 Reporter: Shunyang Li Attachments: example.java When the runVertexCentricIteration function is executed twice it will throw an error: java.lang.IllegalArgumentException: Too few memory segments provided. Hash Table needs at least 33 memory segments. We solved this issue by setting the following configuration: ``` VertexCentricConfiguration parameter = new VertexCentricConfiguration(); parameter.setSolutionSetUnmanagedMemory(true); ``` However, the execution times cannot be greater than 4 (>4). After executing four times it will throw an error: java.lang.IllegalArgumentException: Too little memory provided to sorter to perform task. Required are at least 12 pages. Current page size is 32768 bytes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29577) Disable rocksdb wal when restore from full snapshot
Cai Liuyang created FLINK-29577: --- Summary: Disable rocksdb wal when restore from full snapshot Key: FLINK-29577 URL: https://issues.apache.org/jira/browse/FLINK-29577 Project: Flink Issue Type: Improvement Components: Runtime / State Backends Reporter: Cai Liuyang For now, RocksDBFullRestoreOperation and RocksDBHeapTimersFullRestoreOperation does's pass RocksDB::WriteOptions to RocksDBWriteBatchWrapper when restore kv-data, which will use RocksDBWriteBatchWrapper‘s default WriteOptions(doesn't disable rocksdb wal explicitly, see code below), so during restoring from full snapshot, wal is enabled(use more disk and affect rocksdb-write-performance when restoring) {code:java} // First: RocksDBHeapTimersFullRestoreOperation::restoreKVStateData() doesn't pass WriteOptions to RocksDBWriteBatchWrapper(null as default) private void restoreKVStateData( ThrowingIterator keyGroups, Map columnFamilies, Map> restoredPQStates) throws IOException, RocksDBException, StateMigrationException { // for all key-groups in the current state handle... try (RocksDBWriteBatchWrapper writeBatchWrapper = new RocksDBWriteBatchWrapper(this.rocksHandle.getDb(), writeBatchSize)) { HeapPriorityQueueSnapshotRestoreWrapper restoredPQ = null; ColumnFamilyHandle handle = null; .. } // Second: RocksDBWriteBatchWrapper::flush function doesn't disable wal explicitly when user doesn't pass WriteOptions to RocksDBWriteBatchWrapper public void flush() throws RocksDBException { if (options != null) { db.write(options, batch); } else { // use the default WriteOptions, if wasn't provided. try (WriteOptions writeOptions = new WriteOptions()) { db.write(writeOptions, batch); } } batch.clear(); } {code} As we known, rocksdb's wal is usesless for flink, so i think we can disable wal for RocksDBWriteBatchWrapper's default WriteOptions. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: Re: [Discuss]- Donate Iceberg Flink Connector
Hi Abid, Thanks for the FLIP. I have a question about Iceberg's Catalog: has that integration between Flink and Iceberg been created already and are you looking to externalize that as well? Thanks, Martijn On Tue, Oct 11, 2022 at 12:14 AM wrote: > Hi Marton, > > Yes, we are initiating this as part of the Externalize Flink Connectors > effort. Plan is to externalize the existing Flink connector from Iceberg > repo into a separate repo under the Flink umbrella. > > Sorry about the doc permissions! I was able to create a FLIP-267: > https://cwiki.apache.org/confluence/display/FLINK/FLIP+267%3A+Iceberg+Connector > < > https://cwiki.apache.org/confluence/display/FLINK/FLIP+267%3A+Iceberg+Connector > > > Lets use that to discuss. > > Thanks > Abid > > On 2022/10/10 12:57:32 Márton Balassi wrote: > > Hi Abid, > > > > Just to clarify does your suggestion mean that the Iceberg community > would > > like to remove the iceberg-flink connector from the Iceberg codebase and > > maintain it under Flink instead? A new separate repo under the Flink > > project umbrella given the current existing effort to extract connectors > to > > their individual repos (externalize) makes sense to me. > > > > [1] https://lists.apache.org/thread/mpzzlpob9ymkjfybm96vz2y2m5fjyvfo > > > > Best, > > Marton > > > > > > On Mon, Oct 10, 2022 at 5:31 AM Jingsong Li wrote: > > > > > Thanks Abid for driving. > > > > > > +1 for this. > > > > > > Can you open the permissions for > > > > > > > https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing > > > ? > > > > > > Best, > > > Jingsong > > > > > > On Mon, Oct 10, 2022 at 9:22 AM Abid Mohammed > > > wrote: > > > > > > > > Hi, > > > > > > > > I would like to start a discussion about contributing Iceberg Flink > > > Connector to Flink. > > > > > > > > I created a doc < > > > > https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing > > > > > with all the details following the Flink Connector template as I don’t > have > > > permissions to create a FLIP yet. > > > > High level details are captured below: > > > > > > > > Motivation: > > > > > > > > This FLIP aims to contribute the existing Apache Iceberg Flink > Connector > > > to Flink. > > > > > > > > Apache Iceberg is an open table format for huge analytic datasets. > > > Iceberg adds tables to compute engines including Spark, Trino, > PrestoDB, > > > Flink, Hive and Impala using a high-performance table format that works > > > just like a SQL table. > > > > Iceberg avoids unpleasant surprises. Schema evolution works and won’t > > > inadvertently un-delete data. Users don’t need to know about > partitioning > > > to get fast queries. Iceberg was designed to solve correctness > problems in > > > eventually-consistent cloud object stores. > > > > > > > > Iceberg supports both Flink’s DataStream API and Table API. Based on > the > > > guideline of the Flink community, only the latest 2 minor versions are > > > actively maintained. See the Multi-Engine Support#apache-flink for > further > > > details. > > > > > > > > > > > > Iceberg connector supports: > > > > > > > > • Source: detailed Source design < > > > > https://docs.google.com/document/d/1q6xaBxUPFwYsW9aXWxYUh7die6O7rDeAPFQcTAMQ0GM/edit# > >, > > > based on FLIP-27 > > > > • Sink: detailed Sink design and interfaces used < > > > > https://docs.google.com/document/d/1O-dPaFct59wUWQECXEEYIkl9_MOoG3zTbC2V-fZRwrg/edit# > > > > > > > > • Usable in both DataStream and Table API/SQL > > > > • DataStream read/append/overwrite > > > > • SQL create/alter/drop table, select, insert into, insert > > > overwrite > > > > • Streaming or batch read in Java API > > > > • Support for Flink’s Python API > > > > > > > > See Iceberg Flink < > https://iceberg.apache.org/docs/latest/flink/#flink>for > > > detailed usage instructions. > > > > > > > > Looking forward to the discussion! > > > > > > > > Thanks > > > > Abid > > > > >
Re: [VOTE] Apache Flink Table Store 0.2.1, release candidate #2
+1 (binding) - Checked release notes: *Action Required* * The fix version of FLINK-29554 is 0.2.1 but still open, please confirm whether this should be included or we should move it out of 0.2.1 - Checked sums and signatures: *OK* - Checked the jars in the staging repo: *OK* - Checked source distribution doesn't include binaries: *OK* - Maven clean install from source: *OK* - Checked version consistency in pom files: *OK* - Went through the quick start: *OK* * Verified with both flink 1.14.5 and 1.15.1 - Checked the website updates: *OK* * Minor: left some comments in the PR, please check Thanks for driving this release, Jingsong! Best Regards, Yu On Sat, 8 Oct 2022 at 10:25, Jingsong Li wrote: > Hi everyone, > > Please review and vote on the release candidate #2 for the version > 0.2.1 of Apache Flink Table Store, as follows: > > [ ] +1, Approve the release > [ ] -1, Do not approve the release (please provide specific comments) > > **Release Overview** > > As an overview, the release consists of the following: > a) Table Store canonical source distribution to be deployed to the > release repository at dist.apache.org > b) Table Store binary convenience releases to be deployed to the > release repository at dist.apache.org > c) Maven artifacts to be deployed to the Maven Central Repository > > **Staging Areas to Review** > > The staging areas containing the above mentioned artifacts are as follows, > for your review: > * All artifacts for a) and b) can be found in the corresponding dev > repository at dist.apache.org [2] > * All artifacts for c) can be found at the Apache Nexus Repository [3] > > All artifacts are signed with the key > 2C2B6A653B07086B65E4369F7C76245E0A318150 [4] > > Other links for your review: > * JIRA release notes [5] > * source code tag "release-0.2.1-rc2" [6] > * PR to update the website Downloads page to include Table Store links [7] > > **Vote Duration** > > The voting time will run for at least 72 hours. > It is adopted by majority approval, with at least 3 PMC affirmative votes. > > Best, > Jingsong Lee > > [1] > https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Table+Store+Release > [2] > https://dist.apache.org/repos/dist/dev/flink/flink-table-store-0.2.1-rc2/ > [3] > https://repository.apache.org/content/repositories/orgapacheflink-1539/ > [4] https://dist.apache.org/repos/dist/release/flink/KEYS > [5] > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352257 > [6] https://github.com/apache/flink-table-store/tree/release-0.2.1-rc2 > [7] https://github.com/apache/flink-web/pull/571 >
[SUMMARY] Flink 1.16 release sync of 2022-10-11
I would like to give you a brief update of the Flink 1.16 release sync meating of 2022-10-11. *rc1 is canceled because we found a metric compatibility change, which is expected to be fixed before Thursday https://issues.apache.org/jira/browse/FLINK-29567. We will prepare rc2 once this blocker issue is fixed.* *We still have some critical test stabilities[1] need to be resolved* For more information about Flink release 1.16, you can refer to https://cwiki.apache.org/confluence/display/FLINK/1.16+Release The next Flink release sync will be on Tuesday the 18th of October at 9am CEST/ 3pm China Standard Time / 7am UTC. The link could be found on the following page https://cwiki.apache.org/confluence/display/FLINK/1.16+Release#id-1.16Release-Syncmeeting . On behalf of all the release managers, best regards, Xingbo [1] https://issues.apache.org/jira/issues/?filter=12352149
[jira] [Created] (FLINK-29576) SourceNAryInputChainingITCase.testDirectSourcesOnlyExecution failed
Matthias Pohl created FLINK-29576: - Summary: SourceNAryInputChainingITCase.testDirectSourcesOnlyExecution failed Key: FLINK-29576 URL: https://issues.apache.org/jira/browse/FLINK-29576 Project: Flink Issue Type: Bug Components: API / DataStream Affects Versions: 1.17.0 Reporter: Matthias Pohl There's a [build failure|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41843=logs=5c8e7682-d68f-54d1-16a2-a09310218a49=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba=8523] being caused by {{SourceNAryInputChainingITCase.testDirectSourcesOnlyExecution}} on {{master}}: {code} Oct 11 01:45:36 [ERROR] Errors: Oct 11 01:45:36 [ERROR] SourceNAryInputChainingITCase.testDirectSourcesOnlyExecution:89 » Runtime Fail... Oct 11 01:45:36 [INFO] Oct 11 01:45:36 [ERROR] Tests run: 1931, Failures: 0, Errors: 1, Skipped: 4 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Flink falls back on to kryo serializer for GenericTypes
Hello, How to avoid flink's kryo serializer for GenericTypes ? Kryo is having some performance issues. Tried below but no luck. env.getConfig().disableForceKryo(); env.getConfig().enableForceAvro(); Tried this - env.getConfig().disableGenericTypes(); getting - Generic types have been disabled in the ExecutionConfig and type org.apache.avro.generic.GenericRecord is treated as a generic type Regards, Sucheth Shivakumar website : https://sucheths.com mobile : +1(650)-576-8050 San Mateo, United States