Re: Re: Re: [Discuss]- Donate Iceberg Flink Connector

2022-10-11 Thread Péter Váry
Thanks Abid,

Count me in, and drop a note, if I can help in any way.

Thanks,
Peter

On Tue, Oct 11, 2022, 20:13  wrote:

> Hi Martijn,
>
> Yes catalog integration exists and catalogs can be created using Flink
> SQL.
>
> https://iceberg.apache.org/docs/latest/flink/#creating-catalogs-and-using-catalogs
> has more details.
> We may need some discussion within Iceberg community but based on the
> current iceberg-flink code structure we are looking to externalize this as
> well.
>
> Thanks
> Abid
>
>
> On 2022/10/11 08:24:44 Martijn Visser wrote:
> > Hi Abid,
> >
> > Thanks for the FLIP. I have a question about Iceberg's Catalog: has that
> > integration between Flink and Iceberg been created already and are you
> > looking to externalize that as well?
> >
> > Thanks,
> >
> > Martijn
> >
> > On Tue, Oct 11, 2022 at 12:14 AM  wrote:
> >
> > > Hi Marton,
> > >
> > > Yes, we are initiating this as part of the Externalize Flink Connectors
> > > effort. Plan is to externalize the existing Flink connector from
> Iceberg
> > > repo into a separate repo under the Flink umbrella.
> > >
> > > Sorry about the doc permissions! I was able to create a FLIP-267:
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+267%3A+Iceberg+Connector
> > > <
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+267%3A+Iceberg+Connector
> > > >
> > > Lets use that to discuss.
> > >
> > > Thanks
> > > Abid
> > >
> > > On 2022/10/10 12:57:32 Márton Balassi wrote:
> > > > Hi Abid,
> > > >
> > > > Just to clarify does your suggestion mean that the Iceberg community
> > > would
> > > > like to remove the iceberg-flink connector from the Iceberg codebase
> and
> > > > maintain it under Flink instead? A new separate repo under the Flink
> > > > project umbrella given the current existing effort to extract
> connectors
> > > to
> > > > their individual repos (externalize) makes sense to me.
> > > >
> > > > [1] https://lists.apache.org/thread/mpzzlpob9ymkjfybm96vz2y2m5fjyvfo
> > > >
> > > > Best,
> > > > Marton
> > > >
> > > >
> > > > On Mon, Oct 10, 2022 at 5:31 AM Jingsong Li  wrote:
> > > >
> > > > > Thanks Abid for driving.
> > > > >
> > > > > +1 for this.
> > > > >
> > > > > Can you open the permissions for
> > > > >
> > > > >
> > >
> https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing
> > > > > ?
> > > > >
> > > > > Best,
> > > > > Jingsong
> > > > >
> > > > > On Mon, Oct 10, 2022 at 9:22 AM Abid Mohammed
> > > > >  wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I would like to start a discussion about contributing Iceberg
> Flink
> > > > > Connector to Flink.
> > > > > >
> > > > > > I created a doc <
> > > > >
> > >
> https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing
> > > >
> > > > > with all the details following the Flink Connector template as I
> don’t
> > > have
> > > > > permissions to create a FLIP yet.
> > > > > > High level details are captured below:
> > > > > >
> > > > > > Motivation:
> > > > > >
> > > > > > This FLIP aims to contribute the existing Apache Iceberg Flink
> > > Connector
> > > > > to Flink.
> > > > > >
> > > > > > Apache Iceberg is an open table format for huge analytic
> datasets.
> > > > > Iceberg adds tables to compute engines including Spark, Trino,
> > > PrestoDB,
> > > > > Flink, Hive and Impala using a high-performance table format that
> works
> > > > > just like a SQL table.
> > > > > > Iceberg avoids unpleasant surprises. Schema evolution works and
> won’t
> > > > > inadvertently un-delete data. Users don’t need to know about
> > > partitioning
> > > > > to get fast queries. Iceberg was designed to solve correctness
> > > problems in
> > > > > eventually-consistent cloud object stores.
> > > > > >
> > > > > > Iceberg supports both Flink’s DataStream API and Table API.
> Based on
> > > the
> > > > > guideline of the Flink community, only the latest 2 minor versions
> are
> > > > > actively maintained. See the Multi-Engine Support#apache-flink for
> > > further
> > > > > details.
> > > > > >
> > > > > >
> > > > > > Iceberg connector supports:
> > > > > >
> > > > > > • Source: detailed Source design <
> > > > >
> > >
> https://docs.google.com/document/d/1q6xaBxUPFwYsW9aXWxYUh7die6O7rDeAPFQcTAMQ0GM/edit#
> > > >,
> > > > > based on FLIP-27
> > > > > > • Sink: detailed Sink design and interfaces used <
> > > > >
> > >
> https://docs.google.com/document/d/1O-dPaFct59wUWQECXEEYIkl9_MOoG3zTbC2V-fZRwrg/edit#
> > > > > >
> > > > > > • Usable in both DataStream and Table API/SQL
> > > > > > • DataStream read/append/overwrite
> > > > > > • SQL create/alter/drop table, select, insert into,
> insert
> > > > > overwrite
> > > > > > • Streaming or batch read in Java API
> > > > > > • Support for Flink’s Python API
> > > > > >
> > > > > > See Iceberg Flink  <
> > > https://iceberg.apache.org/docs/latest/flink/#flink>for
> > > > > detailed 

Re: State of the Rescale API

2022-10-11 Thread Jiangang Liu
Thanks for the attention to the rescale api. Dynamic resource adjust is
useful for streaming jobs since the throughput can change in different
time. The rescale api is a lightweight way to change the job's parallelism.
This is importance for some jobs, for example, the jobs are in activities
or related to money which can not be delayed.
In our production scenario,we have supported a simple rescale api which may
be not perfect. By this chance, I suggest to support the rescale api in
adaptive scheduler for auto-scaling.

Chesnay Schepler  于2022年10月11日周二 20:36写道:

> The AdaptiveScheduler is not limited to reactive mode. There are no
> deployment limitations for the scheduler itself.
> In a nutshell, all that reactive mode does is crank the target
> parallelism to infinity, when usually it is the parallelism the user has
> set in the job/configuration.
>
> I think it would be fine if a new/revised rescale API were only
> available in the Adaptive Scheduler (without reactive mode!) for starters.
> We'd require way more stuff to make this useful for batch workloads.
>
> On 10/10/2022 16:47, Maximilian Michels wrote:
> > Hey Gyula,
> >
> > Is the Adaptive Scheduler limited to the Reactive mode? I agree that if
> we
> > move forward with the Adaptive Scheduler solution it should support all
> > deployment scenarios.
> >
> > Thanks,
> > -Max
> >
> > On Sun, Oct 9, 2022 at 6:10 AM Gyula Fóra  wrote:
> >
> >> Hi!
> >>
> >> I think we have to make sure that the Rescale API will work also without
> >> the adaptive scheduler (for instance when we are running Flink with the
> >> Kubernetes Native Integration or in other cases where the adaptive
> >> scheduler is not supported).
> >>
> >> What do you think?
> >>
> >> Cheers
> >> Gyula
> >>
> >>
> >>
> >> On Fri, Oct 7, 2022 at 3:50 PM Maximilian Michels 
> wrote:
> >>
> >>> We've been looking into ways to support programmatic rescaling of job
> >>> vertices. This feature is typically required for any type of Flink
> >>> autoscaler which does not merely set the default parallelism but
> adjusts
> >>> the parallelisms on a JobVertex level.
> >>>
> >>> We've had an initial discussion here:
> >>> https://issues.apache.org/jira/browse/FLINK-29501 where Chesnay
> suggested
> >>> to use the infamous "rescaling" API:
> >>>
> >>>
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-rescaling
> >>> This API is disabled as of
> >>> https://issues.apache.org/jira/browse/FLINK-12312
> >>> .
> >>>
> >>> Since there is the Adaptive Scheduler in Flink now, it may be feasible
> to
> >>> re-enable the API (at least for streaming jobs) and allow overriding
> the
> >>> parallelism of job vertices in addition to the default parallelism.
> >>>
> >>> Any thoughts?
> >>>
> >>> -Max
> >>>
>
>


[jira] [Created] (FLINK-29590) Fix constant fold issue for HiveDialect

2022-10-11 Thread luoyuxia (Jira)
luoyuxia created FLINK-29590:


 Summary: Fix constant fold issue for HiveDialect
 Key: FLINK-29590
 URL: https://issues.apache.org/jira/browse/FLINK-29590
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Affects Versions: 1.16.0
Reporter: luoyuxia






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer

2022-10-11 Thread yuxia
Congratulations Danny!

Best regards,
Yuxia

- 原始邮件 -
发件人: "Xingbo Huang" 
收件人: "dev" 
发送时间: 星期三, 2022年 10 月 12日 上午 9:44:22
主题: Re: [ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer

Congratulations Danny!

Best,
Xingbo

Sergey Nuyanzin  于2022年10月12日周三 01:26写道:

> Congratulations, Danny
>
> On Tue, Oct 11, 2022, 15:18 Lincoln Lee  wrote:
>
> > Congratulations Danny!
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Congxian Qiu  于2022年10月11日周二 19:42写道:
> >
> > > Congratulations Danny!
> > >
> > > Best,
> > > Congxian
> > >
> > >
> > > Leonard Xu  于2022年10月11日周二 18:03写道:
> > >
> > > > Congratulations Danny!
> > > >
> > > >
> > > > Best,
> > > > Leonard
> > > >
> > > >
> > >
> >
>


Re: [ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer

2022-10-11 Thread Xingbo Huang
Congratulations Danny!

Best,
Xingbo

Sergey Nuyanzin  于2022年10月12日周三 01:26写道:

> Congratulations, Danny
>
> On Tue, Oct 11, 2022, 15:18 Lincoln Lee  wrote:
>
> > Congratulations Danny!
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Congxian Qiu  于2022年10月11日周二 19:42写道:
> >
> > > Congratulations Danny!
> > >
> > > Best,
> > > Congxian
> > >
> > >
> > > Leonard Xu  于2022年10月11日周二 18:03写道:
> > >
> > > > Congratulations Danny!
> > > >
> > > >
> > > > Best,
> > > > Leonard
> > > >
> > > >
> > >
> >
>


Re: [DISCUSS] Updating Flink HBase Connectors

2022-10-11 Thread Ferenc Csaky
Hi Martijn,

Thank you for your comment. About HBase 2.x, correct, that is my thought 
process, but it has to be tested and verified.

+1 from my side about merging these updates with the connector externalization.

Best,
F


--- Original Message ---
On Tuesday, October 11th, 2022 at 16:30, Martijn Visser 
 wrote:


> 
> 
> Hi Ferenc,
> 
> Thanks for opening the discussion on this topic!
> 
> +1 for dropping HBase 1.x.
> 
> Regarding HBase 2.x, if I understand correctly it should be possible to
> connect to any 2.x cluster if you're using the 2.x client. Wouldn't it make
> more sense to always support the latest available version, so basically 2.5
> at the moment? We could always include a test to check that implementation
> against an older HBase version.
> 
> I also have a follow-up question already: if there's an agreement on this
> topic, does it make sense to directly build a new HBase connector in its
> own external connector repo, since I believe the current connector uses the
> old source/sink interfaces. We could then directly drop the ones in the
> Flink repo and replace it with new implementations?
> 
> Best regards,
> 
> Martijn
> 
> Op ma 10 okt. 2022 om 16:24 schreef Ferenc Csaky  
> > Hi everyone,
> > 
> > Now that the connector externalization effort ig going on, I think it is
> > definitely work to revisit the currently supported HBase versions for the
> > Flink connector. Currently, ther is an HBase 1.4 and HBase 2.2 connector
> > versions, although both of those versions are kind of outdated.
> > 
> > From the HBase point of view the following can be considered [1]:
> > 
> > - HBase 1.x is dead, so on the way forward it should be safe to drop it.
> > - HBase 2.2 is EoL, but still used actively, we are also supporting it ins
> > some of our still active releases as Cloudera.
> > - HBase 2.4 is the main thing now and probably will be supported for a
> > while (by us, definitely).
> > - HBase 2.5 just came out, but 2.6 is expected pretty soon, so it is
> > possible it won't live long.
> > - HBase 3 is in alpha, but shooting for that probably would be early in
> > the near future.
> > 
> > In addition, if we are only using the standard HBase 2.x client APIs, then
> > it should be possible to be compile it with any Hbase 2.x version. Also,
> > any HBase 2.x cluster should be backwards compatible with all earlier HBase
> > 2.x client libraries. I did not check this part thorougly but I think this
> > should be true, so ideally it would be enough to have an HBase 2.4
> > connector. [2]
> > 
> > Looking forward to your opinions about this topic.
> > 
> > Best,
> > F
> > 
> > [1] https://hbase.apache.org/downloads.html
> > [2] https://hbase.apache.org/book.html#hbase.versioning.post10 (Client
> > API compatibility)
> 
> 
> --
> Martijn
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser


[jira] [Created] (FLINK-29589) Data Loss in Sink GlobalCommitter during Task Manager recovery

2022-10-11 Thread Krzysztof Chmielewski (Jira)
Krzysztof Chmielewski created FLINK-29589:
-

 Summary: Data Loss in Sink GlobalCommitter during Task Manager 
recovery
 Key: FLINK-29589
 URL: https://issues.apache.org/jira/browse/FLINK-29589
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.14.6, 1.14.5, 1.14.4, 1.14.3, 1.14.2, 1.14.0
Reporter: Krzysztof Chmielewski


Flink's Sink's architecture with global committer seems to be vulnerable for 
data loss during Task Manager recovery. The entire checkpoint can be lost by 
GlobalCommitter resulting with data loss for sinks.

Issue was observed in Delta Sink connector on a real 1.14.x cluster and was 
replicated using Flink's 1.14.6 Test Utils classes.

Scenario:
1. Streaming source emitting constant number of events per checkpoint (20 
events for 5 commits)
2. Sink with parallelism > 1 with committer and GlobalCommitter elements.
3. Commitaers processed committables for checkpointId 2.
3. GlobalCommitter throws exception (desired exception) during checkpointId 2 
(third commit) while processing data from checkpoint 1 (it is expected to 
global committer architecture lag one commit behind in reference to rest of the 
pipeline).
4. Streaming source ends
5. we are missing 20 records (one checkpoint).

What is happening is that during recovery, committers are performing "retry" on 
committables for checkpointId 2, however those committables, reprocessed from 
"retry" task are not emit to the global committer. 

The issue can be reproduced using Junit Test builded with Flink's TestSink.
The test was implemented 
[here|https://github.com/kristoffSC/flink/blob/Flink_1.14_DataLoss_SinkGlobalCommitter/flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/SinkITCase.java#:~:text=testGlobalCommitterMissingRecordsDuringRecovery].



I believe that problem is somewhere around 
`SinkOperator::notifyCheckpointComplete` method. In there we see that Retry 
async task is scheduled however its result is never emitted downstream like it 
is done for regular flow one line above.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Updating Flink HBase Connectors

2022-10-11 Thread Martijn Visser
Hi Ferenc,

Thanks for opening the discussion on this topic!

+1 for dropping HBase 1.x.

Regarding HBase 2.x, if I understand correctly it should be possible to
connect to any 2.x cluster if you're using the 2.x client. Wouldn't it make
more sense to always support the latest available version, so basically 2.5
at the moment? We could always include a test to check that implementation
against an older HBase version.

I also have a follow-up question already: if there's an agreement on this
topic, does it make sense to directly build a new HBase connector in its
own external connector repo, since I believe the current connector uses the
old source/sink interfaces. We could then directly drop the ones in the
Flink repo and replace it with new implementations?

Best regards,

Martijn

Op ma 10 okt. 2022 om 16:24 schreef Ferenc Csaky 

> Hi everyone,
>
> Now that the connector externalization effort ig going on, I think it is
> definitely work to revisit the currently supported HBase versions for the
> Flink connector. Currently, ther is an HBase 1.4 and HBase 2.2 connector
> versions, although both of those versions are kind of outdated.
>
> From the HBase point of view the following can be considered [1]:
>
> - HBase 1.x is dead, so on the way forward it should be safe to drop it.
> - HBase 2.2 is EoL, but still used actively, we are also supporting it ins
> some of our still active releases as Cloudera.
> - HBase 2.4 is the main thing now and probably will be supported for a
> while (by us, definitely).
> - HBase 2.5 just came out, but 2.6 is expected pretty soon, so it is
> possible it won't live long.
> - HBase 3 is in alpha, but shooting for that probably would be early in
> the near future.
>
> In addition, if we are only using the standard HBase 2.x client APIs, then
> it should be possible to be compile it with any Hbase 2.x version. Also,
> any HBase 2.x cluster should be backwards compatible with all earlier HBase
> 2.x client libraries. I did not check this part thorougly but I think this
> should be true, so ideally it would be enough to have an HBase 2.4
> connector. [2]
>
> Looking forward to your opinions about this topic.
>
> Best,
> F
>
> [1] https://hbase.apache.org/downloads.html
> [2] https://hbase.apache.org/book.html#hbase.versioning.post10 (Client
> API compatibility)

-- 
Martijn
https://twitter.com/MartijnVisser82
https://github.com/MartijnVisser


RE: Re: Re: [Discuss]- Donate Iceberg Flink Connector

2022-10-11 Thread abmo . work
Hi Martijn,

Yes catalog integration exists and catalogs can be created using Flink SQL. 
https://iceberg.apache.org/docs/latest/flink/#creating-catalogs-and-using-catalogs
 has more details.
We may need some discussion within Iceberg community but based on the current 
iceberg-flink code structure we are looking to externalize this as well.

Thanks
Abid


On 2022/10/11 08:24:44 Martijn Visser wrote:
> Hi Abid,
> 
> Thanks for the FLIP. I have a question about Iceberg's Catalog: has that
> integration between Flink and Iceberg been created already and are you
> looking to externalize that as well?
> 
> Thanks,
> 
> Martijn
> 
> On Tue, Oct 11, 2022 at 12:14 AM  wrote:
> 
> > Hi Marton,
> >
> > Yes, we are initiating this as part of the Externalize Flink Connectors
> > effort. Plan is to externalize the existing Flink connector from Iceberg
> > repo into a separate repo under the Flink umbrella.
> >
> > Sorry about the doc permissions! I was able to create a FLIP-267:
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP+267%3A+Iceberg+Connector
> > <
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP+267%3A+Iceberg+Connector
> > >
> > Lets use that to discuss.
> >
> > Thanks
> > Abid
> >
> > On 2022/10/10 12:57:32 Márton Balassi wrote:
> > > Hi Abid,
> > >
> > > Just to clarify does your suggestion mean that the Iceberg community
> > would
> > > like to remove the iceberg-flink connector from the Iceberg codebase and
> > > maintain it under Flink instead? A new separate repo under the Flink
> > > project umbrella given the current existing effort to extract connectors
> > to
> > > their individual repos (externalize) makes sense to me.
> > >
> > > [1] https://lists.apache.org/thread/mpzzlpob9ymkjfybm96vz2y2m5fjyvfo
> > >
> > > Best,
> > > Marton
> > >
> > >
> > > On Mon, Oct 10, 2022 at 5:31 AM Jingsong Li  wrote:
> > >
> > > > Thanks Abid for driving.
> > > >
> > > > +1 for this.
> > > >
> > > > Can you open the permissions for
> > > >
> > > >
> > https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing
> > > > ?
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Mon, Oct 10, 2022 at 9:22 AM Abid Mohammed
> > > >  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > I would like to start a discussion about contributing Iceberg Flink
> > > > Connector to Flink.
> > > > >
> > > > > I created a doc <
> > > >
> > https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing
> > >
> > > > with all the details following the Flink Connector template as I don’t
> > have
> > > > permissions to create a FLIP yet.
> > > > > High level details are captured below:
> > > > >
> > > > > Motivation:
> > > > >
> > > > > This FLIP aims to contribute the existing Apache Iceberg Flink
> > Connector
> > > > to Flink.
> > > > >
> > > > > Apache Iceberg is an open table format for huge analytic datasets.
> > > > Iceberg adds tables to compute engines including Spark, Trino,
> > PrestoDB,
> > > > Flink, Hive and Impala using a high-performance table format that works
> > > > just like a SQL table.
> > > > > Iceberg avoids unpleasant surprises. Schema evolution works and won’t
> > > > inadvertently un-delete data. Users don’t need to know about
> > partitioning
> > > > to get fast queries. Iceberg was designed to solve correctness
> > problems in
> > > > eventually-consistent cloud object stores.
> > > > >
> > > > > Iceberg supports both Flink’s DataStream API and Table API. Based on
> > the
> > > > guideline of the Flink community, only the latest 2 minor versions are
> > > > actively maintained. See the Multi-Engine Support#apache-flink for
> > further
> > > > details.
> > > > >
> > > > >
> > > > > Iceberg connector supports:
> > > > >
> > > > > • Source: detailed Source design <
> > > >
> > https://docs.google.com/document/d/1q6xaBxUPFwYsW9aXWxYUh7die6O7rDeAPFQcTAMQ0GM/edit#
> > >,
> > > > based on FLIP-27
> > > > > • Sink: detailed Sink design and interfaces used <
> > > >
> > https://docs.google.com/document/d/1O-dPaFct59wUWQECXEEYIkl9_MOoG3zTbC2V-fZRwrg/edit#
> > > > >
> > > > > • Usable in both DataStream and Table API/SQL
> > > > > • DataStream read/append/overwrite
> > > > > • SQL create/alter/drop table, select, insert into, insert
> > > > overwrite
> > > > > • Streaming or batch read in Java API
> > > > > • Support for Flink’s Python API
> > > > >
> > > > > See Iceberg Flink  <
> > https://iceberg.apache.org/docs/latest/flink/#flink>for
> > > > detailed usage instructions.
> > > > >
> > > > > Looking forward to the discussion!
> > > > >
> > > > > Thanks
> > > > > Abid
> > > >
> > >
> 

[jira] [Created] (FLINK-29588) Add Flink Version and Application Version to Savepoint properties

2022-10-11 Thread Clara Xiong (Jira)
Clara Xiong created FLINK-29588:
---

 Summary: Add Flink Version and Application Version to Savepoint 
properties
 Key: FLINK-29588
 URL: https://issues.apache.org/jira/browse/FLINK-29588
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.2.0
Reporter: Clara Xiong


It is common that users need to upgrade long running FlinkDeployment's. An 
application upgrade or a major Flink upgrade might break the applications due 
to schema incompatibilities, especially the 
latter([Link)|https://nightlies.apache.org/flink/flink-docs-master/docs/ops/upgrading/#compatibility-table].
 In this case, the users need to manually restore to a savepoint that is 
compatible with the version the user wants to try or re-process. Currently 
Flink Operator returns a list of completed Savepoint's for a FlinkDeployment.  
It will be helpful that the Savepoint's in SavepointHistory have the properties 
for Flink Version and application version so users can easily determine which 
savepoint to use.

 
 * {{String flinkVersion}}
 * {{String appVersion}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer

2022-10-11 Thread Sergey Nuyanzin
Congratulations, Danny

On Tue, Oct 11, 2022, 15:18 Lincoln Lee  wrote:

> Congratulations Danny!
>
> Best,
> Lincoln Lee
>
>
> Congxian Qiu  于2022年10月11日周二 19:42写道:
>
> > Congratulations Danny!
> >
> > Best,
> > Congxian
> >
> >
> > Leonard Xu  于2022年10月11日周二 18:03写道:
> >
> > > Congratulations Danny!
> > >
> > >
> > > Best,
> > > Leonard
> > >
> > >
> >
>


Re: [ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer

2022-10-11 Thread Lincoln Lee
Congratulations Danny!

Best,
Lincoln Lee


Congxian Qiu  于2022年10月11日周二 19:42写道:

> Congratulations Danny!
>
> Best,
> Congxian
>
>
> Leonard Xu  于2022年10月11日周二 18:03写道:
>
> > Congratulations Danny!
> >
> >
> > Best,
> > Leonard
> >
> >
>


Re: State of the Rescale API

2022-10-11 Thread Chesnay Schepler
The AdaptiveScheduler is not limited to reactive mode. There are no 
deployment limitations for the scheduler itself.
In a nutshell, all that reactive mode does is crank the target 
parallelism to infinity, when usually it is the parallelism the user has 
set in the job/configuration.


I think it would be fine if a new/revised rescale API were only 
available in the Adaptive Scheduler (without reactive mode!) for starters.

We'd require way more stuff to make this useful for batch workloads.

On 10/10/2022 16:47, Maximilian Michels wrote:

Hey Gyula,

Is the Adaptive Scheduler limited to the Reactive mode? I agree that if we
move forward with the Adaptive Scheduler solution it should support all
deployment scenarios.

Thanks,
-Max

On Sun, Oct 9, 2022 at 6:10 AM Gyula Fóra  wrote:


Hi!

I think we have to make sure that the Rescale API will work also without
the adaptive scheduler (for instance when we are running Flink with the
Kubernetes Native Integration or in other cases where the adaptive
scheduler is not supported).

What do you think?

Cheers
Gyula



On Fri, Oct 7, 2022 at 3:50 PM Maximilian Michels  wrote:


We've been looking into ways to support programmatic rescaling of job
vertices. This feature is typically required for any type of Flink
autoscaler which does not merely set the default parallelism but adjusts
the parallelisms on a JobVertex level.

We've had an initial discussion here:
https://issues.apache.org/jira/browse/FLINK-29501 where Chesnay suggested
to use the infamous "rescaling" API:

https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-rescaling
This API is disabled as of
https://issues.apache.org/jira/browse/FLINK-12312
.

Since there is the Adaptive Scheduler in Flink now, it may be feasible to
re-enable the API (at least for streaming jobs) and allow overriding the
parallelism of job vertices in addition to the default parallelism.

Any thoughts?

-Max





[jira] [Created] (FLINK-29587) Fail to generate code for SearchOperator

2022-10-11 Thread luoyuxia (Jira)
luoyuxia created FLINK-29587:


 Summary: Fail to generate code for  SearchOperator 
 Key: FLINK-29587
 URL: https://issues.apache.org/jira/browse/FLINK-29587
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner, Table SQL / Runtime
Reporter: luoyuxia


Can be reproduced with the following code with Hive dialect
{code:java}
// hive dialect

tableEnv.executeSql("create table table1 (id int, val string, val1 string, 
dimid int)");
tableEnv.executeSql("create table table3 (id int)");

CollectionUtil.iteratorToList(
tableEnv.executeSql(
"select table1.id, table1.val, table1.val1 from table1 
left semi join"
+ " table3 on table1.dimid = table3.id and 
table3.id = 100 where table1.dimid = 200")
.collect());{code}
The  plan is 
{code:java}
LogicalSink(table=[*anonymous_collect$1*], fields=[id, val, val1])
  LogicalProject(id=[$0], val=[$1], val1=[$2])
    LogicalFilter(condition=[=($3, 200)])
      LogicalJoin(condition=[AND(=($3, $4), =($4, 100))], joinType=[semi])
        LogicalTableScan(table=[[test-catalog, default, table1]])
        LogicalTableScan(table=[[test-catalog, default, 
table3]])BatchPhysicalSink(table=[*anonymous_collect$1*], fields=[id, val, 
val1])
  BatchPhysicalNestedLoopJoin(joinType=[LeftSemiJoin], where=[$f1], select=[id, 
val, val1], build=[right])
    BatchPhysicalCalc(select=[id, val, val1], where=[=(dimid, 200)])
      BatchPhysicalTableSourceScan(table=[[test-catalog, default, table1]], 
fields=[id, val, val1, dimid])
    BatchPhysicalExchange(distribution=[broadcast])
      BatchPhysicalCalc(select=[SEARCH(id, Sarg[]) AS $f1])
        BatchPhysicalTableSourceScan(table=[[test-catalog, default, table3]], 
fields=[id]) {code}
 

But it'll throw exception when generate code for it.

The exception is 

 

 
{code:java}
java.util.NoSuchElementException
    at 
com.google.common.collect.ImmutableRangeSet.span(ImmutableRangeSet.java:203)
    at org.apache.calcite.util.Sarg.isComplementedPoints(Sarg.java:148)
    at 
org.apache.flink.table.planner.codegen.calls.SearchOperatorGen$.generateSearch(SearchOperatorGen.scala:87)
    at 
org.apache.flink.table.planner.codegen.ExprCodeGenerator.visitCall(ExprCodeGenerator.scala:474)
    at 
org.apache.flink.table.planner.codegen.ExprCodeGenerator.visitCall(ExprCodeGenerator.scala:57)
    at org.apache.calcite.rex.RexCall.accept(RexCall.java:174)
    at 
org.apache.flink.table.planner.codegen.ExprCodeGenerator.generateExpression(ExprCodeGenerator.scala:143)
    at 
org.apache.flink.table.planner.codegen.CalcCodeGenerator$.$anonfun$generateProcessCode$4(CalcCodeGenerator.scala:140)
    at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:58)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:51)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at scala.collection.TraversableLike.map(TraversableLike.scala:233)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:226)
    at scala.collection.AbstractTraversable.map(Traversable.scala:104)
    at 
org.apache.flink.table.planner.codegen.CalcCodeGenerator$.produceProjectionCode$1(CalcCodeGenerator.scala:140)
    at 
org.apache.flink.table.planner.codegen.CalcCodeGenerator$.generateProcessCode(CalcCodeGenerator.scala:164)
    at 
org.apache.flink.table.planner.codegen.CalcCodeGenerator$.generateCalcOperator(CalcCodeGenerator.scala:49)
    at 
org.apache.flink.table.planner.codegen.CalcCodeGenerator.generateCalcOperator(CalcCodeGenerator.scala)
    at 
org.apache.flink.table.planner.plan.nodes.exec.common.CommonExecCalc.translateToPlanInternal(CommonExecCalc.java:100)
    at 
org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase.translateToPlan(ExecNodeBase.java:158)
    at 
org.apache.flink.table.planner.plan.nodes.exec.ExecEdge.translateToPlan(ExecEdge.java:257)
    at 
org.apache.flink.table.planner.plan.nodes.exec.batch.BatchExecExchange.translateToPlanInternal(BatchExecExchange.java:136)
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29586) Let write buffer spillable

2022-10-11 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-29586:


 Summary: Let write buffer spillable
 Key: FLINK-29586
 URL: https://issues.apache.org/jira/browse/FLINK-29586
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: table-store-0.3.0


Column format and remote DFS may greatly affect the performance of compaction. 
We can change the writeBuffer to spillable to improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29585) Migrate TableSchema to Schema for Hive connector

2022-10-11 Thread luoyuxia (Jira)
luoyuxia created FLINK-29585:


 Summary: Migrate TableSchema to Schema for Hive connector
 Key: FLINK-29585
 URL: https://issues.apache.org/jira/browse/FLINK-29585
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Hive
Reporter: luoyuxia


`TableSchema` is deprecated, we should migrate it to Schema



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29584) Upgrade java 11 version on the microbenchmark worker

2022-10-11 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-29584:
--

 Summary: Upgrade java 11 version on the microbenchmark worker
 Key: FLINK-29584
 URL: https://issues.apache.org/jira/browse/FLINK-29584
 Project: Flink
  Issue Type: Improvement
  Components: Benchmarks
Affects Versions: 1.17.0
Reporter: Piotr Nowojski
Assignee: Piotr Nowojski
 Fix For: 1.17.0


It looks like the older JDK 11 builds have problems with JIT in the 
microbenchmarks, as for example visible 
[here|http://codespeed.dak8s.net:8000/timeline/?ben=globalWindow=2]. 
Locally I was able to reproduce this problem and the issue goes away after 
upgrading to a newer JDK 11 build.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29583) Ensure correct subtaskId and checkpointId is set during committer state migration

2022-10-11 Thread Fabian Paul (Jira)
Fabian Paul created FLINK-29583:
---

 Summary: Ensure correct subtaskId and checkpointId is set during 
committer state migration
 Key: FLINK-29583
 URL: https://issues.apache.org/jira/browse/FLINK-29583
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Common
Affects Versions: 1.15.2, 1.16.0, 1.17.0
Reporter: Fabian Paul


We already discovered two problems during recovery of committers when a post 
commit topology is used

 
 * https://issues.apache.org/jira/browse/FLINK-29509
 * https://issues.apache.org/jira/browse/FLINK-29512

Both problems also apply when recovering Flink 1.14 unified sinks committer 
state and migrate it to the extended unified model.

As part of this ticket we should fix both issues for the migration and also 
increase the test coverage for the migration cases i.e. add test cases in 
CommitterOperatorTest and CommittableCollectorSerializerTest.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer

2022-10-11 Thread Congxian Qiu
Congratulations Danny!

Best,
Congxian


Leonard Xu  于2022年10月11日周二 18:03写道:

> Congratulations Danny!
>
>
> Best,
> Leonard
>
>


[jira] [Created] (FLINK-29582) SavepointWriter should be usable without any transformation

2022-10-11 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-29582:


 Summary: SavepointWriter should be usable without any 
transformation
 Key: FLINK-29582
 URL: https://issues.apache.org/jira/browse/FLINK-29582
 Project: Flink
  Issue Type: Improvement
  Components: API / State Processor
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.17.0


The SavepointWriter of the state processor API currently enforces at least one 
transformation to be defined be the user.

This is an irritating limitation; this means you can't use the API to delete a 
state or use the new uid remapping function from FLINK-29457 without specifying 
some dummy transformation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29581) Emit warning events for session job reconciliation exception

2022-10-11 Thread Xin Hao (Jira)
Xin Hao created FLINK-29581:
---

 Summary: Emit warning events for session job reconciliation 
exception
 Key: FLINK-29581
 URL: https://issues.apache.org/jira/browse/FLINK-29581
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Xin Hao


Same as FlinkDeployment, will be useful for monitoring.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29580) pulsar.consumer.autoUpdatePartitionsIntervalSeconds doesn't works and should be removed

2022-10-11 Thread Yufan Sheng (Jira)
Yufan Sheng created FLINK-29580:
---

 Summary: pulsar.consumer.autoUpdatePartitionsIntervalSeconds 
doesn't works and should be removed
 Key: FLINK-29580
 URL: https://issues.apache.org/jira/browse/FLINK-29580
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Pulsar
Affects Versions: 1.15.2, 1.17.0, 1.16.1
Reporter: Yufan Sheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Reverting sink metric name changes made in 1.15

2022-10-11 Thread Chesnay Schepler

Currently I think that would be a mistake.

Ultimately what we have here is the culmination of us never really 
considering how the numRecordsOut metric should behave for operators 
that emit data to other operators _and_ external systems. This goes 
beyond sinks.
This even applies to numRecordsIn, for cases where functions query/write 
data from/to the outside, (e.g., Async IO).


Having 2 separate metrics for that, 1 exclusively for internal data 
transfers, and 1 exclusively for external data transfers, is the only 
way to get a consistent metric definition in the long-run.

We can jump back-and-forth now or just commit to it.

I don't think we can really judge this based on FLIP-33. It was IIRC 
written before the two phase sinks were added, which heavily blurred the 
lines of what a sink even is. Because it definitely is _not_ the last 
operator in a chain anymore.


What I would suggest is to stick with what we got (although I despise 
the name numRecordsSend), and alias the numRecordsOut metric for all 
non-TwoPhaseCommittingSink.


On 11/10/2022 05:54, Qingsheng Ren wrote:

Thanks for the details Chesnay!

By “alias” I mean to respect the original definition made in FLIP-33 
for numRecordsOut, which is the number of records written to the 
external system, and keep numRecordsSend as the same value as 
numRecordsOut for compatibility.


I think keeping numRecordsOut for the output to the external system is 
more intuitive to end users because in most cases the metric of data 
flow output is more essential. I agree with you that a new metric is 
required, but considering compatibility and users’ intuition I prefer 
to keep the initial definition of numRecordsOut in FLIP-33 and name a 
new metric for sink writer’s output to downstream operators. This 
might be against consistency with metrics in other operators in Flink 
but maybe it’s acceptable to have the sink as a special case.


Best,
Qingsheng
On Oct 10, 2022, 19:13 +0800, Chesnay Schepler , 
wrote:

> I’m with Xintong’s idea to treat numXXXSend as an alias of numXXXOut

But that's not possible. If it were that simple there would have 
never been a need to introduce another metric in the first place.


It's a rather fundamental issue with how the new sinks work, in that 
they emit data to the external system (usually considered as 
"numRecordsOut" of sinks) while _also_ sending data to a downstream 
operator (usually considered as "numRecordsOut" of tasks).
The original issue was that the numRecordsOut of the sink counted 
both (which is completely wrong).


A new metric was always required; otherwise you inevitably end up 
breaking /some/ semantic.
Adding a new metric for what the sink writes to the external system 
is, for better or worse, more consistent with how these metrics 
usually work in Flink.


On 10/10/2022 12:45, Qingsheng Ren wrote:

Thanks everyone for joining the discussion!

> Do you have any idea what has happened in the process here?

The discussion in this PR [1] shows some details and could be 
helpful to understand the original motivation of the renaming. We do 
have a test case for guarding metrics but unfortunaly the case was 
also modified so the defense was broken.


I think the reason why both the developer and the reviewer forgot to 
trigger an discussion and gave a green pass on the change is that 
metrics are quite “trivial” to be noticed as public APIs. As 
mentioned by Martijn I couldn’t find a place noting that metrics are 
public APIs and should be treated carefully while contributing and 
reviewing.


IMHO three actions could be made to prevent this kind of changes in 
the future:


a. Add test case for metrics (which we already have in 
SinkMetricsITCase)
b. We emphasize that any public-interface breaking changes should be 
proposed by a FLIP or discussed in mailing list, and should be 
listed in the release note.
c. We remind contributors and reviewers about what should be 
considered as public API, and include metric names in it.


For b and c these two pages [2][3] might be proper places.

About the patch to revert this, it looks like we have a consensus on 
1.16. As of 1.15 I think it’s worthy to trigger a minor version. I 
didn’t see complaints about this for now so it should be OK to save 
the situation asap. I’m with Xintong’s idea to treat numXXXSend as 
an alias of numXXXOut considering there could possibly some users 
have already adapted their system to the new naming, and have 
another internal metric for reflecting number of outgoing 
committable batches (actually the numRecordsIn of sink committer 
operator should be carrying this info already).


[1] https://github.com/apache/flink/pull/18825
[2] https://flink.apache.org/contributing/contribute-code.html
[3] https://flink.apache.org/contributing/reviewing-prs.html

Best,
Qingsheng
On Oct 10, 2022, 17:40 +0800, Xintong Song , 
wrote:

+1 for reverting these changes in Flink 1.16.

For 1.15.3, can we make these metrics available via both names 

Re: [ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer

2022-10-11 Thread Leonard Xu
Congratulations Danny!


Best,
Leonard



[jira] [Created] (FLINK-29579) Flink parquet reader cannot read fully optional elements in a repeated list

2022-10-11 Thread Tiansu Yu (Jira)
Tiansu Yu created FLINK-29579:
-

 Summary: Flink parquet reader cannot read fully optional elements 
in a repeated list
 Key: FLINK-29579
 URL: https://issues.apache.org/jira/browse/FLINK-29579
 Project: Flink
  Issue Type: Bug
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.13.3
Reporter: Tiansu Yu


While trying to read a parquet file containing the following field as part of 
the schema, 

 
{code:java}
 optional group attribute_values (LIST) {
repeated group list {
  optional group element {
optional binary attribute_key_id (STRING);
optional binary attribute_value_id (STRING);
optional int32 pos;
  }
}
  } {code}
 

 I encountered the following problem 

 
{code:java}
Exception in thread "main" java.lang.UnsupportedOperationException: List field 
[optional binary attribute_key_id (STRING)] in List [attribute_values] has to 
be required. 
at 
org.apache.flink.formats.parquet.utils.ParquetSchemaConverter.convertGroupElementToArrayTypeInfo(ParquetSchemaConverter.java:338)
at 
org.apache.flink.formats.parquet.utils.ParquetSchemaConverter.convertParquetTypeToTypeInfo(ParquetSchemaConverter.java:271)
at 
org.apache.flink.formats.parquet.utils.ParquetSchemaConverter.convertFields(ParquetSchemaConverter.java:81)
at 
org.apache.flink.formats.parquet.utils.ParquetSchemaConverter.fromParquetType(ParquetSchemaConverter.java:61)
at 
org.apache.flink.formats.parquet.ParquetInputFormat.(ParquetInputFormat.java:120)
at 
org.apache.flink.formats.parquet.ParquetRowInputFormat.(ParquetRowInputFormat.java:39)
 {code}
The main code that raises the problem goes as follows:

 
{code:java}
private static ObjectArrayTypeInfo convertGroupElementToArrayTypeInfo(
GroupType arrayFieldType, GroupType elementType) {
for (Type type : elementType.getFields()) {
if (!type.isRepetition(Type.Repetition.REQUIRED)) {
throw new UnsupportedOperationException(
String.format(
"List field [%s] in List [%s] has to be 
required. ",
type.toString(), arrayFieldType.getName()));
}
}
return 
ObjectArrayTypeInfo.getInfoFor(convertParquetTypeToTypeInfo(elementType));
} {code}
I am not very familiar with internals of Parquet schema. But the problem looks 
like to me is that Flink is too restrictive on repetition types inside certain 
nested fields. Would love to hear some feedbacks on this (improvements, 
corrections / workarounds).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29578) Multiple executions of `runVertexCentricIteration` cause memory errors (flink gelly)

2022-10-11 Thread Shunyang Li (Jira)
Shunyang Li created FLINK-29578:
---

 Summary: Multiple executions of `runVertexCentricIteration` cause 
memory errors (flink gelly)
 Key: FLINK-29578
 URL: https://issues.apache.org/jira/browse/FLINK-29578
 Project: Flink
  Issue Type: Bug
  Components: Library / Graph Processing (Gelly)
Affects Versions: 1.15.1
 Environment: * Flink 1.15.1
 * Java 11
 * Gradle 7.4.2
Reporter: Shunyang Li
 Attachments: example.java

When the runVertexCentricIteration function is executed twice it will throw an 
error: 
java.lang.IllegalArgumentException: Too few memory segments provided. Hash 
Table needs at least 33 memory segments.
 
We solved this issue by setting the following configuration:
```
VertexCentricConfiguration parameter = new VertexCentricConfiguration();
parameter.setSolutionSetUnmanagedMemory(true);
```
 
However, the execution times cannot be greater than 4 (>4).  After executing 
four times it will throw an error: java.lang.IllegalArgumentException: Too 
little memory provided to sorter to perform task. Required are at least 12 
pages. Current page size is 32768 bytes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29577) Disable rocksdb wal when restore from full snapshot

2022-10-11 Thread Cai Liuyang (Jira)
Cai Liuyang created FLINK-29577:
---

 Summary: Disable rocksdb wal when restore from full snapshot
 Key: FLINK-29577
 URL: https://issues.apache.org/jira/browse/FLINK-29577
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / State Backends
Reporter: Cai Liuyang


For now, RocksDBFullRestoreOperation and RocksDBHeapTimersFullRestoreOperation 
does's pass RocksDB::WriteOptions to RocksDBWriteBatchWrapper when restore 
kv-data, which will use RocksDBWriteBatchWrapper‘s default WriteOptions(doesn't 
disable rocksdb wal explicitly, see code below), so during restoring from full 
snapshot, wal is enabled(use more disk and affect rocksdb-write-performance 
when restoring)

 
{code:java}
// First: RocksDBHeapTimersFullRestoreOperation::restoreKVStateData() doesn't 
pass WriteOptions to RocksDBWriteBatchWrapper(null as default)

private void restoreKVStateData(
ThrowingIterator keyGroups,
Map columnFamilies,
Map> 
restoredPQStates)
throws IOException, RocksDBException, StateMigrationException {
// for all key-groups in the current state handle...
try (RocksDBWriteBatchWrapper writeBatchWrapper =
new RocksDBWriteBatchWrapper(this.rocksHandle.getDb(), 
writeBatchSize)) {
HeapPriorityQueueSnapshotRestoreWrapper 
restoredPQ = null;
ColumnFamilyHandle handle = null;
   ..
}


// Second: RocksDBWriteBatchWrapper::flush function doesn't disable wal 
explicitly when user doesn't pass WriteOptions to RocksDBWriteBatchWrapper
public void flush() throws RocksDBException {
if (options != null) {
db.write(options, batch);
} else {
// use the default WriteOptions, if wasn't provided.
try (WriteOptions writeOptions = new WriteOptions()) {
db.write(writeOptions, batch);
}
}
batch.clear();
}

{code}
 

 

As we known, rocksdb's wal is usesless for flink, so i think we can disable wal 
for RocksDBWriteBatchWrapper's default WriteOptions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Re: [Discuss]- Donate Iceberg Flink Connector

2022-10-11 Thread Martijn Visser
Hi Abid,

Thanks for the FLIP. I have a question about Iceberg's Catalog: has that
integration between Flink and Iceberg been created already and are you
looking to externalize that as well?

Thanks,

Martijn

On Tue, Oct 11, 2022 at 12:14 AM  wrote:

> Hi Marton,
>
> Yes, we are initiating this as part of the Externalize Flink Connectors
> effort. Plan is to externalize the existing Flink connector from Iceberg
> repo into a separate repo under the Flink umbrella.
>
> Sorry about the doc permissions! I was able to create a FLIP-267:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+267%3A+Iceberg+Connector
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+267%3A+Iceberg+Connector
> >
> Lets use that to discuss.
>
> Thanks
> Abid
>
> On 2022/10/10 12:57:32 Márton Balassi wrote:
> > Hi Abid,
> >
> > Just to clarify does your suggestion mean that the Iceberg community
> would
> > like to remove the iceberg-flink connector from the Iceberg codebase and
> > maintain it under Flink instead? A new separate repo under the Flink
> > project umbrella given the current existing effort to extract connectors
> to
> > their individual repos (externalize) makes sense to me.
> >
> > [1] https://lists.apache.org/thread/mpzzlpob9ymkjfybm96vz2y2m5fjyvfo
> >
> > Best,
> > Marton
> >
> >
> > On Mon, Oct 10, 2022 at 5:31 AM Jingsong Li  wrote:
> >
> > > Thanks Abid for driving.
> > >
> > > +1 for this.
> > >
> > > Can you open the permissions for
> > >
> > >
> https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing
> > > ?
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Mon, Oct 10, 2022 at 9:22 AM Abid Mohammed
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > I would like to start a discussion about contributing Iceberg Flink
> > > Connector to Flink.
> > > >
> > > > I created a doc <
> > >
> https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing
> >
> > > with all the details following the Flink Connector template as I don’t
> have
> > > permissions to create a FLIP yet.
> > > > High level details are captured below:
> > > >
> > > > Motivation:
> > > >
> > > > This FLIP aims to contribute the existing Apache Iceberg Flink
> Connector
> > > to Flink.
> > > >
> > > > Apache Iceberg is an open table format for huge analytic datasets.
> > > Iceberg adds tables to compute engines including Spark, Trino,
> PrestoDB,
> > > Flink, Hive and Impala using a high-performance table format that works
> > > just like a SQL table.
> > > > Iceberg avoids unpleasant surprises. Schema evolution works and won’t
> > > inadvertently un-delete data. Users don’t need to know about
> partitioning
> > > to get fast queries. Iceberg was designed to solve correctness
> problems in
> > > eventually-consistent cloud object stores.
> > > >
> > > > Iceberg supports both Flink’s DataStream API and Table API. Based on
> the
> > > guideline of the Flink community, only the latest 2 minor versions are
> > > actively maintained. See the Multi-Engine Support#apache-flink for
> further
> > > details.
> > > >
> > > >
> > > > Iceberg connector supports:
> > > >
> > > > • Source: detailed Source design <
> > >
> https://docs.google.com/document/d/1q6xaBxUPFwYsW9aXWxYUh7die6O7rDeAPFQcTAMQ0GM/edit#
> >,
> > > based on FLIP-27
> > > > • Sink: detailed Sink design and interfaces used <
> > >
> https://docs.google.com/document/d/1O-dPaFct59wUWQECXEEYIkl9_MOoG3zTbC2V-fZRwrg/edit#
> > > >
> > > > • Usable in both DataStream and Table API/SQL
> > > > • DataStream read/append/overwrite
> > > > • SQL create/alter/drop table, select, insert into, insert
> > > overwrite
> > > > • Streaming or batch read in Java API
> > > > • Support for Flink’s Python API
> > > >
> > > > See Iceberg Flink  <
> https://iceberg.apache.org/docs/latest/flink/#flink>for
> > > detailed usage instructions.
> > > >
> > > > Looking forward to the discussion!
> > > >
> > > > Thanks
> > > > Abid
> > >
> >


Re: [VOTE] Apache Flink Table Store 0.2.1, release candidate #2

2022-10-11 Thread Yu Li
+1 (binding)

- Checked release notes: *Action Required*
  * The fix version of FLINK-29554 is 0.2.1 but still open, please confirm
whether this should be included or we should move it out of 0.2.1
- Checked sums and signatures: *OK*
- Checked the jars in the staging repo: *OK*
- Checked source distribution doesn't include binaries: *OK*
- Maven clean install from source: *OK*
- Checked version consistency in pom files: *OK*
- Went through the quick start: *OK*
  * Verified with both flink 1.14.5 and 1.15.1
- Checked the website updates: *OK*
  * Minor: left some comments in the PR, please check

Thanks for driving this release, Jingsong!

Best Regards,
Yu


On Sat, 8 Oct 2022 at 10:25, Jingsong Li  wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #2 for the version
> 0.2.1 of Apache Flink Table Store, as follows:
>
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> **Release Overview**
>
> As an overview, the release consists of the following:
> a) Table Store canonical source distribution to be deployed to the
> release repository at dist.apache.org
> b) Table Store binary convenience releases to be deployed to the
> release repository at dist.apache.org
> c) Maven artifacts to be deployed to the Maven Central Repository
>
> **Staging Areas to Review**
>
> The staging areas containing the above mentioned artifacts are as follows,
> for your review:
> * All artifacts for a) and b) can be found in the corresponding dev
> repository at dist.apache.org [2]
> * All artifacts for c) can be found at the Apache Nexus Repository [3]
>
> All artifacts are signed with the key
> 2C2B6A653B07086B65E4369F7C76245E0A318150 [4]
>
> Other links for your review:
> * JIRA release notes [5]
> * source code tag "release-0.2.1-rc2" [6]
> * PR to update the website Downloads page to include Table Store links [7]
>
> **Vote Duration**
>
> The voting time will run for at least 72 hours.
> It is adopted by majority approval, with at least 3 PMC affirmative votes.
>
> Best,
> Jingsong Lee
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Table+Store+Release
> [2]
> https://dist.apache.org/repos/dist/dev/flink/flink-table-store-0.2.1-rc2/
> [3]
> https://repository.apache.org/content/repositories/orgapacheflink-1539/
> [4] https://dist.apache.org/repos/dist/release/flink/KEYS
> [5]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352257
> [6] https://github.com/apache/flink-table-store/tree/release-0.2.1-rc2
> [7] https://github.com/apache/flink-web/pull/571
>


[SUMMARY] Flink 1.16 release sync of 2022-10-11

2022-10-11 Thread Xingbo Huang
I would like to give you a brief update of the Flink 1.16 release sync
meating of 2022-10-11.

*rc1 is canceled because we found a metric compatibility change, which is
expected to be fixed before Thursday
https://issues.apache.org/jira/browse/FLINK-29567. We will prepare rc2 once
this blocker issue is fixed.*

*We still have some critical test stabilities[1] need to be resolved*

For more information about Flink release 1.16, you can refer to
https://cwiki.apache.org/confluence/display/FLINK/1.16+Release

The next Flink release sync will be on Tuesday the 18th of October at 9am
CEST/ 3pm China Standard Time / 7am UTC. The link could be found on the
following page
https://cwiki.apache.org/confluence/display/FLINK/1.16+Release#id-1.16Release-Syncmeeting
.

On behalf of all the release managers,

best regards,
Xingbo

[1] https://issues.apache.org/jira/issues/?filter=12352149


[jira] [Created] (FLINK-29576) SourceNAryInputChainingITCase.testDirectSourcesOnlyExecution failed

2022-10-11 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-29576:
-

 Summary: 
SourceNAryInputChainingITCase.testDirectSourcesOnlyExecution failed
 Key: FLINK-29576
 URL: https://issues.apache.org/jira/browse/FLINK-29576
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream
Affects Versions: 1.17.0
Reporter: Matthias Pohl


There's a [build 
failure|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41843=logs=5c8e7682-d68f-54d1-16a2-a09310218a49=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba=8523]
 being caused by 
{{SourceNAryInputChainingITCase.testDirectSourcesOnlyExecution}} on {{master}}:
{code}
Oct 11 01:45:36 [ERROR] Errors: 
Oct 11 01:45:36 [ERROR]   
SourceNAryInputChainingITCase.testDirectSourcesOnlyExecution:89 » Runtime 
Fail...
Oct 11 01:45:36 [INFO] 
Oct 11 01:45:36 [ERROR] Tests run: 1931, Failures: 0, Errors: 1, Skipped: 4
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Flink falls back on to kryo serializer for GenericTypes

2022-10-11 Thread Sucheth S
Hello,

How to avoid flink's kryo serializer for GenericTypes ? Kryo is having some
performance issues.

Tried below but no luck.

env.getConfig().disableForceKryo();
env.getConfig().enableForceAvro();

Tried this - env.getConfig().disableGenericTypes();

getting - Generic types have been disabled in the ExecutionConfig and
type org.apache.avro.generic.GenericRecord is treated as a generic
type



Regards,
Sucheth Shivakumar
website : https://sucheths.com
mobile : +1(650)-576-8050
San Mateo, United States