Re: [2.0] How to handle on-going feature development in Flink 2.0?

2024-06-25 Thread David Radley
Hi,
I think this is a great question. I am not sure if this has been covered 
elsewhere, but it would be good to be clear how this effects the connectors and 
operator repos, with potentially v1 and v2 oriented new featuresI suspect this 
will be a connector by connector investigation. I am thinking connectors with 
Hadoop eco-system dependencies (e.g. Paimon) which may not work nicely with 
Java 17,

 Kind regards, David.


From: Matthias Pohl 
Date: Tuesday, 25 June 2024 at 09:57
To: dev@flink.apache.org 
Cc: Xintong Song , martijnvis...@apache.org 
, imj...@gmail.com , 
becket@gmail.com 
Subject: [EXTERNAL] [2.0] How to handle on-going feature development in Flink 
2.0?
Hi 2.0 release managers,
With the 1.20 release branch being cut [1], master is now referring to
2.0-SNAPSHOT. I remember that, initially, the community had the idea of
keeping the 2.0 release as small as possible focusing on API changes [2].

What does this mean for new features? I guess blocking them until 2.0 is
released is not a good option. Shall we treat new features as
"nice-to-have" items as documented in the 2.0 release overview [3] and
merge them to master like it was done for minor releases in the past? Do
you want to add a separate section in the 2.0 release overview [3] to list
these new features (e.g. FLIP-461 [4]) separately? That might help to
manage planned 2.0 deprecations/API removal and new features separately. Or
do you have a different process in mind?

Apologies if this was already discussed somewhere. I didn't manage to find
anything related to this topic.

Best,
Matthias

[1] https://lists.apache.org/thread/mwnfd7o10xo6ynx0n640pw9v2opbkm8l
[2] https://lists.apache.org/thread/b8w5cx0qqbwzzklyn5xxf54vw9ymys1c
[3] https://cwiki.apache.org/confluence/display/FLINK/2.0+Release
[4]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


FW: Non Nullable fields within objects

2024-06-20 Thread David Radley
Hi,
I am looking to get the Avro format to support Non Nullable fields within 
objects or arrays. It works with Confluent Avro. I notice 
https://issues.apache.org/jira/browse/CALCITE-4085 where it looks like Calcite 
was changed to allow this capability to work with Flink. 
https://github.com/apache/flink/pull/24916/files#diff-c2ed59d02b6bec354790442fd97c7694676eaa4199425353d7ef6cde1304c2e0
 might effect this also.

I have a table definition of

CREATE TABLE source_7

(

`order_id` STRING,

`order_time` STRING,

`buyer` ROW<

`first_name`   STRING,

`last_name`  STRING NOT NULL,

`title`   STRING NOT NULL

>

)

 WITH (

 'connector' = 'kafka',

   'topic' = 'vivekavro',

   'properties.bootstrap.servers' = 'localhost:9092',

   'value.format' = 'avro',

   'value.fields-include' = 'ALL',

  'scan.startup.mode' = 'earliest-offset'

 );

And an event of shape:
{
 "order_id": "12345",
 "order_time": "1234",
 "buyer": {
 "first_name": "hvcwc",
 "last_name": "hvcwc2",
 "title": "hvcwc3"
 }
}


When I issue a select it fails deserialize, internally the writer schema has

"name" : "last_name",  "type" : [ "null", "string" ],

"name" : "titie",  "type" : [ "null", "string" ],



So it has lost the non-nullable. I would have expected  "type" : "string", for 
last_name and title.



I have done a little digging. It appears that the issue is in the 
createSqlTableConverter.

In the debugger I see:

Buyer has a nullable followed by 2 non nullable fields.

The FieldsDataType are all nullable. This looks like it has lost the nullable 
hint.

LogicalType does not have the concept of nullable.

 fromLogicalTypeToDataType  creates a DataType from LogicalType and results in 
the fields being set as nullables.

This looks like it could be cause of the behaviour we are seeing or am I 
missing something?



WDYT?



Kind regards, David.































Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Re: [ANNOUNCE] New Apache Flink Committer - Zhongqiang Gong

2024-06-17 Thread David Radley
Congratulations Zhongqiang,

Kind regards, David.

From: Feng Jin 
Date: Monday, 17 June 2024 at 12:36
To: dev@flink.apache.org 
Subject: Re: [ANNOUNCE] New Apache Flink Committer - Zhongqiang Gong
Congratulations Zhongqiang !!!

Best regards
Feng Jin

On Mon, Jun 17, 2024 at 7:01 PM Feifan Wang  wrote:

> Congratulations Zhongqiang !
>
>
> ——
>
> Best regards,
>
> Feifan Wang
>
>
>
>
> At 2024-06-17 11:20:30, "Leonard Xu"  wrote:
> >Hi everyone,
> >On behalf of the PMC, I'm happy to announce that Zhongqiang Gong has
> become a new Flink Committer!
> >
> >Zhongqiang has been an active Flink community member since November 2021,
> contributing numerous PRs to both the Flink and Flink CDC repositories. As
> a core contributor to Flink CDC, he developed the Oracle and SQL Server CDC
> Connectors and managed essential website and CI migrations during the
> donation of Flink CDC to Apache Flink.
> >
> >Beyond his technical contributions, Zhongqiang actively participates in
> discussions on the Flink dev mailing list and responds to threads on the
> user and user-zh mailing lists. As an Apache StreamPark (incubating)
> Committer, he promotes Flink SQL and Flink CDC technologies at meetups and
> within the StreamPark community.
> >
> >Please join me in congratulating Zhongqiang Gong for becoming an Apache
> Flink committer!
> >
> >Best,
> >Leonard (on behalf of the Flink PMC)
>

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-06-14 Thread David Radley
Hi everyone,
I have talked with Chesnay and Danny offline. Danny and I were not very happy 
with the passing Maps around, and were looking for a neater design. Chesnay 
suggested that we could move the new format to the Kafka connector, then pass 
the Kafka record down to the deserialize logic so it can make use of the 
headers during deserialization and serialisation.

I think this is a neat idea. This would mean:
- the Kafka connector code would need to be updated to pass down the Kafka 
record
- there would be the Avro Apicurio format and SQL in the kafka repository. We 
feel it is unlikely to want to use the Apicurio registry with files, as the 
Avro format could be used.

Unfortunately I have found that this as not so straight forward to implement as 
the Avro Apicurio format uses the Avro format, which is tied to the 
DeserializationSchema. We were hoping to have a new decoding implementation 
that would pass down the Kafka record rather than the payload. This does not 
appear possible without a Avro format change.


Inspired by this idea, I notice that KafkaValueOnlyRecordDeserializerWrapper 
extends KafkaValueOnlyDeserializerWrapper

Does

deserializer.deserialize(record.topic(),record.value())



I am investigating If I can add a factory/reflection to provide an alternative
Implementation that will pass the record based (the kafka record is not 
serializable so I will pick what we need and deserialize) as a byte array.

I would need to do this 4 times (value ,key for deserialisation and 
serialisation. To do this I would need to convert the record into a byte array, 
so it fits into the existing interface (DeserializationSchema).  I think this 
could be a way through, to avoid using maps and avoid changing the existing 
Avro format and avoid change any core Flink interfaces.

I am going to prototype this idea. WDYT?

My thanks go to Chesnay and Danny for their support and insight around this 
Flip,
   Kind regards, David.






From: David Radley 
Date: Wednesday, 29 May 2024 at 11:39
To: dev@flink.apache.org 
Subject: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
Hi Danny,
Thank you for your feedback on this.

I agree that using maps has pros and cons. The maps are flexible, but do 
require the sender and receiver to know what is in the map.

When you say “That sounds like it would fit in better, I assume we cannot just 
take that approach?” The motivation behind this Flip is to support the headers 
which is the usual way that Apicurio runs. We will support the “schema id in 
the payload” as well.

I agree with you when you say “ I am not 100% happy with the solution but I
cannot offer a better option.” – this is a pragmatic way we have found to solve 
this issue. I am open to any suggestions to improve this as well.

If we are going with the maps design (which is the best we have at the moment) 
; it would be good to have the Flink core changes in base Flink version 2.0 as 
this would mean we do not need to use reflection in a Flink Kafka version 2 
connector to work out if the runtime Flink has the new methods.

At this stage we only have one committer (yourself) backing this. Do you know 
of other 2 committers who would support this Flip?

 Kind regards, David.



From: Danny Cranmer 
Date: Friday, 24 May 2024 at 19:32
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
Hello,

> I am curious what you mean by abused.

I just meant we will end up adding more and more fields to this map over
time, and it may be hard to undo.

> For Apicurio it can be sent at the start of the payload like Confluent
Avro does. Confluent Avro have a magic byte followed by 4 bytes of schema
id, at the start of the payload. Apicurio clients and SerDe libraries can
be configured to not put the schema id in the headers in which case there
is a magic byte followed by an 8 byte schema at the start of the payload.
In the deserialization case, we would not need to look at the headers –
though the encoding is also in the headers.

That sounds like it would fit in better, I assume we cannot just take that
approach?

Thanks for the discussion. I am not 100% happy with the solution but I
cannot offer a better option. I would be interested to hear if others have
any suggestions. Playing devil's advocate against myself, we pass maps
around to configure connectors so it is not too far away from that.

Thanks,
Danny


On Fri, May 24, 2024 at 2:23 PM David Radley 
wrote:

> Hi Danny,
> No worries, thanks for replying. I have working prototype code that is
> being reviewed. It needs some cleaning up and more complete testing before
> it is ready, but will give you the general idea [1][2] to help to assess
> this approach.
>
>
> I am curious what you mean by abused. I guess the approaches are between
> generic map, mechanism vs a more particular more granular things being
> passed that might be used by another connector.
>
> Your first ques

RE: [VOTE] Release flink-connector-kafka v3.2.0, release candidate #1

2024-06-11 Thread David Radley
Hi Martjin,
Thanks for the confirmation,
Kind regards, David.

From: Martijn Visser 
Date: Tuesday, 11 June 2024 at 15:00
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [VOTE] Release flink-connector-kafka v3.2.0, release 
candidate #1
Hi David,

That's a blocker for a Flink Kafka connector 4.0, not for 3.2.0. It's not
related to this release.

Best regards,

Martijn

On Tue, Jun 11, 2024 at 3:54 PM David Radley 
wrote:

> Hi,
> Sorry I am a bit late.
> I notice https://issues.apache.org/jira/browse/FLINK-35109 is open and a
> blocker. Can I confirm that we have mitigated the impacts of this issue in
> this release?
>   Kind regards, David.
>
> From: Danny Cranmer 
> Date: Friday, 7 June 2024 at 11:46
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: [VOTE] Release flink-connector-kafka v3.2.0,
> release candidate #1
> Thanks all. This vote is now closed, I will announce the results in a
> separate thread.
>
> On Fri, Jun 7, 2024 at 11:45 AM Danny Cranmer 
> wrote:
>
> > +1 (binding)
> >
> > - Release notes look good
> > - Source archive checksum and signature is correct
> > - Binary checksum and signature is correct
> > - Contents of Maven repo looks good
> > - Verified there are no binaries in the source archive
> > - Builds from source, tests pass using Java 8
> > - CI run passed [1]
> > - Tag exists in repo
> > - NOTICE and LICENSE files present and correct
> >
> > Thanks,
> > Danny
> >
> > [1]
> > https://github.com/apache/flink-connector-kafka/actions/runs/8785158288
> >
> >
> > On Fri, Jun 7, 2024 at 7:19 AM Yanquan Lv  wrote:
> >
> >> +1 (non-binding)
> >>
> >> - verified gpg signatures
> >> - verified sha512 hash
> >> - built from source code with java 8/11/17
> >> - checked Github release tag
> >> - checked the CI result
> >> - checked release notes
> >>
> >> Danny Cranmer  于2024年4月22日周一 21:56写道:
> >>
> >> > Hi everyone,
> >> >
> >> > Please review and vote on release candidate #1 for
> flink-connector-kafka
> >> > v3.2.0, as follows:
> >> > [ ] +1, Approve the release
> >> > [ ] -1, Do not approve the release (please provide specific comments)
> >> >
> >> > This release supports Flink 1.18 and 1.19.
> >> >
> >> > The complete staging area is available for your review, which
> includes:
> >> > * JIRA release notes [1],
> >> > * the official Apache source release to be deployed to
> dist.apache.org
> >> > [2],
> >> > which are signed with the key with fingerprint 125FD8DB [3],
> >> > * all artifacts to be deployed to the Maven Central Repository [4],
> >> > * source code tag v3.2.0-rc1 [5],
> >> > * website pull request listing the new release [6].
> >> > * CI build of the tag [7].
> >> >
> >> > The vote will be open for at least 72 hours. It is adopted by majority
> >> > approval, with at least 3 PMC affirmative votes.
> >> >
> >> > Thanks,
> >> > Danny
> >> >
> >> > [1]
> >> >
> >> >
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354209
> >> > [2]
> >> >
> >> >
> >>
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-kafka-3.2.0-rc1
> >> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> >> > [4]
> >> https://repository.apache.org/content/repositories/orgapacheflink-1723
> >> > [5]
> >> >
> https://github.com/apache/flink-connector-kafka/releases/tag/v3.2.0-rc1
> >> > [6] https://github.com/apache/flink-web/pull/738
> >> > [7] https://github.com/apache/flink-connector-kafka
> >> >
> >>
> >
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: [VOTE] Release flink-connector-kafka v3.2.0, release candidate #1

2024-06-11 Thread David Radley
Hi,
Sorry I am a bit late.
I notice https://issues.apache.org/jira/browse/FLINK-35109 is open and a 
blocker. Can I confirm that we have mitigated the impacts of this issue in this 
release?
  Kind regards, David.

From: Danny Cranmer 
Date: Friday, 7 June 2024 at 11:46
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [VOTE] Release flink-connector-kafka v3.2.0, release 
candidate #1
Thanks all. This vote is now closed, I will announce the results in a
separate thread.

On Fri, Jun 7, 2024 at 11:45 AM Danny Cranmer 
wrote:

> +1 (binding)
>
> - Release notes look good
> - Source archive checksum and signature is correct
> - Binary checksum and signature is correct
> - Contents of Maven repo looks good
> - Verified there are no binaries in the source archive
> - Builds from source, tests pass using Java 8
> - CI run passed [1]
> - Tag exists in repo
> - NOTICE and LICENSE files present and correct
>
> Thanks,
> Danny
>
> [1]
> https://github.com/apache/flink-connector-kafka/actions/runs/8785158288
>
>
> On Fri, Jun 7, 2024 at 7:19 AM Yanquan Lv  wrote:
>
>> +1 (non-binding)
>>
>> - verified gpg signatures
>> - verified sha512 hash
>> - built from source code with java 8/11/17
>> - checked Github release tag
>> - checked the CI result
>> - checked release notes
>>
>> Danny Cranmer  于2024年4月22日周一 21:56写道:
>>
>> > Hi everyone,
>> >
>> > Please review and vote on release candidate #1 for flink-connector-kafka
>> > v3.2.0, as follows:
>> > [ ] +1, Approve the release
>> > [ ] -1, Do not approve the release (please provide specific comments)
>> >
>> > This release supports Flink 1.18 and 1.19.
>> >
>> > The complete staging area is available for your review, which includes:
>> > * JIRA release notes [1],
>> > * the official Apache source release to be deployed to dist.apache.org
>> > [2],
>> > which are signed with the key with fingerprint 125FD8DB [3],
>> > * all artifacts to be deployed to the Maven Central Repository [4],
>> > * source code tag v3.2.0-rc1 [5],
>> > * website pull request listing the new release [6].
>> > * CI build of the tag [7].
>> >
>> > The vote will be open for at least 72 hours. It is adopted by majority
>> > approval, with at least 3 PMC affirmative votes.
>> >
>> > Thanks,
>> > Danny
>> >
>> > [1]
>> >
>> >
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354209
>> > [2]
>> >
>> >
>> https://dist.apache.org/repos/dist/dev/flink/flink-connector-kafka-3.2.0-rc1
>> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
>> > [4]
>> https://repository.apache.org/content/repositories/orgapacheflink-1723
>> > [5]
>> > https://github.com/apache/flink-connector-kafka/releases/tag/v3.2.0-rc1
>> > [6] https://github.com/apache/flink-web/pull/738
>> > [7] https://github.com/apache/flink-connector-kafka
>> >
>>
>

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: Flink Kubernetes Operator 1.9.0 release planning

2024-06-11 Thread David Radley
I agree – thanks for driving this Gyula.

From: Rui Fan <1996fan...@gmail.com>
Date: Tuesday, 11 June 2024 at 02:52
To: dev@flink.apache.org 
Cc: Mate Czagany 
Subject: [EXTERNAL] Re: Flink Kubernetes Operator 1.9.0 release planning
Thanks Gyula for driving this release!

> I suggest we cut the release branch this week after merging current
> outstanding smaller PRs.

It makes sense to me.

Best,
Rui

On Mon, Jun 10, 2024 at 3:05 PM Gyula Fóra  wrote:

> Hi all!
>
> I want to kick off the discussion / release process for the Flink
> Kubernetes Operator 1.9.0 version.
>
> The last, 1.8.0, version was released in March and since then we have had a
> number of important fixes. Furthermore there are some bigger pieces of
> outstanding work in the form of open PRs such as the Savepoint CRD work
> which should only be merged to 1.10.0 to gain more exposure/stability.
>
> I suggest we cut the release branch this week after merging current
> outstanding smaller PRs.
>
> I volunteer as the release manager but if someone else would like to do it,
> I would also be happy to assist.
>
> Please let me know what you think.
>
> Cheers,
> Gyula
>

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Re: [DISCUSS] Connector Externalization Retrospective

2024-06-10 Thread David Radley
Hi Danny,
I think your proposal is a good one. This is the approach that we took with the 
Egeria project, firstly taking the connectors out of the main repo, then 
connectors having their own versions that incremented organically rather then 
tied to the core release.

Blue sky thinking - I wonder if we could :
- have a wizard / utility so the user inputs which Flink level they want and 
which connectors; the utility knows the compatibility matrix and downloads the 
appropriate bundles.
- have the docs interrogate the core and connector repos to check the poms for 
the Flink levels and the pr builds to have ?live? docs showing the supported 
Flink levels. PyTorch does something like this for it?s docs.

Kind regards, David.



From: Danny Cranmer 
Date: Monday, 10 June 2024 at 17:26
To: dev 
Subject: [EXTERNAL] [DISCUSS] Connector Externalization Retrospective
Hello Flink community,

It has been over 2 years [1] since we started externalizing the Flink
connectors to dedicated repositories from the main Flink code base. The
past discussions can be found here [2]. The community decided to
externalize the connectors to primarily 1/ improve stability and speed of
the CI, and 2/ decouple version and release lifecycle to allow the projects
to evolve independently. The outcome of this has resulted in each connector
requiring a dedicated release per Flink minor version, which is a burden on
the community. Flink 1.19.0 was released on 2024-03-18 [3], the first
supported connector followed roughly 2.5 months later on 2024-06-06 [4]
(MongoDB). There are still 5 connectors that do not support Flink 1.19 [5].

Two decisions contribute to the high lag between releases. 1/ creating one
repository per connector instead of a single flink-connector mono-repo and
2/ coupling the Flink version to the connector version [6]. A single
connector repository would reduce the number of connector releases from N
to 1, but would couple the connector CI and reduce release flexibility.
Decoupling the connector versions from Flink would eliminate the need to
release each connector for each new Flink minor version, but we would need
a new compatibility mechanism.

I propose that from each next connector release we drop the coupling on the
Flink version. For example, instead of 3.4.0-1.20 (.) we
would release 3.4.0 (). We can model a compatibility matrix
within the Flink docs to help users pick the correct versions. This would
mean we would usually not need to release a new connector version per Flink
version, assuming there are no breaking changes. Worst case, if breaking
changes impact all connectors we would still need to release all
connectors. However, for Flink 1.17 and 1.18 there were only a handful of
issues (breaking changes), and mostly impacting tests. We could decide to
align this with Flink 2.0, however I see no compelling reason to do so.
This was discussed previously [2] as a long term goal once the connector
APIs are stable. But I think the current compatibility rules support this
change now.

I would prefer to not create a connector mono-repo. Separate repos gives
each connector more flexibility to evolve independently, and removing
unnecessary releases will significantly reduce the release effort.

I would like to hear opinions and ideas from the community. In particular,
are there any other issues you have observed that we should consider
addressing?

Thanks,
Danny.

[1]
https://github.com/apache/flink-connector-elasticsearch/commit/3ca2e625e3149e8864a4ad478773ab4a82720241
[2] https://lists.apache.org/thread/8k1xonqt7hn0xldbky1cxfx3fzh6sj7h
[3]
https://flink.apache.org/2024/03/18/announcing-the-release-of-apache-flink-1.19/
[4] https://flink.apache.org/downloads/#apache-flink-connectors-1
[5] https://issues.apache.org/jira/browse/FLINK-35131
[6]
https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development#ExternalizedConnectordevelopment-Examples

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: [ANNOUNCE] New Apache Flink PMC Member - Fan Rui

2024-06-10 Thread David Radley
Congratulations, Rui!

From: Sergey Nuyanzin 
Date: Sunday, 9 June 2024 at 20:33
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [ANNOUNCE] New Apache Flink PMC Member - Fan Rui
Congratulations, Rui!

On Fri, Jun 7, 2024 at 5:36 AM Xia Sun  wrote:

> Congratulations, Rui!
>
> Best,
> Xia
>
> Paul Lam  于2024年6月6日周四 11:59写道:
>
> > Congrats, Rui!
> >
> > Best,
> > Paul Lam
> >
> > > 2024年6月6日 11:02,Junrui Lee  写道:
> > >
> > > Congratulations, Rui.
> > >
> > > Best,
> > > Junrui
> > >
> > > Hang Ruan  于2024年6月6日周四 10:35写道:
> > >
> > >> Congratulations, Rui!
> > >>
> > >> Best,
> > >> Hang
> > >>
> > >> Samrat Deb  于2024年6月6日周四 10:28写道:
> > >>
> > >>> Congratulations Rui
> > >>>
> > >>> Bests,
> > >>> Samrat
> > >>>
> > >>> On Thu, 6 Jun 2024 at 7:45 AM, Yuxin Tan 
> > wrote:
> > >>>
> >  Congratulations, Rui!
> > 
> >  Best,
> >  Yuxin
> > 
> > 
> >  Xuannan Su  于2024年6月6日周四 09:58写道:
> > 
> > > Congratulations!
> > >
> > > Best regards,
> > > Xuannan
> > >
> > > On Thu, Jun 6, 2024 at 9:53 AM Hangxiang Yu 
> > >>> wrote:
> > >>
> > >> Congratulations, Rui !
> > >>
> > >> On Thu, Jun 6, 2024 at 9:18 AM Lincoln Lee <
> lincoln.8...@gmail.com
> > >>>
> > > wrote:
> > >>
> > >>> Congratulations, Rui!
> > >>>
> > >>> Best,
> > >>> Lincoln Lee
> > >>>
> > >>>
> > >>> Lijie Wang  于2024年6月6日周四 09:11写道:
> > >>>
> >  Congratulations, Rui!
> > 
> >  Best,
> >  Lijie
> > 
> >  Rodrigo Meneses  于2024年6月5日周三 21:35写道:
> > 
> > > All the best
> > >
> > > On Wed, Jun 5, 2024 at 5:56 AM xiangyu feng <
> >  xiangyu...@gmail.com>
> >  wrote:
> > >
> > >> Congratulations, Rui!
> > >>
> > >> Regards,
> > >> Xiangyu Feng
> > >>
> > >> Feng Jin  于2024年6月5日周三 20:42写道:
> > >>
> > >>> Congratulations, Rui!
> > >>>
> > >>>
> > >>> Best,
> > >>> Feng Jin
> > >>>
> > >>> On Wed, Jun 5, 2024 at 8:23 PM Yanfei Lei <
> >  fredia...@gmail.com
> > >>
> >  wrote:
> > >>>
> >  Congratulations, Rui!
> > 
> >  Best,
> >  Yanfei
> > 
> >  Luke Chen  于2024年6月5日周三 20:08写道:
> > >
> > > Congrats, Rui!
> > >
> > > Luke
> > >
> > > On Wed, Jun 5, 2024 at 8:02 PM Jiabao Sun <
> > >>> jiabao...@apache.org>
> > >>> wrote:
> > >
> > >> Congrats, Rui. Well-deserved!
> > >>
> > >> Best,
> > >> Jiabao
> > >>
> > >> Zhanghao Chen 
> > >>> 于2024年6月5日周三
> >  19:29写道:
> > >>
> > >>> Congrats, Rui!
> > >>>
> > >>> Best,
> > >>> Zhanghao Chen
> > >>> 
> > >>> From: Piotr Nowojski 
> > >>> Sent: Wednesday, June 5, 2024 18:01
> > >>> To: dev ; rui fan <
> >  1996fan...@gmail.com>
> > >>> Subject: [ANNOUNCE] New Apache Flink PMC Member -
> > >>> Fan
> > > Rui
> > >>>
> > >>> Hi everyone,
> > >>>
> > >>> On behalf of the PMC, I'm very happy to announce
> > > another
> > >>> new
> > >> Apache
> >  Flink
> > >>> PMC Member - Fan Rui.
> > >>>
> > >>> Rui has been active in the community since August
> >  2019.
> >  During
> > >> this
> >  time
> > >> he
> > >>> has contributed a lot of new features. Among
> > >>> others:
> > >>>  - Decoupling Autoscaler from Kubernetes
> > >> Operator,
> >  and
> > >> supporting
> > >>> Standalone Autoscaler
> > >>>  - Improvements to checkpointing, flamegraphs,
> >  restart
> > >> strategies,
> > >>> watermark alignment, network shuffles
> > >>>  - Optimizing the memory and CPU usage of large
> > > operators,
> > >> greatly
> > >>> reducing the risk and probability of TaskManager
> > >>> OOM
> > >>>
> > >>> He reviewed a significant amount of PRs and has
> > >>> been
> > > active
> > > both
> > >> on
> >  the
> > >>> mailing lists and in Jira helping to both
> > >> maintain
> >  and
> > > grow
> > >> Apache
> > >> Flink's
> > >>> community. He is also our current Flink 1.20
> > >>> release
> > >>> manager.
> > >>>
> > >>> In the last 12 months, Rui has been the most
> > >> active
> >  contributor
> > >> in
> >  the
> > >>> Flink Kubernetes Operator 

Re: [DISCUSS] FLIP-463: Schema Definition in CREATE TABLE AS Statement

2024-06-10 Thread David Radley
Hi Sergio,
Sounds good . I am relatively new to this area and had some basic questions:

I notice in [1] it talks of materialized views. And CREATE view can already 
take the AS keyword. It would be useful to me to understand when we would use 
each of these.

- I assume the table will be kept up to date as the source data changes like a 
view.
- Are there any restrictions on the select ? Can the select be a join, 
aggregate, windowed?
- I assume generated columns are allowed?
- is the table read-only, or can we insert/ delete / update into it. If it is 
read only how will the inserts , deletes updates fail.
- I notice Azure [2] supports ISNULL. Is there a thought to change the 
nullability for the CTAS?
- I notice amazon [3] talks of the difference between view and CTAS; that CTAS 
persists the content. Is this the approach we are taking? If so where are we 
persisting?
- Amazon [4] ignores ordered by clauses ? is that the same for this proposal?

[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/concepts/dynamic_tables/
[2] 
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-develop-ctas#selectinto-vs-ctas
[3] 
https://docs.aws.amazon.com/athena/latest/ug/ctas-considerations-limitations.html#ctas-considerations-limitations-queries-vs-views
[4] 
https://docs.aws.amazon.com/athena/latest/ug/ctas-considerations-limitations.html#ctas-considerations-limitations-order-by-ignored


Kind regards, David.

From: Sergio Pena 
Date: Friday, 7 June 2024 at 16:13
To: dev@flink.apache.org 
Subject: [EXTERNAL] [DISCUSS] FLIP-463: Schema Definition in CREATE TABLE AS 
Statement
HI All,

I'd like to start a discussion on FLIP-463: Schema Definition in CREATE
TABLE AS Statement [1]

The proposal extends the CTAS statement to allow users to define their own
schema by adding columns, primary and partition keys, and table
distribution to the CREATE statement.

Any thoughts are welcome.

Thanks,
- Sergio Pena

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-463%3A+Schema+Definition+in+CREATE+TABLE+AS+Statement

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: [DISCUSS] FLIP-XXX Add K8S conditions to Flink CRD

2024-05-31 Thread David Radley
Hi Mate and Gyula,
Thank you very much for your clarifications; it is clearer for me now. I agree 
that a reconciliation condition would be useful – maybe reconciled instead of 
ready for the boolean, so it is very explicit.

Your suggestion of a job related readiness condition related to it’s health 
would be useful; you suggest it be user configurable – this seems closer to a 
liveliness / readiness probe.

Kind regards, David.

From: Mate Czagany 
Date: Thursday, 30 May 2024 at 10:39
To: dev@flink.apache.org 
Cc: morh...@apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Add K8S conditions to Flink CRD
Hi,

I would definitely keep this as a FLIP. Not all FLIPs have to be big
changes, and this format makes it easier for others to chime in and follow.

I am not a Kubernetes expert, but my understanding is that we don't have to
follow any strict convention for the type names in the conditions, e.g.
"Ready" or "Error". And as Gyula said it doesn't add too much value in the
currently proposed way, it might even be confusing for users who have not
read this email thread or FLIP because "Ready" might suggest that the job
is running and is healthy. So my suggestion is the same as Gyulas, to have
more explicit type names instead of just "Ready" and "Error". However
"ClusterReady" sounds weird in case of FlinkSessionJobs.

Regarding appending to the conditions field: if I understand the FLIP
correctly, we would allow multiple elements of the same type to exist in
the conditions list if the message and reason fields are different. From
the Kubernetes documentation it seems like the correct way would be to use
the "type" field as the map key and merge the fields [1].


[1]
https://github.com/kubernetes/kubernetes/blob/bce55b94cdc3a4592749aa919c591fa7df7453eb/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L1528

Best regards,
Mate

Gyula Fóra  ezt írta (időpont: 2024. máj. 30., Cs,
10:53):

> David,
>
> The problem is exactly that ResourceLifecycleStates do not correspond to
> specific Job statuses (JobReady condition) in most cases. Let me give you a
> concrete example:
>
> ResourceLifecycleState.STABLE means that app/job defined in the spec has
> been successfully deployed and was observed running, and this spec is now
> considered to be stable (won't be rolled back). Once a resource
> (FlinkDeployment) reached STABLE state, it won't change unless the user
> changes the spec. At the same time, this doesn't really say anything about
> job health/readiness at any given future time. 10 minutes later the job can
> go in an unrecoverable failure loop and never reach a running status, the
> ResourceLifecycleState will remain STABLE.
>
> This is actually not a problem with the ResourceLifecycleState but more
> with the understanding of it. It's called ResourceLifecycleState and not
> JobState exactly because it refers to the upgrade/rollback/suspend etc
> lifecycle of the FlinkDeployment/FlinkSessionJob resource and not the
> underlying flink job itself.
>
> But this is a crucial detail here that we need to consider otherwise the
> "Ready" condition that we may create will be practically useless.
>
> This is the reason why @morh...@apache.org  and
> I suggest separating this to at least 2 independent conditions. One could
> be the UpgradeCompleted/ReconciliationCompleted or something along these
> lines computed based on LifecycleState (as described in your proposal but
> with a different name). The other should be JobReady which could initially
> work based on the JobStatus.state field but ideally would be user
> configurable ready condition such as (job running at least 10 minutes,
> running and have taken checkpoints etcetc).
>
> These 2 conditions should be enough to start with and would actually
> provide a tangible value to users. We can probably leave out ClusterReady
> on a second thought.
>
> Cheers,
> Gyula
>
>
> On Wed, May 29, 2024 at 5:16 PM David Radley 
> wrote:
>
> > Hi Gyula,
> > Thank you for the quick response and confirmation we need a Flip. I am
> not
> > an expert at K8s, Lajith will answer in more detail. Some questions I had
> > anyway:
> >
> > I assume each of the ResourceLifecycleState do have a corresponding
> > jobReady status. You point out some mistakes in the table, for example
> that
> > STABLE should be NotReady; thankyou.  If we put a reason mentioning the
> > stable state, this would help us understand the jobStatus.
> >
> > I guess the jobReady is one perspective that we know is useful (with
> > corrected  mappings from ResourceLifecycleState and with reasons). Can I
> > check that the  2 proposed conditions would also be useful additions? I
> > assume that in your 

RE: [DISCUSS] FLIP-XXX Add K8S conditions to Flink CRD

2024-05-29 Thread David Radley
Hi Gyula,
Thank you for the quick response and confirmation we need a Flip. I am not an 
expert at K8s, Lajith will answer in more detail. Some questions I had anyway:

I assume each of the ResourceLifecycleState do have a corresponding jobReady 
status. You point out some mistakes in the table, for example that STABLE 
should be NotReady; thankyou.  If we put a reason mentioning the stable state, 
this would help us understand the jobStatus.

I guess the jobReady is one perspective that we know is useful (with corrected  
mappings from ResourceLifecycleState and with reasons). Can I check that the  2 
proposed conditions would also be useful additions? I assume that in your 
proposal  when jobReady is true, then UpgradeCompleted condition would not be 
present and ClusterReady would always be true? I know conditions do not need to 
be orthogonal, but I wanted to check what your thoughts are.

Kind regards, David.




From: Gyula Fóra 
Date: Wednesday, 29 May 2024 at 15:28
To: dev@flink.apache.org 
Cc: morh...@apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Add K8S conditions to Flink CRD
Hi David!

This change definitely warrants a FLIP even if the code change is not huge,
there are quite some implications going forward.

Looping in @morh...@apache.org  for this discussion.

I have some questions / suggestions regarding the condition's meaning and
naming.

In your proposal you have:
 - Ready (True/False) -> This condition is intended for resources which are
fully ready and operational
 - Error (True) -> This condition can be used in scenarios where any
exception/error during resource reconcile process

The problem with the above is that the implementation does not well reflect
this. ResourceLifecycleState STABLE/ROLLED_BACK does not actually mean the
job is running, it just means that the resource is fully reconciled and it
will not be rolled back (so the current pending upgrade is completed). This
is mainly a fault of the ResourceLifecycleState as it doesn't capture the
job status but one could argue that it was "designed" this way.

I think we should probably have more condition types to capture the
difference:
 - JobReady (True/False) -> Flink job is running (Basically job status but
with transition time)
 - ClusterReady (True/False) -> Session / Application cluster is deployed
(Basically JM deployment status but with transition time)
-  UpgradeCompleted (True/False) -> Similar to what you call Ready now
which should correspond to the STABLE/ROLLED_BACK states and mostly tracks
in-progress CR updates

This is my best idea at the moment, not great as it feels a little
redundant with the current status fields. But maybe thats not a problem or
a way to eliminate the old fields later?

I am not so sure of the Error status and what this means in practice. Why
do we want to track the last error in 2 places? It's already in the status.

What do you think?
Gyula

On Wed, May 29, 2024 at 3:55 PM David Radley 
wrote:

> Hi,
> Thanks Lajith for raising this discussion thread under the Flip title.
>
> To summarise the concerns from the other discussion thread.
>
> “
> - I echo Gyula that including some examples and further explanations might
> ease reader's work. With the current version, the FLIP is a bit hard to
> follow. - Will the usage of Conditions be enabled by default? Or will there
> be any disadvantages for Flink users? If Conditions with the same type
> already exist in the Status Conditions
>
> - Do you think we should have clear rules about handling rules for how
> these Conditions should be managed, especially when multiple Conditions of
> the same type are present? For example, resource has multiple causes for
> the same condition (e.g., Error due to network and Error due to I/O). Then,
> overriding the old condition with the new one is not the best approach no?
> Please correct me if I misunderstood.
> “
>
> I see the Google doc link has been reformatted to match the Flip template.
>
> To explicitly answer the questions from Jeyhun and Gyula:
> - “Will the usage of Conditions be enabled by default?” Yes, but this is
> just making the status content useful, whereas before it was not useful.
> - in terms of examples, I am not sure what you would like to see, the
> table Lajith provided shows the status for various ResourceLifecycleStates.
> How the operator gets into these states is the current behaviour. The
> change just shows the appropriate corresponding high level status – that
> could be shown on the User Interfaces.
> - “will there be any disadvantages for Flink users?” None , there is just
> more information in the status, without this it is more difficult to work
> out the status of the job.
> - Multiple conditions question. The status is showing whether the job is
> ready or not, so as long as the last condition is the one that is shown -
> all 

Re: [DISCUSS] FLIP-XXX Add K8S conditions to Flink CRD

2024-05-29 Thread David Radley
Hi,
Thanks Lajith for raising this discussion thread under the Flip title.

To summarise the concerns from the other discussion thread.

“
- I echo Gyula that including some examples and further explanations might ease 
reader's work. With the current version, the FLIP is a bit hard to follow. - 
Will the usage of Conditions be enabled by default? Or will there be any 
disadvantages for Flink users? If Conditions with the same type already exist 
in the Status Conditions

- Do you think we should have clear rules about handling rules for how these 
Conditions should be managed, especially when multiple Conditions of the same 
type are present? For example, resource has multiple causes for the same 
condition (e.g., Error due to network and Error due to I/O). Then, overriding 
the old condition with the new one is not the best approach no? Please correct 
me if I misunderstood.
“

I see the Google doc link has been reformatted to match the Flip template.

To explicitly answer the questions from Jeyhun and Gyula:
- “Will the usage of Conditions be enabled by default?” Yes, but this is just 
making the status content useful, whereas before it was not useful.
- in terms of examples, I am not sure what you would like to see, the table 
Lajith provided shows the status for various ResourceLifecycleStates. How the 
operator gets into these states is the current behaviour. The change just shows 
the appropriate corresponding high level status – that could be shown on the 
User Interfaces.
- “will there be any disadvantages for Flink users?” None , there is just more 
information in the status, without this it is more difficult to work out the 
status of the job.
- Multiple conditions question. The status is showing whether the job is ready 
or not, so as long as the last condition is the one that is shown - all is as 
expected. I don’t think this needs rules for precedence and the like.
- The condition’s Reason is going to be more specific.

Gyula and Jeyhun, is the google doc clear enough for you now? Do you feel you 
feedback has been addressed? Lajith and I are happy to provide more details.

I wonder whether this change is big enough to warrant a Flip, as it is so 
small. We could do this in an issue. WDYT?

Kind regards, David.


From: Lajith Koova 
Date: Wednesday, 29 May 2024 at 13:41
To: dev@flink.apache.org 
Subject: [EXTERNAL] [DISCUSS] FLIP-XXX Add K8S conditions to Flink CRD
Hello ,


Discussion thread here:
https://lists.apache.org/thread/dvy8w17pyjv68c3t962w49frl9odoz4z  to
discuss a proposal to add Conditions field in the CR status of Flink
Deployment and FlinkSessionJob.


Note : Starting this new thread as discussion thread title has been
modified to follow the FLIP process.


Thank you.

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-05-29 Thread David Radley
Hi Danny,
Thank you for your feedback on this.

I agree that using maps has pros and cons. The maps are flexible, but do 
require the sender and receiver to know what is in the map.

When you say “That sounds like it would fit in better, I assume we cannot just 
take that approach?” The motivation behind this Flip is to support the headers 
which is the usual way that Apicurio runs. We will support the “schema id in 
the payload” as well.

I agree with you when you say “ I am not 100% happy with the solution but I
cannot offer a better option.” – this is a pragmatic way we have found to solve 
this issue. I am open to any suggestions to improve this as well.

If we are going with the maps design (which is the best we have at the moment) 
; it would be good to have the Flink core changes in base Flink version 2.0 as 
this would mean we do not need to use reflection in a Flink Kafka version 2 
connector to work out if the runtime Flink has the new methods.

At this stage we only have one committer (yourself) backing this. Do you know 
of other 2 committers who would support this Flip?

 Kind regards, David.



From: Danny Cranmer 
Date: Friday, 24 May 2024 at 19:32
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
Hello,

> I am curious what you mean by abused.

I just meant we will end up adding more and more fields to this map over
time, and it may be hard to undo.

> For Apicurio it can be sent at the start of the payload like Confluent
Avro does. Confluent Avro have a magic byte followed by 4 bytes of schema
id, at the start of the payload. Apicurio clients and SerDe libraries can
be configured to not put the schema id in the headers in which case there
is a magic byte followed by an 8 byte schema at the start of the payload.
In the deserialization case, we would not need to look at the headers –
though the encoding is also in the headers.

That sounds like it would fit in better, I assume we cannot just take that
approach?

Thanks for the discussion. I am not 100% happy with the solution but I
cannot offer a better option. I would be interested to hear if others have
any suggestions. Playing devil's advocate against myself, we pass maps
around to configure connectors so it is not too far away from that.

Thanks,
Danny


On Fri, May 24, 2024 at 2:23 PM David Radley 
wrote:

> Hi Danny,
> No worries, thanks for replying. I have working prototype code that is
> being reviewed. It needs some cleaning up and more complete testing before
> it is ready, but will give you the general idea [1][2] to help to assess
> this approach.
>
>
> I am curious what you mean by abused. I guess the approaches are between
> generic map, mechanism vs a more particular more granular things being
> passed that might be used by another connector.
>
> Your first question:
> “how would this work if the schema ID is not in the Kafka headers, as
> hinted to in the FLIP "usually the global ID in a Kafka header"?
>
> For Apicurio it can be sent at the start of the payload like Confluent
> Avro does. Confluent Avro have a magic byte followed by 4 bytes of schema
> id, at the start of the payload. Apicurio clients and SerDe libraries can
> be configured to not put the schema id in the headers in which case there
> is a magic byte followed by an 8 byte schema at the start of the payload.
> In the deserialization case, we would not need to look at the headers –
> though the encoding is also in the headers.
>
> Your second question:
> “I am wondering if there are any other instances where the source would be
> aware of the schema ID and pass it through in this way?
> ”
> The examples I can think of are:
> - Avro can send the complete schema in a header, this is not recommended
> but in theory fits the need for a message payload to require something else
> to get the structure.
> - I see [2] that Apicurio Protobuf uses headers.
> - it might be that other message queuing projects like Rabbit MQ would
> need this to be able to support Apicurio Avro & protobuf.
>
> Kind regards, David,
>
>
>
>
> [1] https://github.com/apache/flink/pull/24715
> [2] https://github.com/apache/flink-connector-kafka/pull/99
> [3]
> https://www.apicur.io/registry/docs/apicurio-registry/2.5.x/getting-started/assembly-configuring-kafka-client-serdes.html#registry-serdes-types-json_registry
>
>
>
>
>
>
>
>
>
>
>
>
> From: Danny Cranmer 
> Date: Friday, 24 May 2024 at 12:22
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
> Hello,
>
> Apologies, I am on vacation and have limited access to email.
>
> I can see the logic here and why you ended up where you did. I can also see
> there are other useful metadata fields that we might want to pass through,
> which mi

RE: [DISCUSS] Proposing an LTS Release for the 1.x Line

2024-05-24 Thread David Radley
Hi Martjin and Alex,
I agree with your summaries, it will be interesting to see what requests there 
might be for back ports.
 Kind regards, David.




From: Alexander Fedulov 
Date: Friday, 24 May 2024 at 14:07
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] Proposing an LTS Release for the 1.x Line
@David
> I agree with Martijn that we only put features into version 2. Back
porting to v1 should not be business as usual for features, only for
security and stability changes.
Yep, this choice is explicitly reflected in the FLIP [1]

@Martijn
>  I think our starting point should be "We don't backport features, unless
discussed and agreed on the Dev mailing list".
I agree - the baseline is that we do not do that. Only if a very compelling
argument is made and the community reaches consensus, exceptions could
potentially be made, but we should try to avoid them.

[1] https://cwiki.apache.org/confluence/x/BApeEg

Best,
Alex

On Fri, 24 May 2024 at 14:38, Martijn Visser 
wrote:

> Hi David,
>
> > If there is a maintainer willing to merge backported features to v1, as
> it is important to some part of the community, this should be allowed, as
> different parts of the community have different priorities and timelines,
>
> I don't think this is a good idea. Backporting a feature can cause issues
> in other components that might be outside the span of expertise of the
> maintainer that backported said feature, causing the overall stability to
> be degraded. I think our starting point should be "We don't backport
> features, unless discussed and agreed on the Dev mailing list". That still
> opens up the ability to backport features but makes it clear where the bar
> lies.
>
> Best regards,
>
> Martijn
>
> On Fri, May 24, 2024 at 11:21 AM David Radley 
> wrote:
>
> > Hi,
> > I agree with Martijn that we only put features into version 2. Back
> > porting to v1 should not be business as usual for features, only for
> > security and stability changes.
> >
> > If there is a maintainer willing to merge backported features to v1, as
> it
> > is important to some part of the community, this should be allowed, as
> > different parts of the community have different priorities and timelines,
> >  Kind regards, David.
> >
> >
> > From: Alexander Fedulov 
> > Date: Thursday, 23 May 2024 at 18:50
> > To: dev@flink.apache.org 
> > Subject: [EXTERNAL] Re: [DISCUSS] Proposing an LTS Release for the 1.x
> Line
> > Good point, Xintong, I incorporated this item into the FLIP.
> >
> > Best,
> > Alex
> >
> > On Wed, 22 May 2024 at 10:37, Xintong Song 
> wrote:
> >
> > > Thanks, Alex.
> > >
> > > I see one task that needs to be done once the FLIP is approved, which
> I'd
> > > suggest to also mention in the: To explain the LTS policy to users on
> > > website / documentation (because FLIP is developer-facing) before /
> upon
> > > releasing 1.20.
> > >
> > > Other than that, the FLIP LGTM.
> > >
> > > Best,
> > >
> > > Xintong
> > >
> > >
> > >
> > > On Tue, May 21, 2024 at 5:21 PM Alexander Fedulov <
> > > alexander.fedu...@gmail.com> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > let's finalize this discussion. As Martijn suggested, I summarized
> this
> > > > thread into a FLIP [1]. Please take a look and let me know if there’s
> > > > anything important that I might have missed.
> > > >
> > > > Best,
> > > > Alex
> > > >
> > > > [1] https://cwiki.apache.org/confluence/x/BApeEg
> > > >
> > > >
> > > > On Tue, 23 Jan 2024 at 03:30, Rui Fan <1996fan...@gmail.com> wrote:
> > > >
> > > > > Thanks Martijn for the feedback!
> > > > >
> > > > > Sounds make sense to me! And I don't have strong opinion that allow
> > > > > backporting new features to 1.x.
> > > > >
> > > > > Best,
> > > > > Rui
> > > > >
> > > > > On Mon, Jan 22, 2024 at 8:56 PM Martijn Visser <
> > > martijnvis...@apache.org
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Rui,
> > > > > >
> > > > > > I don't think that we should allow backporting of new features
> from
> > > > > > the first minor version of 2.x to 1.x. If a user doesn't yet want
> > to
> > > > > > upgrade to 2.0, I 

RE: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-05-24 Thread David Radley
Hi Danny,
No worries, thanks for replying. I have working prototype code that is being 
reviewed. It needs some cleaning up and more complete testing before it is 
ready, but will give you the general idea [1][2] to help to assess this 
approach.


I am curious what you mean by abused. I guess the approaches are between 
generic map, mechanism vs a more particular more granular things being passed 
that might be used by another connector.

Your first question:
“how would this work if the schema ID is not in the Kafka headers, as hinted to 
in the FLIP "usually the global ID in a Kafka header"?

For Apicurio it can be sent at the start of the payload like Confluent Avro 
does. Confluent Avro have a magic byte followed by 4 bytes of schema id, at the 
start of the payload. Apicurio clients and SerDe libraries can be configured to 
not put the schema id in the headers in which case there is a magic byte 
followed by an 8 byte schema at the start of the payload. In the 
deserialization case, we would not need to look at the headers – though the 
encoding is also in the headers.

Your second question:
“I am wondering if there are any other instances where the source would be 
aware of the schema ID and pass it through in this way?
”
The examples I can think of are:
- Avro can send the complete schema in a header, this is not recommended but in 
theory fits the need for a message payload to require something else to get the 
structure.
- I see [2] that Apicurio Protobuf uses headers.
- it might be that other message queuing projects like Rabbit MQ would need 
this to be able to support Apicurio Avro & protobuf.

Kind regards, David,




[1] https://github.com/apache/flink/pull/24715
[2] https://github.com/apache/flink-connector-kafka/pull/99
[3] 
https://www.apicur.io/registry/docs/apicurio-registry/2.5.x/getting-started/assembly-configuring-kafka-client-serdes.html#registry-serdes-types-json_registry












From: Danny Cranmer 
Date: Friday, 24 May 2024 at 12:22
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
Hello,

Apologies, I am on vacation and have limited access to email.

I can see the logic here and why you ended up where you did. I can also see
there are other useful metadata fields that we might want to pass through,
which might result in this Map being abused (Kafka Topic, Kinesis Shard,
etc).

I have a follow up question, how would this work if the schema ID is not in
the Kafka headers, as hinted to in the FLIP "usually the global ID in a
Kafka header"? I am wondering if there are any other instances where the
source would be aware of the schema ID and pass it through in this way?

Thanks,
Danny



On Wed, May 22, 2024 at 3:43 PM David Radley 
wrote:

> Hi Danny,
> Did you have a chance you have a look at my responses to your feedback? I
> am hoping to keep the momentum going on this one,   kind regards, David.
>
>
> From: David Radley 
> Date: Tuesday, 14 May 2024 at 17:21
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] [DISCUSS] FLIP-XXX Apicurio-avro format
> Hi Danny,
>
> Thank you very much for the feedback and your support. I have copied your
> feedback from the VOTE thread to this discussion thread, so we can continue
> our discussions off the VOTE thread.
>
>
>
> Your feedback:
>
> Thanks for Driving this David. I am +1 for adding support for the new
>
> format, however have some questions/suggestions on the details.
>
>
>
> 1. Passing around Map additionalInputProperties feels a bit
>
> dirty. It looks like this is mainly for the Kafka connector. This connector
>
> already has a de/serialization schema extension to access record
>
> headers, KafkaRecordDeserializationSchema [1], can we use this instead?
>
> 2. Can you elaborate why we need to change the SchemaCoder interface? Again
>
> I am not a fan of adding these Map parameters
>
> 3. I assume this integration will go into the core Flink repo under
>
> flink-formats [2], and not be a separate repository like the connectors?
>
>
>
> My response:
>
> Addressing 1. and 2.
>
> I agree that sending maps around is a bit dirty. If we can see a better
> way that would be great. I was looking for a way to pass this kafka header
> information in a non-Kafka way - the most obvious way I could think was as
> a map. Here are the main considerations I saw, if I have missed anything or
> could improve something I would be grateful for any further feedback.
>
>
>
>   *   I see KafkaRecordDeserializationSchema is a Kafka interface that
> works at the Kafka record level (so includes the headers). We need a
> mechanism to send over the headers from the Kafka record to Flink
>   *   Flink core is not aware of Kafka headers, and I did not want to add
> a Kafka dependancy to core flink.
>   *   The forma

RE: [DISCUSS] Proposing an LTS Release for the 1.x Line

2024-05-24 Thread David Radley
Hi,
I agree with Martijn that we only put features into version 2. Back porting to 
v1 should not be business as usual for features, only for security and 
stability changes.

If there is a maintainer willing to merge backported features to v1, as it is 
important to some part of the community, this should be allowed, as different 
parts of the community have different priorities and timelines,
 Kind regards, David.


From: Alexander Fedulov 
Date: Thursday, 23 May 2024 at 18:50
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] Proposing an LTS Release for the 1.x Line
Good point, Xintong, I incorporated this item into the FLIP.

Best,
Alex

On Wed, 22 May 2024 at 10:37, Xintong Song  wrote:

> Thanks, Alex.
>
> I see one task that needs to be done once the FLIP is approved, which I'd
> suggest to also mention in the: To explain the LTS policy to users on
> website / documentation (because FLIP is developer-facing) before / upon
> releasing 1.20.
>
> Other than that, the FLIP LGTM.
>
> Best,
>
> Xintong
>
>
>
> On Tue, May 21, 2024 at 5:21 PM Alexander Fedulov <
> alexander.fedu...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > let's finalize this discussion. As Martijn suggested, I summarized this
> > thread into a FLIP [1]. Please take a look and let me know if there’s
> > anything important that I might have missed.
> >
> > Best,
> > Alex
> >
> > [1] https://cwiki.apache.org/confluence/x/BApeEg
> >
> >
> > On Tue, 23 Jan 2024 at 03:30, Rui Fan <1996fan...@gmail.com> wrote:
> >
> > > Thanks Martijn for the feedback!
> > >
> > > Sounds make sense to me! And I don't have strong opinion that allow
> > > backporting new features to 1.x.
> > >
> > > Best,
> > > Rui
> > >
> > > On Mon, Jan 22, 2024 at 8:56 PM Martijn Visser <
> martijnvis...@apache.org
> > >
> > > wrote:
> > >
> > > > Hi Rui,
> > > >
> > > > I don't think that we should allow backporting of new features from
> > > > the first minor version of 2.x to 1.x. If a user doesn't yet want to
> > > > upgrade to 2.0, I think that's fine since we'll have a LTS for 1.x.
> If
> > > > a newer feature becomes available in 2.x that's interesting for the
> > > > user, the user at that point can decide if they want to do the
> > > > migration. It's always a case-by-case tradeoff of effort vs benefits,
> > > > and I think with a LTS version that has bug fixes only we provide the
> > > > users with assurance that existing bugs can get fixed, and that they
> > > > can decide for themselves when they want to migrate to a newer
> version
> > > > with better/newer features.
> > > >
> > > > Best regards,
> > > >
> > > > Martijn
> > > >
> > > > On Thu, Jan 11, 2024 at 3:50 AM Rui Fan <1996fan...@gmail.com>
> wrote:
> > > > >
> > > > > Thanks everyone for discussing this topic!
> > > > >
> > > > > My question is could we make a trade-off between Flink users
> > > > > and Flink maintainers?
> > > > >
> > > > > 1. From the perspective of a Flink maintainer
> > > > >
> > > > > I strongly agree with Martin's point of view, such as:
> > > > >
> > > > > - Allowing backporting of new features to Flink 1.x will result in
> > > users
> > > > > delaying the upgrade.
> > > > > - New features will also introduce new bugs, meaning that
> maintainers
> > > > will
> > > > > have to spend time on two release versions.
> > > > >
> > > > > Considering the simplicity of maintenance, don't backport
> > > > > new features to Flink 1.x is fine.
> > > > >
> > > > > 2. From the perspective of a flink user
> > > > >
> > > > > In the first version Flink 2.x, flink will remove a lot of
> > > > > deprecated api, and introduce some features.
> > > > >
> > > > > It's a new major version, major version changes are much
> > > > > greater than minor version and patch version. Big changes
> > > > > may introduce more bugs, so I guess that a large number
> > > > > of Flink users will not use the first version of 2.x in the
> > > > > production environment. Maybe they will wait for the second
> > > > > minor version of 2.x.
> > > > >
> > > > > So, I was wondering whether we allow backport new features
> > > > > from the first minor version of 2.x to 1.x?
> > > > >
> > > > > It means, we allow backport new features of 2.0.0 to 1.21.
> > > > > And 1.21.x is similar to 2.0.x, their features are same, but
> > > > > 2.0.x removes deprecated apis. After 2.0.0 is released,
> > > > > all new features in 2.1.x and above are only available in 2.x.
> > > > >
> > > > > Looking forward to your opinions~
> > > > >
> > > > > Best,
> > > > > Rui
> > > > >
> > > > > On Wed, Jan 10, 2024 at 9:39 PM Martijn Visser <
> > > martijnvis...@apache.org
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Alex,
> > > > > >
> > > > > > I saw that I missed replying to this topic. I do think that
> Xintong
> > > > > > touched on an important topic when he mentioned that we should
> > define
> > > > > > what an LTS version means. From my point of view, I would state
> > that
> > > > > > an LTS version for Apache Flink 

Re: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-05-22 Thread David Radley
Hi Danny,
Did you have a chance you have a look at my responses to your feedback? I am 
hoping to keep the momentum going on this one,   kind regards, David.


From: David Radley 
Date: Tuesday, 14 May 2024 at 17:21
To: dev@flink.apache.org 
Subject: [EXTERNAL] [DISCUSS] FLIP-XXX Apicurio-avro format
Hi Danny,

Thank you very much for the feedback and your support. I have copied your 
feedback from the VOTE thread to this discussion thread, so we can continue our 
discussions off the VOTE thread.



Your feedback:

Thanks for Driving this David. I am +1 for adding support for the new

format, however have some questions/suggestions on the details.



1. Passing around Map additionalInputProperties feels a bit

dirty. It looks like this is mainly for the Kafka connector. This connector

already has a de/serialization schema extension to access record

headers, KafkaRecordDeserializationSchema [1], can we use this instead?

2. Can you elaborate why we need to change the SchemaCoder interface? Again

I am not a fan of adding these Map parameters

3. I assume this integration will go into the core Flink repo under

flink-formats [2], and not be a separate repository like the connectors?



My response:

Addressing 1. and 2.

I agree that sending maps around is a bit dirty. If we can see a better way 
that would be great. I was looking for a way to pass this kafka header 
information in a non-Kafka way - the most obvious way I could think was as a 
map. Here are the main considerations I saw, if I have missed anything or could 
improve something I would be grateful for any further feedback.



  *   I see KafkaRecordDeserializationSchema is a Kafka interface that works at 
the Kafka record level (so includes the headers). We need a mechanism to send 
over the headers from the Kafka record to Flink
  *   Flink core is not aware of Kafka headers, and I did not want to add a 
Kafka dependancy to core flink.
  *   The formats are stateless so it did not appear to be in fitting with the 
Flink architecture to pass through header information to stash in state in the 
format waiting for the deserialise to be subsequently called to pick up the 
header information.
  *   We could have used Thread local storage to stash the header content, but 
this would be extra state to manage; and this would seem like an obtrusive 
change.
  *   The SchemaCoder deserialise is where Confluent Avro gets the schema id 
from the payload, so it can lookup the schema. In line with this approach it 
made sense to extend the deserialise so it had the header contents so the 
Apicurio Avro format could lookup the schema.
  *   I did not want to have Apicurio specific logic in the Kafka connector, if 
we did we could pull out the appropriate headers and only send over the schema 
ids.
  *   For deserialise, the schema id we are interested in is the one in the 
Kafka headers on the message and is for the writer schema (an Avro format 
concept) currently used by the confluent-avro format in deserialize.
  *   For serialise the schema ids need to be obtained from apicurio then 
passed through to Kafka.
  *   For serialise there is existing logic around handling the metadata which 
includes passing the headers. But the presence of the metadata would imply we 
have a metadata column. Maybe a change to the metadata mechanism may have 
allowed to use to pass the headers, but not create a metadata column; instead I 
pass through the additional headers in a map to be appended.



3.

Yes this integration will go into the core Flink repo under

flink-formats and sit next to the confluent-avro format. The Avro format has 
the concept of a Registry and drives the confluent-avro format. The Apicurio 
Avro format will use the same approach.

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: [DISCUSSION] FLIP-450: Improve Runtime Configuration for Flink 2.0

2024-05-15 Thread David Radley
Hi Xuannan,
I like that you are cleaning up options that I assume are not recommended or 
currently used in some way.

I have not got experience of these options. For the proposed deprecations, will 
there be a proposed recommended alternatives that will be mentioned in the 
deprecation. If they are going to be removed for v2, it would be good to 
explicitly document the thinking and impact of the deprecations and consequent 
removals,
  Kind regards, David.

From: Xuannan Su 
Date: Tuesday, 14 May 2024 at 02:08
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [DISCUSSION] FLIP-450: Improve Runtime Configuration 
for Flink 2.0
Hi all,

Thank you for all the comments and suggestions! If there are no
further comments, I'd like to close the discussion and start the
voting in two days.

Best regards,
Xuannan



On Mon, May 13, 2024 at 3:10 PM Jeyhun Karimov  wrote:
>
> Hi Xuannan,
>
> Thanks a lot for the update. The FLIP looks good to me. +1 for it.
>
> Regards,
> Jeyhun
>
> On Mon, May 13, 2024 at 4:45 AM Xuannan Su  wrote:
>
> > Hi Jeyhun,
> >
> > Thanks for the comment!
> >
> > Yes, we intended to remove the StreamPiplineOptions in 2.0. I updated
> > the FLIP to include the information.
> >
> > Best regards,
> > Xuannan
> >
> > On Sun, May 12, 2024 at 9:16 PM Jeyhun Karimov 
> > wrote:
> > >
> > > Hi Xuannan,
> > >
> > > Thanks for driving this FLIP!
> > > I have a minor comment. Do we plan to remove StreamPipelineOptions in
> > 2.0,
> > > as it only contains deprecated options?
> > >
> > > Regards,
> > > Jeyhun
> > >
> > > On Sat, May 11, 2024 at 4:40 AM Rui Fan <1996fan...@gmail.com> wrote:
> > >
> > > > Thanks Xuannan for the update!
> > > >
> > > > LGTM, +1 for this proposal.
> > > >
> > > > Best,
> > > > Rui
> > > >
> > > > On Sat, May 11, 2024 at 10:20 AM Xuannan Su 
> > wrote:
> > > >
> > > > > Hi Rui,
> > > > >
> > > > > Thanks for the suggestion!
> > > > >
> > > > > I updated the description of
> > > > > taskmanager.network.memory.max-overdraft-buffers-per-gate and
> > > > > hard-coded it to 20.
> > > > >
> > > > > Best regards,
> > > > > Xuannan
> > > > >
> > > > > On Mon, May 6, 2024 at 11:28 AM Rui Fan <1996fan...@gmail.com>
> > wrote:
> > > > > >
> > > > > > Thanks Xuannan for driving this proposal!
> > > > > >
> > > > > > > taskmanager.network.memory.max-overdraft-buffers-per-gate will be
> > > > > removed
> > > > > > and hard-coded to either 10 or 20.
> > > > > >
> > > > > > Currently, it's a public option. Could we determine the value of
> > > > > > the overdraft buffer in the current FLIP?
> > > > > >
> > > > > > I vote 20 as the hard code value due to 2 reasons:
> > > > > > - Removing this option means users cannot change it, it might be
> > better
> > > > > to
> > > > > > turn it up.
> > > > > > - Most of tasks don't use the overdraft buffer, so increasing it
> > > > doesn't
> > > > > > introduce more risk.
> > > > > >
> > > > > > Best,
> > > > > > Rui
> > > > > >
> > > > > > On Mon, May 6, 2024 at 10:47 AM Yuxin Tan 
> > > > > wrote:
> > > > > >
> > > > > > > Thanks for the effort, Xuannan.
> > > > > > >
> > > > > > > +1 for the proposal.
> > > > > > >
> > > > > > > Best,
> > > > > > > Yuxin
> > > > > > >
> > > > > > >
> > > > > > > Xintong Song  于2024年4月29日周一 15:40写道:
> > > > > > >
> > > > > > > > Thanks for driving this effort, Xuannan.
> > > > > > > >
> > > > > > > > +1 for the proposed changes.
> > > > > > > >
> > > > > > > > Just one suggestion: Some of the proposed changes involve not
> > > > solely
> > > > > > > > changing the configuration options, but are bound to changing /
> > > > > removal
> > > > > > > of
> > > > > > > > certain features. E.g., the removal of hash-blocking shuffle
> > and
> > > > > legacy
> > > > > > > > hybrid shuffle mode, and the behavior change of overdraft
> > network
> > > > > > > buffers.
> > > > > > > > Therefore, it might be nicer to provide an implementation plan
> > > > with a
> > > > > > > list
> > > > > > > > of related tasks in the FLIP. This should not block the FLIP
> > > > though.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > >
> > > > > > > > Xintong
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Apr 25, 2024 at 4:35 PM Xuannan Su <
> > suxuanna...@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I'd like to start a discussion on FLIP-450: Improve Runtime
> > > > > > > > > Configuration for Flink 2.0 [1]. As Flink moves toward 2.0,
> > we
> > > > have
> > > > > > > > > revisited all runtime configurations and identified several
> > > > > > > > > improvements to enhance user-friendliness and
> > maintainability. In
> > > > > this
> > > > > > > > > FLIP, we aim to refine the runtime configuration.
> > > > > > > > >
> > > > > > > > > Looking forward to everyone's feedback and suggestions. Thank
> > > > you!
> > > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > > Xuannan
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > 

RE: [VOTE] FLIP-454: New Apicurio Avro format

2024-05-14 Thread David Radley
Hi Danny,
Thankyou so much for supporting this flip and for your feedback. I have copied 
your points and responded on the discussion thread here:
https://lists.apache.org/thread/rmcc67z35ysk0sv2jhz2wq0mwzjorhjb

kind regards, David.

From: Danny Cranmer 
Date: Tuesday, 14 May 2024 at 11:58
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [VOTE] FLIP-454: New Apicurio Avro format
Hello all,

Thanks for Driving this David. I am +1 for adding support for the new
format, however have some questions/suggestions on the details.

1. Passing around Map additionalInputProperties feels a bit
dirty. It looks like this is mainly for the Kafka connector. This connector
already has a de/serialization schema extension to access record
headers, KafkaRecordDeserializationSchema [1], can we use this instead?
2. Can you elaborate why we need to change the SchemaCoder interface? Again
I am not a fan of adding these Map parameters
3. I assume this integration will go into the core Flink repo under
flink-formats [2], and not be a separate repository like the connectors?

Thanks,
Danny

[1]
https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/reader/deserializer/KafkaRecordDeserializationSchema.java
[2] https://github.com/apache/flink/tree/master/flink-formats

On Sat, May 4, 2024 at 12:46 PM Ahmed Hamdy  wrote:

> +1 (non-binding)
>
> Best Regards
> Ahmed Hamdy
>
>
> On Fri, 3 May 2024 at 15:16, Jeyhun Karimov  wrote:
>
> > +1 (non binding)
> >
> > Thanks for driving this FLIP David.
> >
> > Regards,
> > Jeyhun
> >
> > On Fri, May 3, 2024 at 2:21 PM Mark Nuttall  wrote:
> >
> > > +1, I would also like to see first class support for Avro and Apicurio
> > >
> > > -- Mark Nuttall, mnutt...@apache.org
> > > Senior Software Engineer, IBM Event Automation
> > >
> > > On 2024/05/02 09:41:09 David Radley wrote:
> > > > Hi everyone,
> > > >
> > > > I'd like to start a vote on the FLIP-454: New Apicurio Avro format
> > > > [1]. The discussion thread is here [2].
> > > >
> > > > The vote will be open for at least 72 hours unless there is an
> > > > objection
> > > > or
> > > > insufficient votes.
> > > >
> > > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-454%3A+New+Apicurio+Avro+format
> > > > [2] https://lists.apache.org/thread/wtkl4yn847tdd0wrqm5xgv9wc0cb0kr8
> > > >
> > > >
> > > > Kind regards, David.
> > > >
> > > > Unless otherwise stated above:
> > > >
> > > > IBM United Kingdom Limited
> > > > Registered in England and Wales with number 741598
> > > > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6
> 3AU
> > > >
> > >
> >
>

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


[DISCUSS] FLIP-XXX Apicurio-avro format

2024-05-14 Thread David Radley
Hi Danny,

Thank you very much for the feedback and your support. I have copied your 
feedback from the VOTE thread to this discussion thread, so we can continue our 
discussions off the VOTE thread.



Your feedback:

Thanks for Driving this David. I am +1 for adding support for the new

format, however have some questions/suggestions on the details.



1. Passing around Map additionalInputProperties feels a bit

dirty. It looks like this is mainly for the Kafka connector. This connector

already has a de/serialization schema extension to access record

headers, KafkaRecordDeserializationSchema [1], can we use this instead?

2. Can you elaborate why we need to change the SchemaCoder interface? Again

I am not a fan of adding these Map parameters

3. I assume this integration will go into the core Flink repo under

flink-formats [2], and not be a separate repository like the connectors?



My response:

Addressing 1. and 2.

I agree that sending maps around is a bit dirty. If we can see a better way 
that would be great. I was looking for a way to pass this kafka header 
information in a non-Kafka way - the most obvious way I could think was as a 
map. Here are the main considerations I saw, if I have missed anything or could 
improve something I would be grateful for any further feedback.



  *   I see KafkaRecordDeserializationSchema is a Kafka interface that works at 
the Kafka record level (so includes the headers). We need a mechanism to send 
over the headers from the Kafka record to Flink
  *   Flink core is not aware of Kafka headers, and I did not want to add a 
Kafka dependancy to core flink.
  *   The formats are stateless so it did not appear to be in fitting with the 
Flink architecture to pass through header information to stash in state in the 
format waiting for the deserialise to be subsequently called to pick up the 
header information.
  *   We could have used Thread local storage to stash the header content, but 
this would be extra state to manage; and this would seem like an obtrusive 
change.
  *   The SchemaCoder deserialise is where Confluent Avro gets the schema id 
from the payload, so it can lookup the schema. In line with this approach it 
made sense to extend the deserialise so it had the header contents so the 
Apicurio Avro format could lookup the schema.
  *   I did not want to have Apicurio specific logic in the Kafka connector, if 
we did we could pull out the appropriate headers and only send over the schema 
ids.
  *   For deserialise, the schema id we are interested in is the one in the 
Kafka headers on the message and is for the writer schema (an Avro format 
concept) currently used by the confluent-avro format in deserialize.
  *   For serialise the schema ids need to be obtained from apicurio then 
passed through to Kafka.
  *   For serialise there is existing logic around handling the metadata which 
includes passing the headers. But the presence of the metadata would imply we 
have a metadata column. Maybe a change to the metadata mechanism may have 
allowed to use to pass the headers, but not create a metadata column; instead I 
pass through the additional headers in a map to be appended.



3.

Yes this integration will go into the core Flink repo under

flink-formats and sit next to the confluent-avro format. The Avro format has 
the concept of a Registry and drives the confluent-avro format. The Apicurio 
Avro format will use the same approach.

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: [VOTE] FLIP-451: Introduce timeout configuration to AsyncSink

2024-05-14 Thread David Radley
Thanks for the clarification Ahmed

+1 (non-binding)

From: Ahmed Hamdy 
Date: Monday, 13 May 2024 at 19:58
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [VOTE] FLIP-451: Introduce timeout configuration to 
AsyncSink
Thanks David,
I have replied to your question in the discussion thread.
Best Regards
Ahmed Hamdy


On Mon, 13 May 2024 at 16:21, David Radley  wrote:

> Hi,
> I raised a question on the discussion thread, around retriable errors, as
> a possible alternative,
>   Kind regards, David.
>
>
> From: Aleksandr Pilipenko 
> Date: Monday, 13 May 2024 at 16:07
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: [VOTE] FLIP-451: Introduce timeout configuration
> to AsyncSink
> Thanks for driving this!
>
> +1 (non-binding)
>
> Thanks,
> Aleksandr
>
> On Mon, 13 May 2024 at 14:08, 
> wrote:
>
> > Thanks Ahmed!
> >
> > +1 non binding
> > On May 13, 2024 at 12:40 +0200, Jeyhun Karimov ,
> > wrote:
> > > Thanks for driving this Ahmed.
> > >
> > > +1 (non-binding)
> > >
> > > Regards,
> > > Jeyhun
> > >
> > > On Mon, May 13, 2024 at 12:37 PM Muhammet Orazov
> > >  wrote:
> > >
> > > > Thanks Ahmed, +1 (non-binding)
> > > >
> > > > Best,
> > > > Muhammet
> > > >
> > > > On 2024-05-13 09:50, Ahmed Hamdy wrote:
> > > > > > Hi all,
> > > > > >
> > > > > > Thanks for the feedback on the discussion thread[1], I would like
> > to
> > > > > > start
> > > > > > a vote on FLIP-451[2]: Introduce timeout configuration to
> AsyncSink
> > > > > >
> > > > > > The vote will be open for at least 72 hours unless there is an
> > > > > > objection or
> > > > > > insufficient votes.
> > > > > >
> > > > > > 1-
> https://lists.apache.org/thread/ft7wcw7kyftvww25n5fm4l925tlgdfg0
> > > > > > 2-
> > > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-451%3A+Introduce+timeout+configuration+to+AsyncSink+API
> > > > > > Best Regards
> > > > > > Ahmed Hamdy
> > > >
> >
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: [VOTE] FLIP-451: Introduce timeout configuration to AsyncSink

2024-05-13 Thread David Radley
Hi,
I raised a question on the discussion thread, around retriable errors, as a 
possible alternative,
  Kind regards, David.


From: Aleksandr Pilipenko 
Date: Monday, 13 May 2024 at 16:07
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [VOTE] FLIP-451: Introduce timeout configuration to 
AsyncSink
Thanks for driving this!

+1 (non-binding)

Thanks,
Aleksandr

On Mon, 13 May 2024 at 14:08,  wrote:

> Thanks Ahmed!
>
> +1 non binding
> On May 13, 2024 at 12:40 +0200, Jeyhun Karimov ,
> wrote:
> > Thanks for driving this Ahmed.
> >
> > +1 (non-binding)
> >
> > Regards,
> > Jeyhun
> >
> > On Mon, May 13, 2024 at 12:37 PM Muhammet Orazov
> >  wrote:
> >
> > > Thanks Ahmed, +1 (non-binding)
> > >
> > > Best,
> > > Muhammet
> > >
> > > On 2024-05-13 09:50, Ahmed Hamdy wrote:
> > > > > Hi all,
> > > > >
> > > > > Thanks for the feedback on the discussion thread[1], I would like
> to
> > > > > start
> > > > > a vote on FLIP-451[2]: Introduce timeout configuration to AsyncSink
> > > > >
> > > > > The vote will be open for at least 72 hours unless there is an
> > > > > objection or
> > > > > insufficient votes.
> > > > >
> > > > > 1-https://lists.apache.org/thread/ft7wcw7kyftvww25n5fm4l925tlgdfg0
> > > > > 2-
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-451%3A+Introduce+timeout+configuration+to+AsyncSink+API
> > > > > Best Regards
> > > > > Ahmed Hamdy
> > >
>

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: [DISCUSS] FLIP-451: Refactor Async sink API

2024-05-13 Thread David Radley
Hi,
I wonder if the way that the async request fails could be a retriable or 
non-retriable error, so it would retry only for retriable (transient) errors 
(like IOExceptions) . I see some talk on the internet around retriable SQL 
errors.
 If this was the case then we may need configuration to limit the number of 
retries of retriable errors.
Kind regards, David


From: Muhammet Orazov 
Date: Monday, 13 May 2024 at 10:30
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] FLIP-451: Refactor Async sink API
Great, thanks for clarifying!

Best,
Muhammet


On 2024-05-06 13:40, Ahmed Hamdy wrote:
> Hi Muhammet,
> Thanks for the feedback.
>
>> Could you please add more here why it is harder? Would the
>> `completeExceptionally`
>> method be related to it? Maybe you can add usage example for it also.
>>
>
> this is mainly due to the current implementation of fatal exception
> failures which depends on base `getFatalExceptionConsumer` method that
> is
> decoupled from the actual called method `submitRequestEntries`, Since
> this
> is now not the primary concern of the FLIP, I have removed it from the
> motivation so that the scope is defined around introducing the timeout
> configuration.
>
>> Should we add a list of possible connectors that this FLIP would
>> improve?
>
> Good call, I have added under migration plan.
>
> Best Regards
> Ahmed Hamdy
>
>
> On Mon, 6 May 2024 at 08:49, Muhammet Orazov 
> wrote:
>
>> Hey Ahmed,
>>
>> Thanks for the FLIP! +1 (non-binding)
>>
>> > Additionally the current interface for passing fatal exceptions and
>> > retrying records relies on java consumers which makes it harder to
>> > understand.
>>
>> Could you please add more here why it is harder? Would the
>> `completeExceptionally`
>> method be related to it? Maybe you can add usage example for it also.
>>
>> > we should proceed by adding support in all supporting connector repos.
>>
>> Should we add list of possible connectors that this FLIP would
>> improve?
>>
>> Best,
>> Muhammet
>>
>>
>> On 2024-04-29 14:08, Ahmed Hamdy wrote:
>> > Hi all,
>> > I would like to start a discussion on FLIP-451[1]
>> > The proposal comes on encountering a couple of issues while working
>> > with
>> > implementers for Async Sink.
>> > The FLIP mainly proposes a new API similar to AsyncFunction and
>> > ResultFuture as well as introducing timeout handling for AsyncSink
>> > requests.
>> > The FLIP targets 1.20 with backward compatible changes and we should
>> > proceed by adding support in all supporting connector repos.
>> >
>> > 1-
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-451%3A+Refactor+Async+Sink+API
>> > Best Regards
>> > Ahmed Hamdy
>>

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: Discussion: Condition field in the CR status

2024-05-13 Thread David Radley
Hi Lajith,
This idea for a Flip is a good addition, which I support.

As discussed:

These conditions populate a status field in the K8s UI – Openshift in our case. 
Currently the status does not contain any information. With the conditions 
present, the status will be populated with meaningful information on the UI; 
which means the readiness is explicitly shown on the UI improving the users 
experience .

One other observation, in your example :

reason: Ready

 status: 'True'

 type: Ready



Could reason be something more granular to give more information about Ready 
status, maybe rolledback, deployed or stable?

We also talked of whether there are any use cases would be beneficial, at this 
time we don’t think so – but you said you have a look at this,

   Kind regards, David,


From: Lajith Koova 
Date: Monday, 13 May 2024 at 10:29
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: Discussion: Condition field in the CR status
Thanks for the feedback Jeyhun.

Regarding Will the usage of Conditions be enabled by default? Or will there
be any
disadvantages for Flink users .   Yes the conditions will be enabled by
default for the CR.

You are right , when there are multiple conditions of the same type,
approach is we  will override the old condition with new conditions only if
conditions status and messages are the same. If they are different , we
will add to existing conditions.

Thanks
Lajith

On Fri, May 3, 2024 at 8:05 PM Jeyhun Karimov  wrote:

> Hi Lajith,
>
> Thanks a lot for driving this FLIP. Please find my comments below:
>
> - I echo Gyula that including some examples and further explanations might
> ease reader's work. With the current version, the FLIP is a bit hard to
> follow.
>
> - Will the usage of Conditions be enabled by default? Or will there be any
> disadvantages for Flink users?
>
> If Conditions with the same type already exist in the Status Conditions
> > list, then replace the existing condition with the same type if the
> > Condition status and message are different.
>
>  - Do you think we should have clear rules about handling rules for how
> these Conditions should be managed, especially when multiple Conditions of
> the same type are present?
> For example, resource has multiple causes for the same condition (e.g.,
> Error due to network and Error due to I/O). Then, overriding the old
> condition with the new one is not the best approach no?
> Please correct me if I misunderstood.
>
> Regards,
> Jeyhun
>
> On Fri, May 3, 2024 at 8:53 AM Gyula Fóra  wrote:
>
> > Hi Lajith!
> >
> > Can you please include some examples in the document to help reviewers?
> > Just some examples with the status and the proposed conditions.
> >
> > Cheers,
> > Gyula
> >
> > On Wed, May 1, 2024 at 9:06 AM Lajith Koova  wrote:
> >
> > > Hello,
> > >
> > >
> > > Starting discussion thread here to discuss a proposal to add Conditions
> > > field in the CR status of Flink Deployment and FlinkSessionJob.
> > >
> > >
> > > Here is the google doc with details. Please provide your
> thoughts/inputs.
> > >
> > >
> > >
> > >
> >
> https://docs.google.com/document/d/12wlJCL_Vq2KZnABzK7OR7gAd1jZMmo0MxgXQXqtWODs/edit?usp=sharing
> > >
> > >
> > > Thanks
> > > Lajith
> > >
> >
>

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: [RESULT][VOTE] FLIP-454: New Apicurio Avro format

2024-05-09 Thread David Radley
Hi Leonard,
OK thanks ?C I missed that. Are any committers/PMC members prepared to back 
this Flip?
  Kind regards, David.

From: Leonard Xu 
Date: Wednesday, 8 May 2024 at 16:43
To: dev 
Subject: [EXTERNAL] Re: [RESULT][VOTE] FLIP-454: New Apicurio Avro format
Thanks David for driving the FLIP forward,  but we need 3 +1(binding)  votes 
according Flink Bylaws[1] before community accepted it.


Best,
Leonard
[1] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws

> 2024??5??8?? 11:05??David Radley  ??
>
> Hi everyone,
> I am happy to say that FLIP-454: New Apicurio Avro format [1] has been 
> accepted and voted through this thread [2].
>
> The proposal has been accepted with 4 approving votes and there
> are no vetos:
>
> - Ahmed Hamdy (non-binding)
> - Jeyhun Karimov (non-binding)
> - Mark Nuttall (non-binding)
> - Nic Townsend (non-binding)
>
> Martijn:
> Please could you update the Flip with:
> - the voting thread link
> - the accepted status
> - the Jira number (https://issues.apache.org/jira/browse/FLINK-35311).
> As the involved committer, are you willing to assign me the Jira to work on 
> and merge once you approve the changes?
>
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-454%3A+New+Apicurio+Avro+format
> [2] https://lists.apache.org/list?dev@flink.apache.org:lte=1M:apicurio
>
> Thanks to all involved.
>
> Kind regards,
> David
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


[RESULT][VOTE] FLIP-454: New Apicurio Avro format

2024-05-08 Thread David Radley
Hi everyone,
I am happy to say that FLIP-454: New Apicurio Avro format [1] has been accepted 
and voted through this thread [2].

The proposal has been accepted with 4 approving votes and there
are no vetos:

- Ahmed Hamdy (non-binding)
- Jeyhun Karimov (non-binding)
- Mark Nuttall (non-binding)
- Nic Townsend (non-binding)

Martijn:
Please could you update the Flip with:
- the voting thread link
- the accepted status
- the Jira number (https://issues.apache.org/jira/browse/FLINK-35311).
As the involved committer, are you willing to assign me the Jira to work on and 
merge once you approve the changes?

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-454%3A+New+Apicurio+Avro+format
[2] https://lists.apache.org/list?dev@flink.apache.org:lte=1M:apicurio

Thanks to all involved.

Kind regards,
David

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


[jira] [Created] (FLINK-35311) FLIP-454: New Apicurio Avro format

2024-05-08 Thread david radley (Jira)
david radley created FLINK-35311:


 Summary: FLIP-454: New Apicurio Avro format
 Key: FLINK-35311
 URL: https://issues.apache.org/jira/browse/FLINK-35311
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Kafka
Affects Versions: 1.18.1, 1.19.0, 1.17.2
Reporter: david radley
 Fix For: 2.0.0, 1.20.0


This Jira is for the accepted 
[FLIP-454|https://cwiki.apache.org/confluence/display/FLINK/FLIP-454%3A+New+Apicurio+Avro+format].
 It involves changes to 2 repositories, core Flink and the Flink Kafka 
connector  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


RE: FW: RE: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-05-02 Thread David Radley
Hi Martijn,
I have started a vote thread – please could you update the Flip with the link 
to the vote thread,
  Kind regards, David.


From: David Radley 
Date: Thursday, 2 May 2024 at 10:39
To: dev@flink.apache.org 
Subject: [EXTERNAL] RE: FW: RE: [DISCUSS] FLIP-XXX Apicurio-avro format
Fabulous, thanks Martijn 

From: Martijn Visser 
Date: Thursday, 2 May 2024 at 10:08
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: FW: RE: [DISCUSS] FLIP-XXX Apicurio-avro format
Done :)

On Thu, May 2, 2024 at 11:01 AM David Radley 
wrote:

> Hi Martijn,
> Thank you very much for looking at this. In response to your feedback; I
> produced a reduced version which is on this link.
>
>
> https://docs.google.com/document/d/1J1E-cE-X2H3-kw4rNjLn71OGPQk_Yl1iGX4-eCHWLgE/edit?usp=sharing
>
> The original version you have copied is a bit out-dated and verbose.
> Please could you replace the Flip with content from the above link,
> Kind regards, David,
>
> From: Martijn Visser 
> Date: Wednesday, 1 May 2024 at 16:31
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: FW: RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> Hi David,
>
> I've copied and pasted it into
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-454%3A+New+Apicurio+Avro+format
> ;
> please take a look if it's as expected.
>
> Best regards,
>
> Martijn
>
> On Wed, May 1, 2024 at 3:47 PM David Radley 
> wrote:
>
> > Hi Martijn,
> > Any news?
> >Kind regards, David.
> >
> >
> > From: David Radley 
> > Date: Monday, 22 April 2024 at 09:48
> > To: dev@flink.apache.org 
> > Subject: FW: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> > Hi Martijn,
> > A gentle nudge, is this ok for you or one of the PMC or committers to
> > create a Flip now?
> >Kind regards, David.
> >
> > From: David Radley 
> > Date: Monday, 15 April 2024 at 12:29
> > To: dev@flink.apache.org 
> > Subject: Re: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> > Hi Martijn,
> > Thanks for looking at this. I have used the template in a new  Google Doc
> >
> https://docs.google.com/document/d/1J1E-cE-X2H3-kw4rNjLn71OGPQk_Yl1iGX4-eCHWLgE/edit?usp=sharing
> .
> > I have significantly reduced the content in the Flip, in line with what I
> > see as the template and its usage. If this it too much or too little, I
> can
> > amend,
> >
> > Kind regards, David.
> >
> > From: Martijn Visser 
> > Date: Friday, 12 April 2024 at 18:11
> > To: dev@flink.apache.org 
> > Subject: Re: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> > Hi David,
> >
> > I tried, but the format wasn't as the FLIP template expects, so I ended
> up
> > needing to change the entire formatting and that was just too much work
> to
> > be honest. If you could make sure that especially the headers match with
> > the FLIP template, and that all of the contents from the FLIP template is
> > there, that would make things much easier.
> >
> > Thanks,
> >
> > Martijn
> >
> > On Fri, Apr 12, 2024 at 6:08 PM David Radley 
> > wrote:
> >
> > > Hi,
> > > A gentle nudge. Please could a committer/PMC member raise the Flip for
> > > this,
> > >   Kind regards, David.
> > >
> > >
> > > From: David Radley 
> > > Date: Monday, 8 April 2024 at 09:40
> > > To: dev@flink.apache.org 
> > > Subject: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> > > Hi,
> > > I have posted a Google Doc [0] to the mailing list for a discussion
> > thread
> > > for a Flip proposal to introduce a Apicurio-avro format. The
> discussions
> > > have been resolved, please could a committer/PMC member copy the
> contents
> > > from the Google Doc, and create a FLIP number for this,. as per the
> > process
> > > [1],
> > >   Kind regards, David.
> > > [0]
> > >
> > >
> >
> https://docs.google.com/document/d/14LWZPVFQ7F9mryJPdKXb4l32n7B0iWYkcOdEd1xTC7w/edit?usp=sharing
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-CreateyourOwnFLIP
> > >
> > > From: Jeyhun Karimov 
> > > Date: Friday, 22 March 2024 at 13:05
> > > To: dev@flink.apache.org 
> > > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
> > > Hi David,
> > >
> > > Thanks a lot for clarification.
> > > Sounds goo

[VOTE] FLIP-454: New Apicurio Avro format

2024-05-02 Thread David Radley
Hi everyone,

I'd like to start a vote on the FLIP-454: New Apicurio Avro format
[1]. The discussion thread is here [2].

The vote will be open for at least 72 hours unless there is an
objection
or
insufficient votes.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-454%3A+New+Apicurio+Avro+format
[2] https://lists.apache.org/thread/wtkl4yn847tdd0wrqm5xgv9wc0cb0kr8


Kind regards, David.

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: FW: RE: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-05-02 Thread David Radley
Fabulous, thanks Martijn 

From: Martijn Visser 
Date: Thursday, 2 May 2024 at 10:08
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: FW: RE: [DISCUSS] FLIP-XXX Apicurio-avro format
Done :)

On Thu, May 2, 2024 at 11:01 AM David Radley 
wrote:

> Hi Martijn,
> Thank you very much for looking at this. In response to your feedback; I
> produced a reduced version which is on this link.
>
>
> https://docs.google.com/document/d/1J1E-cE-X2H3-kw4rNjLn71OGPQk_Yl1iGX4-eCHWLgE/edit?usp=sharing
>
> The original version you have copied is a bit out-dated and verbose.
> Please could you replace the Flip with content from the above link,
> Kind regards, David,
>
> From: Martijn Visser 
> Date: Wednesday, 1 May 2024 at 16:31
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: FW: RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> Hi David,
>
> I've copied and pasted it into
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-454%3A+New+Apicurio+Avro+format
> ;
> please take a look if it's as expected.
>
> Best regards,
>
> Martijn
>
> On Wed, May 1, 2024 at 3:47 PM David Radley 
> wrote:
>
> > Hi Martijn,
> > Any news?
> >Kind regards, David.
> >
> >
> > From: David Radley 
> > Date: Monday, 22 April 2024 at 09:48
> > To: dev@flink.apache.org 
> > Subject: FW: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> > Hi Martijn,
> > A gentle nudge, is this ok for you or one of the PMC or committers to
> > create a Flip now?
> >Kind regards, David.
> >
> > From: David Radley 
> > Date: Monday, 15 April 2024 at 12:29
> > To: dev@flink.apache.org 
> > Subject: Re: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> > Hi Martijn,
> > Thanks for looking at this. I have used the template in a new  Google Doc
> >
> https://docs.google.com/document/d/1J1E-cE-X2H3-kw4rNjLn71OGPQk_Yl1iGX4-eCHWLgE/edit?usp=sharing
> .
> > I have significantly reduced the content in the Flip, in line with what I
> > see as the template and its usage. If this it too much or too little, I
> can
> > amend,
> >
> > Kind regards, David.
> >
> > From: Martijn Visser 
> > Date: Friday, 12 April 2024 at 18:11
> > To: dev@flink.apache.org 
> > Subject: Re: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> > Hi David,
> >
> > I tried, but the format wasn't as the FLIP template expects, so I ended
> up
> > needing to change the entire formatting and that was just too much work
> to
> > be honest. If you could make sure that especially the headers match with
> > the FLIP template, and that all of the contents from the FLIP template is
> > there, that would make things much easier.
> >
> > Thanks,
> >
> > Martijn
> >
> > On Fri, Apr 12, 2024 at 6:08 PM David Radley 
> > wrote:
> >
> > > Hi,
> > > A gentle nudge. Please could a committer/PMC member raise the Flip for
> > > this,
> > >   Kind regards, David.
> > >
> > >
> > > From: David Radley 
> > > Date: Monday, 8 April 2024 at 09:40
> > > To: dev@flink.apache.org 
> > > Subject: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> > > Hi,
> > > I have posted a Google Doc [0] to the mailing list for a discussion
> > thread
> > > for a Flip proposal to introduce a Apicurio-avro format. The
> discussions
> > > have been resolved, please could a committer/PMC member copy the
> contents
> > > from the Google Doc, and create a FLIP number for this,. as per the
> > process
> > > [1],
> > >   Kind regards, David.
> > > [0]
> > >
> > >
> >
> https://docs.google.com/document/d/14LWZPVFQ7F9mryJPdKXb4l32n7B0iWYkcOdEd1xTC7w/edit?usp=sharing
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-CreateyourOwnFLIP
> > >
> > > From: Jeyhun Karimov 
> > > Date: Friday, 22 March 2024 at 13:05
> > > To: dev@flink.apache.org 
> > > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
> > > Hi David,
> > >
> > > Thanks a lot for clarification.
> > > Sounds good to me.
> > >
> > > Regards,
> > > Jeyhun
> > >
> > > On Fri, Mar 22, 2024 at 10:54 AM David Radley  >
> > > wrote:
> > >
> > > > Hi Jeyhun,
> > > > Thanks for your feedback.
> > > >
> >

RE: FW: RE: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-05-02 Thread David Radley
Hi Martijn,
Thank you very much for looking at this. In response to your feedback; I 
produced a reduced version which is on this link.

https://docs.google.com/document/d/1J1E-cE-X2H3-kw4rNjLn71OGPQk_Yl1iGX4-eCHWLgE/edit?usp=sharing

The original version you have copied is a bit out-dated and verbose. Please 
could you replace the Flip with content from the above link,
Kind regards, David,

From: Martijn Visser 
Date: Wednesday, 1 May 2024 at 16:31
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: FW: RE: [DISCUSS] FLIP-XXX Apicurio-avro format
Hi David,

I've copied and pasted it into
https://cwiki.apache.org/confluence/display/FLINK/FLIP-454%3A+New+Apicurio+Avro+format;
please take a look if it's as expected.

Best regards,

Martijn

On Wed, May 1, 2024 at 3:47 PM David Radley  wrote:

> Hi Martijn,
> Any news?
>Kind regards, David.
>
>
> From: David Radley 
> Date: Monday, 22 April 2024 at 09:48
> To: dev@flink.apache.org 
> Subject: FW: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> Hi Martijn,
> A gentle nudge, is this ok for you or one of the PMC or committers to
> create a Flip now?
>    Kind regards, David.
>
> From: David Radley 
> Date: Monday, 15 April 2024 at 12:29
> To: dev@flink.apache.org 
> Subject: Re: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> Hi Martijn,
> Thanks for looking at this. I have used the template in a new  Google Doc
> https://docs.google.com/document/d/1J1E-cE-X2H3-kw4rNjLn71OGPQk_Yl1iGX4-eCHWLgE/edit?usp=sharing.
> I have significantly reduced the content in the Flip, in line with what I
> see as the template and its usage. If this it too much or too little, I can
> amend,
>
> Kind regards, David.
>
> From: Martijn Visser 
> Date: Friday, 12 April 2024 at 18:11
> To: dev@flink.apache.org 
> Subject: Re: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> Hi David,
>
> I tried, but the format wasn't as the FLIP template expects, so I ended up
> needing to change the entire formatting and that was just too much work to
> be honest. If you could make sure that especially the headers match with
> the FLIP template, and that all of the contents from the FLIP template is
> there, that would make things much easier.
>
> Thanks,
>
> Martijn
>
> On Fri, Apr 12, 2024 at 6:08 PM David Radley 
> wrote:
>
> > Hi,
> > A gentle nudge. Please could a committer/PMC member raise the Flip for
> > this,
> >   Kind regards, David.
> >
> >
> > From: David Radley 
> > Date: Monday, 8 April 2024 at 09:40
> > To: dev@flink.apache.org 
> > Subject: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> > Hi,
> > I have posted a Google Doc [0] to the mailing list for a discussion
> thread
> > for a Flip proposal to introduce a Apicurio-avro format. The discussions
> > have been resolved, please could a committer/PMC member copy the contents
> > from the Google Doc, and create a FLIP number for this,. as per the
> process
> > [1],
> >   Kind regards, David.
> > [0]
> >
> >
> https://docs.google.com/document/d/14LWZPVFQ7F9mryJPdKXb4l32n7B0iWYkcOdEd1xTC7w/edit?usp=sharing
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-CreateyourOwnFLIP
> >
> > From: Jeyhun Karimov 
> > Date: Friday, 22 March 2024 at 13:05
> > To: dev@flink.apache.org 
> > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
> > Hi David,
> >
> > Thanks a lot for clarification.
> > Sounds good to me.
> >
> > Regards,
> > Jeyhun
> >
> > On Fri, Mar 22, 2024 at 10:54 AM David Radley 
> > wrote:
> >
> > > Hi Jeyhun,
> > > Thanks for your feedback.
> > >
> > > So for outbound messages, the message includes the global ID. We
> register
> > > the schema and match on the artifact id. So if the schema then evolved,
> > > adding a new  version, the global ID would still be unique and the same
> > > version would be targeted. If you wanted to change the Flink table
> > > definition in line with a higher version, then you could do this – the
> > > artifact id would need to match for it to use the same schema and a
> > higher
> > > artifact version would need to be provided. I notice that Apicurio has
> > > rules around compatibility that you can configure, I suppose if we
> > attempt
> > > to create an artifact that breaks these rules , then the register
> schema
> > > will fail and the associated operation should fail (e.g. an insert). I
&g

FW: RE: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-05-01 Thread David Radley
Hi Martijn,
Any news?
   Kind regards, David.


From: David Radley 
Date: Monday, 22 April 2024 at 09:48
To: dev@flink.apache.org 
Subject: FW: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
Hi Martijn,
A gentle nudge, is this ok for you or one of the PMC or committers to create a 
Flip now?
   Kind regards, David.

From: David Radley 
Date: Monday, 15 April 2024 at 12:29
To: dev@flink.apache.org 
Subject: Re: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
Hi Martijn,
Thanks for looking at this. I have used the template in a new  Google Doc 
https://docs.google.com/document/d/1J1E-cE-X2H3-kw4rNjLn71OGPQk_Yl1iGX4-eCHWLgE/edit?usp=sharing.
  I have significantly reduced the content in the Flip, in line with what I see 
as the template and its usage. If this it too much or too little, I can amend,

Kind regards, David.

From: Martijn Visser 
Date: Friday, 12 April 2024 at 18:11
To: dev@flink.apache.org 
Subject: Re: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
Hi David,

I tried, but the format wasn't as the FLIP template expects, so I ended up
needing to change the entire formatting and that was just too much work to
be honest. If you could make sure that especially the headers match with
the FLIP template, and that all of the contents from the FLIP template is
there, that would make things much easier.

Thanks,

Martijn

On Fri, Apr 12, 2024 at 6:08 PM David Radley 
wrote:

> Hi,
> A gentle nudge. Please could a committer/PMC member raise the Flip for
> this,
>   Kind regards, David.
>
>
> From: David Radley 
> Date: Monday, 8 April 2024 at 09:40
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> Hi,
> I have posted a Google Doc [0] to the mailing list for a discussion thread
> for a Flip proposal to introduce a Apicurio-avro format. The discussions
> have been resolved, please could a committer/PMC member copy the contents
> from the Google Doc, and create a FLIP number for this,. as per the process
> [1],
>   Kind regards, David.
> [0]
>
> https://docs.google.com/document/d/14LWZPVFQ7F9mryJPdKXb4l32n7B0iWYkcOdEd1xTC7w/edit?usp=sharing
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-CreateyourOwnFLIP
>
> From: Jeyhun Karimov 
> Date: Friday, 22 March 2024 at 13:05
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
> Hi David,
>
> Thanks a lot for clarification.
> Sounds good to me.
>
> Regards,
> Jeyhun
>
> On Fri, Mar 22, 2024 at 10:54 AM David Radley 
> wrote:
>
> > Hi Jeyhun,
> > Thanks for your feedback.
> >
> > So for outbound messages, the message includes the global ID. We register
> > the schema and match on the artifact id. So if the schema then evolved,
> > adding a new  version, the global ID would still be unique and the same
> > version would be targeted. If you wanted to change the Flink table
> > definition in line with a higher version, then you could do this – the
> > artifact id would need to match for it to use the same schema and a
> higher
> > artifact version would need to be provided. I notice that Apicurio has
> > rules around compatibility that you can configure, I suppose if we
> attempt
> > to create an artifact that breaks these rules , then the register schema
> > will fail and the associated operation should fail (e.g. an insert). I
> have
> > not tried this.
> >
> >
> > For inbound messages, using the global id in the header – this targets
> one
> > version of the schema. I can create different messages on the topic built
> > with different schema versions, and I can create different tables in
> Flink,
> > as long as the reader and writer schemas are compatible as per the
> >
> https://github.com/apache/flink/blob/779459168c46b7b4c600ef52f99a5435f81b9048/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/RegistryAvroDeserializationSchema.java#L109
> > Then this should work.
> >
> > Does this address your question?
> > Kind regards, David.
> >
> >
> > From: Jeyhun Karimov 
> > Date: Thursday, 21 March 2024 at 21:06
> > To: dev@flink.apache.org 
> > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
> > Hi David,
> >
> > Thanks for the FLIP. +1 for it.
> > I have a minor comment.
> >
> > Can you please elaborate more on mechanisms in place to ensure data
> > consistency and integrity, particularly in the event of schema conflicts?
> > Since each message includes a schema ID for inbound and outbound
> messages,
> > can you elaborate more on me

FW: RE: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-04-22 Thread David Radley
Hi Martijn,
A gentle nudge, is this ok for you or one of the PMC or committers to create a 
Flip now?
   Kind regards, David.

From: David Radley 
Date: Monday, 15 April 2024 at 12:29
To: dev@flink.apache.org 
Subject: Re: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
Hi Martijn,
Thanks for looking at this. I have used the template in a new  Google Doc 
https://docs.google.com/document/d/1J1E-cE-X2H3-kw4rNjLn71OGPQk_Yl1iGX4-eCHWLgE/edit?usp=sharing.
  I have significantly reduced the content in the Flip, in line with what I see 
as the template and its usage. If this it too much or too little, I can amend,

Kind regards, David.

From: Martijn Visser 
Date: Friday, 12 April 2024 at 18:11
To: dev@flink.apache.org 
Subject: Re: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
Hi David,

I tried, but the format wasn't as the FLIP template expects, so I ended up
needing to change the entire formatting and that was just too much work to
be honest. If you could make sure that especially the headers match with
the FLIP template, and that all of the contents from the FLIP template is
there, that would make things much easier.

Thanks,

Martijn

On Fri, Apr 12, 2024 at 6:08 PM David Radley 
wrote:

> Hi,
> A gentle nudge. Please could a committer/PMC member raise the Flip for
> this,
>   Kind regards, David.
>
>
> From: David Radley 
> Date: Monday, 8 April 2024 at 09:40
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> Hi,
> I have posted a Google Doc [0] to the mailing list for a discussion thread
> for a Flip proposal to introduce a Apicurio-avro format. The discussions
> have been resolved, please could a committer/PMC member copy the contents
> from the Google Doc, and create a FLIP number for this,. as per the process
> [1],
>   Kind regards, David.
> [0]
>
> https://docs.google.com/document/d/14LWZPVFQ7F9mryJPdKXb4l32n7B0iWYkcOdEd1xTC7w/edit?usp=sharing
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-CreateyourOwnFLIP
>
> From: Jeyhun Karimov 
> Date: Friday, 22 March 2024 at 13:05
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
> Hi David,
>
> Thanks a lot for clarification.
> Sounds good to me.
>
> Regards,
> Jeyhun
>
> On Fri, Mar 22, 2024 at 10:54 AM David Radley 
> wrote:
>
> > Hi Jeyhun,
> > Thanks for your feedback.
> >
> > So for outbound messages, the message includes the global ID. We register
> > the schema and match on the artifact id. So if the schema then evolved,
> > adding a new  version, the global ID would still be unique and the same
> > version would be targeted. If you wanted to change the Flink table
> > definition in line with a higher version, then you could do this – the
> > artifact id would need to match for it to use the same schema and a
> higher
> > artifact version would need to be provided. I notice that Apicurio has
> > rules around compatibility that you can configure, I suppose if we
> attempt
> > to create an artifact that breaks these rules , then the register schema
> > will fail and the associated operation should fail (e.g. an insert). I
> have
> > not tried this.
> >
> >
> > For inbound messages, using the global id in the header – this targets
> one
> > version of the schema. I can create different messages on the topic built
> > with different schema versions, and I can create different tables in
> Flink,
> > as long as the reader and writer schemas are compatible as per the
> >
> https://github.com/apache/flink/blob/779459168c46b7b4c600ef52f99a5435f81b9048/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/RegistryAvroDeserializationSchema.java#L109
> > Then this should work.
> >
> > Does this address your question?
> > Kind regards, David.
> >
> >
> > From: Jeyhun Karimov 
> > Date: Thursday, 21 March 2024 at 21:06
> > To: dev@flink.apache.org 
> > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
> > Hi David,
> >
> > Thanks for the FLIP. +1 for it.
> > I have a minor comment.
> >
> > Can you please elaborate more on mechanisms in place to ensure data
> > consistency and integrity, particularly in the event of schema conflicts?
> > Since each message includes a schema ID for inbound and outbound
> messages,
> > can you elaborate more on message consistency in the context of schema
> > evolution?
> >
> > Regards,
> > Jeyhun
> >
> >
> >
> >
> >
> > On Wed, Mar 20, 2024 at 4:34 PM David Radley 
> 

RE: [DISCUSS] FLINK-34440 Support Debezium Protobuf Confluent Format

2024-04-16 Thread David Radley
Hi Anupam,
Thanks for your response. I was wondering around the schema id and had some 
thoughts:

I assume that for Confluent Avro, specifying the schema is not normally done, 
but could be useful to force a particular shape.

If you specify a schema id in the format configuration:
- for deserialization : does this mean the schema id in the payload has to 
match it. If so we lose the ability to have multiple versions of the schema on 
a topic. For me schemaId makes less sense for deserialization as the existing 
mechanism used by Avro / confluent avro formats is working well.

- I can see it makes sense for the serialization where there is an existing 
schema in the registry you want to target.

I suggest the schemaId be called something like schemaIdForSink or 
schemaIdForSerilization; to prevent confusion with the deserialization case. We 
could have the schema as you suggest so we are compatible with the confluent 
avro format.


WDYT?
Kind regards, David.


From: Anupam Aggarwal 
Date: Saturday, 13 April 2024 at 16:08
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] FLINK-34440 Support Debezium Protobuf 
Confluent Format
Hi David,

Thank you for the suggestion.
IIUC, you are proposing using an explicit schema string, instead of the
schemaID.
This makes sense, as it would make the behavior consistent with Avro,
although a bit more verbose from a config standpoint.

If we go via the schema string route, the user would have to ensure that
the input schema string corresponds to an existing schemaID.
This however, might end up registering a new id (based on
https://github.com/confluentinc/schema-registry/issues/878#issuecomment-437510493
).

How about adding both the options (explicit schema string/ schemaID).
If a schema string is specified we register a new schemaID, if the user
specifies an explicit schemaID we just use it directly?

Thanks
Anupam

On Wed, Apr 10, 2024 at 2:27 PM David Radley 
wrote:

> Hi,
> I notice in the draft pr that there is a schema id in the format config. I
> was wondering why? In the confluent avro and existing debezium formats,
> there is no schema id in the config, but there is the ability to specify a
> complete schema. In the protobuf format there is no schema id.
>
> I assume the schema id would be used during serialize in the case there is
> already an existing registered schema and you have its id. I see in the
> docs
> https://docs.confluent.io/platform/current/schema-registry/fundamentals/serdes-develop/serdes-protobuf.html
> there is a serialize example where 2 schemas are registered.
>
> I would suggest aiming to copy what the confluent DeSer libraries do
> rather than having a schema id hard coded in the config.
>
> WDYT?
> Kind regards, David.
>
> From: Kevin Lam 
> Date: Tuesday, 26 March 2024 at 20:06
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: [DISCUSS] FLINK-34440 Support Debezium Protobuf
> Confluent Format
> Thanks Anupam! Looking forward to it.
>
> On Thu, Mar 14, 2024 at 1:50 AM Anupam Aggarwal  >
> wrote:
>
> > Hi Kevin,
> >
> > Thanks, these are some great points.
> > Just to clarify, I do agree that the subject should be an option (like in
> > the case of RegistryAvroFormatFactory).
> > We could fallback to subject and auto-register schemas, if schema-Id not
> > provided explicitly.
> > In general, I think it would be good to be more explicit about the
> schemas
> > used (
> >
> >
> https://docs.confluent.io/platform/curren/schema-registry/schema_registry_onprem_tutorial.html#auto-schema-registration
> > <
> >
> https://docs.confluent.io/platform/current/schema-registry/schema_registry_onprem_tutorial.html#auto-schema-registration
> > >
> > ).
> > This would also help prevent us from overriding the ids in incompatible
> > ways.
> >
> > Under the current implementation of FlinkToProtoSchemaConverter we might
> > end up overwriting the field-Ids.
> > If we are able to locate a prior schema, the approach you outlined makes
> a
> > lot of sense.
> > Let me explore this a bit further and get back(in terms of feasibility).
> >
> > Thanks again!
> > - Anupam
> >
> > On Wed, Mar 13, 2024 at 2:28 AM Kevin Lam  >
> > wrote:
> >
> > > Hi Anupam,
> > >
> > > Thanks again for your work on contributing this feature back.
> > >
> > > Sounds good re: the refactoring/re-organizing.
> > >
> > > Regarding the schema-id, in my opinion this should NOT be a
> configuration
> > > option on the format. We should be able to deterministically map the
> > Flink
> > > type to the ProtoSchema and register that with the Schema Registry.
> > >
> > > I 

Re: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-04-15 Thread David Radley
Hi Martijn,
Thanks for looking at this. I have used the template in a new  Google Doc 
https://docs.google.com/document/d/1J1E-cE-X2H3-kw4rNjLn71OGPQk_Yl1iGX4-eCHWLgE/edit?usp=sharing.
  I have significantly reduced the content in the Flip, in line with what I see 
as the template and its usage. If this it too much or too little, I can amend,

Kind regards, David.

From: Martijn Visser 
Date: Friday, 12 April 2024 at 18:11
To: dev@flink.apache.org 
Subject: Re: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
Hi David,

I tried, but the format wasn't as the FLIP template expects, so I ended up
needing to change the entire formatting and that was just too much work to
be honest. If you could make sure that especially the headers match with
the FLIP template, and that all of the contents from the FLIP template is
there, that would make things much easier.

Thanks,

Martijn

On Fri, Apr 12, 2024 at 6:08 PM David Radley 
wrote:

> Hi,
> A gentle nudge. Please could a committer/PMC member raise the Flip for
> this,
>   Kind regards, David.
>
>
> From: David Radley 
> Date: Monday, 8 April 2024 at 09:40
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> Hi,
> I have posted a Google Doc [0] to the mailing list for a discussion thread
> for a Flip proposal to introduce a Apicurio-avro format. The discussions
> have been resolved, please could a committer/PMC member copy the contents
> from the Google Doc, and create a FLIP number for this,. as per the process
> [1],
>   Kind regards, David.
> [0]
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_14LWZPVFQ7F9mryJPdKXb4l32n7B0iWYkcOdEd1xTC7w_edit-3Fusp-3Dsharing=DwIGaQ=BSDicqBQBDjDI9RkVyTcHQ=a_7ppZzQ4vpQjmqdi73nB22RONTV0tEZsZXcfdiBEOA=ir9ageEmhu8pt03AmvMqEG9MHPp8aZLMBcqU2pmOnyg6yHra8b6IRXFylvH_aP8G=pHL2e8waNNtvTDT0a3PQM0bcXrb1Fywv0YW_Ln50jCo=
>
> [1]
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_FLINK_Flink-2BImprovement-2BProposals-23FlinkImprovementProposals-2DCreateyourOwnFLIP=DwIGaQ=BSDicqBQBDjDI9RkVyTcHQ=a_7ppZzQ4vpQjmqdi73nB22RONTV0tEZsZXcfdiBEOA=ir9ageEmhu8pt03AmvMqEG9MHPp8aZLMBcqU2pmOnyg6yHra8b6IRXFylvH_aP8G=_7fvlZYc-gUtkFEhwSz9utYsgbDrUtkHEToTdhtQvQc=
>
> From: Jeyhun Karimov 
> Date: Friday, 22 March 2024 at 13:05
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
> Hi David,
>
> Thanks a lot for clarification.
> Sounds good to me.
>
> Regards,
> Jeyhun
>
> On Fri, Mar 22, 2024 at 10:54 AM David Radley 
> wrote:
>
> > Hi Jeyhun,
> > Thanks for your feedback.
> >
> > So for outbound messages, the message includes the global ID. We register
> > the schema and match on the artifact id. So if the schema then evolved,
> > adding a new  version, the global ID would still be unique and the same
> > version would be targeted. If you wanted to change the Flink table
> > definition in line with a higher version, then you could do this – the
> > artifact id would need to match for it to use the same schema and a
> higher
> > artifact version would need to be provided. I notice that Apicurio has
> > rules around compatibility that you can configure, I suppose if we
> attempt
> > to create an artifact that breaks these rules , then the register schema
> > will fail and the associated operation should fail (e.g. an insert). I
> have
> > not tried this.
> >
> >
> > For inbound messages, using the global id in the header – this targets
> one
> > version of the schema. I can create different messages on the topic built
> > with different schema versions, and I can create different tables in
> Flink,
> > as long as the reader and writer schemas are compatible as per the
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_flink_blob_779459168c46b7b4c600ef52f99a5435f81b9048_flink-2Dformats_flink-2Davro_src_main_java_org_apache_flink_formats_avro_RegistryAvroDeserializationSchema.java-23L109=DwIGaQ=BSDicqBQBDjDI9RkVyTcHQ=a_7ppZzQ4vpQjmqdi73nB22RONTV0tEZsZXcfdiBEOA=ir9ageEmhu8pt03AmvMqEG9MHPp8aZLMBcqU2pmOnyg6yHra8b6IRXFylvH_aP8G=kfPzGTjUx9alvbOMoJoeWEHHQ14qwYxTJXbVWAhYvAc=
> > Then this should work.
> >
> > Does this address your question?
> > Kind regards, David.
> >
> >
> > From: Jeyhun Karimov 
> > Date: Thursday, 21 March 2024 at 21:06
> > To: dev@flink.apache.org 
> > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
> > Hi David,
> >
> > Thanks for the FLIP. +1 for it.
> > I have a minor comment.
> >
> > Can you please elaborate more on mechanisms in place to ensure data
> > consistenc

Re: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-04-12 Thread David Radley
Hi,
A gentle nudge. Please could a committer/PMC member raise the Flip for this,
  Kind regards, David.


From: David Radley 
Date: Monday, 8 April 2024 at 09:40
To: dev@flink.apache.org 
Subject: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
Hi,
I have posted a Google Doc [0] to the mailing list for a discussion thread for 
a Flip proposal to introduce a Apicurio-avro format. The discussions have been 
resolved, please could a committer/PMC member copy the contents from the Google 
Doc, and create a FLIP number for this,. as per the process [1],
  Kind regards, David.
[0]
https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_14LWZPVFQ7F9mryJPdKXb4l32n7B0iWYkcOdEd1xTC7w_edit-3Fusp-3Dsharing=DwIGaQ=BSDicqBQBDjDI9RkVyTcHQ=a_7ppZzQ4vpQjmqdi73nB22RONTV0tEZsZXcfdiBEOA=ir9ageEmhu8pt03AmvMqEG9MHPp8aZLMBcqU2pmOnyg6yHra8b6IRXFylvH_aP8G=pHL2e8waNNtvTDT0a3PQM0bcXrb1Fywv0YW_Ln50jCo=

[1]
https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_FLINK_Flink-2BImprovement-2BProposals-23FlinkImprovementProposals-2DCreateyourOwnFLIP=DwIGaQ=BSDicqBQBDjDI9RkVyTcHQ=a_7ppZzQ4vpQjmqdi73nB22RONTV0tEZsZXcfdiBEOA=ir9ageEmhu8pt03AmvMqEG9MHPp8aZLMBcqU2pmOnyg6yHra8b6IRXFylvH_aP8G=_7fvlZYc-gUtkFEhwSz9utYsgbDrUtkHEToTdhtQvQc=

From: Jeyhun Karimov 
Date: Friday, 22 March 2024 at 13:05
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
Hi David,

Thanks a lot for clarification.
Sounds good to me.

Regards,
Jeyhun

On Fri, Mar 22, 2024 at 10:54 AM David Radley 
wrote:

> Hi Jeyhun,
> Thanks for your feedback.
>
> So for outbound messages, the message includes the global ID. We register
> the schema and match on the artifact id. So if the schema then evolved,
> adding a new  version, the global ID would still be unique and the same
> version would be targeted. If you wanted to change the Flink table
> definition in line with a higher version, then you could do this – the
> artifact id would need to match for it to use the same schema and a higher
> artifact version would need to be provided. I notice that Apicurio has
> rules around compatibility that you can configure, I suppose if we attempt
> to create an artifact that breaks these rules , then the register schema
> will fail and the associated operation should fail (e.g. an insert). I have
> not tried this.
>
>
> For inbound messages, using the global id in the header – this targets one
> version of the schema. I can create different messages on the topic built
> with different schema versions, and I can create different tables in Flink,
> as long as the reader and writer schemas are compatible as per the
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_flink_blob_779459168c46b7b4c600ef52f99a5435f81b9048_flink-2Dformats_flink-2Davro_src_main_java_org_apache_flink_formats_avro_RegistryAvroDeserializationSchema.java-23L109=DwIGaQ=BSDicqBQBDjDI9RkVyTcHQ=a_7ppZzQ4vpQjmqdi73nB22RONTV0tEZsZXcfdiBEOA=ir9ageEmhu8pt03AmvMqEG9MHPp8aZLMBcqU2pmOnyg6yHra8b6IRXFylvH_aP8G=kfPzGTjUx9alvbOMoJoeWEHHQ14qwYxTJXbVWAhYvAc=
> Then this should work.
>
> Does this address your question?
> Kind regards, David.
>
>
> From: Jeyhun Karimov 
> Date: Thursday, 21 March 2024 at 21:06
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
> Hi David,
>
> Thanks for the FLIP. +1 for it.
> I have a minor comment.
>
> Can you please elaborate more on mechanisms in place to ensure data
> consistency and integrity, particularly in the event of schema conflicts?
> Since each message includes a schema ID for inbound and outbound messages,
> can you elaborate more on message consistency in the context of schema
> evolution?
>
> Regards,
> Jeyhun
>
>
>
>
>
> On Wed, Mar 20, 2024 at 4:34 PM David Radley  wrote:
>
> > Thank you very much for your feedback Mark. I have made the changes in
> the
> > latest google document. On reflection I agree with you that the
> > globalIdPlacement format configuration should apply to the
> deserialization
> > as well, so it is declarative. I am also going to have a new
> configuration
> > option to work with content IDs as well as global IDs. In line with the
> > deser Apicurio IdHandler and headerHandlers.
> >
> >  kind regards, David.
> >
> >
> > On 2024/03/20 15:18:37 Mark Nuttall wrote:
> > > +1 to this
> > >
> > > A few small comments:
> > >
> > > Currently, if users have Avro schemas in an Apicurio Registry (an open
> > source Apache 2 licensed schema registry), then the natural way to work
> > with those Avro flows is to use the schemas in the Apicurio Repository.
> > > 'those Avro flows' ... this is 

RE: [DISCUSS] FLINK-34440 Support Debezium Protobuf Confluent Format

2024-04-10 Thread David Radley
ting
> > logic
> > > in AbstractKafkaProtobufSerializer.java
> > > <
> > >
> >
> https://github.com/confluentinc/schema-registry/blob/342c8a9d3854d4253d785214f5dcfb1b6cc59a06/protobuf-serializer/src/main/java/io/confluent/kafka/serializers/protobuf/AbstractKafkaProtobufSerializer.java#L157
> > > >
> > >  (|Deserializer).
> > > Please let me know if this makes sense / or in case you have any other
> > > feedback.
> > >
> > > Thanks
> > > Anupam
> > >
> > > On Thu, Feb 29, 2024 at 8:54 PM Kevin Lam
>  > >
> > > wrote:
> > >
> > > > Hey Robert,
> > > >
> > > > Awesome thanks, that timeline works for me. Sounds good re: deciding
> on
> > > > FLIP once we have the PR, and thanks for looking into the field ids.
> > > >
> > > > Looking forward to it!
> > > >
> > > > On Thu, Feb 29, 2024 at 5:09 AM Robert Metzger 
> > > > wrote:
> > > >
> > > > > Hey Kevin,
> > > > >
> > > > > Thanks a lot. Then let's contribute the Confluent implementation to
> > > > > apache/flink. We can't start working on this immediately because
> of a
> > > > team
> > > > > event next week, but within the next two weeks, we will start
> working
> > > on
> > > > > this.
> > > > > It probably makes sense for us to open a pull request of what we
> have
> > > > > already, so that you can start reviewing and maybe also
> contributing
> > to
> > > > the
> > > > > PR.
> > > > > I hope this timeline works for you!
> > > > >
> > > > > Let's also decide if we need a FLIP once the code is public.
> > > > > We will look into the field ids.
> > > > >
> > > > >
> > > > > On Tue, Feb 27, 2024 at 8:56 PM Kevin Lam
> > >  > > > >
> > > > > wrote:
> > > > >
> > > > > > Hey Robert,
> > > > > >
> > > > > > Thanks for your response. I have a partial implementation, just
> for
> > > the
> > > > > > decoding portion.
> > > > > >
> > > > > > The code I have is pretty rough and doesn't do any of the
> > refactors I
> > > > > > mentioned, but the decoder logic does pull the schema from the
> > schema
> > > > > > registry and use that to deserialize the DynamicMessage before
> > > > converting
> > > > > > it to RowData using a DynamicMessageToRowDataConverter class. For
> > the
> > > > > other
> > > > > > aspects, I would need to start from scratch for the encoder.
> > > > > >
> > > > > > Would be very happy to see you drive the contribution back to
> open
> > > > source
> > > > > > from Confluent, or collaborate on this.
> > > > > >
> > > > > > Another topic I had is Protobuf's field ids. Ideally in Flink it
> > > would
> > > > be
> > > > > > nice if we are idiomatic about not renumbering them in
> incompatible
> > > > ways,
> > > > > > similar to what's discussed on the Schema Registry issue here:
> > > > > > https://github.com/confluentinc/schema-registry/issues/2551
> > > > > >
> > > > > >
> > > > > > On Tue, Feb 27, 2024 at 5:51 AM Robert Metzger <
> > rmetz...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > +1 to support the format in Flink.
> > > > > > >
> > > > > > > @Kevin: Do you already have an implementation for this inhouse
> > that
> > > > you
> > > > > > are
> > > > > > > looking to upstream, or would you start from scratch?
> > > > > > > I'm asking because my current employer, Confluent, has a
> Protobuf
> > > > > Schema
> > > > > > > registry implementation for Flink, and I could help drive
> > > > contributing
> > > > > > this
> > > > > > > back to open source.
> > > > > > > If you already have an implementation, let's decide which one
> to
> > &g

RE: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-04-09 Thread David Radley
Hi Han,
Thanks for getting back to me.

I am curious about the valid characters in a model name – I assume any 
characters are valid as it is a quoted string in SQL. So $ could be in the 
model name. I would think that the model would be determined then the model is 
deployed, ( there could be other versions associated with authoring  or 
intermediate states of the model that never get deployed) – rather than 
allocated by Flink if there is none.
I see https://github.com/onnx/onnx/blob/main/docs/Versioning.md supports 
numbers or semantic versioning and 3 different types of versioning.

It would be interesting to see how champion challenger scenarios would play out 
– when you try a new version of the model that might perform better.
I suggest having a new optional model-version keyword, which would seem to be a 
cleaner way of specifying a model.



 Kind regards, David.

From: Hao Li 
Date: Wednesday, 3 April 2024 at 18:58
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL
Cross post David Radley's comments here from voting thread:

> I don’t think this counts as an objection, I have some comments. I should
have put this on the discussion thread earlier but have just got to this.
> - I suggest we can put a model version in the model resource. Versions
are notoriously difficult to add later; I don’t think we want to
proliferate differently named models as a model mutates. We may want to
work with non-latest models.
> - I see that the model name is the unique identifier. I realise this
would move away from the Oracle syntax – so may not be feasible short term;
but I wonder if we can have:
> - a uuid as the main identifier and the model name as an attribute.
> or
>  - a namespace (or something like a system of origin)
> to help organise models with the same name.
> - does the model have an owner? I assume that Flink model resource is the
master of the model? I imagine in the future that a model that comes in via
a new connector could be kept up to date with the external model and would
not be allowed to be changed by anything other than the connector.

Thanks for the comments. I agree supporting the model version is important.
I think we could support versioning without changing the overall syntax by
appending version number/name to the model name. Catalog implementations
can handle the versions. For example,

CREATE MODEL `my-model$1`...

"$1" would imply it's version 1. If no version is provided, we can auto
increment the version if the model name exists already or create the first
version if the model name doesn't exist yet.

As for model ownership, I'm not entirely sure about the use case and how it
should be controlled. It could be controlled from the user side through
ACL/rbac or some way in the catalog I guess. Maybe we can follow up on this
as the requirement or use case becomes more clear.

Cross post David Moravek's comments from voting thread:

> My only suggestion would be to move Catalog changes into a separate
> interface to allow us to begin with lower stability guarantees. Existing
> Catalogs would be able to opt-in by implementing it. It's a minor thing
> though, overall the FLIP is solid and the direction is pretty exciting.

I think it's fine to move model related catalog changes to a separate
interface and let the current catalog interface extend it. As model support
will be built-in in Flink, the current catalog interface will need to
support model CRUD operations. For my own education, can you elaborate more
on how separate interface will allow us to begin with lower stability
guarantees?

Thanks,
Hao


On Thu, Mar 28, 2024 at 10:14 AM Hao Li  wrote:

> Thanks Timo. I'll start a vote tomorrow if no further discussion.
>
> Thanks,
> Hao
>
> On Thu, Mar 28, 2024 at 9:33 AM Timo Walther  wrote:
>
>> Hi everyone,
>>
>> I updated the FLIP according to this discussion.
>>
>> @Hao Li: Let me know if I made a mistake somewhere. I added some
>> additional explaning comments about the new PTF syntax.
>>
>> There are no further objections from my side. If nobody objects, Hao
>> feel free to start the voting tomorrow.
>>
>> Regards,
>> Timo
>>
>>
>> On 28.03.24 16:30, Jark Wu wrote:
>> > Thanks, Hao,
>> >
>> > Sounds good to me.
>> >
>> > Best,
>> > Jark
>> >
>> > On Thu, 28 Mar 2024 at 01:02, Hao Li  wrote:
>> >
>> >> Hi Jark,
>> >>
>> >> I think we can start with supporting popular model providers such as
>> >> openai, azureml, sagemaker for remote models.
>> >>
>> >> Thanks,
>> >> Hao
>> >>
>> >> On Tue, Mar 26, 2024 at 8:15 PM Jark Wu  wrote:
>> >>
>> >>> Thanks for the PoC and updating,
>> >>>
>> >>> The final syntax looks good to me, at least it is a nice and concise
>> >> first
>> >>> step.
>> >>>
>> >>> SELECT f1, f2, label FROM
>> >>> ML_PREDICT(
>> >>>   input => `my_data`,
>> >>>   model => `my_cat`.`my_db`.`classifier_model`,
>> >>>   args => DESCRIPTOR(f1, f2));
>> >>>
>> >>> Besides, what built-in models 

RE: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-04-08 Thread David Radley
Hi,
I have posted a Google Doc [0] to the mailing list for a discussion thread for 
a Flip proposal to introduce a Apicurio-avro format. The discussions have been 
resolved, please could a committer/PMC member copy the contents from the Google 
Doc, and create a FLIP number for this,. as per the process [1],
  Kind regards, David.
[0]
https://docs.google.com/document/d/14LWZPVFQ7F9mryJPdKXb4l32n7B0iWYkcOdEd1xTC7w/edit?usp=sharing

[1]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-CreateyourOwnFLIP

From: Jeyhun Karimov 
Date: Friday, 22 March 2024 at 13:05
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
Hi David,

Thanks a lot for clarification.
Sounds good to me.

Regards,
Jeyhun

On Fri, Mar 22, 2024 at 10:54 AM David Radley 
wrote:

> Hi Jeyhun,
> Thanks for your feedback.
>
> So for outbound messages, the message includes the global ID. We register
> the schema and match on the artifact id. So if the schema then evolved,
> adding a new  version, the global ID would still be unique and the same
> version would be targeted. If you wanted to change the Flink table
> definition in line with a higher version, then you could do this – the
> artifact id would need to match for it to use the same schema and a higher
> artifact version would need to be provided. I notice that Apicurio has
> rules around compatibility that you can configure, I suppose if we attempt
> to create an artifact that breaks these rules , then the register schema
> will fail and the associated operation should fail (e.g. an insert). I have
> not tried this.
>
>
> For inbound messages, using the global id in the header – this targets one
> version of the schema. I can create different messages on the topic built
> with different schema versions, and I can create different tables in Flink,
> as long as the reader and writer schemas are compatible as per the
> https://github.com/apache/flink/blob/779459168c46b7b4c600ef52f99a5435f81b9048/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/RegistryAvroDeserializationSchema.java#L109
> Then this should work.
>
> Does this address your question?
> Kind regards, David.
>
>
> From: Jeyhun Karimov 
> Date: Thursday, 21 March 2024 at 21:06
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
> Hi David,
>
> Thanks for the FLIP. +1 for it.
> I have a minor comment.
>
> Can you please elaborate more on mechanisms in place to ensure data
> consistency and integrity, particularly in the event of schema conflicts?
> Since each message includes a schema ID for inbound and outbound messages,
> can you elaborate more on message consistency in the context of schema
> evolution?
>
> Regards,
> Jeyhun
>
>
>
>
>
> On Wed, Mar 20, 2024 at 4:34 PM David Radley  wrote:
>
> > Thank you very much for your feedback Mark. I have made the changes in
> the
> > latest google document. On reflection I agree with you that the
> > globalIdPlacement format configuration should apply to the
> deserialization
> > as well, so it is declarative. I am also going to have a new
> configuration
> > option to work with content IDs as well as global IDs. In line with the
> > deser Apicurio IdHandler and headerHandlers.
> >
> >  kind regards, David.
> >
> >
> > On 2024/03/20 15:18:37 Mark Nuttall wrote:
> > > +1 to this
> > >
> > > A few small comments:
> > >
> > > Currently, if users have Avro schemas in an Apicurio Registry (an open
> > source Apache 2 licensed schema registry), then the natural way to work
> > with those Avro flows is to use the schemas in the Apicurio Repository.
> > > 'those Avro flows' ... this is the first reference to flows.
> > >
> > > The new format will use the global Id to look up the Avro schema that
> > the message was written during deserialization.
> > > I get the point, phrasing is awkward. Probably you're more interested
> in
> > content than word polish at this point though.
> > >
> > > The Avro Schema Registry (apicurio-avro) format
> > > The Confluent format is called avro-confluent; this should be
> > avro-apicurio
> > >
> > > How to create tables with Apicurio-avro format
> > > s/Apicurio-avro/avro-apicurio/g
> > >
> > > HEADER – globalId is put in the header
> > > LEGACY– global Id is put in the message as a long
> > > CONFLUENT - globalId is put in the message as an int.
> > > Please could we specify 'four-byte int' and 'eight-byte long' ?
> > >
> > > For a Kafka source the

Re: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-04-08 Thread David Radley
Hi,
The discussions seem to have ended. I have made some minor changes to the 
document [1], I would like to leave the discussion open to the end of this 
week. If there are no more discussions, I will ask for the Flip to be copied 
into the standard FLIP location and start the voting next week.

[1] 
https://docs.google.com/document/d/14LWZPVFQ7F9mryJPdKXb4l32n7B0iWYkcOdEd1xTC7w/edit?usp=sharing


  Kind regards, David


From: David Radley 
Date: Wednesday, 20 March 2024 at 11:03
To: dev@flink.apache.org 
Subject: [EXTERNAL] [DISCUSS] FLIP-XXX Apicurio-avro format
Hi,
As per the FLIP process I would like to raise a FLIP, but do not have 
authority, so have created a google doc for the Flip to introduce a new 
Apicurio Avro format. The document is 
https://docs.google.com/document/d/14LWZPVFQ7F9mryJPdKXb4l32n7B0iWYkcOdEd1xTC7w/edit?usp=sharing

I have prototyped a lot of the content to prove that this approach is feasible. 
I look forward to the discussion,
  Kind regards, David.



Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Re: [VOTE] FLIP-437: Support ML Models in Flink SQL

2024-04-03 Thread David Radley
Hi Hao,
I don’t think this counts as an objection, I have some comments. I should have 
put this on the discussion thread earlier but have just got to this.
- I suggest we can put a model version in the model resource. Versions are 
notoriously difficult to add later; I don’t think we want to proliferate 
differently named models as a model mutates. We may want to work with 
non-latest models.
- I see that the model name is the unique identifier. I realise this would move 
away from the Oracle syntax – so may not be feasible short term; but I wonder 
if we can have:
 - a uuid as the main identifier and the model name as an attribute.
or
 - a namespace (or something like a system of origin)
to help organise models with the same name.
- does the model have an owner? I assume that Flink model resource is the 
master of the model? I imagine in the future that a model that comes in via a 
new connector could be kept up to date with the external model and would not be 
allowed to be changed by anything other than the connector.

   Kind regards, David.

From: Hao Li 
Date: Friday, 29 March 2024 at 16:30
To: dev@flink.apache.org 
Subject: [EXTERNAL] [VOTE] FLIP-437: Support ML Models in Flink SQL
Hi devs,

I'd like to start a vote on the FLIP-437: Support ML Models in Flink
SQL [1]. The discussion thread is here [2].

The vote will be open for at least 72 hours unless there is an objection or
insufficient votes.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL

[2] https://lists.apache.org/thread/9z94m2bv4w265xb5l2mrnh4lf9m28ccn

Thanks,
Hao

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-03-22 Thread David Radley
Hi Jeyhun,
Thanks for your feedback.

So for outbound messages, the message includes the global ID. We register the 
schema and match on the artifact id. So if the schema then evolved, adding a 
new  version, the global ID would still be unique and the same version would be 
targeted. If you wanted to change the Flink table definition in line with a 
higher version, then you could do this – the artifact id would need to match 
for it to use the same schema and a higher artifact version would need to be 
provided. I notice that Apicurio has rules around compatibility that you can 
configure, I suppose if we attempt to create an artifact that breaks these 
rules , then the register schema will fail and the associated operation should 
fail (e.g. an insert). I have not tried this.


For inbound messages, using the global id in the header – this targets one 
version of the schema. I can create different messages on the topic built with 
different schema versions, and I can create different tables in Flink, as long 
as the reader and writer schemas are compatible as per the 
https://github.com/apache/flink/blob/779459168c46b7b4c600ef52f99a5435f81b9048/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/RegistryAvroDeserializationSchema.java#L109
Then this should work.

Does this address your question?
Kind regards, David.


From: Jeyhun Karimov 
Date: Thursday, 21 March 2024 at 21:06
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
Hi David,

Thanks for the FLIP. +1 for it.
I have a minor comment.

Can you please elaborate more on mechanisms in place to ensure data
consistency and integrity, particularly in the event of schema conflicts?
Since each message includes a schema ID for inbound and outbound messages,
can you elaborate more on message consistency in the context of schema
evolution?

Regards,
Jeyhun





On Wed, Mar 20, 2024 at 4:34 PM David Radley  wrote:

> Thank you very much for your feedback Mark. I have made the changes in the
> latest google document. On reflection I agree with you that the
> globalIdPlacement format configuration should apply to the deserialization
> as well, so it is declarative. I am also going to have a new configuration
> option to work with content IDs as well as global IDs. In line with the
> deser Apicurio IdHandler and headerHandlers.
>
>  kind regards, David.
>
>
> On 2024/03/20 15:18:37 Mark Nuttall wrote:
> > +1 to this
> >
> > A few small comments:
> >
> > Currently, if users have Avro schemas in an Apicurio Registry (an open
> source Apache 2 licensed schema registry), then the natural way to work
> with those Avro flows is to use the schemas in the Apicurio Repository.
> > 'those Avro flows' ... this is the first reference to flows.
> >
> > The new format will use the global Id to look up the Avro schema that
> the message was written during deserialization.
> > I get the point, phrasing is awkward. Probably you're more interested in
> content than word polish at this point though.
> >
> > The Avro Schema Registry (apicurio-avro) format
> > The Confluent format is called avro-confluent; this should be
> avro-apicurio
> >
> > How to create tables with Apicurio-avro format
> > s/Apicurio-avro/avro-apicurio/g
> >
> > HEADER – globalId is put in the header
> > LEGACY– global Id is put in the message as a long
> > CONFLUENT - globalId is put in the message as an int.
> > Please could we specify 'four-byte int' and 'eight-byte long' ?
> >
> > For a Kafka source the globalId will be looked for in this order:
> > - In the header
> > - After a magic byte as an int
> > - After a magic byte as a long.
> > but apicurio-avro.globalid-placement has a default value of HEADER : why
> do we have a search order as well? Isn't apicurio-avro.globalid-placement
> enough? Don't the two mechanisms conflict?
> >
> > In addition to the types listed there, Flink supports reading/writing
> nullable types. Flink maps nullable types to Avro union(something, null),
> where something is the Avro type converted from Flink type.
> > Is that definitely the right way round? I know we've had multiple
> conversations about how unions work with Flink
> >
> >  This is because the writer schema is expanded, but this could not
> complete if there are circularities.
> > I understand your meaning but the sentence is awkward.
> >
> > The registered schema will be created or if it exists be updated.
> > same again
> >
> > At some stage the lowest Flink level supported by the Kafka connector
> will contain the additionalProperties methods in code flink.
> > wording
> >
> > There existing Kafka deserialization for the wri

Re: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-03-20 Thread David Radley
Thank you very much for your feedback Mark. I have made the changes in the 
latest google document. On reflection I agree with you that the 
globalIdPlacement format configuration should apply to the deserialization as 
well, so it is declarative. I am also going to have a new configuration option 
to work with content IDs as well as global IDs. In line with the deser Apicurio 
IdHandler and headerHandlers. 

 kind regards, David.


On 2024/03/20 15:18:37 Mark Nuttall wrote:
> +1 to this
> 
> A few small comments: 
> 
> Currently, if users have Avro schemas in an Apicurio Registry (an open source 
> Apache 2 licensed schema registry), then the natural way to work with those 
> Avro flows is to use the schemas in the Apicurio Repository.
> 'those Avro flows' ... this is the first reference to flows.
> 
> The new format will use the global Id to look up the Avro schema that the 
> message was written during deserialization.
> I get the point, phrasing is awkward. Probably you're more interested in 
> content than word polish at this point though.
> 
> The Avro Schema Registry (apicurio-avro) format
> The Confluent format is called avro-confluent; this should be avro-apicurio
> 
> How to create tables with Apicurio-avro format
> s/Apicurio-avro/avro-apicurio/g
> 
> HEADER – globalId is put in the header
> LEGACY– global Id is put in the message as a long
> CONFLUENT - globalId is put in the message as an int.
> Please could we specify 'four-byte int' and 'eight-byte long' ?
> 
> For a Kafka source the globalId will be looked for in this order:
> - In the header
> - After a magic byte as an int
> - After a magic byte as a long.
> but apicurio-avro.globalid-placement has a default value of HEADER : why do 
> we have a search order as well? Isn't apicurio-avro.globalid-placement 
> enough? Don't the two mechanisms conflict?
> 
> In addition to the types listed there, Flink supports reading/writing 
> nullable types. Flink maps nullable types to Avro union(something, null), 
> where something is the Avro type converted from Flink type.
> Is that definitely the right way round? I know we've had multiple 
> conversations about how unions work with Flink
> 
>  This is because the writer schema is expanded, but this could not complete 
> if there are circularities.
> I understand your meaning but the sentence is awkward.
> 
> The registered schema will be created or if it exists be updated.
> same again
> 
> At some stage the lowest Flink level supported by the Kafka connector will 
> contain the additionalProperties methods in code flink.
> wording
> 
> There existing Kafka deserialization for the writer schema passes down the 
> message body to be deserialised.
> wording
> 
> @Override
> public void deserialize(ConsumerRecord message, Collector 
> out)
>   throws IOException {
>   Map additionalPropertiesMap =  new HashMap<>();
>   for (Header header : message.additionalProperties()) {
>   headersMap.put(header.key(), header.value());
>   }
>   deserializationSchema.deserialize(message.value(), headersMap, out);
> }
> This fails to compile at headersMap.
> 
> The input stream and additionalProperties will be sent so the Apicurio 
> SchemaCoder which will try getting the globalId from the headers, then 4 
> bytes from the payload then 8 bytes from the payload.
> I'm still stuck on apicurio-avro.globalid-placement having a default value of 
> HEADER . Should we try all three, or fail if this config param has a wrong 
> value?
> 
> Other considerations
> The implementation does not use the Apicurio deser libraries,
> Please can we refer to them as SerDes; this is the term used within the 
> documentation that you link to
> 
> 
> On 2024/03/20 10:09:08 David Radley wrote:
> > Hi,
> > As per the FLIP process I would like to raise a FLIP, but do not have 
> > authority, so have created a google doc for the Flip to introduce a new 
> > Apicurio Avro format. The document is 
> > https://docs.google.com/document/d/14LWZPVFQ7F9mryJPdKXb4l32n7B0iWYkcOdEd1xTC7w/edit?usp=sharing
> > 
> > I have prototyped a lot of the content to prove that this approach is 
> > feasible. I look forward to the discussion,
> >   Kind regards, David.
> > 
> > 
> > 
> > Unless otherwise stated above:
> > 
> > IBM United Kingdom Limited
> > Registered in England and Wales with number 741598
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
> > 
> 


[DISCUSS] FLIP-XXX Apicurio-avro format

2024-03-20 Thread David Radley
Hi,
As per the FLIP process I would like to raise a FLIP, but do not have 
authority, so have created a google doc for the Flip to introduce a new 
Apicurio Avro format. The document is 
https://docs.google.com/document/d/14LWZPVFQ7F9mryJPdKXb4l32n7B0iWYkcOdEd1xTC7w/edit?usp=sharing

I have prototyped a lot of the content to prove that this approach is feasible. 
I look forward to the discussion,
  Kind regards, David.



Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: Additional metadata available for Kafka serdes

2024-03-20 Thread David Radley
Hi Balint,
Excellent, I have just put up a discussion thread on the dev list for a new Fli 
top add this, please review and feedback and +1 if this is what you are looking 
for.
At the moment, I have not got the topic name in there. I wonder where we would 
use the topic name during deserialization or serialization? The only place I 
could see it being used could be in serialization, where we register a 
(potentially new) schema, but this may not be desired as the schema could be a 
nested schema.
WDYT
 kind regards, David.

From: Balint Bene 
Date: Thursday, 14 March 2024 at 20:16
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: Additional metadata available for Kafka serdes
Hi David!

I think passing the headers as a map (as opposed to
ConsumerRecord/ProducerRecord) is a great idea that should work. That way
the core Flink package doesn't have Kafka dependencies, it seems like
they're meant to be decoupled anyway. The one bonus that using the Record
objects has is that it also provides the topic name, which is a part of the
signature (but usually unused) for Kafka serdes. Do you think it's
worthwhile to also have the topic name included in the signature along with
the map?

Happy to test things out, provide feedback. I'm not working on an Apicurio
format myself, but the use case is very similar.

Thanks,
Balint

On Thu, Mar 14, 2024 at 12:41 PM David Radley 
wrote:

> Hi ,
> I am currently prototyping an Avro Apicurio format that I hope to raise as
> a FLIP very soon (hopefully by early  next week). In my prototyping , I am
> passing through the Kafka headers content as a map to the
> DeserializationSchema and have extended the SerializationSchema to pass
> back headers. I am using new default methods in the interface so as to be
> backwardly compatible. I have the deserialise working and the serialise is
> close.
>
> We did consider trying to use the Apicurio deser libraries but this is
> tricky due to the way the code is split.
>
> Let me know what you think – I hope this approach will meet your needs,
> Kind regards, David.
>
> From: Balint Bene 
> Date: Tuesday, 12 March 2024 at 22:18
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Additional metadata available for Kafka serdes
> Hello! Looking to get some guidance for a problem around the Flink formats
> used for Kafka.
>
> Flink currently uses common serdes interfaces across all formats. However,
> some data formats used in Kafka require headers for serdes.  It's the same
> problem for serialization and deserialization, so I'll just use
> DynamicKafkaDeserialationSchema
> <
> https://github.com/Shopify/shopify-flink-connector-kafka/blob/979791c4c71e944c16c51419cf9a84aa1f8fea4c/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/table/DynamicKafkaDeserializationSchema.java#L130
> >
> as
> an example. It has access to the Kafka record headers, but it can't pass
> them to the DeserializationSchema
> <
> https://github.com/apache/flink/blob/94b55d1ae61257f21c7bb511660e7497f269abc7/flink-core/src/main/java/org/apache/flink/api/common/serialization/DeserializationSchema.java#L81
> >
> implemented
> by the format since the interface is generic.
>
> If it were possible to pass the headers, then open source formats such as
> Apicurio could be supported. Unlike the Confluent formats which store the
> metadata (schema ID) appended to the serialized bytes in the key and value,
> the Apicurio formats store their metadata in the record headers.
>
> I have bandwidth to work on this, but it would be great to have direction
> from the community. I have a simple working prototype that's able to load a
> custom version of the format with a modified interface that can accept the
> headers (I just put the entire Apache Kafka ConsumerRecord
> <
> https://kafka.apache.org/36/javadoc/org/apache/kafka/clients/consumer/ConsumerRecord.html
> >
> /ProducerRecord
> <
> https://kafka.apache.org/36/javadoc/org/apache/kafka/clients/producer/ProducerRecord.html
> >
> for simplicity). The issues I foresee is that the class-loader
> <
> https://github.com/apache/flink/blob/94b55d1ae61257f21c7bb511660e7497f269abc7/flink-core/src/main/java/org/apache/flink/core/classloading/ComponentClassLoader.java
> >
> exists in the Flink repo along with interfaces for the formats, but these
> changes are specific to Kafka. This solution could require migrating
> formats to the Flink-connector-kafka repo which is a decent amount of work.
>
> Feedback is appreciated!
> Thanks
> Balint
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: [ANNOUNCE] Apache Flink 1.19.0 released

2024-03-18 Thread David Radley
Congratulations!
Kind regards, David

From: Ahmed Hamdy 
Date: Monday, 18 March 2024 at 15:55
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [ANNOUNCE] Apache Flink 1.19.0 released
Congratulations!
Best Regards
Ahmed Hamdy


On Mon, 18 Mar 2024 at 12:30, Xintong Song  wrote:

> Congratulations~!
>
> Best,
>
> Xintong
>
>
>
> On Mon, Mar 18, 2024 at 7:02 PM Feng Jin  wrote:
>
> > Congratulations!
> >
> > Best,
> > Feng
> >
> > On Mon, Mar 18, 2024 at 6:18 PM Yuepeng Pan 
> wrote:
> >
> > > Congratulations!
> > >
> > >
> > > Thanks to release managers and everyone involved.
> > >
> > >
> > >
> > >
> > > Best,
> > > Yuepeng Pan
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > At 2024-03-18 18:09:45, "Yubin Li"  wrote:
> > > >Congratulations!
> > > >
> > > >Thanks to release managers and everyone involved.
> > > >
> > > >On Mon, Mar 18, 2024 at 5:55 PM Hangxiang Yu 
> > wrote:
> > > >>
> > > >> Congratulations!
> > > >> Thanks release managers and all involved!
> > > >>
> > > >> On Mon, Mar 18, 2024 at 5:23 PM Hang Ruan 
> > > wrote:
> > > >>
> > > >> > Congratulations!
> > > >> >
> > > >> > Best,
> > > >> > Hang
> > > >> >
> > > >> > Paul Lam  于2024年3月18日周一 17:18写道:
> > > >> >
> > > >> > > Congrats! Thanks to everyone involved!
> > > >> > >
> > > >> > > Best,
> > > >> > > Paul Lam
> > > >> > >
> > > >> > > > 2024年3月18日 16:37,Samrat Deb  写道:
> > > >> > > >
> > > >> > > > Congratulations !
> > > >> > > >
> > > >> > > > On Mon, 18 Mar 2024 at 2:07 PM, Jingsong Li <
> > > jingsongl...@gmail.com>
> > > >> > > wrote:
> > > >> > > >
> > > >> > > >> Congratulations!
> > > >> > > >>
> > > >> > > >> On Mon, Mar 18, 2024 at 4:30 PM Rui Fan <
> 1996fan...@gmail.com>
> > > wrote:
> > > >> > > >>>
> > > >> > > >>> Congratulations, thanks for the great work!
> > > >> > > >>>
> > > >> > > >>> Best,
> > > >> > > >>> Rui
> > > >> > > >>>
> > > >> > > >>> On Mon, Mar 18, 2024 at 4:26 PM Lincoln Lee <
> > > lincoln.8...@gmail.com>
> > > >> > > >> wrote:
> > > >> > > 
> > > >> > >  The Apache Flink community is very happy to announce the
> > > release of
> > > >> > > >> Apache Flink 1.19.0, which is the fisrt release for the
> Apache
> > > Flink
> > > >> > > 1.19
> > > >> > > >> series.
> > > >> > > 
> > > >> > >  Apache Flink® is an open-source stream processing framework
> > for
> > > >> > > >> distributed, high-performing, always-available, and accurate
> > data
> > > >> > > streaming
> > > >> > > >> applications.
> > > >> > > 
> > > >> > >  The release is available for download at:
> > > >> > >  https://flink.apache.org/downloads.html
> > > >> > > 
> > > >> > >  Please check out the release blog post for an overview of
> the
> > > >> > > >> improvements for this bugfix release:
> > > >> > > 
> > > >> > > >>
> > > >> > >
> > > >> >
> > >
> >
> https://flink.apache.org/2024/03/18/announcing-the-release-of-apache-flink-1.19/
> > > >> > > 
> > > >> > >  The full release notes are available in Jira:
> > > >> > > 
> > > >> > > >>
> > > >> > >
> > > >> >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353282
> > > >> > > 
> > > >> > >  We would like to thank all contributors of the Apache Flink
> > > >> > community
> > > >> > > >> who made this release possible!
> > > >> > > 
> > > >> > > 
> > > >> > >  Best,
> > > >> > >  Yun, Jing, Martijn and Lincoln
> > > >> > > >>
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > > >> --
> > > >> Best,
> > > >> Hangxiang.
> > >
> >
>

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Re: Additional metadata available for Kafka serdes

2024-03-14 Thread David Radley
Hi ,
I am currently prototyping an Avro Apicurio format that I hope to raise as a 
FLIP very soon (hopefully by early  next week). In my prototyping , I am 
passing through the Kafka headers content as a map to the DeserializationSchema 
and have extended the SerializationSchema to pass back headers. I am using new 
default methods in the interface so as to be backwardly compatible. I have the 
deserialise working and the serialise is close.

We did consider trying to use the Apicurio deser libraries but this is tricky 
due to the way the code is split.

Let me know what you think – I hope this approach will meet your needs,
Kind regards, David.

From: Balint Bene 
Date: Tuesday, 12 March 2024 at 22:18
To: dev@flink.apache.org 
Subject: [EXTERNAL] Additional metadata available for Kafka serdes
Hello! Looking to get some guidance for a problem around the Flink formats
used for Kafka.

Flink currently uses common serdes interfaces across all formats. However,
some data formats used in Kafka require headers for serdes.  It's the same
problem for serialization and deserialization, so I'll just use
DynamicKafkaDeserialationSchema

as
an example. It has access to the Kafka record headers, but it can't pass
them to the DeserializationSchema

implemented
by the format since the interface is generic.

If it were possible to pass the headers, then open source formats such as
Apicurio could be supported. Unlike the Confluent formats which store the
metadata (schema ID) appended to the serialized bytes in the key and value,
the Apicurio formats store their metadata in the record headers.

I have bandwidth to work on this, but it would be great to have direction
from the community. I have a simple working prototype that's able to load a
custom version of the format with a modified interface that can accept the
headers (I just put the entire Apache Kafka ConsumerRecord

/ProducerRecord

for simplicity). The issues I foresee is that the class-loader

exists in the Flink repo along with interfaces for the formats, but these
changes are specific to Kafka. This solution could require migrating
formats to the Flink-connector-kafka repo which is a decent amount of work.

Feedback is appreciated!
Thanks
Balint

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Re: [DISCUSS] FLINK-34440 Support Debezium Protobuf Confluent Format

2024-02-22 Thread David Radley
Hi Kevin,
Some thoughts on this.
I suggested an Apicurio registry format in the dev list, and was advised to 
raise a FLIP for this, I suggest the same would apply here (or the alternative 
to FLIPs if you cannot raise one). I am prototyping an Avro Apicurio format, 
prior to raising the Flip,  and notice that the readSchema in the SchemaCoder 
only takes a byte array ,but I need to pass down the Kafka headers (where the 
Apicurio globalId identifying the schema lives).

I assume:

  *   for the confluent Protobuf format you would extend the Protobuf format to 
drive some Schema Registry logic for Protobuf (similar to the way Avro does it) 
where the magic byte _ schema id can be obtained and the schema looked up using 
the Confluent Schema registry.
  *   It would be good if any protobuf format enhancements for Schema 
registries pass down the Kafka headers (I am thinking as a Map 
for Avro) as well as the message payload so Apicurio registry could work with 
this.
  *   It would make sense to have the Confluent schema lookup in common code, 
which is part of the SchemaCoder readSchema  logic.
  *   I assume the ProtobufSchemaCoder readSchema would return a Protobuf 
Schema object.



I also wondered whether these Kafka only formats should be moved to the Kafka 
connector repo, or whether they might in the future be used outside Kafka – 
e.g. Avro/Protobuf files in a database.
   Kind regards, David.


From: Kevin Lam 
Date: Wednesday, 21 February 2024 at 18:51
To: dev@flink.apache.org 
Subject: [EXTERNAL] [DISCUSS] FLINK-34440 Support Debezium Protobuf Confluent 
Format
I would love to get some feedback from the community on this JIRA issue:
https://issues.apache.org/jira/projects/FLINK/issues/FLINK-34440

I am looking into creating a PR and would appreciate some review on the
approach.

In terms of design I think we can mirror the `debezium-avro-confluent` and
`avro-confluent` formats already available in Flink:

   1. `protobuf-confluent` format which uses DynamicMessage
   

   for encoding and decoding.
  - For encoding the Flink RowType will be used to dynamically create a
  Protobuf Schema and register it with the Confluent Schema
Registry. It will
  use the same schema to construct a DynamicMessage and serialize it.
  - For decoding, the schema will be fetched from the registry and use
  DynamicMessage to deserialize and convert the Protobuf object to a Flink
  RowData.
  - Note: here there is no external .proto file
   2. `debezium-avro-confluent` format which unpacks the Debezium Envelope
   and collects the appropriate UPDATE_BEFORE, UPDATE_AFTER, INSERT, DELETE
   events.
  - We may be able to refactor and reuse code from the existing
  DebeziumAvroDeserializationSchema + DebeziumAvroSerializationSchema since
  the deser logic is largely delegated to and these Schemas are concerned
  with the handling the Debezium envelope.
   3. Move the Confluent Schema Registry Client code to a separate maven
   module, flink-formats/flink-confluent-common, and extend it to support
   ProtobufSchemaProvider
   

   .


Does anyone have any feedback or objections to this approach?

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: [ANNOUNCE] New Apache Flink Committer - Jiabao Sun

2024-02-19 Thread David Radley
Congratulations Jiabao!

From: Swapnal Varma 
Date: Monday, 19 February 2024 at 10:14
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [ANNOUNCE] New Apache Flink Committer - Jiabao Sun
Congratulations Jiabao!

Best,
Swapnal

On Mon, 19 Feb 2024, 15:37 weijie guo,  wrote:

> Congratulations, Jiabao :)
>
> Best regards,
>
> Weijie
>
>
> Hang Ruan  于2024年2月19日周一 18:04写道:
>
> > Congratulations, Jiabao!
> >
> > Best,
> > Hang
> >
> > Qingsheng Ren  于2024年2月19日周一 17:53写道:
> >
> > > Hi everyone,
> > >
> > > On behalf of the PMC, I'm happy to announce Jiabao Sun as a new Flink
> > > Committer.
> > >
> > > Jiabao began contributing in August 2022 and has contributed 60+
> commits
> > > for Flink main repo and various connectors. His most notable
> contribution
> > > is being the core author and maintainer of MongoDB connector, which is
> > > fully functional in DataStream and Table/SQL APIs. Jiabao is also the
> > > author of FLIP-377 and the main contributor of JUnit 5 migration in
> > runtime
> > > and table planner modules.
> > >
> > > Beyond his technical contributions, Jiabao is an active member of our
> > > community, participating in the mailing list and consistently
> > volunteering
> > > for release verifications and code reviews with enthusiasm.
> > >
> > > Please join me in congratulating Jiabao for becoming an Apache Flink
> > > committer!
> > >
> > > Best,
> > > Qingsheng (on behalf of the Flink PMC)
> > >
> >
>

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


FLINK-21672

2024-02-16 Thread David Radley
Hi,
I see https://issues.apache.org/jira/browse/FLINK-21672 has been open for a 
while. We at IBM are building Flink with the latest v11  Semeru JDK 
(https://developer.ibm.com/languages/java/semeru-runtimes/).
Flink fails to build with skipTests. It fails because 
sun.management.VMManagement class
Cannot be found at build time. I see some logic in the Flink code to tolerate 
the lack of com.sun packages, but not this sun package. We get:


ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile (default-compile) 
on project flink-local-recovery-and-allocation-test: Compilation failure: 
Compilation failure:

[ERROR] 
/Users/davidradley/flinkapicurio/flink-end-to-end-tests/flink-local-recovery-and-allocation-test/src/main/java/org/apache/flink/streaming/tests/StickyAllocationAndLocalRecoveryTestJob.java:[418,23]
 cannot find symbol

[ERROR]   symbol:   class VMManagement

[ERROR]   location: package sun.management

[ERROR] 
/Users/davidradley/flinkapicurio/flink-end-to-end-tests/flink-local-recovery-and-allocation-test/src/main/java/org/apache/flink/streaming/tests/StickyAllocationAndLocalRecoveryTestJob.java:[418,59]
 cannot find symbol

[ERROR]   symbol:   class VMManagement

[ERROR]   location: package sun.management


As per the link in the issue, sun. packages are not supported or part of the 
JDK after java 1.7.

I would like to have the priority raised on this Jira and would like to change 
the code so it builds successfully by  removing the dependency on this old / 
unsupported sun package . I am happy to work on this, if you are willing to 
support this by assigning me the Jira and merging the fix; ideally we would 
like this to be in the next release - Flink 1.19,
 Kind regards, David.

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: Issues running with Flink 1.20-SNAPSHOT

2024-02-09 Thread David Radley
Hi Martijn,
Yes the Maven wrapper works, so it is a local issue. Thank you for the pointer, 
I am glad it is not anything serious,
 Kind regards, David.

From: Martijn Visser 
Date: Friday, 9 February 2024 at 16:27
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: Issues running with Flink 1.20-SNAPSHOT
Hi,

There's never a flink_dist, this is most likely because of a local
build problem. See
https://repository.apache.org/content/groups/snapshots/org/apache/flink/flink-dist/1.19-SNAPSHOT/
which also doesn't exist. This is most likely a local build problem;
are you using the Maven wrapper?

Best regards,

Martijn

On Fri, Feb 9, 2024 at 4:43 PM David Radley  wrote:
>
> Hello,
> I am git cloned the latest flink to a new folder. I emptied my .m2 folder for 
> Flink then ran.
>
> mvn clean
>
> and see error
>
> [ERROR] Failed to execute goal on project flink-dist_2.12: Could not resolve 
> dependencies for project org.apache.flink:flink-dist_2.12:jar:1.20-SNAPSHOT: 
> The following artifacts could not be resolved: 
> org.apache.flink:flink-dist-scala_2.12:jar:1.20-SNAPSHOT, 
> org.apache.flink:flink-examples-streaming-state-machine:jar:1.20-SNAPSHOT: 
> Could not find artifact 
> org.apache.flink:flink-dist-scala_2.12:jar:1.20-SNAPSHOT in apache.snapshots 
> (https://repository.apache.org/snapshots ) -> [Help 1]
>
> I look in 
> https://repository.apache.org/content/groups/snapshots/org/apache/flink/  and 
> do not see flink-dist-scala_2.12:jar
>
> My environment is
>
> mvn -version
> Apache Maven 3.8.6 (84538c9988a25aec085021c365c560670ad80f63)
> Maven home: /Applications/apache-maven-3.8.6
> Java version: 11.0.18, vendor: Eclipse Adoptium, runtime: 
> /Library/Java/JavaVirtualMachines/temurin-11.jdk/Contents/Home
> Default locale: en_GB, platform encoding: UTF-8
> OS name: "mac os x", version: "14.2.1", arch: "aarch64", family: "mac"
> (base) davidradley@Davids-MBP-2 flink120 % mvn -version
> Apache Maven 3.8.6 (84538c9988a25aec085021c365c560670ad80f63)
> Maven home: /Applications/apache-maven-3.8.6
> Java version: 11.0.18, vendor: Eclipse Adoptium, runtime: 
> /Library/Java/JavaVirtualMachines/temurin-11.jdk/Contents/Home
> Default locale: en_GB, platform encoding: UTF-8
> OS name: "mac os x", version: "14.2.1", arch: "aarch64", family: "mac"
>
> It looks like the new snapshot is not completely there. I was expecting to 
> see a folder for flink-dist-scala_2.12:jar
> similar to 
> https://repository.apache.org/content/groups/snapshots/org/apache/flink/flink-core/1.20-SNAPSHOT/
>
>Kind regards, David.
>
>
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Issues running with Flink 1.20-SNAPSHOT

2024-02-09 Thread David Radley
Hello,
I am git cloned the latest flink to a new folder. I emptied my .m2 folder for 
Flink then ran.

mvn clean

and see error

[ERROR] Failed to execute goal on project flink-dist_2.12: Could not resolve 
dependencies for project org.apache.flink:flink-dist_2.12:jar:1.20-SNAPSHOT: 
The following artifacts could not be resolved: 
org.apache.flink:flink-dist-scala_2.12:jar:1.20-SNAPSHOT, 
org.apache.flink:flink-examples-streaming-state-machine:jar:1.20-SNAPSHOT: 
Could not find artifact 
org.apache.flink:flink-dist-scala_2.12:jar:1.20-SNAPSHOT in apache.snapshots 
(https://repository.apache.org/snapshots) -> [Help 1]

I look in 
https://repository.apache.org/content/groups/snapshots/org/apache/flink/ and do 
not see flink-dist-scala_2.12:jar

My environment is

mvn -version
Apache Maven 3.8.6 (84538c9988a25aec085021c365c560670ad80f63)
Maven home: /Applications/apache-maven-3.8.6
Java version: 11.0.18, vendor: Eclipse Adoptium, runtime: 
/Library/Java/JavaVirtualMachines/temurin-11.jdk/Contents/Home
Default locale: en_GB, platform encoding: UTF-8
OS name: "mac os x", version: "14.2.1", arch: "aarch64", family: "mac"
(base) davidradley@Davids-MBP-2 flink120 % mvn -version
Apache Maven 3.8.6 (84538c9988a25aec085021c365c560670ad80f63)
Maven home: /Applications/apache-maven-3.8.6
Java version: 11.0.18, vendor: Eclipse Adoptium, runtime: 
/Library/Java/JavaVirtualMachines/temurin-11.jdk/Contents/Home
Default locale: en_GB, platform encoding: UTF-8
OS name: "mac os x", version: "14.2.1", arch: "aarch64", family: "mac"

It looks like the new snapshot is not completely there. I was expecting to see 
a folder for flink-dist-scala_2.12:jar
similar to 
https://repository.apache.org/content/groups/snapshots/org/apache/flink/flink-core/1.20-SNAPSHOT/

   Kind regards, David.



Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: FW: RE: [VOTE] Release flink-connector-jdbc, release candidate #3

2024-02-08 Thread David Radley
Thanks Sergey,

It looks better now.

gpg --verify flink-connector-jdbc-3.1.2-1.18.jar.asc

gpg: assuming signed data in 'flink-connector-jdbc-3.1.2-1.18.jar'

gpg: Signature made Thu  1 Feb 10:54:45 2024 GMT

gpg:using RSA key F7529FAE24811A5C0DF3CA741596BBF0726835D8

gpg: Good signature from "Sergey Nuyanzin (CODE SIGNING KEY) 
snuyan...@apache.org<mailto:snuyan...@apache.org>" [unknown]

gpg: aka "Sergey Nuyanzin (CODE SIGNING KEY) 
snuyan...@gmail.com<mailto:snuyan...@gmail.com>" [unknown]

gpg: aka "Sergey Nuyanzin 
snuyan...@gmail.com<mailto:snuyan...@gmail.com>" [unknown]

gpg: WARNING: This key is not certified with a trusted signature!

gpg:  There is no indication that the signature belongs to the owner.

I assume the warning is ok,
  Kind regards, David.

From: Sergey Nuyanzin 
Date: Thursday, 8 February 2024 at 14:39
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: FW: RE: [VOTE] Release flink-connector-jdbc, release 
candidate #3
Hi David

it looks like in your case you don't specify the jar itself and probably it
is not in current dir
so it should be something like that (assuming that both asc and jar file
are downloaded and are in current folder)
gpg --verify flink-connector-jdbc-3.1.2-1.16.jar.asc
flink-connector-jdbc-3.1.2-1.16.jar

Here it is a more complete guide how to do it for Apache projects [1]

[1] https://www.apache.org/info/verification.html#CheckingSignatures

On Thu, Feb 8, 2024 at 12:38 PM David Radley 
wrote:

> Hi,
> I was looking more at the asc files. I imported the keys and tried.
>
>
> gpg --verify flink-connector-jdbc-3.1.2-1.16.jar.asc
>
> gpg: no signed data
>
> gpg: can't hash datafile: No data
>
> This seems to be the same for all the asc file. It does not look right; am
> I doing doing incorrect?
>Kind regards, David.
>
>
> From: David Radley 
> Date: Thursday, 8 February 2024 at 10:46
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] RE: [VOTE] Release flink-connector-jdbc, release
> candidate #3
> +1 (non-binding)
>
> I assume that thttps://github.com/apache/flink-web/pull/707 and be
> completed after the release is out.
>
> From: Martijn Visser 
> Date: Friday, 2 February 2024 at 08:38
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: [VOTE] Release flink-connector-jdbc, release
> candidate #3
> +1 (binding)
>
> - Validated hashes
> - Verified signature
> - Verified that no binaries exist in the source archive
> - Build the source with Maven
> - Verified licenses
> - Verified web PRs
>
> On Fri, Feb 2, 2024 at 9:31 AM Yanquan Lv  wrote:
>
> > +1 (non-binding)
> >
> > - Validated checksum hash
> > - Verified signature
> > - Build the source with Maven and jdk8/11/17
> > - Check that the jar is built by jdk8
> > - Verified that no binaries exist in the source archive
> >
> > Sergey Nuyanzin  于2024年2月1日周四 19:50写道:
> >
> > > Hi everyone,
> > > Please review and vote on the release candidate #3 for the version
> 3.1.2,
> > > as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > This version is compatible with Flink 1.16.x, 1.17.x and 1.18.x.
> > >
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source release to be deployed to dist.apache.org
> > > [2],
> > > which are signed with the key with fingerprint 1596BBF0726835D8 [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * source code tag v3.1.2-rc3 [5],
> > > * website pull request listing the new release [6].
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval, with at least 3 PMC affirmative votes.
> > >
> > > Thanks,
> > > Release Manager
> > >
> > > [1]
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354088
> > > [2]
> > >
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-jdbc-3.1.2-rc3
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > >
> https://repository.apache.org/content/repositories/orgapacheflink-1706/
> > > [5]
> > https://github.com/apache/flink-connector-jdbc/releases/tag/v3.1.2-rc3
> > > [6] https://github.com/apache/flink-web/pull/707
> > >
> >
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>


--
Best regards,
Sergey

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


FW: RE: [VOTE] Release flink-connector-jdbc, release candidate #3

2024-02-08 Thread David Radley
Hi,
I was looking more at the asc files. I imported the keys and tried.


gpg --verify flink-connector-jdbc-3.1.2-1.16.jar.asc

gpg: no signed data

gpg: can't hash datafile: No data

This seems to be the same for all the asc file. It does not look right; am I 
doing doing incorrect?
   Kind regards, David.


From: David Radley 
Date: Thursday, 8 February 2024 at 10:46
To: dev@flink.apache.org 
Subject: [EXTERNAL] RE: [VOTE] Release flink-connector-jdbc, release candidate 
#3
+1 (non-binding)

I assume that thttps://github.com/apache/flink-web/pull/707 and be completed 
after the release is out.

From: Martijn Visser 
Date: Friday, 2 February 2024 at 08:38
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [VOTE] Release flink-connector-jdbc, release candidate 
#3
+1 (binding)

- Validated hashes
- Verified signature
- Verified that no binaries exist in the source archive
- Build the source with Maven
- Verified licenses
- Verified web PRs

On Fri, Feb 2, 2024 at 9:31 AM Yanquan Lv  wrote:

> +1 (non-binding)
>
> - Validated checksum hash
> - Verified signature
> - Build the source with Maven and jdk8/11/17
> - Check that the jar is built by jdk8
> - Verified that no binaries exist in the source archive
>
> Sergey Nuyanzin  于2024年2月1日周四 19:50写道:
>
> > Hi everyone,
> > Please review and vote on the release candidate #3 for the version 3.1.2,
> > as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > This version is compatible with Flink 1.16.x, 1.17.x and 1.18.x.
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release to be deployed to dist.apache.org
> > [2],
> > which are signed with the key with fingerprint 1596BBF0726835D8 [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag v3.1.2-rc3 [5],
> > * website pull request listing the new release [6].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Release Manager
> >
> > [1]
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354088
> > [2]
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-jdbc-3.1.2-rc3
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
> > https://repository.apache.org/content/repositories/orgapacheflink-1706/
> > [5]
> https://github.com/apache/flink-connector-jdbc/releases/tag/v3.1.2-rc3
> > [6] https://github.com/apache/flink-web/pull/707
> >
>

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: Flink jdbc connector rc3 for flink 1.18

2024-02-08 Thread David Radley
Hi Sergey,
Yes that makes sense, thanks,
Kind regards, David.

From: Sergey Nuyanzin 
Date: Wednesday, 7 February 2024 at 11:41
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: Flink jdbc connector rc3 for flink 1.18
Hi David,

Thanks for testing.

Yes the jars are built from the same sources and same git tag apart from
the Flink version.

as it was mentioned in jdbc connector RC thread [1]

>The complete staging area is available for your review, which includes:
>* all artifacts to be deployed to the Maven Central Repository [2]

which contains jars for three Flink versions (1.16.x, 1.17.x, 1.18.x)

Please let  me know whether this answers your question or not

[1] https://lists.apache.org/thread/rlk5kp2vxgkmbxmq4wnco885q5vv9rtp
[2] https://repository.apache.org/content/repositories/orgapacheflink-1706/

On Wed, Feb 7, 2024 at 12:18 PM David Radley 
wrote:

> Hi ,
> I had a question on Flink jdbc connector new release. I notice for the
> last release we have
>
> https://mvnrepository.com/artifact/org.apache.flink/flink-connector-jdbc/3.1.1-1.16
>
> https://mvnrepository.com/artifact/org.apache.flink/flink-connector-jdbc/3.1.1-1.17
> It seems that the 2 jars above are identical apart from the manifest.
>
> I assume for the new Flink JDBC connector, there will be a 1.16 1.17 and
> 1.18 versions in Maven central, each level will be compiled against Flink
> 1.6.0, 1.17.0 and 1.18.0 ( or would it be 1.16.3 , 1.17.2 and 1.18.1?)
> respectively. Either way the 3 jar files should be the same (apart from the
> manifest names) as the dependencies on core Flink are forward compatible.
>
> We are looking to use a JDBC connector that works with Flink 1.18 and
> fixes the lookup join filter issue. So we are planning to build against the
> latest 3.1 branch code against 1.18.0– unless the connector is released
> very soon – and we would pick that up.
>
>Kind regards, David.
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>


--
Best regards,
Sergey

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: [VOTE] Release flink-connector-jdbc, release candidate #3

2024-02-08 Thread David Radley
+1 (non-binding)

I assume that thttps://github.com/apache/flink-web/pull/707 and be completed 
after the release is out.

From: Martijn Visser 
Date: Friday, 2 February 2024 at 08:38
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [VOTE] Release flink-connector-jdbc, release candidate 
#3
+1 (binding)

- Validated hashes
- Verified signature
- Verified that no binaries exist in the source archive
- Build the source with Maven
- Verified licenses
- Verified web PRs

On Fri, Feb 2, 2024 at 9:31 AM Yanquan Lv  wrote:

> +1 (non-binding)
>
> - Validated checksum hash
> - Verified signature
> - Build the source with Maven and jdk8/11/17
> - Check that the jar is built by jdk8
> - Verified that no binaries exist in the source archive
>
> Sergey Nuyanzin  于2024年2月1日周四 19:50写道:
>
> > Hi everyone,
> > Please review and vote on the release candidate #3 for the version 3.1.2,
> > as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > This version is compatible with Flink 1.16.x, 1.17.x and 1.18.x.
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release to be deployed to dist.apache.org
> > [2],
> > which are signed with the key with fingerprint 1596BBF0726835D8 [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag v3.1.2-rc3 [5],
> > * website pull request listing the new release [6].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Release Manager
> >
> > [1]
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354088
> > [2]
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-jdbc-3.1.2-rc3
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
> > https://repository.apache.org/content/repositories/orgapacheflink-1706/
> > [5]
> https://github.com/apache/flink-connector-jdbc/releases/tag/v3.1.2-rc3
> > [6] https://github.com/apache/flink-web/pull/707
> >
>

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Flink jdbc connector rc3 for flink 1.18

2024-02-07 Thread David Radley
Hi ,
I had a question on Flink jdbc connector new release. I notice for the last 
release we have
https://mvnrepository.com/artifact/org.apache.flink/flink-connector-jdbc/3.1.1-1.16
https://mvnrepository.com/artifact/org.apache.flink/flink-connector-jdbc/3.1.1-1.17
It seems that the 2 jars above are identical apart from the manifest.

I assume for the new Flink JDBC connector, there will be a 1.16 1.17 and 1.18 
versions in Maven central, each level will be compiled against Flink 1.6.0, 
1.17.0 and 1.18.0 ( or would it be 1.16.3 , 1.17.2 and 1.18.1?) respectively. 
Either way the 3 jar files should be the same (apart from the manifest names) 
as the dependencies on core Flink are forward compatible.

We are looking to use a JDBC connector that works with Flink 1.18 and fixes the 
lookup join filter issue. So we are planning to build against the latest 3.1 
branch code against 1.18.0– unless the connector is released very soon – and we 
would pick that up.

   Kind regards, David.

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


FW: RE: Flink JDBC connector release

2024-02-06 Thread David Radley
Hi Sergey,
It looks like the issue we saw was a caused by a mismatch in our build levels, 
so was a false negative; and is not an issue. We are testing with the latest 
level of the branch now (the one compiled against 1.16).
  Kind regards, David.

From: David Radley 
Date: Sunday, 4 February 2024 at 22:50
To: dev@flink.apache.org 
Subject: FW: [EXTERNAL] RE: Flink JDBC connector release
Hi Sergey,
Sorry for the typos. I meant:

Yes that is right, I am talking about the jdbc connector rc3 and Flink 1.18.
I am looking into finding a simple way to reproduce the issue, and will raise a 
Jira if I can,
  Kind regards, David.

From: Sergey Nuyanzin 
Date: Friday, 2 February 2024 at 19:46
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: Flink JDBC connector release
Hi David

thanks for testing
I assume you are talking about jdbc connector rc3 and Flink 1.18.

In case you think there is a bug it would make sense to raise a jira issue
and provide steps to reproduce it

On Fri, Feb 2, 2024 at 5:42 PM David Radley  wrote:

> Hi,
>
> We have been doing some testing on flink jdbc connector rc2. We are
> testing with Flink 1.1.8 jars and are using the TableEnvironment
> TableResult<
> https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/table/api/TableResult.html
>   >
> executeSql(String<
> http://docs.oracle.com/javase/7/docs/api/java/lang/String.html?is-external=true
>   >
> statement). We hit some strange behaviour testing the lookup join, we got a
> null pointer Exception on
> https://github.com/apache/flink-connector-jdbc/blob/390d7bc9139204fbfc48fe275a69eb60c4807fb5/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/table/JdbcRowDataLookupFunction.java#L192
>
> and that the constructor did not seem to be driven.  We changed the
> release that the pom file was building against to 1.18 and it works; so our
> issue appeared to be mismatching jar levels. This issue did not appear
> running the SQL client against rc2.
>
>
>
> I am attempting to put a simple java test together to show this issue.
>
>
>
> WDYT?
>
>   Kind regards.
>
>
>
>
>
>
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>


--
Best regards,
Sergey

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


FW: RE: Flink JDBC connector release

2024-02-04 Thread David Radley
Hi Sergey,
Sorry for the typos. I meant:

Yes that is right, I am talking about the jdbc connector rc3 and Flink 1.18.
I am looking into finding a simple way to reproduce the issue, and will raise a 
Jira if I can,
  Kind regards, David.

From: Sergey Nuyanzin 
Date: Friday, 2 February 2024 at 19:46
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: Flink JDBC connector release
Hi David

thanks for testing
I assume you are talking about jdbc connector rc3 and Flink 1.18.

In case you think there is a bug it would make sense to raise a jira issue
and provide steps to reproduce it

On Fri, Feb 2, 2024 at 5:42 PM David Radley  wrote:

> Hi,
>
> We have been doing some testing on flink jdbc connector rc2. We are
> testing with Flink 1.1.8 jars and are using the TableEnvironment
> TableResult<
> https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/table/api/TableResult.html
>   >
> executeSql(String<
> http://docs.oracle.com/javase/7/docs/api/java/lang/String.html?is-external=true
>   >
> statement). We hit some strange behaviour testing the lookup join, we got a
> null pointer Exception on
> https://github.com/apache/flink-connector-jdbc/blob/390d7bc9139204fbfc48fe275a69eb60c4807fb5/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/table/JdbcRowDataLookupFunction.java#L192
>
> and that the constructor did not seem to be driven.  We changed the
> release that the pom file was building against to 1.18 and it works; so our
> issue appeared to be mismatching jar levels. This issue did not appear
> running the SQL client against rc2.
>
>
>
> I am attempting to put a simple java test together to show this issue.
>
>
>
> WDYT?
>
>   Kind regards.
>
>
>
>
>
>
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>


--
Best regards,
Sergey

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: Flink JDBC connector release

2024-02-04 Thread David Radley
Hi Sergey,
Yes that is right I am talking about are talking about jdbc connector rc3 and 
Flink 1.18.
I am looking into finding a simple way to reproduce it, and will raise a Jira 
if I can,
  Kind regards, David.

From: Sergey Nuyanzin 
Date: Friday, 2 February 2024 at 19:46
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: Flink JDBC connector release
Hi David

thanks for testing
I assume you are talking about jdbc connector rc3 and Flink 1.18.

In case you think there is a bug it would make sense to raise a jira issue
and provide steps to reproduce it

On Fri, Feb 2, 2024 at 5:42 PM David Radley  wrote:

> Hi,
>
> We have been doing some testing on flink jdbc connector rc2. We are
> testing with Flink 1.1.8 jars and are using the TableEnvironment
> TableResult<
> https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/table/api/TableResult.html
>  >
> executeSql(String<
> http://docs.oracle.com/javase/7/docs/api/java/lang/String.html?is-external=true
>  >
> statement). We hit some strange behaviour testing the lookup join, we got a
> null pointer Exception on
> https://github.com/apache/flink-connector-jdbc/blob/390d7bc9139204fbfc48fe275a69eb60c4807fb5/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/table/JdbcRowDataLookupFunction.java#L192
>
> and that the constructor did not seem to be driven.  We changed the
> release that the pom file was building against to 1.18 and it works; so our
> issue appeared to be mismatching jar levels. This issue did not appear
> running the SQL client against rc2.
>
>
>
> I am attempting to put a simple java test together to show this issue.
>
>
>
> WDYT?
>
>   Kind regards.
>
>
>
>
>
>
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>


--
Best regards,
Sergey

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Flink JDBC connector release

2024-02-02 Thread David Radley
Hi,

We have been doing some testing on flink jdbc connector rc2. We are testing 
with Flink 1.1.8 jars and are using the TableEnvironment 
TableResult
 
executeSql(String
 statement). We hit some strange behaviour testing the lookup join, we got a 
null pointer Exception on 
https://github.com/apache/flink-connector-jdbc/blob/390d7bc9139204fbfc48fe275a69eb60c4807fb5/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/table/JdbcRowDataLookupFunction.java#L192

and that the constructor did not seem to be driven.  We changed the release 
that the pom file was building against to 1.18 and it works; so our issue 
appeared to be mismatching jar levels. This issue did not appear running the 
SQL client against rc2.



I am attempting to put a simple java test together to show this issue.



WDYT?

  Kind regards.







Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Re: [VOTE] Release flink-connector-jdbc, release candidate #2

2024-01-31 Thread David Radley
Hi,
[x] -1, Do not approve the release (please provide specific comments)

I wanted clarifications on our thinking around the following.

In the source code 
https://github.com/apache/flink-connector-jdbc/releases/tag/v3.1.2-rc2

  *   This release introduces (I assume experimental, like Flink’s statement 
around Java compatibility but we have not documented this as experimental) 
support for Java 17. I see 
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/java_compatibility/
 says that java 17 support was added at 1.18. I can’t find a backport to 1.16 
or 1.17? Should JDBC java 17 support (FLINK-33787) be included in this release?
  *   The Flink version in the pom is 1.17, but we are saying we support Flink 
1.16.


Kind regards, David.

From: Sergey Nuyanzin 
Date: Tuesday, 30 January 2024 at 00:18
To: dev@flink.apache.org 
Subject: [EXTERNAL] [VOTE] Release flink-connector-jdbc, release candidate #2
Hi everyone,
Please review and vote on the release candidate #2 for the version
3.1.2, as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

This version is compatible with Flink 1.16.x, 1.17.x and 1.18.x.

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org
[2], which are signed with the key with fingerprint
1596BBF0726835D8 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag v3.1.2-rc2 [5],
* website pull request listing the new release [6].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Release Manager

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354088
[2]
https://dist.apache.org/repos/dist/dev/flink/flink-connector-jdbc-3.1.2-rc2
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1704/
[5] https://github.com/apache/flink-connector-jdbc/releases/tag/v3.1.2-rc2
[6] https://github.com/apache/flink-web/pull/707

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


[jira] [Created] (FLINK-34211) Filtering on Column names with ?s fails for JDBC lookup join.

2024-01-23 Thread david radley (Jira)
david radley created FLINK-34211:


 Summary: Filtering on Column names with ?s fails for JDBC lookup 
join. 
 Key: FLINK-34211
 URL: https://issues.apache.org/jira/browse/FLINK-34211
 Project: Flink
  Issue Type: Bug
  Components: Connectors / JDBC, Table SQL / JDBC
Reporter: david radley


There is a check for ? character in 

[https://github.com/apache/flink-connector-jdbc/blob/e3dd84160cd665ae17672da8b6e742e61a72a32d/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/statement/FieldNamedPreparedStatementImpl.java#L186
 |FieldNamedPreparedStatementImpl.java]

Removing this check allows column names containing _?_ 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34170) Include the look up join conditions in the optimised plan.

2024-01-19 Thread david radley (Jira)
david radley created FLINK-34170:


 Summary: Include the look up join conditions in the optimised plan.
 Key: FLINK-34170
 URL: https://issues.apache.org/jira/browse/FLINK-34170
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: david radley


As per 
[https://github.com/apache/flink-connector-jdbc/pull/79#discussion_r1458664773]

[~libenchao] asked that I raise this issue to Include the look up join 
conditions in the optimised plan; in lime with the scan conditions. The JDBC 
and other lookup sources could then be updated to pick up the conditions from 
the plan. 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34146) JDBC lookup joins fail with RDB column names containing colons

2024-01-18 Thread david radley (Jira)
david radley created FLINK-34146:


 Summary: JDBC lookup joins fail with RDB column names containing 
colons
 Key: FLINK-34146
 URL: https://issues.apache.org/jira/browse/FLINK-34146
 Project: Flink
  Issue Type: Bug
  Components: Connectors / JDBC, Table SQL / JDBC
Affects Versions: 1.18.1
Reporter: david radley


[https://github.com/apache/flink-connector-jdbc/pull/79] adds filter support 
for lookup joins. This was implemented using FieldNamedPreparedStatements in 
line with the way that the join key was implemented.   The 
[FieldNamedPreparedStatementImpl 
logic|https://github.com/apache/flink-connector-jdbc/blob/e3dd84160cd665ae17672da8b6e742e61a72a32d/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/statement/FieldNamedPreparedStatementImpl.java#L221]
 explicitly tests for the colon key and can incorrectly pickup column names.  
So JDBC lookup joins fail with RDB column names containing colons when used in 
filters and lookup keys.

It looks like we have used the approach from 
[https://stackoverflow.com/questions/2309970/named-parameters-in-jdbc]. It says 
{{Please note that the above simple example does not handle using named 
parameter twice. Nor does it handle using the : sign inside quotes.}} It looks 
like we could play with some Regex Patterns to see if we can get one that works 
well for us.

 

A junit that shows the issue can be added to
FieldNamedPreparedStatementImplTest
 
...
private final String[] fieldNames2 =
new String[] \{"id?:", "name:?", "email", "ts", "field1", "field_2", 
"__field_3__"};
private final String[] keyFields2 = new String[] \{"id?:", "__field_3__"};
...
@Test
void testSelectStatementWithWeirdCharacters() {
String selectStmt = dialect.getSelectFromStatement(tableName, fieldNames2, 
keyFields2);
assertThat(selectStmt)
.isEqualTo(
"SELECT `id?:`, `name:?`, `email`, `ts`, `field1`, `field_2`, `__field_3__` 
FROM `tbl` "
+ "WHERE `id?:` = :id?: AND `__field_3__` = :__field_3__");
NamedStatementMatcher.parsedSql(
"SELECT `id?:`, `name:?`, `email`, `ts`, `field1`, `field_2`, `__field_3__` 
FROM `tbl` "
+ "WHERE `id?:` = ? AND `__field_3__` = ?")
.parameter("id", singletonList(1))
.parameter("__field_3__", singletonList(2))
.matches(selectStmt);
}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


RE: [VOTE] Release flink-connector-jdbc, release candidate #1

2024-01-07 Thread David Radley
Hi ,
I am working on FLINK-33365. I am making good progress;  thanks Sergey for your 
fabulous feedback. A lot of the query cases are now working with the latest fix 
but not all. I think it is pragmatic to revert the lookup join predicate 
pushdown support, so we can release a functional JDBC connector. I can then 
work on fixing the remaining FLINK-33365 query cases, which should not take too 
long, but I am out until Thursday this week so will be looking at it then,
   Kind regards, David.


From: Martijn Visser 
Date: Friday, 5 January 2024 at 14:24
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [VOTE] Release flink-connector-jdbc, release candidate 
#1
Hi,

Hmmm, it would have been good to mark the Jira ticket as a Blocker
then for the JDBC connector. Since it's marked as Critical, it doesn't
appear. It has also been open for multiple months, so it doesn't
really feel like a Blocker. I'm +0 with including this fix, but then
we should either get that in quickly or revert FLINK-16024, especially
since this bug ticket has been open for multiple months. Right now, it
means that we don't have a working JDBC connector for Flink 1.17 and
Flink 1.18. That shouldn't be OK.

Thanks,

Martijn

On Fri, Jan 5, 2024 at 2:31 PM Sergey Nuyanzin  wrote:
>
> Thanks for driving this
>
> the thing which makes me thinking about -1 (not sure yet and that's why
> asking here) is that there is FLINK-33365 [1]
> mentioned as a blocker for JDBC connector release at [2]
> Since the reason for that is FLINK-16024 [3] as also was explained in
> comments for [1].
>
> So should we wait for a fix of [1] or revert [3] for 3.1.x and continue
> releasing 3.1.2?
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-33365
> [2] https://lists.apache.org/thread/sdkm5qshqozow9sljz6c0qjft6kg9cwc
>
> [3] https://issues.apache.org/jira/browse/FLINK-16024
>
> On Fri, Jan 5, 2024 at 2:19 PM Martijn Visser 
> wrote:
>
> > Hi everyone,
> > Please review and vote on the release candidate #1 for the version
> > 3.1.2, as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > This version is compatible with Flink 1.16.x, 1.17.x and 1.18.x.
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release to be deployed to dist.apache.org
> > [2], which are signed with the key with fingerprint
> > A5F3BCE4CBE993573EC5966A65321B8382B219AF [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag v3.1.2-rc1 [5],
> > * website pull request listing the new release [6].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Release Manager
> >
> > [1]
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354088
> > [2]
> > https://dist.apache.org/repos/dist/dev/flink/flink-connector-jdbc-3.1.2-rc1
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
> > https://repository.apache.org/content/repositories/orgapacheflink-1691/
> > [5] https://github.com/apache/flink-connector-jdbc/releases/tag/v3.1.2-rc1
> > [6] https://github.com/apache/flink-web/pull/707
> >
>
>
> --
> Best regards,
> Sergey

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


FW: [ANNOUNCE] New Apache Flink Committer - Alexander Fedulov

2024-01-03 Thread David Radley
Sorry for my typo.

Many congratulations Alex!

From: David Radley 
Date: Wednesday, 3 January 2024 at 10:23
To: David Anderson 
Cc: dev@flink.apache.org 
Subject: Re: [EXTERNAL] [ANNOUNCE] New Apache Flink Committer - Alexander 
Fedulov
Many Congratulations David .

From: Maximilian Michels 
Date: Tuesday, 2 January 2024 at 12:16
To: dev 
Cc: Alexander Fedulov 
Subject: [EXTERNAL] [ANNOUNCE] New Apache Flink Committer - Alexander Fedulov
Happy New Year everyone,

I'd like to start the year off by announcing Alexander Fedulov as a
new Flink committer.

Alex has been active in the Flink community since 2019. He has
contributed more than 100 commits to Flink, its Kubernetes operator,
and various connectors [1][2].

Especially noteworthy are his contributions on deprecating and
migrating the old Source API functions and test harnesses, the
enhancement to flame graphs, the dynamic rescale time computation in
Flink Autoscaling, as well as all the small enhancements Alex has
contributed which make a huge difference.

Beyond code contributions, Alex has been an active community member
with his activity on the mailing lists [3][4], as well as various
talks and blog posts about Apache Flink [5][6].

Congratulations Alex! The Flink community is proud to have you.

Best,
The Flink PMC

[1] https://github.com/search?type=commits=author%3Aafedulov+org%3Aapache
[2] 
https://issues.apache.org/jira/browse/FLINK-28229?jql=status%20in%20(Resolved%2C%20Closed)%20AND%20assignee%20in%20(afedulov)%20ORDER%20BY%20resolved%20DESC%2C%20created%20DESC
[3] https://lists.apache.org/list?dev@flink.apache.org:lte=100M:Fedulov
[4] https://lists.apache.org/list?u...@flink.apache.org:lte=100M:Fedulov
[5] 
https://flink.apache.org/2020/01/15/advanced-flink-application-patterns-vol.1-case-study-of-a-fraud-detection-system/
[6] 
https://www.ververica.com/blog/presenting-our-streaming-concepts-introduction-to-flink-video-series

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Re: [ANNOUNCE] New Apache Flink Committer - Alexander Fedulov

2024-01-03 Thread David Radley
Many Congratulations David .

From: Maximilian Michels 
Date: Tuesday, 2 January 2024 at 12:16
To: dev 
Cc: Alexander Fedulov 
Subject: [EXTERNAL] [ANNOUNCE] New Apache Flink Committer - Alexander Fedulov
Happy New Year everyone,

I'd like to start the year off by announcing Alexander Fedulov as a
new Flink committer.

Alex has been active in the Flink community since 2019. He has
contributed more than 100 commits to Flink, its Kubernetes operator,
and various connectors [1][2].

Especially noteworthy are his contributions on deprecating and
migrating the old Source API functions and test harnesses, the
enhancement to flame graphs, the dynamic rescale time computation in
Flink Autoscaling, as well as all the small enhancements Alex has
contributed which make a huge difference.

Beyond code contributions, Alex has been an active community member
with his activity on the mailing lists [3][4], as well as various
talks and blog posts about Apache Flink [5][6].

Congratulations Alex! The Flink community is proud to have you.

Best,
The Flink PMC

[1] https://github.com/search?type=commits=author%3Aafedulov+org%3Aapache
[2] 
https://issues.apache.org/jira/browse/FLINK-28229?jql=status%20in%20(Resolved%2C%20Closed)%20AND%20assignee%20in%20(afedulov)%20ORDER%20BY%20resolved%20DESC%2C%20created%20DESC
[3] https://lists.apache.org/list?dev@flink.apache.org:lte=100M:Fedulov
[4] https://lists.apache.org/list?u...@flink.apache.org:lte=100M:Fedulov
[5] 
https://flink.apache.org/2020/01/15/advanced-flink-application-patterns-vol.1-case-study-of-a-fraud-detection-system/
[6] 
https://www.ververica.com/blog/presenting-our-streaming-concepts-introduction-to-flink-video-series

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-12-19 Thread David Radley
Hi,

I had a 3 of comments:
- the name of the config option is "scan.filter-push-down.enabled". This 
implies it is only for scan sources and not lookups. I suggest removing the 
scan. prefix.
- there is a talk of having a numeric option, as the filter pushdown might 
result in a full table scan. Without the filter being pushed down - I assume 
there will be a full table scan anyway - how is the filter pushdown full table 
scan worse than the full table scan that will occur without it.
- what are the use cases to not pushdown filters if the source supports it. The 
only one I can think of is that during development , you can easily compare 
query results between pushed down filters and not - to check they are the same. 
Are there other cases?

On 2023/10/24 14:13:38 Jiabao Sun wrote:
> Thanks Jark, Martijn, Xuyang for the valuable feedback.
> 
> Adding only the "scan.filter-push-down.enabled" configuration option would be 
> great for me as well.
> Optimization for this public behavior can be added later.
> 
> I made some modifications to the FLIP document and added the approach of 
> adding new method to the Rejected Alternatives section. 
> 
> Looking forward to your feedback again.
> 
> Best,
> Jiabao
> 
> 
> > 2023年10月24日 16:58,Jark Wu  写道:
> > 
> > the current interface can already satisfy your requirements.
> > The connector can reject all the filters by returning the input filters
> > as `Result#remainingFilters`.
> 
> 


Question on lookup joins

2023-12-15 Thread David Radley
Hi ,
I am working on FLINK-33365 which related to JDBC predicate pushdown. I want to 
ensure that the same results occur with predicate pushdown as without. So I am 
asking this question outside the pr / issue.

I notice the following behaviour for lookup joins without predicate pushdown. I 
was not expecting all the s , when there is not a matching join key.  ’a’ 
is a table in paimon and ‘db’ is a relational database.



Flink SQL> select * from a;

+++-+

| op | ip |proctime |

+++-+

| +I |10.10.10.10 | 2023-12-15 17:36:10.028 |

| +I |20.20.20.20 | 2023-12-15 17:36:10.030 |

| +I |30.30.30.30 | 2023-12-15 17:36:10.031 |

^CQuery terminated, received a total of 3 rows



Flink SQL> select * from  db_catalog.menagerie.e;

+++-+-+-+-+

| op | ip |type | age |  height 
|  weight |

+++-+-+-+-+

| +I |10.10.10.10 |   1 |  30 | 100 
| 100 |

| +I |10.10.10.10 |   2 |  40 |  90 
| 110 |

| +I |10.10.10.10 |   2 |  50 |  80 
| 120 |

| +I |10.10.10.10 |   3 |  50 |  70 
|  40 |

| +I |20.20.20.20 |   3 |  30 |  80 
|  90 |

+++-+-+-+-+

Received a total of 5 rows



Flink SQL> set table.optimizer.source.predicate-pushdown-enabled=false;

[INFO] Execute statement succeed.



Flink SQL> SELECT * FROM a left join mariadb_catalog.menagerie.e FOR 
SYSTEM_TIME AS OF a.proctime on e.type = 2 and a.ip = e.ip;

+++-++-+-+-+-+

| op | ip |proctime |   
 ip0 |type | age |  height |  weight |

+++-++-+-+-+-+

| +I |10.10.10.10 | 2023-12-15 17:38:05.169 |   
 10.10.10.10 |   2 |  40 |  90 | 110 |

| +I |10.10.10.10 | 2023-12-15 17:38:05.169 |   
 10.10.10.10 |   2 |  50 |  80 | 120 |

| +I |20.20.20.20 | 2023-12-15 17:38:05.170 |   
   |   |   |   |   |

| +I |30.30.30.30 | 2023-12-15 17:38:05.172 |   
   |   |   |   |   |

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Building Flink JDBC connector locally on M1 Mac

2023-12-08 Thread David Radley
Hi,
When I build the JDBC connector locally on my M1 Mac with a mvn clean install, 
I get the following errors:


 in (XaFacadeImpl.java:75)

[ERROR] Errors:

[ERROR]   DerbyExactlyOnceSinkE2eTest.testInsert » JobExecution Job execution 
failed.

[ERROR]   MySqlExactlyOnceSinkE2eTest » IllegalState Previous attempts to find 
a Docker ...

[ERROR]   OracleExactlyOnceSinkE2eTest » IllegalState Could not find a valid 
Docker envi...

[ERROR]   PostgresCatalogTest » IllegalState Could not find a valid Docker 
environment. ...

[ERROR]   JdbcCatalogFactoryTest » IllegalState Could not find a valid Docker 
environmen...

[ERROR]   PostgresExactlyOnceSinkE2eTest » IllegalState Could not find a valid 
Docker en...

[ERROR]   SqlServerExactlyOnceSinkE2eTest » IllegalState Previous attempts to 
find a Doc...

[INFO]

[ERROR] Tests run: 280, Failures: 1, Errors: 7, Skipped: 1

An example of an individual failure is:

Test set: 
org.apache.flink.connector.jdbc.databases.mysql.xa.MySqlExactlyOnceSinkE2eTest
---
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.006 s <<< 
FAILURE! - in 
org.apache.flink.connector.jdbc.databases.mysql.xa.MySqlExactlyOnceSinkE2eTest
org.apache.flink.connector.jdbc.databases.mysql.xa.MySqlExactlyOnceSinkE2eTest  
Time elapsed: 0.006 s  <<< ERROR!
java.lang.IllegalStateException: Previous attempts to find a Docker environment 
failed. Will not retry. Please see logs and check configuration
   at 
org.testcontainers.dockerclient.DockerClientProviderStrategy.getFirstValidStrategy(DockerClientProviderStrategy.java:231)
   at 
org.testcontainers.DockerClientFactory.getOrInitializeStrategy(DockerClientFactory.java:150)
   at 
org.testcontainers.DockerClientFactory.client(DockerClientFactory.java:191)
   at 
org.testcontainers.DockerClientFactory$1.getDockerClient(DockerClientFactory.java:104)
   at 
com.github.dockerjava.api.DockerClientDelegate.authConfig(DockerClientDelegate.java:109)
   at 
org.testcontainers.containers.GenericContainer.start(GenericContainer.java:321)




Googling I see a suggestion  : If you're using MAC, in my case, to solve the 
problem I had to add a link.

Follow the command below:

sudo ln -s $HOME/.docker/run/docker.sock /var/run/docker.sock

But this does not work for me.

I am running rancher, so do have a docker environment. Any pointers?

Kind regards, David.

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


FW: Re: flink-sql-connector-jdbc new release

2023-12-08 Thread David Radley
Hi,
As below, I am keen to help get the next flink JDBC connector release out - so 
we can get DB2 and Trino connectors released. When I first mentioned this, 
Jingsong mentioned that we should fix a blocking issue first. I put in pr 
https://github.com/apache/flink-connector-jdbc/pull/79 to fix the blocking 
issue; it has had some reviews and one approval; please could someone either 
merge or provide feedback as to what needs to be changed so it can be merged,
   Kind regards, David.


From: David Radley 
Date: Friday, 27 October 2023 at 16:22
To: dev@flink.apache.org 
Subject: Re: [EXTERNAL] Re: flink-sql-connector-jdbc new release
Hi Martijn,
Thanks for the link. I suspect I cannot be the release manager, as I do not 
have the required access, but am happy to help this progress, kind 
regards, David.

From: Martijn Visser 
Date: Friday, 27 October 2023 at 12:16
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: flink-sql-connector-jdbc new release
Hi David,

The release process for connector is documented at
https://cwiki.apache.org/confluence/display/FLINK/Creating+a+flink-connector+release

Best regards,

Martijn

On Fri, Oct 27, 2023 at 12:00 PM David Radley  wrote:
>
> Hi Jing,
> I just spotted the mailing list that it is a regression – I agree it is a 
> blocker,
>Kind regards, David.
>
> From: David Radley 
> Date: Friday, 27 October 2023 at 10:33
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] RE: flink-sql-connector-jdbc new release
> Hi Jing,
> thanks are there any processes documented around getting a release out. Out 
> of interest what is your thinking around this being a blocker? I suspect it 
> is not a regression, but a really nice to have, WDYT,
> Either way it looks interesting – I am going to have a look into this issue 
> to try to move it along– could you assign it to me please,
> Kind regards,  David.
>
> From: Jingsong Li 
> Date: Friday, 27 October 2023 at 06:54
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: flink-sql-connector-jdbc new release
> Hi David,
>
> Thanks for driving this.
>
> I think https://issues.apache.org/jira/browse/FLINK-33365should be a 
> blocker.
>
> Best,
> Jingsong
>
> On Thu, Oct 26, 2023 at 11:43 PM David Radley  wrote:
> >
> > Hi,
> > I propose that we do a 3.2 release of flink-sql-connector-jdbc so that 
> > there is a version matching 1.18 that includes the new dialects. I am happy 
> > to drive this, some pointers to documentation on the process and the 
> > approach to testing the various dialects would be great,
> >
> >  Kind regards, David.
> >
> >
> > Unless otherwise stated above:
> >
> > IBM United Kingdom Limited
> > Registered in England and Wales with number 741598
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: [VOTE] FLIP-384: Introduce TraceReporter and use it to create checkpointing and recovery traces

2023-11-29 Thread David Radley
+1(non-binding)

From: Stefan Richter 
Date: Wednesday, 29 November 2023 at 09:44
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [VOTE] FLIP-384: Introduce TraceReporter and use it to 
create checkpointing and recovery traces
+1 (binding)

Best,
Stefan


> On 22. Nov 2023, at 11:20, Roman Khachatryan  wrote:
>
> +1 (binding)
>
> Regards,
> Roman
>
> On Wed, Nov 22, 2023, 7:08 AM Zakelly Lan  > wrote:
>
>> +1(non-binding)
>>
>> Best,
>> Zakelly
>>
>> On Wed, Nov 22, 2023 at 3:04 PM Hangxiang Yu  wrote:
>>
>>> +1 (binding)
>>> Thanks for driving this again!
>>>
>>> On Wed, Nov 22, 2023 at 10:30 AM Rui Fan <1996fan...@gmail.com> wrote:
>>>
 +1(binding)

 Best,
 Rui

 On Wed, Nov 22, 2023 at 6:43 AM Jing Ge 
 wrote:

> +1(binding) Thanks!
>
> Best regards,
> Jing
>
> On Tue, Nov 21, 2023 at 6:17 PM Piotr Nowojski >>
> wrote:
>
>> Hi All,
>>
>> I'd like to start a vote on the FLIP-384: Introduce TraceReporter
>> and
 use
>> it to create checkpointing and recovery traces [1]. The discussion
 thread
>> is here [2].
>>
>> The vote will be open for at least 72 hours unless there is an
 objection
> or
>> not enough votes.
>>
>> [1] 
>> https://www.google.com/url?q=https://cwiki.apache.org/confluence/x/TguZE=gmail-imap=170125329000=AOvVaw3QR8LNFApod9Cz_gw2y64w
>> [2]
>> https://www.google.com/url?q=https://lists.apache.org/thread/7lql5f5q1np68fw1wc9trq3d9l2ox8f4=gmail-imap=170125329000=AOvVaw28yzl2wfrtrnoPLsdLW-7q
>>
>>
>> Best,
>> Piotrek
>>
>

>>>
>>>
>>> --
>>> Best,
>>> Hangxiang.

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: Apicurio Avro format proposal

2023-11-23 Thread David Radley
Hi Ryan,
I am reasonable new it this, but here Is my understanding.

If we use pure Avro and Flink SQL, when we create a table – the shape of that 
table is the shape we expect the event to be. This falls down when we evolve 
the schema, i.e. create new versions of the schema. The new versions need to be 
compatible (https://avro.apache.org/docs/1.11.1/specification 
<https://avro.apache.org/docs/1.11.1/specification%20>  schema resolution for 
more details).

So if we want a topic to be for a schema then we need to be able to read 
messages that are at different schema versions. The message needs to identify 
which schema version it was written with – so an identifier in a schema 
registry.

In confluent registry there is a magic byte at the start of the message that is 
the schema id, Confluent schema registry can map this to a schema version. 
Using the Confluent Avro format, the serialisers and deserialisers use the 
schema id (the writer schema) to deserialise and the reader schema (e,g, in 
Flink the shape of the table definition) to convert the message appropriately.

In Apicurio, there is a ‘global id’ that identifies the schema version in the 
Apicurio registry.
See 
https://www.apicur.io/registry/docs/apicurio-registry/2.4.x/getting-started/assembly-configuring-kafka-client-serdes.html#registry-serdes-types-avro_registry
 . You will notice that the global id can be in a header or in the message 
payload. It also can be 8 bytes or the legacy 4 bytes. Apicurio also has an 
option to allow it to work with Confluent forms of messages (with the magic 
byte) using option ENABLE_CONFLUENT_ID_HANDLER .

In terms of the issue https://github.com/apache/flink/pull/21805 - it seems to 
be a change to tolerate the presence of bytes from Apicurio (when  
ENABLE_CONFLUENT_ID_HANDLER is specified?). My suggestion is that we close this 
issue and pr; then explicitly create support for Apicurio and document the 
options for a new format with the new Flip that the community agrees with.

In terms of a common base; it looks like we already have 
RegistryAvroDeserializationSchema
and RegistryAvroSerializationSchema as a common base. There might be 
refactoring we can do, when we create the second implementation.

You ask Outside of configuration options, are there different features? They 
are both schema registries that do schema evolution – I think this is a main 
feature they both do, that is relevant to Flink.

Does this help? If I have misrepresented anything – please let me know,

I am investigating further so I can create a well described FLIP for the 
proposed change,
Kind regards, David.


From: Ryan Skraba 
Date: Thursday, 23 November 2023 at 09:55
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: Apicurio Avro format proposal
Pardon me, I forgot to include that I'd seen this before as
FLINK-26654.  There's a linked JIRA with an open PR that kind of
*plugs in* 8-byte ids . I haven't had the chance to check out Apicurio
yet, but I'm interested in schema registries in general.

All my best, Ryan

[1]: https://github.com/apache/flink/pull/21805
"[FLINK-30721][avro-confluent-registry] Enable 8byte schema id"

On Thu, Nov 23, 2023 at 10:48 AM Ryan Skraba  wrote:
>
> Hello David!
>
> In the FLIP, I'd be interested in knowing how the avro-apicurio and
> avro-confluent formats would differ!  Outside of configuration
> options, are there different features?  Would the two schema registry
> formats have a lot of common base that we could take advantage of?
>
> All my best, Ryan
>
> On Thu, Nov 23, 2023 at 10:14 AM David Radley  wrote:
> >
> > Hi Martijn,
> > Ok will do,
> >   Kind regards, David.
> >
> > From: Martijn Visser 
> > Date: Wednesday, 22 November 2023 at 21:47
> > To: dev@flink.apache.org 
> > Subject: [EXTERNAL] Re: Apicurio Avro format proposal
> > Hi David,
> >
> > Can you create a small FLIP for this?
> >
> > Best regards,
> >
> > Martijn
> >
> > On Wed, Nov 22, 2023 at 6:46 PM David Radley  
> > wrote:
> > >
> > > Hi,
> > > I would like to propose a new Apicurio Avro format.
> > > The Apicurio Avro Schema Registry (avro-apicurio) format would allow you 
> > > to read records that were serialized by the 
> > > io.apicurio.registry.serde.avro.AvroKafkaSerializer and to write records 
> > > that can in turn be read by the 
> > > io.apicurio.registry.serde.avro.AvroKafkaDeserialiser.
> > >
> > > With format options including:
> > >
> > >   *   Apicurio Registry URL
> > >   *   Artifact resolver strategy
> > >   *   ID location
> > >   *   ID encoding
> > >   *   Avro datum provider
> > >   *   Avro encoding
> > >
> > >
> > >
> > > For m

RE: Apicurio Avro format proposal

2023-11-23 Thread David Radley
Hi Martijn,
Ok will do,
  Kind regards, David.

From: Martijn Visser 
Date: Wednesday, 22 November 2023 at 21:47
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: Apicurio Avro format proposal
Hi David,

Can you create a small FLIP for this?

Best regards,

Martijn

On Wed, Nov 22, 2023 at 6:46 PM David Radley  wrote:
>
> Hi,
> I would like to propose a new Apicurio Avro format.
> The Apicurio Avro Schema Registry (avro-apicurio) format would allow you to 
> read records that were serialized by the 
> io.apicurio.registry.serde.avro.AvroKafkaSerializer and to write records that 
> can in turn be read by the 
> io.apicurio.registry.serde.avro.AvroKafkaDeserialiser.
>
> With format options including:
>
>   *   Apicurio Registry URL
>   *   Artifact resolver strategy
>   *   ID location
>   *   ID encoding
>   *   Avro datum provider
>   *   Avro encoding
>
>
>
> For more details see 
> https://www.apicur.io/registry/docs/apicurio-registry/2.4.x/getting-started/assembly-configuring-kafka-client-serdes.html#registry-serdes-types-avro_registry
>
> I am happy to work on this,
>   Kind regards, David.
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Apicurio Avro format proposal

2023-11-22 Thread David Radley
Hi,
I would like to propose a new Apicurio Avro format.
The Apicurio Avro Schema Registry (avro-apicurio) format would allow you to 
read records that were serialized by the 
io.apicurio.registry.serde.avro.AvroKafkaSerializer and to write records that 
can in turn be read by the 
io.apicurio.registry.serde.avro.AvroKafkaDeserialiser.

With format options including:

  *   Apicurio Registry URL
  *   Artifact resolver strategy
  *   ID location
  *   ID encoding
  *   Avro datum provider
  *   Avro encoding



For more details see 
https://www.apicur.io/registry/docs/apicurio-registry/2.4.x/getting-started/assembly-configuring-kafka-client-serdes.html#registry-serdes-types-avro_registry

I am happy to work on this,
  Kind regards, David.

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: How do I source debug the scala in the flink table planner?

2023-11-03 Thread David Radley
Hi Sergey,
Thanks Sergey. Yes I tried that.

I notice 
https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/flink-architecture/
 which shows the planner running in the Flink application, which in my case is 
the SQL client. This makes sense as the debug statements to stdout show in the 
sql client console.

I amended the sql-client script to add in extra jvm options including the debug 
port. I start the SQL client and can see it starting up the debug port – in my 
case 9000.

I go into Intellij which has the Scala plugin and attach to the process – which 
I can see. But the Scala breakpoints do not spring. I wonder if it relates to 
the planner being loaded in a different class loader.

 Kind regards, David.





From: Sergey Nuyanzin 
Date: Thursday, 2 November 2023 at 19:58
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: How do I source debug the scala in the flink table 
planner?
Hi David

thanks on working on it

May be I don't fully understand your issue however if  you are using
IntellijIdea and scala plugin (at least in readme it is mentioned that it
is recommended)
then you should be able to download corresponding sources(at least for
table-planner)  set breakpoints both in java and scala and debug...
Or what is the issue with this?

On Thu, Nov 2, 2023 at 6:59 PM David Radley  wrote:

> Hi,
> I am working on issue https://issues.apache.org/jira/browse/FLINK-33365
> which has been marked as critical and a blocker for the next release of the
> jdbc connector. I can recreate an issue locally using code I built from
> source, so I can add in println’s which are coming out – but this is slow
> and tedious.
>
> Ideally I would like to be able to source debug the scala in the flink
> table planner; any advice would be fab?
>
>  Kind regards, David.
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>


--
Best regards,
Sergey

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


How do I source debug the scala in the flink table planner?

2023-11-02 Thread David Radley
Hi,
I am working on issue https://issues.apache.org/jira/browse/FLINK-33365 which 
has been marked as critical and a blocker for the next release of the jdbc 
connector. I can recreate an issue locally using code I built from source, so I 
can add in println’s which are coming out – but this is slow and tedious.

Ideally I would like to be able to source debug the scala in the flink table 
planner; any advice would be fab?

 Kind regards, David.

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: flink-sql-connector-jdbc new release

2023-10-27 Thread David Radley
Hi Martijn,
Thanks for the link. I suspect I cannot be the release manager, as I do not 
have the required access, but am happy to help this progress, kind 
regards, David.

From: Martijn Visser 
Date: Friday, 27 October 2023 at 12:16
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: flink-sql-connector-jdbc new release
Hi David,

The release process for connector is documented at
https://cwiki.apache.org/confluence/display/FLINK/Creating+a+flink-connector+release

Best regards,

Martijn

On Fri, Oct 27, 2023 at 12:00 PM David Radley  wrote:
>
> Hi Jing,
> I just spotted the mailing list that it is a regression – I agree it is a 
> blocker,
>Kind regards, David.
>
> From: David Radley 
> Date: Friday, 27 October 2023 at 10:33
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] RE: flink-sql-connector-jdbc new release
> Hi Jing,
> thanks are there any processes documented around getting a release out. Out 
> of interest what is your thinking around this being a blocker? I suspect it 
> is not a regression, but a really nice to have, WDYT,
> Either way it looks interesting – I am going to have a look into this issue 
> to try to move it along– could you assign it to me please,
> Kind regards,  David.
>
> From: Jingsong Li 
> Date: Friday, 27 October 2023 at 06:54
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: flink-sql-connector-jdbc new release
> Hi David,
>
> Thanks for driving this.
>
> I think https://issues.apache.org/jira/browse/FLINK-33365should be a 
> blocker.
>
> Best,
> Jingsong
>
> On Thu, Oct 26, 2023 at 11:43 PM David Radley  wrote:
> >
> > Hi,
> > I propose that we do a 3.2 release of flink-sql-connector-jdbc so that 
> > there is a version matching 1.18 that includes the new dialects. I am happy 
> > to drive this, some pointers to documentation on the process and the 
> > approach to testing the various dialects would be great,
> >
> >  Kind regards, David.
> >
> >
> > Unless otherwise stated above:
> >
> > IBM United Kingdom Limited
> > Registered in England and Wales with number 741598
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


[jira] [Created] (FLINK-33384) MySQL JDBC driver is deprecated

2023-10-27 Thread david radley (Jira)
david radley created FLINK-33384:


 Summary: MySQL JDBC driver is deprecated
 Key: FLINK-33384
 URL: https://issues.apache.org/jira/browse/FLINK-33384
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / JDBC
Reporter: david radley


I see when running tests on the JDBC connector, I get a warning

_Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver 
class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via 
the SPI and manual loading of the driver class is generally unnecessary._
 

I suggest we change the class to be loaded from the old to the new non 
deprecated class name.

 

I am happy to implement this and do testing on it. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


RE: flink-sql-connector-jdbc new release

2023-10-27 Thread David Radley
Hi Jing,
I just spotted the mailing list that it is a regression – I agree it is a 
blocker,
   Kind regards, David.

From: David Radley 
Date: Friday, 27 October 2023 at 10:33
To: dev@flink.apache.org 
Subject: [EXTERNAL] RE: flink-sql-connector-jdbc new release
Hi Jing,
thanks are there any processes documented around getting a release out. Out of 
interest what is your thinking around this being a blocker? I suspect it is not 
a regression, but a really nice to have, WDYT,
Either way it looks interesting – I am going to have a look into this issue to 
try to move it along– could you assign it to me please,
Kind regards,  David.

From: Jingsong Li 
Date: Friday, 27 October 2023 at 06:54
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: flink-sql-connector-jdbc new release
Hi David,

Thanks for driving this.

I think https://issues.apache.org/jira/browse/FLINK-33365   should be a blocker.

Best,
Jingsong

On Thu, Oct 26, 2023 at 11:43 PM David Radley  wrote:
>
> Hi,
> I propose that we do a 3.2 release of flink-sql-connector-jdbc so that there 
> is a version matching 1.18 that includes the new dialects. I am happy to 
> drive this, some pointers to documentation on the process and the approach to 
> testing the various dialects would be great,
>
>  Kind regards, David.
>
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: flink-sql-connector-jdbc new release

2023-10-27 Thread David Radley
Hi Jing,
thanks are there any processes documented around getting a release out. Out of 
interest what is your thinking around this being a blocker? I suspect it is not 
a regression, but a really nice to have, WDYT,
Either way it looks interesting – I am going to have a look into this issue to 
try to move it along– could you assign it to me please,
Kind regards,  David.

From: Jingsong Li 
Date: Friday, 27 October 2023 at 06:54
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: flink-sql-connector-jdbc new release
Hi David,

Thanks for driving this.

I think https://issues.apache.org/jira/browse/FLINK-33365  should be a blocker.

Best,
Jingsong

On Thu, Oct 26, 2023 at 11:43 PM David Radley  wrote:
>
> Hi,
> I propose that we do a 3.2 release of flink-sql-connector-jdbc so that there 
> is a version matching 1.18 that includes the new dialects. I am happy to 
> drive this, some pointers to documentation on the process and the approach to 
> testing the various dialects would be great,
>
>  Kind regards, David.
>
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


flink-sql-connector-jdbc new release

2023-10-26 Thread David Radley
Hi,
I propose that we do a 3.2 release of flink-sql-connector-jdbc so that there is 
a version matching 1.18 that includes the new dialects. I am happy to drive 
this, some pointers to documentation on the process and the approach to testing 
the various dialects would be great,

 Kind regards, David.


Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Re: [VOTE] Apache Flink Kubernetes Operator Release 1.6.1, release candidate #1

2023-10-26 Thread David Radley
Hi,
I downloaded the artifacts.

  *   I did an install of the operator and ran the basic sample
  *   I checked the checksums
  *   Checked the GPG signatures
  *   Ran the UI
  *   Ran a Twistlock scan
  *   I installed 1.6 then did a helm upgrade
  *   I have not managed to do the source build and subsequent install yet.
I wanted to check these 2 things are what you would expect:

  1.  I followed link 
https://github.com/apache/flink-kubernetes-operator/pkgs/container/flink-kubernetes-operator/139454270?tag=51eeae1
And notice that it does not have a description . Is this correct?

  1.  I get this in the gpg verification . Is this ok?


gpg --verify flink-kubernetes-operator-1.6.1-src.tgz.asc

gpg: assuming signed data in 'flink-kubernetes-operator-1.6.1-src.tgz'

gpg: Signature made Fri 20 Oct 2023 04:07:48 PDT

gpg:using RSA key B2D64016B940A7E0B9B72E0D7D0528B28037D8BC

gpg: Good signature from "Rui Fan fan...@apache.org" 
[unknown]

gpg: WARNING: This key is not certified with a trusted signature!

gpg:  There is no indication that the signature belongs to the owner.

Primary key fingerprint: B2D6 4016 B940 A7E0 B9B7  2E0D 7D05 28B2 8037 D8BC




Hi Everyone,

Please review and vote on the release candidate #1 for the version 1.6.1 of
Apache Flink Kubernetes Operator,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

**Release Overview**

As an overview, the release consists of the following:
a) Kubernetes Operator canonical source distribution (including the
Dockerfile), to be deployed to the release repository at dist.apache.org
b) Kubernetes Operator Helm Chart to be deployed to the release repository
at dist.apache.org
c) Maven artifacts to be deployed to the Maven Central Repository
d) Docker image to be pushed to dockerhub

**Staging Areas to Review**

The staging areas containing the above mentioned artifacts are as follows,
for your review:
* All artifacts for a,b) can be found in the corresponding dev repository
at dist.apache.org [1]
* All artifacts for c) can be found at the Apache Nexus Repository [2]
* The docker image for d) is staged on github [3]

All artifacts are signed with the
key B2D64016B940A7E0B9B72E0D7D0528B28037D8BC [4]

Other links for your review:
* source code tag "release-1.6.1-rc1" [5]
* PR to update the website Downloads page to
include Kubernetes Operator links [6]
* PR to update the doc version of flink-kubernetes-operator[7]

**Vote Duration**

The voting time will run for at least 72 hours.
It is adopted by majority approval, with at least 3 PMC affirmative votes.

**Note on Verification**

You can follow the basic verification guide here[8].
Note that you don't need to verify everything yourself, but please make
note of what you have tested together with your +- vote.

[1]
https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.6.1-rc1/
[2] https://repository.apache.org/content/repositories/orgapacheflink-1663/
[3]
https://github.com/apache/flink-kubernetes-operator/pkgs/container/flink-kubernetes-operator/139454270?tag=51eeae1
[4] https://dist.apache.org/repos/dist/release/flink/KEYS
[5]
https://github.com/apache/flink-kubernetes-operator/tree/release-1.6.1-rc1
[6] https://github.com/apache/flink-web/pull/690
[7] https://github.com/apache/flink-kubernetes-operator/pull/687
[8]
https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Kubernetes+Operator+Release

Best,
Rui

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


FW: Maven and java version variables

2023-10-25 Thread David Radley
Hi,
I notice another pr, 
https://github.com/apache/flink/pull/23594/files#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8
 is going to cause a conflict again, unless this is merged,

Are we ok to merge https://github.com/apache/flink/pull/23469  – so I do not 
need to resolve another conflict?
Kind regards,  David.


From: David Radley 
Date: Monday, 23 October 2023 at 12:25
To: dev@flink.apache.org 
Subject: [EXTERNAL] Maven and java version variables
Hi,

I have an open pr in the backlog that improves the pom.xml by introducing some 
Maven variables. The pr is https://github.com/apache/flink/pull/23469
It has been reviewed but not merged. In the meantime another pom change has 
been added that caused a conflict. I have amended the code in my pr to 
implement the new logic, introducing a new java upper bounds version variable.
I notice that the pom change that was added introduced this comment:





I am not sure what the CI setup means and where in the Flink Release wiki the 
java range is mentioned. It would be great if the comment could be extended to 
include links to this information. I am happy to do that as part of this pr , 
if needed, if I can be supplied the links.  I think this pr should be merged 
asap, so subsequent pom file changes use the Maven variables.

  WDYT

Kind regards, David.

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: Maven and java version variables

2023-10-25 Thread David Radley
Hi Matthias,
That sounds reasonable,
Kind regards, David

From: Matthias Pohl 
Date: Monday, 23 October 2023 at 16:41
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: Maven and java version variables
Hi David,
The change that caused the conflict in your PR is caused by FLINK-33291
[1]. I was thinking about adding links to the comments to make the
navigation to the corresponding resources easier as you rightfully
mentioned. I didn't do it in the end because I was afraid that
documentation might be moved in the future and those links wouldn't be
valid anymore. That is why I tried to make the comments descriptive instead.

But I agree: We could definitely do better with the documentation.
...especially (but not only) for CI.

Best,
Matthias

[1] https://issues.apache.org/jira/browse/FLINK-33291

On Mon, Oct 23, 2023 at 2:53 PM Alexander Fedulov <
alexander.fedu...@gmail.com> wrote:

> (under "Prepare for the release")
>
> As for CI:
>
> https://github.com/apache/flink/blob/78b5ddb11dfd2a3a00b58079fe9ee29a80555988/tools/ci/maven-utils.sh#L84
>
> https://github.com/apache/flink/blob/9b63099964b36ad9d78649bb6f5b39473e0031bd/tools/azure-pipelines/build-apache-repo.yml#L39
>
> https://github.com/apache/flink/blob/9b63099964b36ad9d78649bb6f5b39473e0031bd/azure-pipelines.yml#L39
>
> Best,
> Alexander Fedulov
>
>
> On Mon, 23 Oct 2023 at 14:44, Jing Ge  wrote:
>
> > Hi David,
> >
> > Please check [1] in the section Verify Java and Maven Version. Thanks!
> >
> > Best regards,
> > Jing
> >
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
> >
> > On Mon, Oct 23, 2023 at 1:25 PM David Radley 
> > wrote:
> >
> > > Hi,
> > >
> > > I have an open pr in the backlog that improves the pom.xml by
> introducing
> > > some Maven variables. The pr is
> > https://github.com/apache/flink/pull/23469
> > > It has been reviewed but not merged. In the meantime another pom change
> > > has been added that caused a conflict. I have amended the code in my pr
> > to
> > > implement the new logic, introducing a new java upper bounds version
> > > variable.
> > > I notice that the pom change that was added introduced this comment:
> > >
> > >  > -->
> > >
> > > 
> > >
> > > I am not sure what the CI setup means and where in the Flink Release
> wiki
> > > the java range is mentioned. It would be great if the comment could be
> > > extended to include links to this information. I am happy to do that as
> > > part of this pr , if needed, if I can be supplied the links.  I think
> > this
> > > pr should be merged asap, so subsequent pom file changes use the Maven
> > > variables.
> > >
> > >   WDYT
> > >
> > > Kind regards, David.
> > >
> > > Unless otherwise stated above:
> > >
> > > IBM United Kingdom Limited
> > > Registered in England and Wales with number 741598
> > > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
> > >
> >
>

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Re: [VOTE] Add JSON encoding to Avro serialization

2023-10-25 Thread David Radley
Looks good to me +1

From: Ryan Skraba 
Date: Wednesday, 25 October 2023 at 17:19
To: dev@flink.apache.org 
Subject: [EXTERNAL] [VOTE] Add JSON encoding to Avro serialization
Hello!

I'm reviewing a new feature of another contributor (Dale Lane) on
FLINK-33058 that adds JSON-encoding in addition to the binary Avro
serialization format.  He addressed my original objections that JSON
encoding isn't _generally_ a best practice for Avro messages.

The discussion is pretty well-captured in the JIRA and PR, but I
wanted to give it a bit of visiblity and see if there were any strong
opinions on the subject! Given the minor nature of this feature, I
don't think it requires a FLIP.

*TL;DR*:  JSON-encoded Avro might not be ideal for production, but it
has a place for small systems and especially setting up and testing
before making the switch to binary-encoding.

All my best, Ryan

[Jira]: https://issues.apache.org/jira/browse/FLINK-33058
[PR]: https://github.com/apache/flink/pull/23395

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: Operator 1.6 to Olm

2023-10-25 Thread David Radley
Hi,
Fyi with some expert direction from James Busche, I have published the 1.6 OLM 
and operatorhub.io versions of the Flink operator.  When 1.6.1 is out I will do 
the same again,
 Kind regards, David.



From: Gyula Fóra 
Date: Tuesday, 10 October 2023 at 13:27
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: Operator 1.6 to Olm
That would be great David, thank you!

Gyula

On Tue, 10 Oct 2023 at 14:13, David Radley  wrote:

> Hi,
> I notice that the latest version in olm of the operator is 1.5. I plan to
> run the scripts to publish the 1.6 Flink operator to olm,
>  Kind regards, David.
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: FLIP-233

2023-10-24 Thread David Radley
Hi Martjin,
Yes I am happy to continue to improve the existing Flip.

Hi jing,
I was looking to continue the discussion and update the Flip content.

Are we OK to reopen : FLIP-233? I will update it and the discussion thread 
there.

  Kind regards, David.


From: Martijn Visser 
Date: Tuesday, 24 October 2023 at 11:56
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: FLIP-233
Hi,

There is already https://github.com/getindata/flink-http-connector  -
Why do we need to create another one, instead of improving the
existing one?

Best regards,

Martijn

On Tue, Oct 24, 2023 at 12:28 PM Jing Ge  wrote:
>
> Hi David,
>
> Thanks for picking this up. I was wondering if you have any concrete plan
> like upgrading the FLIP or directly starting a new discussion with the
> current FLIP as it is? Looking forward to having this connector.
>
> Best regards,
> Jing
>
> On Tue, Oct 24, 2023 at 10:55 AM David Radley 
> wrote:
>
> > Thanks Leonard,
> > Hopefully this will be reopened, as we would very much like this
> > capability and want to take over the FLIP, continue the discussion to get a
> > consensus, then implement,
> >Kind regards, David
> >
> > From: Leonard Xu 
> > Date: Tuesday, 24 October 2023 at 02:56
> > To: dev 
> > Cc: jd...@amazon.com 
> > Subject: [EXTERNAL] Re: FLIP-233
> > +1 to reopen the FLIP, the FLIP  has been stalled for more than a year due
> > to the author's time slot.
> >
> > Glad to see the developers from IBM would like to take over the FLIP, we
> > can continue the discussion in FLIP-233 discussion thread [1]
> >
> > Best,
> > Leonard
> >
> > [1] https://lists.apache.org/thread/cd60ln4pjgml7sv4kh23o1fohcfwvjcz
> >
> > > 2023年10月24日 上午12:41,David Radley  写道:
> > >
> > > Hi,
> > > I notice
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-233%3A+Introduce+HTTP+Connector
> > has been abandoned , due to lack of capacity. I work for IBM and my team is
> > interested in helping to get this connector contributed into Flink. Can we
> > open this Flip again and we can look to get agreement in the discussion
> > thread please,
> > >
> > > Kind regards, David.
> > >
> > > Unless otherwise stated above:
> > >
> > > IBM United Kingdom Limited
> > > Registered in England and Wales with number 741598
> > > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
> >
> > Unless otherwise stated above:
> >
> > IBM United Kingdom Limited
> > Registered in England and Wales with number 741598
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
> >

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: FLIP-233

2023-10-24 Thread David Radley
Thanks Leonard,
Hopefully this will be reopened, as we would very much like this capability and 
want to take over the FLIP, continue the discussion to get a consensus, then 
implement,
   Kind regards, David

From: Leonard Xu 
Date: Tuesday, 24 October 2023 at 02:56
To: dev 
Cc: jd...@amazon.com 
Subject: [EXTERNAL] Re: FLIP-233
+1 to reopen the FLIP, the FLIP  has been stalled for more than a year due to 
the author's time slot.

Glad to see the developers from IBM would like to take over the FLIP, we can 
continue the discussion in FLIP-233 discussion thread [1]

Best,
Leonard

[1] https://lists.apache.org/thread/cd60ln4pjgml7sv4kh23o1fohcfwvjcz

> 2023年10月24日 上午12:41,David Radley  写道:
>
> Hi,
> I notice 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-233%3A+Introduce+HTTP+Connector
>   has been abandoned , due to lack of capacity. I work for IBM and my team is 
> interested in helping to get this connector contributed into Flink. Can we 
> open this Flip again and we can look to get agreement in the discussion 
> thread please,
>
> Kind regards, David.
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


FLIP-233

2023-10-23 Thread David Radley
Hi,
I notice 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-233%3A+Introduce+HTTP+Connector
 has been abandoned , due to lack of capacity. I work for IBM and my team is 
interested in helping to get this connector contributed into Flink. Can we open 
this Flip again and we can look to get agreement in the discussion thread 
please,

 Kind regards, David.

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: Backport strategy

2023-10-23 Thread David Radley
Hi Martijn,
Thanks for the pointer; that makes sense – many (most?) projects only provide 
fixes at the current release (apart for exception circumstances – possibly some 
high priority security fixes) ; I am curious why Flink fixes 2 streams of code.

One thing that I wondered about is me is the use of the word ‘support’. In 
previous open source projects we have been keen to stress that the open source 
community does not provide support, in line with the Apache 2 license which 
talks of the code being supplied as-is with no warrantee. What do you think 
about not using the word support in case it is misleading?
  Kind regards, David.


From: Martijn Visser 
Date: Monday, 23 October 2023 at 16:18
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: Backport strategy
Hi David,

The policy is that the current and and previous minor release are
supported, and it's documented at
https://flink.apache.org/downloads/#update-policy-for-old-releases
One of the reasons for decoupling the connectors from Flink is that it
could be possible to support older versions of Flink as well. That
depends of course on the complexity of the backport etc which is a
case-by-case situation.

Best regards,

Martijn

On Mon, Oct 23, 2023 at 4:16 PM David Radley  wrote:
>
> Hi,
> I am relatively new to the Flink community. I notice that critical fixes are 
> backported to previous versions. Do we have a documented backport strategy 
> and set of principles?
>
> The reason I ask is that we recently moved removed the Kafka connector from 
> the core repository, so the Kafka connector should be picked up from its own 
> repository. I noticed this removal and updated issues in the core repo to 
> indicate the code has moved to another repo. One of the issues was 
> https://github.com/apache/flink/pull/21226#issuecomment-1775121605  . This is 
> a critical issue and the request is to backport it to 1.15.3. I assume a 
> backport would involved a 3rd number change 1.15.4 in this case.
>
> It seems to me that we should look to create fixes in the stand alone Kafka 
> connector where possible and to list the compatible Flink versions it can be 
> used with, this could include patch levels of 1.15 1.16 1.17.
>
> WDYT?
>   Kind regards, David.
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Backport strategy

2023-10-23 Thread David Radley
Hi,
I am relatively new to the Flink community. I notice that critical fixes are 
backported to previous versions. Do we have a documented backport strategy and 
set of principles?

The reason I ask is that we recently moved removed the Kafka connector from the 
core repository, so the Kafka connector should be picked up from its own 
repository. I noticed this removal and updated issues in the core repo to 
indicate the code has moved to another repo. One of the issues was 
https://github.com/apache/flink/pull/21226#issuecomment-1775121605 . This is a 
critical issue and the request is to backport it to 1.15.3. I assume a backport 
would involved a 3rd number change 1.15.4 in this case.

It seems to me that we should look to create fixes in the stand alone Kafka 
connector where possible and to list the compatible Flink versions it can be 
used with, this could include patch levels of 1.15 1.16 1.17.

WDYT?
  Kind regards, David.

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Maven and java version variables

2023-10-23 Thread David Radley
Hi,

I have an open pr in the backlog that improves the pom.xml by introducing some 
Maven variables. The pr is https://github.com/apache/flink/pull/23469
It has been reviewed but not merged. In the meantime another pom change has 
been added that caused a conflict. I have amended the code in my pr to 
implement the new logic, introducing a new java upper bounds version variable.
I notice that the pom change that was added introduced this comment:





I am not sure what the CI setup means and where in the Flink Release wiki the 
java range is mentioned. It would be great if the comment could be extended to 
include links to this information. I am happy to do that as part of this pr , 
if needed, if I can be supplied the links.  I think this pr should be merged 
asap, so subsequent pom file changes use the Maven variables.

  WDYT

Kind regards, David.

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Re: [ANNOUNCE] New Apache Flink Committer - Ron Liu

2023-10-16 Thread David Radley
Congratulations Ron!

From: Jark Wu 
Date: Sunday, 15 October 2023 at 18:57
To: dev 
Cc: ron9@gmail.com 
Subject: [EXTERNAL] [ANNOUNCE] New Apache Flink Committer - Ron Liu
Hi, everyone

On behalf of the PMC, I'm very happy to announce Ron Liu as a new Flink
Committer.

Ron has been continuously contributing to the Flink project for many years,
authored and reviewed a lot of codes. He mainly works on Flink SQL parts
and drove several important FLIPs, e.g., USING JAR (FLIP-214), Operator
Fusion CodeGen (FLIP-315), Runtime Filter (FLIP-324). He has a great
knowledge of the Batch SQL and improved a lot of batch performance in the
past several releases. He is also quite active in mailing lists,
participating in discussions and answering user questions.

Please join me in congratulating Ron Liu for becoming a Flink Committer!

Best,
Jark Wu (on behalf of the Flink PMC)

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: FW: RE: [DISCUSS] FLIP-368 Reorganize the exceptions thrown in state interfaces

2023-10-13 Thread David Radley
Hi,
It seems that migrating to the next dot version and finding you need to 
recompile, would be frustrating and couse unexpected work, as I suspect a jar 
file using the old API will give an exception around not finding the method – I 
am not sure how this would surface in typical applications at runtime.

Technically because this is tagged as @PublicEvolving then we can make this 
breaking change.

So there will be migration issues if people are using this API, have we an idea 
on how many of our users are using it?

If we use @PublicEvolving then maybe we should have stable binaries that only 
include public APIs and then another bleeding edge package containing 
@PublicEvolving content, so users can choose.

Organisations I have worked with would not tend to want to or expect to have to 
recompile their applications on a dot version – as this would normally mean a 
lot more testing for them.

On balance, as I am risk averse, I would suggest delaying this to v2 as Jing 
has proposed. This is a cleaner API, is there a demand for this in a dot 
version? If the community think this is too risk averse, then we could go with 
1.19.
WDYT?

Kind regards, David.



From: Jing Ge 
Date: Friday, 13 October 2023 at 14:30
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: FW: RE: [DISCUSS] FLIP-368 Reorganize the exceptions 
thrown in state interfaces
HI Zakelly,

What about the jobs that catch those exceptions? Will these downstream
callers that expect this exception to be thrown break, since the exceptions
are part of the method's contract?

Best regards,
Jing

On Fri, Oct 13, 2023 at 11:01 AM Zakelly Lan  wrote:

> Hi Yuan,
>
> Thanks for sharing your thoughts!
>
> In Flink 2.0 I believe we could design brand-new state APIs, which is
> uncertain and may take some time. However, this proposal primarily
> focuses on improving the consistency of interface definitions and
> enhancing the user experience in error handling for the current APIs.
> Therefore, I would prefer to make it in version 1.19. Furthermore, the
> impact of this API change can be controlled since most Flink users do
> not actively catch these exceptions. For them, a simple code recompile
> is sufficient and acceptable when migrating to a new minor release.
> The change is not that big that we need a major version to apply.
>
>
> Best,
> Zakelly
>
> On Fri, Oct 13, 2023 at 3:03 PM Yuan Mei  wrote:
> >
> > +1 for the proposal
> >
> > But "Since the signature of the public state API has been changed", I was
> > wondering whether this would be more fittable in Flink 2.0, instead of
> 1.19?
> >
> > WDYT?
> >
> > Best
> > Yuan
> >
> > On Wed, Oct 11, 2023 at 4:34 PM David Radley 
> > wrote:
> >
> > > Hi Zakelly,
> > > Thanks for making this clear for me.  We should document the impact on
> the
> > > user in the release notes, which will be a minimal rewrite and
> recompile of
> > > any java using the old APIs.
> > > I think it is a good point you make about if there are future
> > > implementations that are
> > > worth retrying (such as network access) – then there could be retries.
> I
> > > agree we should not be trying to create code now for an implementation
> > > consideration that is not there yet,
> > >
> > > +1 from me ,
> > >  Kind regards, David.
> > >
> > > From: Zakelly Lan 
> > > Date: Wednesday, 11 October 2023 at 04:25
> > > To: dev@flink.apache.org 
> > > Subject: [EXTERNAL] Re: FW: RE: [DISCUSS] FLIP-368 Reorganize the
> > > exceptions thrown in state interfaces
> > > Hi David,
> > >
> > > Thanks for your response.
> > >
> > > The exceptions thrown by state interfaces are NOT retriable. For
> > > example, there may be some elements sent to the wrong subtask due to a
> > > non-deterministic hashCode() algorithm and the key group is not
> > > matching. Or the rocksdb may fail to read a file if it has been
> > > deleted by the user. If there are future implementations that are
> > > worth retrying (such as network access), it would be better to let the
> > > implementation itself handle the retries and provide a configuration
> > > for this, rather than requiring users to catch these exceptions.
> > >
> > > Regarding the release and documentation, I have mentioned that this
> > > change is targeted for version 1.19 with proper documentation. You may
> > > have noticed that state interfaces are annotated with @PublicEvolving,
> > > which means these interfaces may change across versions. The changes
> > > are suitable for a minor release (1.18.0 c

[jira] [Created] (FLINK-33269) light and dark scss files do not have apache licenses at the top.

2023-10-13 Thread david radley (Jira)
david radley created FLINK-33269:


 Summary: light and dark scss files  do not have apache licenses at 
the top.
 Key: FLINK-33269
 URL: https://issues.apache.org/jira/browse/FLINK-33269
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Runtime / Web Frontend
Reporter: david radley


I notice that 

[https://github.com/apache/flink-web/blob/asf-site/docs/assets/_code-dark.scss]

and 

[https://github.com/apache/flink-web/blob/asf-site/docs/assets/_code-light.scss]

 

recently added for https://issues.apache.org/jira/browse/FLINK-33046 do not 
have the Apache  license at the top like the other scss files. 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


RE: FW: RE: Close orphaned/stale PRs

2023-10-12 Thread David Radley
Hi everyone,
Martjin, I like your ideas. I think these labels will help, make it obvious 
what work is actionable. I really feel these sort of process improvements will 
incrementally help work to flow through appropriately.

2 additional thoughts – I hope these help this discussion:

  *   A triaged label on the issue would indicate that a maintainer has agreed 
this is a valid issue – this would be a better pool of issues for contributors 
to pickup. I am not sure if maintainers currently do this sort of work.
  *   I like the codeowners idea; did you find a way though this within the 
Apache rules? An extension to this is that increasingly we are moving out parts 
of the code from the main Flink repository to other repositories; would this be 
doable. Could experts in those repositories be given write access to those 
repos; so that each non core repo can work through its issues and merge its prs 
more independently. This is how LF project Egeria works with its connectors and 
UIS;  I guess the concern is that in ASF these people would need to be  
committers, or could they be a committer on a subset of repos. Another way to 
manage who can merge prs is to gate the pr process using git actions, so that 
if an approved approver indicates a pr is good then the raiser can merge – this 
would give us granularity on write access – PyTorch follows this sort of 
process.

  kind regards, David.


From: Martijn Visser 
Date: Thursday, 12 October 2023 at 10:32
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: FW: RE: Close orphaned/stale PRs
Hi everyone,

I'm overall +1 on Ryan's comment.
When we're talking about component ownership, I've started a
discussion on the Infra mailing list in the beginning of the year on
it. In principle, the "codeowners" idea goes against ASF principles.

Let's summarize things:
1. From a project perspective, we can have a discussion about closing
PRs automatically that a) are not followed-up within X number of days
after a review and/or b) PRs that don't have a passing build and/or
don't follow contribution guidelines and/or C) need to be rebased
2. In order to help understand which PRs are OK to get reviewed, we
could consider automatically adding a label "Ready for review" in case
1b (passing build/contribution guidelines met) is the case.
3. In order to help contributors, we could consider automatically
adding a label in case their PR isn't mergeable for the situations
that are displayed in situation 1

When that's done, we can see what the effect is on the PRs queue.

Best regards,

Martijn

On Wed, Oct 4, 2023 at 5:13 PM David Radley  wrote:
>
> Hi Ryan,
>
> I agree that good communication is key to determining what can be worked on.
>
> In terms of metrics , we can use the gh cli to list prs and we can export 
> issues from Jira. A view across them, you could join on the Flink issue (at 
> the start of the pr comment and the flink issue itself – you could then see 
> which prs have an assigned Jira would be expected to be reviewed. There is no 
> explicit reviewer field in the Jira issue; I am not sure if we can easily get 
> this info without having a custom field (which others have tried).
>
> In terms of what prs a committer could / should review – I would think that 
> component ownership helps scope the subset of prs to review / merge.
>
> Kind regards, David.
>
>
> From: Ryan Skraba 
> Date: Wednesday, 4 October 2023 at 15:09
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: FW: RE: Close orphaned/stale PRs
> Hey, this has been an interesting discussion -- this is something that
> has been on my mind as an open source contributor and committer (I'm
> not a Flink committer).
>
> A large number of open PRs doesn't _necessarily_ mean a project is
> unhealthy or has technical debt. If it's fun and easy to get your
> contribution accepted and committed, even for a small fix, you're more
> likely to raise another PR, and another.  I wouldn't be surprised if
> there's a natural equilibrium where adding capacity to smoothly review
> and manage more PRs cause more PRs to be submitted.  Everyone wins!
>
> I don't think there's a measure for the "average PR lifetime", or
> "time to first comment", but those would be more interesting things to
> know and those are the worrisome ones.
>
> As a contributor, I'm pretty willing to wait as long as necessary (and
> rebase and fix merge conflicts) if there's good communication in
> place. I'm pretty patient, especially if I knew that the PR would be
> looked at and merged for a specific fix version (for example).  I'd
> expect simple and obvious fixes with limited scope to take less time
> than a more complex, far-reaching change.  I'd probably appreciate
> that the boring-cyborg welcomes me on my first PR, but I'd be pretty
> irritat

RE: FW: RE: [DISCUSS] FLIP-368 Reorganize the exceptions thrown in state interfaces

2023-10-11 Thread David Radley
Hi Zakelly,
Thanks for making this clear for me.  We should document the impact on the user 
in the release notes, which will be a minimal rewrite and recompile of any java 
using the old APIs.
I think it is a good point you make about if there are future implementations 
that are
worth retrying (such as network access) – then there could be retries. I agree 
we should not be trying to create code now for an implementation consideration 
that is not there yet,

+1 from me ,
 Kind regards, David.

From: Zakelly Lan 
Date: Wednesday, 11 October 2023 at 04:25
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: FW: RE: [DISCUSS] FLIP-368 Reorganize the exceptions 
thrown in state interfaces
Hi David,

Thanks for your response.

The exceptions thrown by state interfaces are NOT retriable. For
example, there may be some elements sent to the wrong subtask due to a
non-deterministic hashCode() algorithm and the key group is not
matching. Or the rocksdb may fail to read a file if it has been
deleted by the user. If there are future implementations that are
worth retrying (such as network access), it would be better to let the
implementation itself handle the retries and provide a configuration
for this, rather than requiring users to catch these exceptions.

Regarding the release and documentation, I have mentioned that this
change is targeted for version 1.19 with proper documentation. You may
have noticed that state interfaces are annotated with @PublicEvolving,
which means these interfaces may change across versions. The changes
are suitable for a minor release (1.18.0 currently to 1.19.0 in the
future) as defined by the API compatibility guarantees of Flink[1].



Best,
Zakelly


[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/upgrading/#api-compatibility-guarantees

On Tue, Oct 10, 2023 at 6:19 PM David Radley  wrote:
>
> Hi,
> I notice 
> https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/common/state/ValueState.html
>   is an external API. I am concerned that this change will break existing 
> applications using the old interface, they are likely to have catches / 
> throws around the existing checked Exceptions.
>
> If we go with RunTimeException, I would suggest that this sort of breaking 
> change should be done on a Flink version change, where it is appropriate to 
> make breaking changes to the API with associated documentation.
>
> If we want this change on a minor release,  we could create a new class 
> ValueState2– that is used internally with the cleaned up Exceptions, but 
> still expose the old class and Exceptions for existing external applications. 
> I guess new applications could use the new ValueState2 .
>
> What do you think?
> Kind regards, David.
>
>
> From: David Radley 
> Date: Tuesday, 10 October 2023 at 09:49
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] RE: [DISCUSS] FLIP-368 Reorganize the exceptions thrown 
> in state interfaces
> Hi ,
> The argument seems to be that the errors cannot be acted on so should be 
> runtime exceptions. I want to confirm that none of these errors could / 
> should be retriable. If there is a possibility that the state is available at 
> some time later then I assume a checked retriable Exception would be 
> appropriate for those cases; and be part of the contract with the caller. Can 
> we be sure that there is no possibility that the state will become available; 
> if so then I agree that a runtime Exception is appropriate. What do you think?
>
>
>
> Kind regards, David.
>
>
> From: Zakelly Lan 
> Date: Monday, 9 October 2023 at 18:12
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-368 Reorganize the exceptions thrown 
> in state interfaces
> Hi everyone,
>
> It seems we're gradually reaching a consensus. So I would like to
> start a vote after 72 hours if there are no further discussions.
>
> Please let me know if you have any concerns, thanks!
>
>
> Best,
> Zakelly
>
>
> On Sat, Oct 7, 2023 at 4:07 PM Zakelly Lan  wrote:
> >
> > Hi Jing,
> >
> > Sorry for the late reply! I agree with you that we do not expect users
> > to do anything with Flink and we won't "bother" them with those
> > exceptions. However, users can still catch the `Throwable` and perform
> > any necessary logging activities, similar to how they use Java
> > Collection interfaces.
> >
> >
> > Thanks for your insights!
> >
> > Best,
> > Zakelly
> >
> > On Thu, Sep 21, 2023 at 8:43 PM Jing Ge  wrote:
> > >
> > > Fair enough! Thanks Zakelly for the information. Afaic, even users can do
> > > nothing with Flink, they still can do something in their territory, at
> > > least doing some logging 

  1   2   >