Re: [VOTE] FLIP-464: Merge "flink run" and "flink run-application"

2024-06-16 Thread Venkatakrishnan Sowrirajan
+1. Thanks for driving this proposal, Ferenc!

Regards
Venkata krishnan


On Thu, Jun 13, 2024 at 10:54 AM Jeyhun Karimov 
wrote:

> Thanks for driving this.
> +1 (non-binding)
>
> Regards,
> Jeyhun
>
> On Thu, Jun 13, 2024 at 5:23 PM Gabor Somogyi 
> wrote:
>
> > +1 (binding)
> >
> > G
> >
> >
> > On Wed, Jun 12, 2024 at 5:23 PM Ferenc Csaky  >
> > wrote:
> >
> > > Hello devs,
> > >
> > > I would like to start a vote about FLIP-464 [1]. The FLIP is about to
> > > merge back the
> > > "flink run-application" functionality to "flink run", so the latter
> will
> > > be capable to deploy jobs in
> > > all deployment modes. More details in the FLIP. Discussion thread [2].
> > >
> > > The vote will be open for at least 72 hours (until 2024 March 23 14:03
> > > UTC) unless there
> > > are any objections or insufficient votes.
> > >
> > > Thanks,Ferenc
> > >
> > > [1]
> > >
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=311626179__;!!IKRxdwAv5BmarQ!fw7_SWUS3G8imqL4w4z0MejMShCR1pHlYxeTnLFJqIu6sI05EF1rM_n1kw8lESNgRzxPqstJC3ITNwDSp1Jf-aA$
> > > [2]
> https://urldefense.com/v3/__https://lists.apache.org/thread/xh58xs0y58kqjmfvd4yor79rv6dlcg5g__;!!IKRxdwAv5BmarQ!fw7_SWUS3G8imqL4w4z0MejMShCR1pHlYxeTnLFJqIu6sI05EF1rM_n1kw8lESNgRzxPqstJC3ITNwDSOIWCchM$
> >
>


[jira] [Created] (FLINK-35622) Filter out noisy "Coordinator of operator xxxx does not exist" exceptions in batch mode

2024-06-16 Thread Junrui Li (Jira)
Junrui Li created FLINK-35622:
-

 Summary: Filter out noisy "Coordinator of operator  does not 
exist" exceptions in batch mode
 Key: FLINK-35622
 URL: https://issues.apache.org/jira/browse/FLINK-35622
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Coordination
Reporter: Junrui Li


In batch mode, the Flink JobManager logs frequently print "Coordinator of 
operator  does not exist or the job vertex this operator belongs to is not 
initialized." exceptions when using the collect() method.

This exception is caused by the collect sink attempting to fetch data from the 
corresponding operator coordinator on the JM based on the operator ID. However, 
batch jobs do not initialize all job vertices at the beginning, and instead, 
initialize them in a sequential manner. If a job vertex has not been 
initialized yet, the corresponding coordinator cannot be found, leading to the 
printing of this message.

These exceptions are harmless and do not affect the job execution, but they can 
clutter the logs and make it difficult to find relevant information, especially 
for large-scale batch jobs.
 
 
 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Request for unsubscribing

2024-06-15 Thread Harshodai kolluru
Hey Admin, I am getting a bunch of emails from Apache flink , please remove
my subscription.

Thanks!


Re: [ANNOUNCE] Flink 1.20 feature freeze

2024-06-15 Thread Zakelly Lan
Hi Robert, Rui, Ufuk and Weijie,

Thanks for the update!

FYI: This PR[1] fixes & cleanup the left-over checkpoint directories for
file-merging on TM exit. And the second commit fixes the wrong state handle
usage. We encountered several unexpected CI fails, so we missed the feature
freeze time. It is better to have this PR in 1.20 so I will merge this if
you agree. Thanks.


[1] https://github.com/apache/flink/pull/24933

Best,
Zakelly

On Sat, Jun 15, 2024 at 6:00 AM weijie guo 
wrote:

> Hi everyone,
>
>
> The feature freeze of 1.20 has started now. That means that no new features
>
> or improvements should now be merged into the master branch unless you ask
>
> the release managers first, which has already been done for PRs, or pending
>
> on CI to pass. Bug fixes and documentation PRs can still be merged.
>
>
>
> - *Cutting release branch*
>
>
> Currently we have no blocker issues(beside tickets that used for
> release-testing).
>
> We are planning to cut the release branch on next Friday (June 21) if
> no new test instabilities, and we'll make another announcement in the
> dev mailing list then.
>
>
>
> - *Cross-team testing*
>
>
> The release testing will start right after we cut the release branch, which
>
> is expected to come in the next week. As a prerequisite of it, we have
> created
>
> the corresponding instruction ticket in FLINK-35602 [1], please check
> and complete the
>
> documentation and test instruction of your new feature and mark the
> related JIRA
>
> issue in the 1.20 release wiki page [2] before we start testing, which
>
> would be quite helpful for other developers to validate your features.
>
>
>
> Best regards,
>
> Robert, Rui, Ufuk and Weijie
>
>
> [1]https://issues.apache.org/jira/browse/FLINK-35602
>
> [2] https://cwiki.apache.org/confluence/display/FLINK/1.20+Release
>


[ANNOUNCE] Flink 1.20 feature freeze

2024-06-14 Thread weijie guo
Hi everyone,


The feature freeze of 1.20 has started now. That means that no new features

or improvements should now be merged into the master branch unless you ask

the release managers first, which has already been done for PRs, or pending

on CI to pass. Bug fixes and documentation PRs can still be merged.



- *Cutting release branch*


Currently we have no blocker issues(beside tickets that used for
release-testing).

We are planning to cut the release branch on next Friday (June 21) if
no new test instabilities, and we'll make another announcement in the
dev mailing list then.



- *Cross-team testing*


The release testing will start right after we cut the release branch, which

is expected to come in the next week. As a prerequisite of it, we have created

the corresponding instruction ticket in FLINK-35602 [1], please check
and complete the

documentation and test instruction of your new feature and mark the related JIRA

issue in the 1.20 release wiki page [2] before we start testing, which

would be quite helpful for other developers to validate your features.



Best regards,

Robert, Rui, Ufuk and Weijie


[1]https://issues.apache.org/jira/browse/FLINK-35602

[2] https://cwiki.apache.org/confluence/display/FLINK/1.20+Release


[jira] [Created] (FLINK-35621) Release Testing Instructions: Verify FLIP-436: Introduce Catalog-related Syntax

2024-06-14 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35621:
--

 Summary: Release Testing Instructions: Verify FLIP-436: Introduce 
Catalog-related Syntax
 Key: FLINK-35621
 URL: https://issues.apache.org/jira/browse/FLINK-35621
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Common
Reporter: Weijie Guo
Assignee: Ahmed Hamdy
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-35435



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35620) Parquet writer creates wrong file for nested fields

2024-06-14 Thread Vicky Papavasileiou (Jira)
Vicky Papavasileiou created FLINK-35620:
---

 Summary: Parquet writer creates wrong file for nested fields
 Key: FLINK-35620
 URL: https://issues.apache.org/jira/browse/FLINK-35620
 Project: Flink
  Issue Type: Bug
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.19.0
Reporter: Vicky Papavasileiou


After PR [https://github.com/apache/flink/pull/24795] got merged that added 
support for nested arrays, the parquet writer produces wrong parquet files that 
cannot be read. Note, the reader (both flink and iceberg) don't throw an 
exception but return `null` for the nested field. 

The error is in how the field `max_definition_level` is populated for nested 
fields. 

Consider Avro schema:

```

{
"namespace": "com.test",
"type": "record",
"name": "RecordData",
"fields": [
{
"name": "Field1",
"type": {
"type": "array",
"items": {
"type": "record",
"name": "NestedField2",
"fields": [
{ "name": "NestedField3", "type": "double" }

]
}
}
}
]
}

```

Consider the excerpt below of a parquet file produced by Flink for the above 
schema:

```

 Column(SegmentStartTime) 
name: NestedField3
path: Field1.list.element.NestedField3
max_definition_level: 1
max_repetition_level: 1
physical_type: DOUBLE
logical_type: None
converted_type (legacy): NONE
compression: SNAPPY (space_saved: 7%)

```

The max_definition_level should be 4 but is 1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35619) Window rank query fails with "must call validate first"

2024-06-14 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-35619:


 Summary: Window rank query fails with "must call validate first"
 Key: FLINK-35619
 URL: https://issues.apache.org/jira/browse/FLINK-35619
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.20.0


A program:

{code}
static final TableTestProgram WINDOW_RANK_HOP_TVF_NAMED_MIN_TOP_1 =
TableTestProgram.of(
"window-rank-hop-tvf-named-min-top-n",
"validates window min top-n follows after hop 
window")
.setupTableSource(SourceTestStep.newBuilder("bid_t")
.addSchema(
"ts STRING",
"price DECIMAL(10,2)",
"supplier_id STRING",
"`bid_time` AS TO_TIMESTAMP(`ts`)",
"WATERMARK for `bid_time` AS `bid_time` - 
INTERVAL '1' SECOND")
.producedValues(
Row.of(
"2020-04-15 08:00:05",
new BigDecimal(4.00),
"supplier1"))
.build())
.setupTableSink(
SinkTestStep.newBuilder("sink_t")
.addSchema("bid_time TIMESTAMP(3)", 
"supplier_id STRING")
.consumedValues(
"+I[2020-04-15T08:00:05, 
supplier1]",
"+I[2020-04-15T08:00:05, 
supplier1]")
.build())
.runSql("INSERT INTO sink_t(bid_time, supplier_id) "
+ "SELECT bid_time, supplier_id\n"
+ "  FROM (\n"
+ "SELECT\n"
+ " bid_time,\n"
+ " supplier_id,\n"
+ " ROW_NUMBER() OVER (PARTITION BY 
window_start, window_end ORDER BY price ASC) AS row_num\n"
+ "FROM TABLE(HOP(\n"
+ "  DATA => TABLE bid_t,\n"
+ "  TIMECOL => DESCRIPTOR(`bid_time`),\n"
+ "  SLIDE => INTERVAL '5' SECOND,\n"
+ "  SIZE => INTERVAL '10' SECOND))\n"
+ "  ) WHERE row_num <= 3")
.build();
{code}

fails with:
{code}
java.lang.AssertionError: must call validate first

at 
org.apache.calcite.sql.validate.IdentifierNamespace.resolve(IdentifierNamespace.java:256)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:2871)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2464)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2378)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2323)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:730)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:716)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3880)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryOrInList(SqlToRelConverter.java:1912)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertExists(SqlToRelConverter.java:1895)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.substituteSubQuery(SqlToRelConverter.java:1421)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.replaceSubQueries(SqlToRelConverter.java:1161)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertCollectionTable(SqlToRelConverter.java:2928)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2511)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2378)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2323)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:730)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:716)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3880)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2490)
at 

Re: [VOTE] Apache Flink CDC Release 3.1.1, release candidate #0

2024-06-14 Thread Xiqian YU
Thanks Qingsheng for driving this release!

+1 (non-binding)


  *   Verified tarball checksum
  *   Compiled CDC from source
  *   Confirmed jars were compiled with JDK 8
  *   Ran savepoint migration test from CDC 3.0.x
  *   Ran Pipeline E2e tests with Flink 1.18.1 / 1.19.0

Regards,
Xiqian

De : Qingsheng Ren 
Date : vendredi, 14 juin 2024 à 15:06
À : dev 
Objet : [VOTE] Apache Flink CDC Release 3.1.1, release candidate #0
Hi everyone,

Please review and vote on the release candidate #0 for the version 3.1.1 of
Apache Flink CDC, as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

**Release Overview**

As an overview, the release consists of the following:
a) Flink CDC source release to be deployed to dist.apache.org
b) Maven artifacts to be deployed to the Maven Central Repository

**Staging Areas to Review**

The staging areas containing the above mentioned artifacts are as follows,
for your review:
* All artifacts for a) can be found in the corresponding dev repository at
dist.apache.org [1], which are signed with the key with fingerprint
A1BD477F79D036D2C30CA7DBCA8AEEC2F6EB040B [2]
* All artifacts for b) can be found at the Apache Nexus Repository [3]

Other links for your review:
* JIRA release notes [4]
* Source code tag "release-3.1.1-rc0" with commit hash
b163b8e1589184bd7716cf6d002f3472766eb5db [5]
* PR for release announcement blog post of Flink CDC 3.1.1 in flink-web [6]

**Vote Duration**

The voting time will run for at least 72 hours.
It is adopted by majority approval, with at least 3 PMC affirmative votes.

Thanks,
Qingsheng Ren

[1] https://dist.apache.org/repos/dist/dev/flink/flink-cdc-3.1.1-rc0
[2] https://dist.apache.org/repos/dist/release/flink/KEYS
[3] https://repository.apache.org/content/repositories/orgapacheflink-1739
[4]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354763
[5] https://github.com/apache/flink-cdc/releases/tag/release-3.1.1-rc0
[6] https://github.com/apache/flink-web/pull/746


[jira] [Created] (FLINK-35618) Flink CDC add MongoDB pipeline data sink connector

2024-06-14 Thread Jiabao Sun (Jira)
Jiabao Sun created FLINK-35618:
--

 Summary: Flink CDC add MongoDB pipeline data sink connector
 Key: FLINK-35618
 URL: https://issues.apache.org/jira/browse/FLINK-35618
 Project: Flink
  Issue Type: New Feature
  Components: Flink CDC
Affects Versions: cdc-3.2.0
Reporter: Jiabao Sun
Assignee: Jiabao Sun


Flink CDC add MongoDB pipeline data sink connector



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35617) Support object reuse in async state execution

2024-06-14 Thread Zakelly Lan (Jira)
Zakelly Lan created FLINK-35617:
---

 Summary: Support object reuse in async state execution
 Key: FLINK-35617
 URL: https://issues.apache.org/jira/browse/FLINK-35617
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Task
Reporter: Zakelly Lan
Assignee: Zakelly Lan


The record processor of {{AEC}} in async state execution model should consider 
object reuse and copy if needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35616) Support upsert into sharded collections

2024-06-14 Thread Jiabao Sun (Jira)
Jiabao Sun created FLINK-35616:
--

 Summary: Support upsert into sharded collections
 Key: FLINK-35616
 URL: https://issues.apache.org/jira/browse/FLINK-35616
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / MongoDB
Affects Versions: mongodb-1.2.0
Reporter: Jiabao Sun
Assignee: Jiabao Sun


{panel:}
For a db.collection.update() operation that includes upsert: true and is on a 
sharded collection, the full sharded key must be included in the filter:

* For an update operation.
* For a replace document operation (starting in MongoDB 4.2).
{panel}

https://www.mongodb.com/docs/manual/reference/method/db.collection.update/#upsert-on-a-sharded-collection

We need to allow users to configure the full sharded key field names to upsert 
into the sharded collection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


RE: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-06-14 Thread David Radley
Hi everyone,
I have talked with Chesnay and Danny offline. Danny and I were not very happy 
with the passing Maps around, and were looking for a neater design. Chesnay 
suggested that we could move the new format to the Kafka connector, then pass 
the Kafka record down to the deserialize logic so it can make use of the 
headers during deserialization and serialisation.

I think this is a neat idea. This would mean:
- the Kafka connector code would need to be updated to pass down the Kafka 
record
- there would be the Avro Apicurio format and SQL in the kafka repository. We 
feel it is unlikely to want to use the Apicurio registry with files, as the 
Avro format could be used.

Unfortunately I have found that this as not so straight forward to implement as 
the Avro Apicurio format uses the Avro format, which is tied to the 
DeserializationSchema. We were hoping to have a new decoding implementation 
that would pass down the Kafka record rather than the payload. This does not 
appear possible without a Avro format change.


Inspired by this idea, I notice that KafkaValueOnlyRecordDeserializerWrapper 
extends KafkaValueOnlyDeserializerWrapper

Does

deserializer.deserialize(record.topic(),record.value())



I am investigating If I can add a factory/reflection to provide an alternative
Implementation that will pass the record based (the kafka record is not 
serializable so I will pick what we need and deserialize) as a byte array.

I would need to do this 4 times (value ,key for deserialisation and 
serialisation. To do this I would need to convert the record into a byte array, 
so it fits into the existing interface (DeserializationSchema).  I think this 
could be a way through, to avoid using maps and avoid changing the existing 
Avro format and avoid change any core Flink interfaces.

I am going to prototype this idea. WDYT?

My thanks go to Chesnay and Danny for their support and insight around this 
Flip,
   Kind regards, David.






From: David Radley 
Date: Wednesday, 29 May 2024 at 11:39
To: dev@flink.apache.org 
Subject: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
Hi Danny,
Thank you for your feedback on this.

I agree that using maps has pros and cons. The maps are flexible, but do 
require the sender and receiver to know what is in the map.

When you say “That sounds like it would fit in better, I assume we cannot just 
take that approach?” The motivation behind this Flip is to support the headers 
which is the usual way that Apicurio runs. We will support the “schema id in 
the payload” as well.

I agree with you when you say “ I am not 100% happy with the solution but I
cannot offer a better option.” – this is a pragmatic way we have found to solve 
this issue. I am open to any suggestions to improve this as well.

If we are going with the maps design (which is the best we have at the moment) 
; it would be good to have the Flink core changes in base Flink version 2.0 as 
this would mean we do not need to use reflection in a Flink Kafka version 2 
connector to work out if the runtime Flink has the new methods.

At this stage we only have one committer (yourself) backing this. Do you know 
of other 2 committers who would support this Flip?

 Kind regards, David.



From: Danny Cranmer 
Date: Friday, 24 May 2024 at 19:32
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
Hello,

> I am curious what you mean by abused.

I just meant we will end up adding more and more fields to this map over
time, and it may be hard to undo.

> For Apicurio it can be sent at the start of the payload like Confluent
Avro does. Confluent Avro have a magic byte followed by 4 bytes of schema
id, at the start of the payload. Apicurio clients and SerDe libraries can
be configured to not put the schema id in the headers in which case there
is a magic byte followed by an 8 byte schema at the start of the payload.
In the deserialization case, we would not need to look at the headers –
though the encoding is also in the headers.

That sounds like it would fit in better, I assume we cannot just take that
approach?

Thanks for the discussion. I am not 100% happy with the solution but I
cannot offer a better option. I would be interested to hear if others have
any suggestions. Playing devil's advocate against myself, we pass maps
around to configure connectors so it is not too far away from that.

Thanks,
Danny


On Fri, May 24, 2024 at 2:23 PM David Radley 
wrote:

> Hi Danny,
> No worries, thanks for replying. I have working prototype code that is
> being reviewed. It needs some cleaning up and more complete testing before
> it is ready, but will give you the general idea [1][2] to help to assess
> this approach.
>
>
> I am curious what you mean by abused. I guess the approaches are between
> generic map, mechanism vs a more particular more granular things being
> passed that might be used by another connector.
>
> Your first question:
> “how would this work if the schema 

[jira] [Created] (FLINK-35615) CDC3.0, the pipeline of mysql-doris , do not support the MySQL field type timestamp(6)

2024-06-14 Thread Dylan Zhao (Jira)
Dylan Zhao created FLINK-35615:
--

 Summary: CDC3.0, the pipeline of mysql-doris ,  do not support the 
MySQL field type timestamp(6)
 Key: FLINK-35615
 URL: https://issues.apache.org/jira/browse/FLINK-35615
 Project: Flink
  Issue Type: Bug
Reporter: Dylan Zhao


@吕宴全I困溺 大佬,目前CDC3.0doris的pipeline同步,字段类型为timestamp(6)包含精度的字段类型,会报错数组越界的问题。

!https://static.dingtalk.com/media/lQLPJw-K4zrLMjnNA6LNB-Swvk-76DoWxAUGVxnmewT2AA_2020_930.png|width=2020,height=930!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35614) Release Testing Instructions: Verify FLIP-443: Interruptible timers firing

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35614:
---

 Summary: Release Testing Instructions: Verify  FLIP-443: 
Interruptible timers firing 
 Key: FLINK-35614
 URL: https://issues.apache.org/jira/browse/FLINK-35614
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Web Frontend
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-29481



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35613) Release Testing Instructions: Verify [FLIP-451] Introduce timeout configuration to AsyncSink

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35613:
---

 Summary: Release Testing Instructions: Verify [FLIP-451] Introduce 
timeout configuration to AsyncSink
 Key: FLINK-35613
 URL: https://issues.apache.org/jira/browse/FLINK-35613
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Web Frontend
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-29481



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35612) Release Testing Instructions: Verify FLIP-445: Support dynamic parallelism inference for HiveSource

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35612:
---

 Summary: Release Testing Instructions: Verify FLIP-445: Support 
dynamic parallelism inference for HiveSource
 Key: FLINK-35612
 URL: https://issues.apache.org/jira/browse/FLINK-35612
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Web Frontend
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-29481



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35611) Release Testing Instructions: Verify [FLIP-453] Promote Unified Sink API V2 to Public and Deprecate SinkFunction

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35611:
---

 Summary: Release Testing Instructions: Verify [FLIP-453] Promote 
Unified Sink API V2 to Public and Deprecate SinkFunction
 Key: FLINK-35611
 URL: https://issues.apache.org/jira/browse/FLINK-35611
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Web Frontend
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-29481



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35610) Release Testing Instructions: Verify FLIP-448: Introduce Pluggable Workflow Scheduler Interface for Materialized Table

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35610:
---

 Summary: Release Testing Instructions: Verify FLIP-448: Introduce 
Pluggable Workflow Scheduler Interface for Materialized Table
 Key: FLINK-35610
 URL: https://issues.apache.org/jira/browse/FLINK-35610
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Web Frontend
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-29481



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35609) Release Testing Instructions: Verify FLIP-435: Introduce a New Materialized Table for Simplifying Data Pipelines

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35609:
---

 Summary: Release Testing Instructions: Verify FLIP-435: Introduce 
a New Materialized Table for Simplifying Data Pipelines
 Key: FLINK-35609
 URL: https://issues.apache.org/jira/browse/FLINK-35609
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Web Frontend
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-29481



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35608) CLONE - Release Testing Instructions: Verify FLIP-376: Add DISTRIBUTED BY clause

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35608:
---

 Summary: CLONE - Release Testing Instructions: Verify FLIP-376: 
Add DISTRIBUTED BY clause
 Key: FLINK-35608
 URL: https://issues.apache.org/jira/browse/FLINK-35608
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Web Frontend
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-29481



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35607) Release Testing Instructions: Verify FLIP-441: Show the JobType and remove Execution Mode on Flink WebUI

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35607:
---

 Summary: Release Testing Instructions: Verify  FLIP-441: Show the 
JobType and remove Execution Mode on Flink WebUI 
 Key: FLINK-35607
 URL: https://issues.apache.org/jira/browse/FLINK-35607
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Web Frontend
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-29481



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35606) Release Testing Instructions: Verify FLINK-26050 Too many small sst files in rocksdb state backend when using time window created in ascending order

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35606:
---

 Summary: Release Testing Instructions: Verify FLINK-26050 Too many 
small sst files in rocksdb state backend when using time window created in 
ascending order
 Key: FLINK-35606
 URL: https://issues.apache.org/jira/browse/FLINK-35606
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / State Backends
Reporter: Rui Fan
Assignee: Roman Khachatryan
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-26050



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35605) Release Testing Instructions: Verify FLIP-306 Unified File Merging Mechanism for Checkpoints

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35605:
---

 Summary: Release Testing Instructions: Verify FLIP-306 Unified 
File Merging Mechanism for Checkpoints
 Key: FLINK-35605
 URL: https://issues.apache.org/jira/browse/FLINK-35605
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Checkpointing
Reporter: Rui Fan
Assignee: Zakelly Lan
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-32070



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35604) Release Testing Instructions: Verify FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35604:
---

 Summary: Release Testing Instructions: Verify FLIP-383: Support 
Job Recovery from JobMaster Failures for Batch Jobs
 Key: FLINK-35604
 URL: https://issues.apache.org/jira/browse/FLINK-35604
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Network
Reporter: Rui Fan
Assignee: Yuxin Tan
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-35533



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35603) Release Testing Instructions: Verify FLINK-35533(FLIP-459): Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35603:
---

 Summary: Release Testing Instructions: Verify 
FLINK-35533(FLIP-459): Support Flink hybrid shuffle integration with Apache 
Celeborn
 Key: FLINK-35603
 URL: https://issues.apache.org/jira/browse/FLINK-35603
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Network
Reporter: Rui Fan
Assignee: Yuxin Tan


Follow up the test for https://issues.apache.org/jira/browse/FLINK-35533



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35602) [Umbrella] Test Flink Release 1.20

2024-06-14 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35602:
--

 Summary: [Umbrella] Test Flink Release 1.20
 Key: FLINK-35602
 URL: https://issues.apache.org/jira/browse/FLINK-35602
 Project: Flink
  Issue Type: Improvement
  Components: Tests
Affects Versions: 1.20.0
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Re: Re: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join

2024-06-14 Thread weijie guo
Hi all,

Thanks for all the feedback and suggestions so far.

If there is no further comment, we will open the voting thread next monday.

Best regards,

Weijie


weijie guo  于2024年6月14日周五 15:49写道:

> Thanks Lincoln for the quick response.
>
> > Since we've decided to extend a new hint option 'shuffle' to the current
> `LOOKUP` join hint, do we support hash shuffle as well?(It seems like it
> shouldn't require a lot of extra work, right?) This will deliver a
> complete new feature to users,  also because
> FLIP-204 is stale for now and this new extension will give user a more
> simpler way to achieve the goal, WDYT?
>
> Yes, I think this makes more sense.
>
> In a word: If the target dim table does not
> implement SupportsLookupCustomShuffle, the planner will try best to apply
> customer partitioning for the input stream. Otherwise, the planner will try
> best to apply a hash partitioning.
>
> As for FLIP-204, I think we can discuss whether it should be discarded or
> refactored in a separate thread. TBH, I think the current approach can
> completely cover it and be much easier to use.
> > "upsert mode" should be "updating stream" or "non-insert-only stream".
>
> Thanks, updated the FLIP.
>
>
>
> Best regards,
>
> Weijie
>
>
> Lincoln Lee  于2024年6月13日周四 23:08写道:
>
>> Thanks Weijie & Wencong for your update including the conclusions of
>> the offline discussion.
>>
>> There's one thing need to be confirmed in the FLIP:
>> > The hint only provides a suggestion to the optimizer, it is not an
>> enforcer. As a result, If the target dim table not implements
>> SupportsLookupCustomShuffle, planner will ignore this newly introduced
>> shuffle option.
>>
>> Since we've decided to extend a new hint option 'shuffle' to the current
>> `LOOKUP` join hint, do we support hash shuffle as well?(It seems like it
>> shouldn't require a lot of extra work, right?)
>> This will deliver a complete new feature to users,  also because
>> FLIP-204 is stale for now and this new extension will give user a more
>> simpler way to achieve the goal, WDYT?
>>
>> Another small comment for the new interface:
>> > "... planner may not apply this partitioner in upsert mode ..."
>> > default boolean isDeterministic()
>> "upsert mode" should be "updating stream" or "non-insert-only stream".
>>
>>
>> Best,
>> Lincoln Lee
>>
>>
>> Wencong Liu  于2024年6月12日周三 21:43写道:
>>
>> > Hi Jingsong,
>> >
>> >
>> > Some of the points you mentioned are currently clarified in
>> > the updated FLIP. Please check it out.
>> >
>> >
>> > 1. Enabling custom data distribution can be done through the
>> > LOOKUP SQL Hint. There are detailed examples provided in the FLIP.
>> >
>> >
>> > 2. We will add the isDeterministic method to the `InputDataPartitioner`
>> > interface, which will return true by default. If the
>> > `InputDataPartitioner`
>> > is not deterministic, the connector developer need to override the
>> > isDeterministic method to return false. If the connector developer
>> > cannot ensure this protocol, they will need to bear the correctness
>> > issues that arise.
>> >
>> >
>> > 3. Yes, this feature will work in batch mode as well.
>> >
>> >
>> > Best regards,
>> > Wencong
>> >
>> >
>> >
>> >
>> >
>> > At 2024-06-11 23:47:40, "Jingsong Li"  wrote:
>> > >Hi all,
>> > >
>> > >+1 to this FLIP, very thanks all for your proposal.
>> > >
>> > >isDeterministic looks good to me too.
>> > >
>> > >We can consider stating the following points:
>> > >
>> > >1. How to enable custom data distribution? Is it a dynamic hint? Can
>> > >you provide an SQL example.
>> > >
>> > >2. What impact will it have when the mainstream is changelog? Causing
>> > >disorder? This may need to be emphasized.
>> > >
>> > >3. Does this feature work in batch mode too?
>> > >
>> > >Best,
>> > >Jingsong
>> > >
>> > >On Tue, Jun 11, 2024 at 8:22 PM Wencong Liu 
>> wrote:
>> > >>
>> > >> Hi Lincoln,
>> > >>
>> > >>
>> > >> Thanks for your reply. Weijie and I discussed these two issues
>> offline,
>> > >> and here are the results of our discussion:
>> > >> 1. When the user utilizes the hash lookup join hint introduced by
>> > FLIP-204[1],
>> > >> the `SupportsLookupCustomShuffle` interface should be ignored. This
>> is
>> > because
>> > >> the hash lookup join hint is directly specified by the user through a
>> > SQL HINT,
>> > >> which is more in line with user intuition. WDYT?
>> > >> 2. We agree with the introduction of the `isDeterministic` method.
>> The
>> > >> `SupportsLookupCustomShuffle` interface introduces a custom shuffle,
>> > which
>> > >> can cause ADD/UPDATE_AFTER events (+I, +U) to appear
>> > >> after UPDATE_BEFORE/DELETE events (-D, -U), thus breaking the current
>> > >> limitations of the Flink Sink Operator[2]. If `isDeterministic`
>> returns
>> > false and the
>> > >> changelog event type is not insert-only, the Planner should not apply
>> > the shuffle
>> > >> provided by `SupportsLookupCustomShuffle`.
>> > >>
>> > >>
>> > >> [1]
>> >
>> 

Re: Re: Re: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join

2024-06-14 Thread weijie guo
Thanks Lincoln for the quick response.

> Since we've decided to extend a new hint option 'shuffle' to the current
`LOOKUP` join hint, do we support hash shuffle as well?(It seems like it
shouldn't require a lot of extra work, right?) This will deliver a complete
new feature to users,  also because
FLIP-204 is stale for now and this new extension will give user a more
simpler way to achieve the goal, WDYT?

Yes, I think this makes more sense.

In a word: If the target dim table does not
implement SupportsLookupCustomShuffle, the planner will try best to apply
customer partitioning for the input stream. Otherwise, the planner will try
best to apply a hash partitioning.

As for FLIP-204, I think we can discuss whether it should be discarded or
refactored in a separate thread. TBH, I think the current approach can
completely cover it and be much easier to use.
> "upsert mode" should be "updating stream" or "non-insert-only stream".

Thanks, updated the FLIP.



Best regards,

Weijie


Lincoln Lee  于2024年6月13日周四 23:08写道:

> Thanks Weijie & Wencong for your update including the conclusions of
> the offline discussion.
>
> There's one thing need to be confirmed in the FLIP:
> > The hint only provides a suggestion to the optimizer, it is not an
> enforcer. As a result, If the target dim table not implements
> SupportsLookupCustomShuffle, planner will ignore this newly introduced
> shuffle option.
>
> Since we've decided to extend a new hint option 'shuffle' to the current
> `LOOKUP` join hint, do we support hash shuffle as well?(It seems like it
> shouldn't require a lot of extra work, right?)
> This will deliver a complete new feature to users,  also because
> FLIP-204 is stale for now and this new extension will give user a more
> simpler way to achieve the goal, WDYT?
>
> Another small comment for the new interface:
> > "... planner may not apply this partitioner in upsert mode ..."
> > default boolean isDeterministic()
> "upsert mode" should be "updating stream" or "non-insert-only stream".
>
>
> Best,
> Lincoln Lee
>
>
> Wencong Liu  于2024年6月12日周三 21:43写道:
>
> > Hi Jingsong,
> >
> >
> > Some of the points you mentioned are currently clarified in
> > the updated FLIP. Please check it out.
> >
> >
> > 1. Enabling custom data distribution can be done through the
> > LOOKUP SQL Hint. There are detailed examples provided in the FLIP.
> >
> >
> > 2. We will add the isDeterministic method to the `InputDataPartitioner`
> > interface, which will return true by default. If the
> > `InputDataPartitioner`
> > is not deterministic, the connector developer need to override the
> > isDeterministic method to return false. If the connector developer
> > cannot ensure this protocol, they will need to bear the correctness
> > issues that arise.
> >
> >
> > 3. Yes, this feature will work in batch mode as well.
> >
> >
> > Best regards,
> > Wencong
> >
> >
> >
> >
> >
> > At 2024-06-11 23:47:40, "Jingsong Li"  wrote:
> > >Hi all,
> > >
> > >+1 to this FLIP, very thanks all for your proposal.
> > >
> > >isDeterministic looks good to me too.
> > >
> > >We can consider stating the following points:
> > >
> > >1. How to enable custom data distribution? Is it a dynamic hint? Can
> > >you provide an SQL example.
> > >
> > >2. What impact will it have when the mainstream is changelog? Causing
> > >disorder? This may need to be emphasized.
> > >
> > >3. Does this feature work in batch mode too?
> > >
> > >Best,
> > >Jingsong
> > >
> > >On Tue, Jun 11, 2024 at 8:22 PM Wencong Liu 
> wrote:
> > >>
> > >> Hi Lincoln,
> > >>
> > >>
> > >> Thanks for your reply. Weijie and I discussed these two issues
> offline,
> > >> and here are the results of our discussion:
> > >> 1. When the user utilizes the hash lookup join hint introduced by
> > FLIP-204[1],
> > >> the `SupportsLookupCustomShuffle` interface should be ignored. This is
> > because
> > >> the hash lookup join hint is directly specified by the user through a
> > SQL HINT,
> > >> which is more in line with user intuition. WDYT?
> > >> 2. We agree with the introduction of the `isDeterministic` method. The
> > >> `SupportsLookupCustomShuffle` interface introduces a custom shuffle,
> > which
> > >> can cause ADD/UPDATE_AFTER events (+I, +U) to appear
> > >> after UPDATE_BEFORE/DELETE events (-D, -U), thus breaking the current
> > >> limitations of the Flink Sink Operator[2]. If `isDeterministic`
> returns
> > false and the
> > >> changelog event type is not insert-only, the Planner should not apply
> > the shuffle
> > >> provided by `SupportsLookupCustomShuffle`.
> > >>
> > >>
> > >> [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join
> > >> [2]
> >
> https://www.ververica.com/blog/flink-sql-secrets-mastering-the-art-of-changelog-event-out-of-orderness
> > >>
> > >>
> > >> Best,
> > >> Wencong
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> At 2024-06-11 00:02:57, "Lincoln Lee"  wrote:
> > >> >Hi Weijie,
> > >> >
> > >> 

Re: [DISCUSS] Add a JDBC Sink Plugin to Flink-CDC-Pipeline

2024-06-14 Thread Jerry
The content has been edited in FLIP format, but I don't have wiki
permissions and can't create a wiki document.

https://docs.google.com/document/d/1bgKV9Teq8ktHZOAJ7EKzgSqP11Q4ZT2EM8BndOtDgy0/edit

Leonard Xu  于2024年5月21日周二 22:50写道:

> Thanks Jerry for kicking off this thread, the idea makes sense to me, JDBC
> Sink is users’ need and Flink CDC project should support it soon.
>
> Could you share your design docs(FLIP) firstly[1]? And then we can
> continue the design discussion.
>
> Please feel free to ping me if you have any concerns about FLIP process or
> Flink CDC design part.
>
> Best,
> Leonard
> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template
>
>
> > 2024年5月15日 下午3:06,Jerry  写道:
> >
> > Hi all
> > My name is ZhengjunZhou, an user and developer of FlinkCDC. In my recent
> > projects, I realized that we could enhance the capabilities of
> > Flink-CDC-Pipeline by introducing a JDBC Sink plugin, enabling FlinkCDC
> to
> > directly output change data capture (CDC) to various JDBC-supported
> > database systems.
> >
> > Currently, while FlinkCDC offers support for a wide range of data
> sources,
> > there is no direct solution for sinks, especially for common relational
> > databases. I believe that adding a JDBC Sink plugin will significantly
> > boost its applicability in data integration scenarios.
> >
> > Specifically, this plugin would allow users to configure database
> > connections and stream data directly to SQL databases via the standard
> JDBC
> > interface. This could be used for data migration tasks as well as
> real-time
> > data synchronization.
> >
> > To further discuss this proposal and gather feedback from the community,
> I
> > have prepared a preliminary design draft and hope to discuss it in detail
> > in the upcoming community meeting. Please consider the potential value of
> > this feature and provide your insights and guidance.
> >
> > Thank you for your time and consideration. I look forward to your active
> > feedback and further discussion.
> >
> > [1] https://github.com/apache/flink-connector-jdbc
>
>


[jira] [Created] (FLINK-35601) InitOutputPathTest.testErrorOccursUnSynchronized failed due to NoSuchFieldException

2024-06-14 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35601:
--

 Summary: InitOutputPathTest.testErrorOccursUnSynchronized failed 
due to NoSuchFieldException
 Key: FLINK-35601
 URL: https://issues.apache.org/jira/browse/FLINK-35601
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.20.0
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[VOTE] Apache Flink CDC Release 3.1.1, release candidate #0

2024-06-14 Thread Qingsheng Ren
Hi everyone,

Please review and vote on the release candidate #0 for the version 3.1.1 of
Apache Flink CDC, as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

**Release Overview**

As an overview, the release consists of the following:
a) Flink CDC source release to be deployed to dist.apache.org
b) Maven artifacts to be deployed to the Maven Central Repository

**Staging Areas to Review**

The staging areas containing the above mentioned artifacts are as follows,
for your review:
* All artifacts for a) can be found in the corresponding dev repository at
dist.apache.org [1], which are signed with the key with fingerprint
A1BD477F79D036D2C30CA7DBCA8AEEC2F6EB040B [2]
* All artifacts for b) can be found at the Apache Nexus Repository [3]

Other links for your review:
* JIRA release notes [4]
* Source code tag "release-3.1.1-rc0" with commit hash
b163b8e1589184bd7716cf6d002f3472766eb5db [5]
* PR for release announcement blog post of Flink CDC 3.1.1 in flink-web [6]

**Vote Duration**

The voting time will run for at least 72 hours.
It is adopted by majority approval, with at least 3 PMC affirmative votes.

Thanks,
Qingsheng Ren

[1] https://dist.apache.org/repos/dist/dev/flink/flink-cdc-3.1.1-rc0
[2] https://dist.apache.org/repos/dist/release/flink/KEYS
[3] https://repository.apache.org/content/repositories/orgapacheflink-1739
[4]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354763
[5] https://github.com/apache/flink-cdc/releases/tag/release-3.1.1-rc0
[6] https://github.com/apache/flink-web/pull/746


[DISCUSS] FLINK-35571: ProfilingServiceTest improper test isolation

2024-06-14 Thread Grace Grimwood
Hi devs,

I created a ticket FLINK-35571 - ProfilingServiceTest.testRollingDeletion
intermittently fails due to improper test isolation[1] yesterday, and my
colleagues and I were hoping to get some feedback on how best for us to fix
it. We'd like to raise a PR to fix it, just want to be sure we understand
what we're doing and aren't breaking anything in doing so.

To summarise the issue briefly: this test fails sometimes with too many
files in the tempDir used by the ProfilingService instance. This happens
because the instance retrieved by ProfilingServiceTest.setUp is sometimes
an old instance with different configuration that was created as part of
other tests outside this test class. There's more detail on the ticket, but
that's the gist.

My colleagues and I have been looking into this, and we've got a handful of
ideas for things we could change here.
In test code:
  1. Add a call to close any existing instances of ProfilingService in
ProfilingServiceTest.setUp before instantiating its own
  2. Modify ProfilingServiceTest.testRollingDeletion so that it is able to
handle being given a previously-used instance of ProfilingService and any
files it might leave laying around
  3. Modify
ProfilingServiceTest.testProfilingConfigurationWorkingAsExpected (or add
another test) so that it covers cases where there is an existing instance
with different config
  4. Add a close method to TestingMiniCluster which calls close on any
existing ProfilingService (prevents tests using TaskExecutor via
TestingMiniCluster from leaking unclosed instances of ProfilingService)

In application code:
  1. Add a check in ProfilingService.getInstance to ensure the instance
returned is correctly configured
  2. Make ProfilingService.close synchronize on ProfilingService.class to
ensure it cannot race with getInstance
  3. Modify ProfilingService so it is no longer a global singleton

We aren't familiar enough with these components of Flink to know what the
implications are for the issue or our suggested fixes. For example, we
don't think that allowing multiple instances of ProfilingService to exist
would cause issues with the application code (especially if AsyncProfiler
was kept as a singleton), but we don't know for certain because we're not
very familiar with how this all fits together. We would appreciate any
input anyone has on this.

[1] https://issues.apache.org/jira/browse/FLINK-35571

Thanks,
Grace

-- 

Grace Grimwood

She/They

Senior Software Engineer

Red Hat 




[jira] [Created] (FLINK-35600) Data read duplication during the full-to-incremental conversion phase

2024-06-13 Thread Di Wu (Jira)
Di Wu created FLINK-35600:
-

 Summary: Data read duplication during the full-to-incremental 
conversion phase
 Key: FLINK-35600
 URL: https://issues.apache.org/jira/browse/FLINK-35600
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Affects Versions: cdc-3.1.0
Reporter: Di Wu


Assume that the table has been split into 3 Chunks

Timeline
t1: chunk1 is read
t2: a piece of data A belonging to chunk2 is inserted in MySQL
t3: chunk2 is read, and data A has been sent downstream
t4: chunk3 is read

At this time, startOffset will be set to lowwatermark
t5: BinlogSplitReader.pollSplitRecords receives data A, and uses the method 
shouldEmit to determine whether the data is sent downstream

In this method
{code:java}
private boolean hasEnterPureBinlogPhase(TableId tableId, BinlogOffset position) 
{
if (pureBinlogPhaseTables.contains(tableId)) {
return true;
}
// the existed tables those have finished snapshot reading
if (maxSplitHighWatermarkMap.containsKey(tableId)
&& position.isAtOrAfter(maxSplitHighWatermarkMap.get(tableId))) {
pureBinlogPhaseTables.add(tableId);
return true;
}
} {code}

*maxSplitHighWatermarkMap.get(tableId)* obtains the HighWatermark data without 
ts_sec variable, and the default value is 0
*position.isAtOrAfter(maxSplitHighWatermarkMap.get(tableId))*
So this expression is judged as true

*Data A continues to be sent downstream, and the data is repeated*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35599) test

2024-06-13 Thread ZhengJunZhou (Jira)
ZhengJunZhou created FLINK-35599:


 Summary: test
 Key: FLINK-35599
 URL: https://issues.apache.org/jira/browse/FLINK-35599
 Project: Flink
  Issue Type: Bug
Reporter: ZhengJunZhou






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35598) Fix error comparison type ExtendedSqlRowTypeNameSpec#equalsDeep

2024-06-13 Thread Frank Wong (Jira)
Frank Wong created FLINK-35598:
--

 Summary: Fix error comparison type 
ExtendedSqlRowTypeNameSpec#equalsDeep
 Key: FLINK-35598
 URL: https://issues.apache.org/jira/browse/FLINK-35598
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Legacy Planner
Affects Versions: 1.20.0
Reporter: Frank Wong


{code:java}
ExtendedSqlRowTypeNameSpec nameSpec = new ExtendedSqlRowTypeNameSpec(
SqlParserPos.ZERO,
Arrays.asList(
new SqlIdentifier("column1", SqlParserPos.ZERO),
new SqlIdentifier("column2", SqlParserPos.ZERO)),
Arrays.asList(
new SqlDataTypeSpec(new SqlBasicTypeNameSpec(
SqlTypeName.INTEGER,
SqlParserPos.ZERO), SqlParserPos.ZERO),
new SqlDataTypeSpec(new SqlBasicTypeNameSpec(
SqlTypeName.INTEGER,
SqlParserPos.ZERO), SqlParserPos.ZERO)),
Collections.emptyList(), true
);
// Throw exception
nameSpec.equalsDeep(nameSpec, Litmus.THROW);{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35597) Fix unstable LocatableSplitAssignerTest#testConcurrentSplitAssignmentForMultipleHosts

2024-06-13 Thread Yubin Li (Jira)
Yubin Li created FLINK-35597:


 Summary: Fix unstable 
LocatableSplitAssignerTest#testConcurrentSplitAssignmentForMultipleHosts
 Key: FLINK-35597
 URL: https://issues.apache.org/jira/browse/FLINK-35597
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.20.0
Reporter: Yubin Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35596) Flink application fails with The implementation of the BlockElement is not serializable

2024-06-13 Thread Venkata krishnan Sowrirajan (Jira)
Venkata krishnan Sowrirajan created FLINK-35596:
---

 Summary: Flink application fails with The implementation of the 
BlockElement is not serializable
 Key: FLINK-35596
 URL: https://issues.apache.org/jira/browse/FLINK-35596
 Project: Flink
  Issue Type: Bug
Reporter: Venkata krishnan Sowrirajan


 
Flink application fails with 
_org.apache.flink.api.common.InvalidProgramException: The implementation of the 
BlockElement is not serializable. The object probably contains or references 
non serializable fields._

Looks like as part of [[FLINK-33058][formats] Add encoding option to Avro 
format|https://github.com/apache/flink/pull/23395/files#top] new _AvroEncoding_ 
enum is introduced but this also uses the TextElement to format the description 
for Javadocs.

This is internally used in the _AvroRowDataSerializationSchema_ and 
_AvroRowDataDeSerializationSchema_ which needs to be serialized while the 
_BlockElement_ is not serializable.
{code:java}
org.apache.flink.api.common.InvalidProgramException: The implementation of the 
BlockElement is not serializable. The object probably contains or references 
non serializable fields. at 
org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:164) at 
org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:132) at 
org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:132) at 
org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:132) at 
org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:132) at 
org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:69) at 
org.apache.flink.connector.kafka.sink.KafkaSinkBuilder.setRecordSerializer(KafkaSinkBuilder.java:152)
 at 
org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink.getSinkRuntimeProvider(KafkaDynamicSink.java:207)
 at 
org.apache.flink.table.planner.plan.nodes.exec.common.CommonExecSink.createSinkTransformation(CommonExecSink.java:150)
 at 
org.apache.flink.table.planner.plan.nodes.exec.stream.StreamExecSink.translateToPlanInternal(StreamExecSink.java:176)
 at 
org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase.translateToPlan(ExecNodeBase.java:159)
 at 
org.apache.flink.table.planner.delegation.StreamPlanner.$anonfun$translateToPlan$1(StreamPlanner.scala:85)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233) 
at scala.collection.Iterator.foreach(Iterator.scala:937) at 
scala.collection.Iterator.foreach$(Iterator.scala:937) at 
scala.collection.AbstractIterator.foreach(Iterator.scala:1425) at 
scala.collection.IterableLike.foreach(IterableLike.scala:70) at 
scala.collection.IterableLike.foreach$(IterableLike.scala:69) at 
scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
scala.collection.TraversableLike.map(TraversableLike.scala:233) at 
scala.collection.TraversableLike.map$(TraversableLike.scala:226) at 
scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
org.apache.flink.table.planner.delegation.StreamPlanner.translateToPlan(StreamPlanner.scala:84)
 at 
org.apache.flink.table.planner.delegation.PlannerBase.translate(PlannerBase.scala:197)
 at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.translate(TableEnvironmentImpl.java:1733)
 at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:825)
 at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:918)
 at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:730)
 at 
org.apache.flink.streaming.connectors.kafka.table.KafkaTableITCase.testKafkaSourceSink(KafkaTableITCase.java:140)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
at org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) 
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at 

[jira] [Created] (FLINK-35595) MailboxProcessor#processMailsWhenDefaultActionUnavailable is allocation intensive

2024-06-13 Thread David Schlosnagle (Jira)
David Schlosnagle created FLINK-35595:
-

 Summary: MailboxProcessor#processMailsWhenDefaultActionUnavailable 
is allocation intensive
 Key: FLINK-35595
 URL: https://issues.apache.org/jira/browse/FLINK-35595
 Project: Flink
  Issue Type: Improvement
  Components: API / Core
Reporter: David Schlosnagle


While investigating allocation stalls and GC pressure of a Flink streaming 
pipeline, I noticed significant allocations of {{Optional}} in JFRs from 
{{org.apache.flink.streaming.runtime.tasks.mailbox. 
MailboxProcessor#processMailsWhenDefaultActionUnavailable()}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-463: Schema Definition in CREATE TABLE AS Statement

2024-06-13 Thread Jeyhun Karimov
Thanks Sergio and Timo for your answers.
Sounds good to me.
Looking forward for this feature.

Regards,
Jeyhun

On Thu, Jun 13, 2024 at 4:48 PM Sergio Pena 
wrote:

> Sure Yuxia, I just added the support for RTAS statements too.
>
> - Sergio
>
> On Wed, Jun 12, 2024 at 8:22 PM yuxia  wrote:
>
> > Hi, Sergio.
> > Thanks for driving the FLIP. Given we also has REPLACE TABLE AS
> > Statement[1] and it's almost same with CREATE TABLE AS Statement,
> > would you mind also supporting schema definition for REPLACE TABLE AS
> > Statement in this FLIP? It'll be a great to align REPLACE TABLE AS
> Statement
> > to CREATE TABLE AS Statement
> >
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-303%3A+Support+REPLACE+TABLE+AS+SELECT+statement
> >
> > Best regards,
> > Yuxia
> >
> > - 原始邮件 -
> > 发件人: "Timo Walther" 
> > 收件人: "dev" 
> > 发送时间: 星期三, 2024年 6 月 12日 下午 10:19:14
> > 主题: Re: [DISCUSS] FLIP-463: Schema Definition in CREATE TABLE AS
> Statement
> >
> > > I just noticed the CREATE TABLE LIKE statement allows the definition
> >  > of new columns in the CREATE part. The difference
> >  > with this CTAS proposal is that TABLE LIKE appends the new columns at
> >  > the end of the schema instead of adding them
> >  > at the beginning like this proposal and Mysql do.
> >
> > This should be fine. The LIKE rather "extends from" the right table.
> > Whereas the SELECT in CTAS rather extends the left schema definition.
> > Given that "the extended part" is always appended, we could argue that
> > the current CTAS behavior in the FLIP is acceptable.
> >
> >  > If you want to rename a column in the create part, then that column
> >  > position must be in the same position as the query column.
> >  > I didn't like the Postgres approach because it does not let us add
> >  > columns that do not exist in the query schema.
> >
> > Flink offers similar functionality in INSERT INTO. INSERT INTO also
> > allows syntax like: `INSERT INTO (b, c) SELECT * FROM t`. So given that
> > our CTAS can be seen as a CREATE TABLE + INSERT INTO. I would adopt
> > Jeyhun comment in the FLIP. If users don't want to add additional schema
> > parts, the same column reordering should be available as if one would
> > write a INSERT INTO.
> >
> > Regards,
> > Timo
> >
> >
> >
> >
> > On 12.06.24 04:30, Yanquan Lv wrote:
> > > Hi Sergio, thanks for driving it, +1 for this.
> > >
> > > I have some comments:
> > > 1. If we have a source table with primary keys and partition keys
> > defined,
> > > what is the default behavior if PARTITIONED and DISTRIBUTED not
> specified
> > > in the CTAS statement, It should not be inherited by default?
> > > 2. I suggest providing a complete syntax that includes table_properties
> > > like FLIP-218.
> > >
> > >
> > > Sergio Pena  于2024年6月12日周三 03:54写道:
> > >
> > >> I just noticed the CREATE TABLE LIKE statement allows the definition
> of
> > new
> > >> columns in the CREATE part. The difference
> > >> with this CTAS proposal is that TABLE LIKE appends the new columns at
> > the
> > >> end of the schema instead of adding them
> > >> at the beginning like this proposal and Mysql do.
> > >>
> > >>> create table t1(id int, name string);
> >  create table s1(a int, b string) like t1;
> >  describe s1;
> > >>
> > >> +-+---+--++
> > >>> | Column Name | Data Type | Nullable | Extras |
> > >>> +-+---+--++
> > >>> | id  | INT   | NULL ||
> > >>> | name| STRING| NULL ||
> > >>> | a   | INT   | NULL ||
> > >>> | b   | STRING| NULL ||
> > >>> +-+---+--++
> > >>
> > >>
> > >>
> > >> The CREATE TABLE LIKE also does not let the definition of existing
> > columns
> > >> in the CREATE part. The statement fails
> > >> that the column already exists.
> > >>
> > >>> create table t1(id int, name string);
> > >>
> > >>> create table s1(id double) like t1;
> > >>> A column named 'id' already exists in the base table.
> > >>>
> > >>
> > >> What do you guys think of making it similar to the CREATE TABLE LIKE?
> > Seems
> > >> the best approach in order to
> > >> be compatible with it.
> > >>
> > >> - Sergio
> > >>
> > >> On Tue, Jun 11, 2024 at 2:10 PM Sergio Pena 
> > wrote:
> > >>
> > >>> Thanks Timo for answering Jeyhun questions.
> > >>>
> > >>> To add info more about your questions Jeyhun. This proposal is not
> > >>> handling NULL/NOT_NULL types. I noticed that
> > >>> the current CTAS impl. (as Timo said) adds this constraint as part of
> > the
> > >>> resulting schema. And when defining
> > >>> a primary key in the CREATE part, if the resulting schema does not
> > have a
> > >>> NOT NULL in the column then the CTAS
> > >>> will fail. This is similar to the CREATE TABLE LIKE which expects the
> > >> LIKE
> > >>> table to have a NOT NULL column if
> > >>> the user defines a primary key in the CREATE 

Re: [VOTE] FLIP-464: Merge "flink run" and "flink run-application"

2024-06-13 Thread Jeyhun Karimov
Thanks for driving this.
+1 (non-binding)

Regards,
Jeyhun

On Thu, Jun 13, 2024 at 5:23 PM Gabor Somogyi 
wrote:

> +1 (binding)
>
> G
>
>
> On Wed, Jun 12, 2024 at 5:23 PM Ferenc Csaky 
> wrote:
>
> > Hello devs,
> >
> > I would like to start a vote about FLIP-464 [1]. The FLIP is about to
> > merge back the
> > "flink run-application" functionality to "flink run", so the latter will
> > be capable to deploy jobs in
> > all deployment modes. More details in the FLIP. Discussion thread [2].
> >
> > The vote will be open for at least 72 hours (until 2024 March 23 14:03
> > UTC) unless there
> > are any objections or insufficient votes.
> >
> > Thanks,Ferenc
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=311626179
> > [2] https://lists.apache.org/thread/xh58xs0y58kqjmfvd4yor79rv6dlcg5g
>


Re: [VOTE] FLIP-464: Merge "flink run" and "flink run-application"

2024-06-13 Thread Gabor Somogyi
+1 (binding)

G


On Wed, Jun 12, 2024 at 5:23 PM Ferenc Csaky 
wrote:

> Hello devs,
>
> I would like to start a vote about FLIP-464 [1]. The FLIP is about to
> merge back the
> "flink run-application" functionality to "flink run", so the latter will
> be capable to deploy jobs in
> all deployment modes. More details in the FLIP. Discussion thread [2].
>
> The vote will be open for at least 72 hours (until 2024 March 23 14:03
> UTC) unless there
> are any objections or insufficient votes.
>
> Thanks,Ferenc
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=311626179
> [2] https://lists.apache.org/thread/xh58xs0y58kqjmfvd4yor79rv6dlcg5g


Re: Re: Re: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join

2024-06-13 Thread Lincoln Lee
Thanks Weijie & Wencong for your update including the conclusions of
the offline discussion.

There's one thing need to be confirmed in the FLIP:
> The hint only provides a suggestion to the optimizer, it is not an
enforcer. As a result, If the target dim table not implements
SupportsLookupCustomShuffle, planner will ignore this newly introduced
shuffle option.

Since we've decided to extend a new hint option 'shuffle' to the current
`LOOKUP` join hint, do we support hash shuffle as well?(It seems like it
shouldn't require a lot of extra work, right?)
This will deliver a complete new feature to users,  also because
FLIP-204 is stale for now and this new extension will give user a more
simpler way to achieve the goal, WDYT?

Another small comment for the new interface:
> "... planner may not apply this partitioner in upsert mode ..."
> default boolean isDeterministic()
"upsert mode" should be "updating stream" or "non-insert-only stream".


Best,
Lincoln Lee


Wencong Liu  于2024年6月12日周三 21:43写道:

> Hi Jingsong,
>
>
> Some of the points you mentioned are currently clarified in
> the updated FLIP. Please check it out.
>
>
> 1. Enabling custom data distribution can be done through the
> LOOKUP SQL Hint. There are detailed examples provided in the FLIP.
>
>
> 2. We will add the isDeterministic method to the `InputDataPartitioner`
> interface, which will return true by default. If the
> `InputDataPartitioner`
> is not deterministic, the connector developer need to override the
> isDeterministic method to return false. If the connector developer
> cannot ensure this protocol, they will need to bear the correctness
> issues that arise.
>
>
> 3. Yes, this feature will work in batch mode as well.
>
>
> Best regards,
> Wencong
>
>
>
>
>
> At 2024-06-11 23:47:40, "Jingsong Li"  wrote:
> >Hi all,
> >
> >+1 to this FLIP, very thanks all for your proposal.
> >
> >isDeterministic looks good to me too.
> >
> >We can consider stating the following points:
> >
> >1. How to enable custom data distribution? Is it a dynamic hint? Can
> >you provide an SQL example.
> >
> >2. What impact will it have when the mainstream is changelog? Causing
> >disorder? This may need to be emphasized.
> >
> >3. Does this feature work in batch mode too?
> >
> >Best,
> >Jingsong
> >
> >On Tue, Jun 11, 2024 at 8:22 PM Wencong Liu  wrote:
> >>
> >> Hi Lincoln,
> >>
> >>
> >> Thanks for your reply. Weijie and I discussed these two issues offline,
> >> and here are the results of our discussion:
> >> 1. When the user utilizes the hash lookup join hint introduced by
> FLIP-204[1],
> >> the `SupportsLookupCustomShuffle` interface should be ignored. This is
> because
> >> the hash lookup join hint is directly specified by the user through a
> SQL HINT,
> >> which is more in line with user intuition. WDYT?
> >> 2. We agree with the introduction of the `isDeterministic` method. The
> >> `SupportsLookupCustomShuffle` interface introduces a custom shuffle,
> which
> >> can cause ADD/UPDATE_AFTER events (+I, +U) to appear
> >> after UPDATE_BEFORE/DELETE events (-D, -U), thus breaking the current
> >> limitations of the Flink Sink Operator[2]. If `isDeterministic` returns
> false and the
> >> changelog event type is not insert-only, the Planner should not apply
> the shuffle
> >> provided by `SupportsLookupCustomShuffle`.
> >>
> >>
> >> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join
> >> [2]
> https://www.ververica.com/blog/flink-sql-secrets-mastering-the-art-of-changelog-event-out-of-orderness
> >>
> >>
> >> Best,
> >> Wencong
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> At 2024-06-11 00:02:57, "Lincoln Lee"  wrote:
> >> >Hi Weijie,
> >> >
> >> >Thanks for your proposal, this will be a useful advanced optimization
> for
> >> >connector developers!
> >> >
> >> >I have two questions:
> >> >
> >> >1. FLIP-204[1] hash lookup join hint is mentioned in this FLIP, what's
> the
> >> >apply ordering of the two feature? For example, a connector that
> >> >implements the `SupportsLookupCustomShuffle` interface also has a
> >> >`SHUFFLE_HASH` lookup join hint specified by the user in sql, what's
> >> >the expected behavior?
> >> >
> >> >2. This FLIP considers the relationship with NDU processing, and I
> agree
> >> >with the current choice to prioritize NDU first. However, we should
> also
> >> >consider another issue: out-of-orderness of the changelog events in
> >> >streaming[2]. If the connector developer supplies a non-deterministic
> >> >partitioner, e.g., a random partitioner for anti-skew purpose, then
> it'll
> >> >break the assumption relied by current SQL operators in streaming: the
> >> >ADD/UDPATE_AFTER events (+I, +U) always occur before its related
> >> >UDPATE_BEFORE/DELETE events (-D, -U) and they are always
> >> >processed by the same task even if a data shuffle is involved. So a
> >> >straightforward approach would be to add method `isDeterministic` to
> >> >the `InputDataPartitioner` interface to 

Re: [DISCUSS] FLIP-463: Schema Definition in CREATE TABLE AS Statement

2024-06-13 Thread Sergio Pena
Sure Yuxia, I just added the support for RTAS statements too.

- Sergio

On Wed, Jun 12, 2024 at 8:22 PM yuxia  wrote:

> Hi, Sergio.
> Thanks for driving the FLIP. Given we also has REPLACE TABLE AS
> Statement[1] and it's almost same with CREATE TABLE AS Statement,
> would you mind also supporting schema definition for REPLACE TABLE AS
> Statement in this FLIP? It'll be a great to align REPLACE TABLE AS Statement
> to CREATE TABLE AS Statement
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-303%3A+Support+REPLACE+TABLE+AS+SELECT+statement
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "Timo Walther" 
> 收件人: "dev" 
> 发送时间: 星期三, 2024年 6 月 12日 下午 10:19:14
> 主题: Re: [DISCUSS] FLIP-463: Schema Definition in CREATE TABLE AS Statement
>
> > I just noticed the CREATE TABLE LIKE statement allows the definition
>  > of new columns in the CREATE part. The difference
>  > with this CTAS proposal is that TABLE LIKE appends the new columns at
>  > the end of the schema instead of adding them
>  > at the beginning like this proposal and Mysql do.
>
> This should be fine. The LIKE rather "extends from" the right table.
> Whereas the SELECT in CTAS rather extends the left schema definition.
> Given that "the extended part" is always appended, we could argue that
> the current CTAS behavior in the FLIP is acceptable.
>
>  > If you want to rename a column in the create part, then that column
>  > position must be in the same position as the query column.
>  > I didn't like the Postgres approach because it does not let us add
>  > columns that do not exist in the query schema.
>
> Flink offers similar functionality in INSERT INTO. INSERT INTO also
> allows syntax like: `INSERT INTO (b, c) SELECT * FROM t`. So given that
> our CTAS can be seen as a CREATE TABLE + INSERT INTO. I would adopt
> Jeyhun comment in the FLIP. If users don't want to add additional schema
> parts, the same column reordering should be available as if one would
> write a INSERT INTO.
>
> Regards,
> Timo
>
>
>
>
> On 12.06.24 04:30, Yanquan Lv wrote:
> > Hi Sergio, thanks for driving it, +1 for this.
> >
> > I have some comments:
> > 1. If we have a source table with primary keys and partition keys
> defined,
> > what is the default behavior if PARTITIONED and DISTRIBUTED not specified
> > in the CTAS statement, It should not be inherited by default?
> > 2. I suggest providing a complete syntax that includes table_properties
> > like FLIP-218.
> >
> >
> > Sergio Pena  于2024年6月12日周三 03:54写道:
> >
> >> I just noticed the CREATE TABLE LIKE statement allows the definition of
> new
> >> columns in the CREATE part. The difference
> >> with this CTAS proposal is that TABLE LIKE appends the new columns at
> the
> >> end of the schema instead of adding them
> >> at the beginning like this proposal and Mysql do.
> >>
> >>> create table t1(id int, name string);
>  create table s1(a int, b string) like t1;
>  describe s1;
> >>
> >> +-+---+--++
> >>> | Column Name | Data Type | Nullable | Extras |
> >>> +-+---+--++
> >>> | id  | INT   | NULL ||
> >>> | name| STRING| NULL ||
> >>> | a   | INT   | NULL ||
> >>> | b   | STRING| NULL ||
> >>> +-+---+--++
> >>
> >>
> >>
> >> The CREATE TABLE LIKE also does not let the definition of existing
> columns
> >> in the CREATE part. The statement fails
> >> that the column already exists.
> >>
> >>> create table t1(id int, name string);
> >>
> >>> create table s1(id double) like t1;
> >>> A column named 'id' already exists in the base table.
> >>>
> >>
> >> What do you guys think of making it similar to the CREATE TABLE LIKE?
> Seems
> >> the best approach in order to
> >> be compatible with it.
> >>
> >> - Sergio
> >>
> >> On Tue, Jun 11, 2024 at 2:10 PM Sergio Pena 
> wrote:
> >>
> >>> Thanks Timo for answering Jeyhun questions.
> >>>
> >>> To add info more about your questions Jeyhun. This proposal is not
> >>> handling NULL/NOT_NULL types. I noticed that
> >>> the current CTAS impl. (as Timo said) adds this constraint as part of
> the
> >>> resulting schema. And when defining
> >>> a primary key in the CREATE part, if the resulting schema does not
> have a
> >>> NOT NULL in the column then the CTAS
> >>> will fail. This is similar to the CREATE TABLE LIKE which expects the
> >> LIKE
> >>> table to have a NOT NULL column if
> >>> the user defines a primary key in the CREATE part.
> >>>
>  In some cases, redefining the column types might be redundant,
> >> especially
>  when users dont change the column type. A user just wants to change
> the
>  column name from the SELECT clause. Should we also support this
> >> scenario,
>  similar to postgres?
> >>>
> >>> I looked into Postgres too. Postgres matches the columns based on the
> >>> order defined in the create and select part.

[jira] [Created] (FLINK-35594) Downscaling doesn't release TaskManagers.

2024-06-13 Thread Aviv Dozorets (Jira)
Aviv Dozorets created FLINK-35594:
-

 Summary: Downscaling doesn't release TaskManagers.
 Key: FLINK-35594
 URL: https://issues.apache.org/jira/browse/FLINK-35594
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: 1.18.1
 Environment: * Flink 1.18.1 (Java 11, Temurin).
 * Kubernetes Operator 1.8
 * Kubernetes version v1.28.9-eks-036c24b (AWS EKS).

 

Autoscaling configuration:
{code:java}
jobmanager.scheduler: adaptive
job.autoscaler.enabled: "true"
job.autoscaler.metrics.window: 15m
job.autoscaler.stabilization.interval: 15m
job.autoscaler.scaling.effectiveness.threshold: 0.2
job.autoscaler.target.utilization: "0.75"
job.autoscaler.target.utilization.boundary: "0.25"
job.autoscaler.metrics.busy-time.aggregator: "AVG"
job.autoscaler.restart.time-tracking.enabled: "true"{code}
Reporter: Aviv Dozorets
 Attachments: Screenshot 2024-06-10 at 12.50.37 PM.png

(Follow-up of Slack conversation on #troubleshooting channel).

Recently I've observed a behavior, that should be improved:

A Flink DataStream that runs with autoscaler (backed by Kubernetes operator) 
and Adaptive scheduler doesn't release a node (TaskManager) when scaling down. 
In my example job started with initial parallelism of 64, while having 4 TM 
with 16 cores each (1:1 core:slot) and scaled down to 16.

My expectation: 1 TaskManager should be up and running.

Reality: All 4 initial TaskManagers are running, with multiple and unequal 
amount of available slots.

 

Didn't find an existing configuration to change the behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35593) Apache Kubernetes Operator Docker image does not contain Apache LICENSE

2024-06-13 Thread Anupam Aggarwal (Jira)
Anupam Aggarwal created FLINK-35593:
---

 Summary: Apache Kubernetes Operator Docker image does not contain 
Apache LICENSE
 Key: FLINK-35593
 URL: https://issues.apache.org/jira/browse/FLINK-35593
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: 1.8.0
Reporter: Anupam Aggarwal


The Apache 
[LICENSE|https://github.com/apache/flink-kubernetes-operator/blob/main/LICENSE] 
is not bundled along with the Apache Flink Kubernetes Operator docker image.


{code:java}
❯ docker run -it  apache/flink-kubernetes-operator:1.8.0 bash
flink@cc372b31d067:/flink-kubernetes-operator$ ls -latr
total 104732
-rw-r--r-- 1 flink flink     40962 Mar 14 15:19 
flink-kubernetes-standalone-1.8.0.jar
-rw-r--r-- 1 flink flink 107055161 Mar 14 15:21 
flink-kubernetes-operator-1.8.0-shaded.jar
-rw-r--r-- 1 flink flink     62402 Mar 14 15:21 
flink-kubernetes-webhook-1.8.0-shaded.jar
-rw-r--r-- 1 flink flink     63740 Mar 14 15:21 NOTICE
drwxr-xr-x 2 flink flink      4096 Mar 14 15:21 licenses
drwxr-xr-x 1 root  root       4096 Mar 14 15:21 .
drwxr-xr-x 1 root  root       4096 Jun 13 12:49 .. {code}

The Apache Flink docker image by contrast bundles the license (LICENSE)
{code:java}
❯ docker run -it apache/flink:latest bash
sed: can't read /config.yaml: No such file or directory
lflink@24c2dff32a45:~$ ls -latr
total 224
-rw-r--r--  1 flink flink   1309 Mar  4 15:34 README.txt
drwxrwxr-x  2 flink flink   4096 Mar  4 15:34 log
-rw-r--r--  1 flink flink  11357 Mar  4 15:34 LICENSE
drwxrwxr-x  2 flink flink   4096 Mar  7 05:49 lib
drwxrwxr-x  6 flink flink   4096 Mar  7 05:49 examples
drwxrwxr-x  1 flink flink   4096 Mar  7 05:49 conf
drwxrwxr-x  2 flink flink   4096 Mar  7 05:49 bin
drwxrwxr-x 10 flink flink   4096 Mar  7 05:49 plugins
drwxrwxr-x  3 flink flink   4096 Mar  7 05:49 opt
-rw-rw-r--  1 flink flink 156327 Mar  7 05:49 NOTICE
drwxrwxr-x  2 flink flink   4096 Mar  7 05:49 licenses
drwxr-xr-x  1 root  root    4096 Mar 19 05:01 ..
drwxr-xr-x  1 flink flink   4096 Mar 19 05:02 .
flink@24c2dff32a45:~$ {code}




 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35592) MysqlDebeziumTimeConverter miss timezone convert to timestamp

2024-06-13 Thread ZhengYu Chen (Jira)
ZhengYu Chen created FLINK-35592:


 Summary: MysqlDebeziumTimeConverter miss timezone convert to 
timestamp
 Key: FLINK-35592
 URL: https://issues.apache.org/jira/browse/FLINK-35592
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: ZhengYu Chen
 Fix For: cdc-3.1.1


MysqlDebeziumTimeConverter miss timezone convert to timestamp.if use timestamp 
to mmddhhmmss.it will be lost timezone convert



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35591) Azure Pipelines not running for master since c9def981

2024-06-13 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35591:
--

 Summary: Azure Pipelines not running for master since c9def981
 Key: FLINK-35591
 URL: https://issues.apache.org/jira/browse/FLINK-35591
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.20.0
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35589) Support MemorySize type in FlinkCDC ConfigOptions

2024-06-13 Thread LvYanquan (Jira)
LvYanquan created FLINK-35589:
-

 Summary: Support MemorySize type in FlinkCDC ConfigOptions 
 Key: FLINK-35589
 URL: https://issues.apache.org/jira/browse/FLINK-35589
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Affects Versions: cdc-3.2.0
Reporter: LvYanquan
 Fix For: cdc-3.2.0


This allow user to set MemorySize config type like Flink.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35590) Cleanup deprecated options usage in docs about state and checkpoint

2024-06-13 Thread Zakelly Lan (Jira)
Zakelly Lan created FLINK-35590:
---

 Summary: Cleanup deprecated options usage in docs about state and 
checkpoint 
 Key: FLINK-35590
 URL: https://issues.apache.org/jira/browse/FLINK-35590
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.20.0
Reporter: Zakelly Lan
Assignee: Zakelly Lan


Currently, there is remaining usage of deprecated options in docs, such as 
'state.backend', which should be replaced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Connector Externalization Retrospective

2024-06-13 Thread Danny Cranmer
Thanks all for the feedback.

@David

> have a wizard / utility so the user inputs which Flink level they want
and which connectors; the utility knows the compatibility matrix and
downloads the appropriate bundles.

My colleagues developed a Maven plugin [1] that performs static checks.
Something like this might work, however it requires users to actually use
it, and keep it up to date. We could provide a Flink Maven bom or similar
that manages the dependencies on their behalf.

@Xington

> This would allow us to immediately release flink-connectors 1.19.0 right
after flink 1.19.0 is out, excluding connectors that are no longer
compatible with flink 1.19.

While this is convenient, this coupling is something we were trying to
avoid. With this approach we cannot make breaking changes without waiting
for a Flink major release.

@Sergey

> The thing I would suggest: since we already have nightly/weekly jobs for
connectors testing against Flink main repo master branch we could add a
requirement before the release of Flink itself having these job results
also green.

It depends how they get green. If we change the connector code to get it
green, this implies the last connector version does not support the new
Flink version and we require a connector release. This is good information
to have, but still results in a connector release.

@Muhammet

> I have mixed opinion with dropping the Flink version. Usually, large
production migrations happen on Flink versions and users want also
naturally update the connectors compatible for that Flink version.

I agree it is less convenient to have arbitrary versions, however the
version coupling does not scale well for the community. We already follow
the version decoupling strategy for other Flink projects, such as Statefun,
Flink ML and the Flink Kubernetes Operator [2].

@Ahmed

> A question would be what do you think the best approach to when we do
introduce backward compatible changes to connectors API like in this PR, in
this case existing connectors would still work with the newly released
Flink version but would rather accumulate technical debt and removing it
would be an Adhoc task for maintainers which I believe is an accepted
tradeoff but would love to hear the feedback.

I would not change our current process. Create a Jira task to update each
connector, do the work, and then this is included as part of the next
connector release. I am not sure how this is impacted by decoupling
versions or monorepo discussions.

@Chesnay

> We technically can't do this because we don't provide binary
compatibility across minor versions.

We can still perform the same compatibility checks we do today until we
achieve full backwards compatibility. Currently we perform these checks and
then do a heavyweight version release. The new process would be gated by
the compatibility matrix update. Therefore we can still perform the same
compatibility checks, however when the checks pass, the process of updating
the compatibility matrix is much lighter than a release. Once we achieve
full binary compatibility this will increase our confidence and allow an
even more streamlined process. For example, the compatibility matrix might
say "supports 1.19+", rather than "supports 1.19".

> That's the entire reason we did this coupling in the first place, and imo
/we/ shouldn't take a shortcut but still have our users face that very
problem.

I would think of this as an incremental improvement rather than a shortcut.
I agree the user experience is not as nice having to look at a
compatibility matrix vs encoded in the version. However, more timely
connector support outweighs this in my opinion. By gating the compatibility
matrix update by our compatibility checks we can provide the same level of
compatibility guarantees we do today.

> We knew this was gonna by annoying for us; that was intentional and meant
to finally push us towards binary compatibility /guarantees/.

The lag between core and Flink releases is the biggest problem at the
moment. However, it is annoying, yes it is.

@Thomas

> Would it be possible to verify that by running e2e tests of connector
binaries built against an earlier Flink minor version against the latest
Flink minor release candidate as part of the release?

We already build all connectors against the next Flink snapshot in the
nightly/weekly builds. So we can/do get early sight of breaking
changes/incompatibilities.

Thanks,
Danny

[1] https://github.com/awslabs/static-checker-flink
[2] https://flink.apache.org/downloads/


On Tue, Jun 11, 2024 at 8:12 PM Thomas Weise  wrote:

> Thanks for bringing this discussion back.
>
> When we decided to decouple the connectors, we already discussed that we
> will only realize the full benefit when the connectors actually become
> independent from the Flink minor releases. Until that happens we have a ton
> of extra work but limited gain. Based on the assumption that getting to the
> binary compatibility guarantee is our goal - 

[jira] [Created] (FLINK-35588) flink sql redis connector

2024-06-13 Thread cris niu (Jira)
cris niu created FLINK-35588:


 Summary: flink sql redis connector
 Key: FLINK-35588
 URL: https://issues.apache.org/jira/browse/FLINK-35588
 Project: Flink
  Issue Type: New Feature
Reporter: cris niu


flink sql have not redis connector. I think we should develop a sql redis 
connector for our easier development.

I have writen little code about sql redis connector and my thoughts are as 
follows:

 

source:

            1.writing a factory class implement DynamicTableSourceFactory

            2.writing a class implement ScanTableSource and how to wrtie it's 
schema

            3.writing a class extends RichSourceFunction and ResultTypeQueryable

            4.using JAVA SIP mode to use factory class and related code

Finally, redis sink has the same step like source ,but it extends source factory



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35587) Job fails with "The read buffer is null in credit-based input channel" on TPC-DS 10TB benchmark

2024-06-12 Thread Junrui Li (Jira)
Junrui Li created FLINK-35587:
-

 Summary: Job fails with "The read buffer is null in credit-based 
input channel" on TPC-DS 10TB benchmark
 Key: FLINK-35587
 URL: https://issues.apache.org/jira/browse/FLINK-35587
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Affects Versions: 1.20.0
Reporter: Junrui Li


While running TPC-DS 10TB benchmark on the latest master branch locally, I've 
encountered a failure in certain queries, specifically query 75, resulting in 
the error "The read buffer is null in credit-based input channel".

Using a binary search approach, I identified the offending commit as 
FLINK-33668. After reverting FLINK-33668 and subsequent commits, the issue 
disappears. Re-applying FLINK-33668 to the branch re-introduces the error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-464: Merge "flink run" and "flink run-application"

2024-06-12 Thread Yuepeng Pan
+1 (non-binding)

Thanks for driving it!


Best,
Yuepeng Pan
在 2024-06-13 10:47:11,"Zakelly Lan"  写道:
>+1 (binding)
>
>Thanks for driving this!
>
>
>Best,
>Zakelly
>
>On Thu, Jun 13, 2024 at 10:05 AM Junrui Lee  wrote:
>
>> +1 (non-binding)
>>
>> Best,
>> Junrui
>>
>> Biao Geng  于2024年6月13日周四 09:54写道:
>>
>> > Thanks for driving this.
>> > +1 (non-binding)
>> >
>> > Best,
>> > Biao Geng
>> >
>> >
>> > weijie guo  于2024年6月13日周四 09:48写道:
>> >
>> > > Thanks for driving this!
>> > >
>> > > +1(binding)
>> > >
>> > > Best regards,
>> > >
>> > > Weijie
>> > >
>> > >
>> > > Xintong Song  于2024年6月13日周四 09:04写道:
>> > >
>> > > > +1(binding)
>> > > >
>> > > > Best,
>> > > >
>> > > > Xintong
>> > > >
>> > > >
>> > > >
>> > > > On Thu, Jun 13, 2024 at 5:15 AM Márton Balassi <
>> > balassi.mar...@gmail.com
>> > > >
>> > > > wrote:
>> > > >
>> > > > > +1 (binding)
>> > > > >
>> > > > > On Wed, Jun 12, 2024 at 7:25 PM Őrhidi Mátyás <
>> > matyas.orh...@gmail.com
>> > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Sounds reasonable,
>> > > > > > +1
>> > > > > >
>> > > > > > Cheers,
>> > > > > > Matyas
>> > > > > >
>> > > > > >
>> > > > > > On Wed, Jun 12, 2024 at 8:54 AM Mate Czagany > >
>> > > > wrote:
>> > > > > >
>> > > > > > > Hi,
>> > > > > > >
>> > > > > > > Thank you for driving this Ferenc,
>> > > > > > > +1 (non-binding)
>> > > > > > >
>> > > > > > > Regards,
>> > > > > > > Mate
>> > > > > > >
>> > > > > > > Ferenc Csaky  ezt írta (időpont:
>> > 2024.
>> > > > > jún.
>> > > > > > > 12., Sze, 17:23):
>> > > > > > >
>> > > > > > > > Hello devs,
>> > > > > > > >
>> > > > > > > > I would like to start a vote about FLIP-464 [1]. The FLIP is
>> > > about
>> > > > to
>> > > > > > > > merge back the
>> > > > > > > > "flink run-application" functionality to "flink run", so the
>> > > latter
>> > > > > > will
>> > > > > > > > be capable to deploy jobs in
>> > > > > > > > all deployment modes. More details in the FLIP. Discussion
>> > thread
>> > > > > [2].
>> > > > > > > >
>> > > > > > > > The vote will be open for at least 72 hours (until 2024 March
>> > 23
>> > > > > 14:03
>> > > > > > > > UTC) unless there
>> > > > > > > > are any objections or insufficient votes.
>> > > > > > > >
>> > > > > > > > Thanks,Ferenc
>> > > > > > > >
>> > > > > > > > [1]
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=311626179
>> > > > > > > > [2]
>> > > > https://lists.apache.org/thread/xh58xs0y58kqjmfvd4yor79rv6dlcg5g
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>


Re: [VOTE] FLIP-464: Merge "flink run" and "flink run-application"

2024-06-12 Thread Zakelly Lan
+1 (binding)

Thanks for driving this!


Best,
Zakelly

On Thu, Jun 13, 2024 at 10:05 AM Junrui Lee  wrote:

> +1 (non-binding)
>
> Best,
> Junrui
>
> Biao Geng  于2024年6月13日周四 09:54写道:
>
> > Thanks for driving this.
> > +1 (non-binding)
> >
> > Best,
> > Biao Geng
> >
> >
> > weijie guo  于2024年6月13日周四 09:48写道:
> >
> > > Thanks for driving this!
> > >
> > > +1(binding)
> > >
> > > Best regards,
> > >
> > > Weijie
> > >
> > >
> > > Xintong Song  于2024年6月13日周四 09:04写道:
> > >
> > > > +1(binding)
> > > >
> > > > Best,
> > > >
> > > > Xintong
> > > >
> > > >
> > > >
> > > > On Thu, Jun 13, 2024 at 5:15 AM Márton Balassi <
> > balassi.mar...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > On Wed, Jun 12, 2024 at 7:25 PM Őrhidi Mátyás <
> > matyas.orh...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Sounds reasonable,
> > > > > > +1
> > > > > >
> > > > > > Cheers,
> > > > > > Matyas
> > > > > >
> > > > > >
> > > > > > On Wed, Jun 12, 2024 at 8:54 AM Mate Czagany  >
> > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Thank you for driving this Ferenc,
> > > > > > > +1 (non-binding)
> > > > > > >
> > > > > > > Regards,
> > > > > > > Mate
> > > > > > >
> > > > > > > Ferenc Csaky  ezt írta (időpont:
> > 2024.
> > > > > jún.
> > > > > > > 12., Sze, 17:23):
> > > > > > >
> > > > > > > > Hello devs,
> > > > > > > >
> > > > > > > > I would like to start a vote about FLIP-464 [1]. The FLIP is
> > > about
> > > > to
> > > > > > > > merge back the
> > > > > > > > "flink run-application" functionality to "flink run", so the
> > > latter
> > > > > > will
> > > > > > > > be capable to deploy jobs in
> > > > > > > > all deployment modes. More details in the FLIP. Discussion
> > thread
> > > > > [2].
> > > > > > > >
> > > > > > > > The vote will be open for at least 72 hours (until 2024 March
> > 23
> > > > > 14:03
> > > > > > > > UTC) unless there
> > > > > > > > are any objections or insufficient votes.
> > > > > > > >
> > > > > > > > Thanks,Ferenc
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=311626179
> > > > > > > > [2]
> > > > https://lists.apache.org/thread/xh58xs0y58kqjmfvd4yor79rv6dlcg5g
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] FLIP-464: Merge "flink run" and "flink run-application"

2024-06-12 Thread Junrui Lee
+1 (non-binding)

Best,
Junrui

Biao Geng  于2024年6月13日周四 09:54写道:

> Thanks for driving this.
> +1 (non-binding)
>
> Best,
> Biao Geng
>
>
> weijie guo  于2024年6月13日周四 09:48写道:
>
> > Thanks for driving this!
> >
> > +1(binding)
> >
> > Best regards,
> >
> > Weijie
> >
> >
> > Xintong Song  于2024年6月13日周四 09:04写道:
> >
> > > +1(binding)
> > >
> > > Best,
> > >
> > > Xintong
> > >
> > >
> > >
> > > On Thu, Jun 13, 2024 at 5:15 AM Márton Balassi <
> balassi.mar...@gmail.com
> > >
> > > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > On Wed, Jun 12, 2024 at 7:25 PM Őrhidi Mátyás <
> matyas.orh...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Sounds reasonable,
> > > > > +1
> > > > >
> > > > > Cheers,
> > > > > Matyas
> > > > >
> > > > >
> > > > > On Wed, Jun 12, 2024 at 8:54 AM Mate Czagany 
> > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Thank you for driving this Ferenc,
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > Regards,
> > > > > > Mate
> > > > > >
> > > > > > Ferenc Csaky  ezt írta (időpont:
> 2024.
> > > > jún.
> > > > > > 12., Sze, 17:23):
> > > > > >
> > > > > > > Hello devs,
> > > > > > >
> > > > > > > I would like to start a vote about FLIP-464 [1]. The FLIP is
> > about
> > > to
> > > > > > > merge back the
> > > > > > > "flink run-application" functionality to "flink run", so the
> > latter
> > > > > will
> > > > > > > be capable to deploy jobs in
> > > > > > > all deployment modes. More details in the FLIP. Discussion
> thread
> > > > [2].
> > > > > > >
> > > > > > > The vote will be open for at least 72 hours (until 2024 March
> 23
> > > > 14:03
> > > > > > > UTC) unless there
> > > > > > > are any objections or insufficient votes.
> > > > > > >
> > > > > > > Thanks,Ferenc
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=311626179
> > > > > > > [2]
> > > https://lists.apache.org/thread/xh58xs0y58kqjmfvd4yor79rv6dlcg5g
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] FLIP-464: Merge "flink run" and "flink run-application"

2024-06-12 Thread Biao Geng
Thanks for driving this.
+1 (non-binding)

Best,
Biao Geng


weijie guo  于2024年6月13日周四 09:48写道:

> Thanks for driving this!
>
> +1(binding)
>
> Best regards,
>
> Weijie
>
>
> Xintong Song  于2024年6月13日周四 09:04写道:
>
> > +1(binding)
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Thu, Jun 13, 2024 at 5:15 AM Márton Balassi  >
> > wrote:
> >
> > > +1 (binding)
> > >
> > > On Wed, Jun 12, 2024 at 7:25 PM Őrhidi Mátyás  >
> > > wrote:
> > >
> > > > Sounds reasonable,
> > > > +1
> > > >
> > > > Cheers,
> > > > Matyas
> > > >
> > > >
> > > > On Wed, Jun 12, 2024 at 8:54 AM Mate Czagany 
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Thank you for driving this Ferenc,
> > > > > +1 (non-binding)
> > > > >
> > > > > Regards,
> > > > > Mate
> > > > >
> > > > > Ferenc Csaky  ezt írta (időpont: 2024.
> > > jún.
> > > > > 12., Sze, 17:23):
> > > > >
> > > > > > Hello devs,
> > > > > >
> > > > > > I would like to start a vote about FLIP-464 [1]. The FLIP is
> about
> > to
> > > > > > merge back the
> > > > > > "flink run-application" functionality to "flink run", so the
> latter
> > > > will
> > > > > > be capable to deploy jobs in
> > > > > > all deployment modes. More details in the FLIP. Discussion thread
> > > [2].
> > > > > >
> > > > > > The vote will be open for at least 72 hours (until 2024 March 23
> > > 14:03
> > > > > > UTC) unless there
> > > > > > are any objections or insufficient votes.
> > > > > >
> > > > > > Thanks,Ferenc
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=311626179
> > > > > > [2]
> > https://lists.apache.org/thread/xh58xs0y58kqjmfvd4yor79rv6dlcg5g
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] FLIP-464: Merge "flink run" and "flink run-application"

2024-06-12 Thread weijie guo
Thanks for driving this!

+1(binding)

Best regards,

Weijie


Xintong Song  于2024年6月13日周四 09:04写道:

> +1(binding)
>
> Best,
>
> Xintong
>
>
>
> On Thu, Jun 13, 2024 at 5:15 AM Márton Balassi 
> wrote:
>
> > +1 (binding)
> >
> > On Wed, Jun 12, 2024 at 7:25 PM Őrhidi Mátyás 
> > wrote:
> >
> > > Sounds reasonable,
> > > +1
> > >
> > > Cheers,
> > > Matyas
> > >
> > >
> > > On Wed, Jun 12, 2024 at 8:54 AM Mate Czagany 
> wrote:
> > >
> > > > Hi,
> > > >
> > > > Thank you for driving this Ferenc,
> > > > +1 (non-binding)
> > > >
> > > > Regards,
> > > > Mate
> > > >
> > > > Ferenc Csaky  ezt írta (időpont: 2024.
> > jún.
> > > > 12., Sze, 17:23):
> > > >
> > > > > Hello devs,
> > > > >
> > > > > I would like to start a vote about FLIP-464 [1]. The FLIP is about
> to
> > > > > merge back the
> > > > > "flink run-application" functionality to "flink run", so the latter
> > > will
> > > > > be capable to deploy jobs in
> > > > > all deployment modes. More details in the FLIP. Discussion thread
> > [2].
> > > > >
> > > > > The vote will be open for at least 72 hours (until 2024 March 23
> > 14:03
> > > > > UTC) unless there
> > > > > are any objections or insufficient votes.
> > > > >
> > > > > Thanks,Ferenc
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=311626179
> > > > > [2]
> https://lists.apache.org/thread/xh58xs0y58kqjmfvd4yor79rv6dlcg5g
> > > >
> > >
> >
>


Re: Flink Kubernetes Operator 1.9.0 release planning

2024-06-12 Thread Leonard Xu
+1 for the release plan and release manager candidate, thanks Gyula.

Best,
Leonard

> 2024年6月12日 下午11:10,Peter Huang  写道:
> 
> +1 Thanks Gyula for driving this release!
> 
> 
> Best Regards
> Peter Huang
> 
> On Tue, Jun 11, 2024 at 12:28 PM Márton Balassi 
> wrote:
> 
>> +1 for cutting the release and Gyula as the release manager.
>> 
>> On Tue, Jun 11, 2024 at 10:41 AM David Radley 
>> wrote:
>> 
>>> I agree – thanks for driving this Gyula.
>>> 
>>> From: Rui Fan <1996fan...@gmail.com>
>>> Date: Tuesday, 11 June 2024 at 02:52
>>> To: dev@flink.apache.org 
>>> Cc: Mate Czagany 
>>> Subject: [EXTERNAL] Re: Flink Kubernetes Operator 1.9.0 release planning
>>> Thanks Gyula for driving this release!
>>> 
 I suggest we cut the release branch this week after merging current
 outstanding smaller PRs.
>>> 
>>> It makes sense to me.
>>> 
>>> Best,
>>> Rui
>>> 
>>> On Mon, Jun 10, 2024 at 3:05 PM Gyula Fóra  wrote:
>>> 
 Hi all!
 
 I want to kick off the discussion / release process for the Flink
 Kubernetes Operator 1.9.0 version.
 
 The last, 1.8.0, version was released in March and since then we have
>>> had a
 number of important fixes. Furthermore there are some bigger pieces of
 outstanding work in the form of open PRs such as the Savepoint CRD work
 which should only be merged to 1.10.0 to gain more exposure/stability.
 
 I suggest we cut the release branch this week after merging current
 outstanding smaller PRs.
 
 I volunteer as the release manager but if someone else would like to do
>>> it,
 I would also be happy to assist.
 
 Please let me know what you think.
 
 Cheers,
 Gyula
 
>>> 
>>> Unless otherwise stated above:
>>> 
>>> IBM United Kingdom Limited
>>> Registered in England and Wales with number 741598
>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>>> 
>> 



Re: [DISCUSS] FLIP-463: Schema Definition in CREATE TABLE AS Statement

2024-06-12 Thread yuxia
Hi, Sergio.
Thanks for driving the FLIP. Given we also has REPLACE TABLE AS Statement[1] 
and it's almost same with CREATE TABLE AS Statement,
would you mind also supporting schema definition for REPLACE TABLE AS Statement 
in this FLIP? It'll be a great to align REPLACE TABLE AS Statement
to CREATE TABLE AS Statement


[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-303%3A+Support+REPLACE+TABLE+AS+SELECT+statement

Best regards,
Yuxia

- 原始邮件 -
发件人: "Timo Walther" 
收件人: "dev" 
发送时间: 星期三, 2024年 6 月 12日 下午 10:19:14
主题: Re: [DISCUSS] FLIP-463: Schema Definition in CREATE TABLE AS Statement

> I just noticed the CREATE TABLE LIKE statement allows the definition
 > of new columns in the CREATE part. The difference
 > with this CTAS proposal is that TABLE LIKE appends the new columns at
 > the end of the schema instead of adding them
 > at the beginning like this proposal and Mysql do.

This should be fine. The LIKE rather "extends from" the right table. 
Whereas the SELECT in CTAS rather extends the left schema definition. 
Given that "the extended part" is always appended, we could argue that 
the current CTAS behavior in the FLIP is acceptable.

 > If you want to rename a column in the create part, then that column
 > position must be in the same position as the query column.
 > I didn't like the Postgres approach because it does not let us add
 > columns that do not exist in the query schema.

Flink offers similar functionality in INSERT INTO. INSERT INTO also 
allows syntax like: `INSERT INTO (b, c) SELECT * FROM t`. So given that 
our CTAS can be seen as a CREATE TABLE + INSERT INTO. I would adopt 
Jeyhun comment in the FLIP. If users don't want to add additional schema 
parts, the same column reordering should be available as if one would 
write a INSERT INTO.

Regards,
Timo




On 12.06.24 04:30, Yanquan Lv wrote:
> Hi Sergio, thanks for driving it, +1 for this.
> 
> I have some comments:
> 1. If we have a source table with primary keys and partition keys defined,
> what is the default behavior if PARTITIONED and DISTRIBUTED not specified
> in the CTAS statement, It should not be inherited by default?
> 2. I suggest providing a complete syntax that includes table_properties
> like FLIP-218.
> 
> 
> Sergio Pena  于2024年6月12日周三 03:54写道:
> 
>> I just noticed the CREATE TABLE LIKE statement allows the definition of new
>> columns in the CREATE part. The difference
>> with this CTAS proposal is that TABLE LIKE appends the new columns at the
>> end of the schema instead of adding them
>> at the beginning like this proposal and Mysql do.
>>
>>> create table t1(id int, name string);
 create table s1(a int, b string) like t1;
 describe s1;
>>
>> +-+---+--++
>>> | Column Name | Data Type | Nullable | Extras |
>>> +-+---+--++
>>> | id  | INT   | NULL ||
>>> | name| STRING| NULL ||
>>> | a   | INT   | NULL ||
>>> | b   | STRING| NULL ||
>>> +-+---+--++
>>
>>
>>
>> The CREATE TABLE LIKE also does not let the definition of existing columns
>> in the CREATE part. The statement fails
>> that the column already exists.
>>
>>> create table t1(id int, name string);
>>
>>> create table s1(id double) like t1;
>>> A column named 'id' already exists in the base table.
>>>
>>
>> What do you guys think of making it similar to the CREATE TABLE LIKE? Seems
>> the best approach in order to
>> be compatible with it.
>>
>> - Sergio
>>
>> On Tue, Jun 11, 2024 at 2:10 PM Sergio Pena  wrote:
>>
>>> Thanks Timo for answering Jeyhun questions.
>>>
>>> To add info more about your questions Jeyhun. This proposal is not
>>> handling NULL/NOT_NULL types. I noticed that
>>> the current CTAS impl. (as Timo said) adds this constraint as part of the
>>> resulting schema. And when defining
>>> a primary key in the CREATE part, if the resulting schema does not have a
>>> NOT NULL in the column then the CTAS
>>> will fail. This is similar to the CREATE TABLE LIKE which expects the
>> LIKE
>>> table to have a NOT NULL column if
>>> the user defines a primary key in the CREATE part.
>>>
 In some cases, redefining the column types might be redundant,
>> especially
 when users dont change the column type. A user just wants to change the
 column name from the SELECT clause. Should we also support this
>> scenario,
 similar to postgres?
>>>
>>> I looked into Postgres too. Postgres matches the columns based on the
>>> order defined in the create and select part.
>>> If you want to rename a column in the create part, then that column
>>> position must be in the same position as the query column.
>>> I didn't like the Postgres approach because it does not let us add
>> columns
>>> that do not exist in the query schema.
>>>
>>> i.e. query has schema (a int, b string), now the `a` column is renamed to
>>> `id` 

[jira] [Created] (FLINK-35586) Detected conflict when using Paimon as pipeline sink with parallelism > 1

2024-06-12 Thread LvYanquan (Jira)
LvYanquan created FLINK-35586:
-

 Summary: Detected conflict when using Paimon as pipeline sink with 
parallelism > 1
 Key: FLINK-35586
 URL: https://issues.apache.org/jira/browse/FLINK-35586
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Affects Versions: cdc-3.1.0
Reporter: LvYanquan
 Fix For: cdc-3.2.0


When submit FlinkCDC pipeline job using yaml like:
{code:java}
source:
  type: mysql
  name: MySQL Source
  hostname: 127.0.0.1
  port: 3306
  username: root
  password: 123456
  tables: inventory.t1

sink:
  type: paimon
  name: Paimon Sink
  catalog.properties.metastore: filesystem
  catalog.properties.warehouse: /mypath

pipeline:
  name: MySQL to Paimon Pipeline
  parallelism: 2 {code}
I met the following error message: 
{code:java}
Caused by: java.lang.RuntimeException: LSM conflicts detected! Give up 
committing. Conflict files are:, bucket 0, level 5, file 
data-6bcac56a-2df2-4c85-97f2-2db91f6d8099-0.orc, bucket 0, level 5, file 
data-351fd27d-4a65-4354-9ce9-c153ba715569-0.orc {code}
And this will cause the task to constantly restart.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-464: Merge "flink run" and "flink run-application"

2024-06-12 Thread Xintong Song
+1(binding)

Best,

Xintong



On Thu, Jun 13, 2024 at 5:15 AM Márton Balassi 
wrote:

> +1 (binding)
>
> On Wed, Jun 12, 2024 at 7:25 PM Őrhidi Mátyás 
> wrote:
>
> > Sounds reasonable,
> > +1
> >
> > Cheers,
> > Matyas
> >
> >
> > On Wed, Jun 12, 2024 at 8:54 AM Mate Czagany  wrote:
> >
> > > Hi,
> > >
> > > Thank you for driving this Ferenc,
> > > +1 (non-binding)
> > >
> > > Regards,
> > > Mate
> > >
> > > Ferenc Csaky  ezt írta (időpont: 2024.
> jún.
> > > 12., Sze, 17:23):
> > >
> > > > Hello devs,
> > > >
> > > > I would like to start a vote about FLIP-464 [1]. The FLIP is about to
> > > > merge back the
> > > > "flink run-application" functionality to "flink run", so the latter
> > will
> > > > be capable to deploy jobs in
> > > > all deployment modes. More details in the FLIP. Discussion thread
> [2].
> > > >
> > > > The vote will be open for at least 72 hours (until 2024 March 23
> 14:03
> > > > UTC) unless there
> > > > are any objections or insufficient votes.
> > > >
> > > > Thanks,Ferenc
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=311626179
> > > > [2] https://lists.apache.org/thread/xh58xs0y58kqjmfvd4yor79rv6dlcg5g
> > >
> >
>


[jira] [Created] (FLINK-35585) Add documentation for distribution

2024-06-12 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-35585:
--

 Summary: Add documentation for distribution
 Key: FLINK-35585
 URL: https://issues.apache.org/jira/browse/FLINK-35585
 Project: Flink
  Issue Type: Sub-task
Reporter: Jim Hughes


Add documentation for ALTER TABLE, CREATE TABLE, and the sink abilities.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-464: Merge "flink run" and "flink run-application"

2024-06-12 Thread Márton Balassi
+1 (binding)

On Wed, Jun 12, 2024 at 7:25 PM Őrhidi Mátyás 
wrote:

> Sounds reasonable,
> +1
>
> Cheers,
> Matyas
>
>
> On Wed, Jun 12, 2024 at 8:54 AM Mate Czagany  wrote:
>
> > Hi,
> >
> > Thank you for driving this Ferenc,
> > +1 (non-binding)
> >
> > Regards,
> > Mate
> >
> > Ferenc Csaky  ezt írta (időpont: 2024. jún.
> > 12., Sze, 17:23):
> >
> > > Hello devs,
> > >
> > > I would like to start a vote about FLIP-464 [1]. The FLIP is about to
> > > merge back the
> > > "flink run-application" functionality to "flink run", so the latter
> will
> > > be capable to deploy jobs in
> > > all deployment modes. More details in the FLIP. Discussion thread [2].
> > >
> > > The vote will be open for at least 72 hours (until 2024 March 23 14:03
> > > UTC) unless there
> > > are any objections or insufficient votes.
> > >
> > > Thanks,Ferenc
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=311626179
> > > [2] https://lists.apache.org/thread/xh58xs0y58kqjmfvd4yor79rv6dlcg5g
> >
>


Re: [DISCUSS] Proposing an LTS Release for the 1.x Line

2024-06-12 Thread Ufuk Celebi
Hi folks!

1) Big +1 on being strict with limiting the LTS version to patches and no 
feature backports. I think that it would open up a big can of worms. What does 
it even mean to backport a feature to a LTS version? My understanding is that 
we want to designate a single 1.x version as LTS. But if we backport a feature, 
we would technically need to bump the minor version and then we are basically 
back at maintaining features for 1.x. Let me know if I’m misunderstanding 
something.

2) It also seems fine to me to stick to 1.x and 2.x concretely in the FLIP 
since it’ll be our first LTS version and we don’t anticipate (as of today) many 
overlapping LTS versions. It actually makes it clearer to me what we’re talking 
about.

3) 2 years seems like a good starting point to me. If anything, I’m wondering 
whether it should be longer given that there are many users on old versions 
even today. It’s probably best to stick to this number and expand if needed 
down the line (as opposed to starting with a longer duration and then running 
into problems).

Cheers,

Ufuk

> On 11. Jun 2024, at 15:44, Alexander Fedulov  
> wrote:
> 
> Hi Matthias,
> 
> I think we can include this generic semantic into the writeup of the LTS
> definition for the Flink website (last item in the Migration Plan).
> Talking about 1.x and 2.x feels more natural than about N.x and N+1.x - I'd
> prefer not to overcomplicate things here.
> Should the gap before the next major release match this one (eight years),
> it would be appropriate to reconsider the project stance and vote anew if
> required.
> 
> Best,
> Alex
> 
> On Mon, 27 May 2024 at 09:02, Matthias Pohl  wrote:
> 
>> Hi Alex,
>> thanks for creating the FLIP. One question: Is it intentionally done that
>> the FLIP only covers the 1.x LTS version? Why don't we make the FLIP
>> generic to also apply to other major version upgrades?
>> 
>> Best,
>> Matthias
>> 
>> On Sat, May 25, 2024 at 4:55 PM Xintong Song 
>> wrote:
>> 
 
 I think our starting point should be "We don't backport features,
>> unless
 discussed and agreed on the Dev mailing list".
>>> 
>>> 
>>> This makes perfect sense. In general, I think we want to encourage users
>> to
>>> upgrade in order to use the new features. Alternatively, users can also
>>> choose to maintain their own custom patches based on the LTS version. I
>>> personally don't see why backporting new features to old releases is
>>> necessary. In case of exceptions, a mailing list discussion is always a
>>> good direction to go.
>>> 
>>> 
>>> Best,
>>> 
>>> Xintong
>>> 
>>> 
>>> 
>>> On Fri, May 24, 2024 at 9:35 PM David Radley 
>>> wrote:
>>> 
 Hi Martjin and Alex,
 I agree with your summaries, it will be interesting to see what
>> requests
 there might be for back ports.
 Kind regards, David.
 
 
 
 
 From: Alexander Fedulov 
 Date: Friday, 24 May 2024 at 14:07
 To: dev@flink.apache.org 
 Subject: [EXTERNAL] Re: [DISCUSS] Proposing an LTS Release for the 1.x
>>> Line
 @David
> I agree with Martijn that we only put features into version 2. Back
 porting to v1 should not be business as usual for features, only for
 security and stability changes.
 Yep, this choice is explicitly reflected in the FLIP [1]
 
 @Martijn
> I think our starting point should be "We don't backport features,
>>> unless
 discussed and agreed on the Dev mailing list".
 I agree - the baseline is that we do not do that. Only if a very
>>> compelling
 argument is made and the community reaches consensus, exceptions could
 potentially be made, but we should try to avoid them.
 
 [1] https://cwiki.apache.org/confluence/x/BApeEg
 
 Best,
 Alex
 
 On Fri, 24 May 2024 at 14:38, Martijn Visser >> 
 wrote:
 
> Hi David,
> 
>> If there is a maintainer willing to merge backported features to
>> v1,
>>> as
> it is important to some part of the community, this should be
>> allowed,
>>> as
> different parts of the community have different priorities and
>>> timelines,
> 
> I don't think this is a good idea. Backporting a feature can cause
>>> issues
> in other components that might be outside the span of expertise of
>> the
> maintainer that backported said feature, causing the overall
>> stability
>>> to
> be degraded. I think our starting point should be "We don't backport
> features, unless discussed and agreed on the Dev mailing list". That
 still
> opens up the ability to backport features but makes it clear where
>> the
 bar
> lies.
> 
> Best regards,
> 
> Martijn
> 
> On Fri, May 24, 2024 at 11:21 AM David Radley <
>> david_rad...@uk.ibm.com
 
> wrote:
> 
>> Hi,
>> I agree with Martijn that we only put features into version 2. Back
>> porting to v1 should not be business as usual for features, only
>> for
>> security and stability 

Re: Flink Kubernetes Operator 1.9.0 release planning

2024-06-12 Thread Ufuk Celebi
+1 to release 1.9.0

> On 12. Jun 2024, at 17:10, Peter Huang  wrote:
> 
> +1 Thanks Gyula for driving this release!
> 
> 
> Best Regards
> Peter Huang
> 
> On Tue, Jun 11, 2024 at 12:28 PM Márton Balassi 
> wrote:
> 
>> +1 for cutting the release and Gyula as the release manager.
>> 
>> On Tue, Jun 11, 2024 at 10:41 AM David Radley 
>> wrote:
>> 
>>> I agree – thanks for driving this Gyula.
>>> 
>>> From: Rui Fan <1996fan...@gmail.com>
>>> Date: Tuesday, 11 June 2024 at 02:52
>>> To: dev@flink.apache.org 
>>> Cc: Mate Czagany 
>>> Subject: [EXTERNAL] Re: Flink Kubernetes Operator 1.9.0 release planning
>>> Thanks Gyula for driving this release!
>>> 
 I suggest we cut the release branch this week after merging current
 outstanding smaller PRs.
>>> 
>>> It makes sense to me.
>>> 
>>> Best,
>>> Rui
>>> 
>>> On Mon, Jun 10, 2024 at 3:05 PM Gyula Fóra  wrote:
>>> 
 Hi all!
 
 I want to kick off the discussion / release process for the Flink
 Kubernetes Operator 1.9.0 version.
 
 The last, 1.8.0, version was released in March and since then we have
>>> had a
 number of important fixes. Furthermore there are some bigger pieces of
 outstanding work in the form of open PRs such as the Savepoint CRD work
 which should only be merged to 1.10.0 to gain more exposure/stability.
 
 I suggest we cut the release branch this week after merging current
 outstanding smaller PRs.
 
 I volunteer as the release manager but if someone else would like to do
>>> it,
 I would also be happy to assist.
 
 Please let me know what you think.
 
 Cheers,
 Gyula
 
>>> 
>>> Unless otherwise stated above:
>>> 
>>> IBM United Kingdom Limited
>>> Registered in England and Wales with number 741598
>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>>> 
>> 



Re: [VOTE] FLIP-464: Merge "flink run" and "flink run-application"

2024-06-12 Thread Őrhidi Mátyás
Sounds reasonable,
+1

Cheers,
Matyas


On Wed, Jun 12, 2024 at 8:54 AM Mate Czagany  wrote:

> Hi,
>
> Thank you for driving this Ferenc,
> +1 (non-binding)
>
> Regards,
> Mate
>
> Ferenc Csaky  ezt írta (időpont: 2024. jún.
> 12., Sze, 17:23):
>
> > Hello devs,
> >
> > I would like to start a vote about FLIP-464 [1]. The FLIP is about to
> > merge back the
> > "flink run-application" functionality to "flink run", so the latter will
> > be capable to deploy jobs in
> > all deployment modes. More details in the FLIP. Discussion thread [2].
> >
> > The vote will be open for at least 72 hours (until 2024 March 23 14:03
> > UTC) unless there
> > are any objections or insufficient votes.
> >
> > Thanks,Ferenc
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=311626179
> > [2] https://lists.apache.org/thread/xh58xs0y58kqjmfvd4yor79rv6dlcg5g
>


Re: [VOTE] FLIP-464: Merge "flink run" and "flink run-application"

2024-06-12 Thread Mate Czagany
Hi,

Thank you for driving this Ferenc,
+1 (non-binding)

Regards,
Mate

Ferenc Csaky  ezt írta (időpont: 2024. jún.
12., Sze, 17:23):

> Hello devs,
>
> I would like to start a vote about FLIP-464 [1]. The FLIP is about to
> merge back the
> "flink run-application" functionality to "flink run", so the latter will
> be capable to deploy jobs in
> all deployment modes. More details in the FLIP. Discussion thread [2].
>
> The vote will be open for at least 72 hours (until 2024 March 23 14:03
> UTC) unless there
> are any objections or insufficient votes.
>
> Thanks,Ferenc
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=311626179
> [2] https://lists.apache.org/thread/xh58xs0y58kqjmfvd4yor79rv6dlcg5g


[VOTE] FLIP-464: Merge "flink run" and "flink run-application"

2024-06-12 Thread Ferenc Csaky
Hello devs,

I would like to start a vote about FLIP-464 [1]. The FLIP is about to merge 
back the
"flink run-application" functionality to "flink run", so the latter will be 
capable to deploy jobs in
all deployment modes. More details in the FLIP. Discussion thread [2].

The vote will be open for at least 72 hours (until 2024 March 23 14:03 UTC) 
unless there
are any objections or insufficient votes.

Thanks,Ferenc

[1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=311626179
[2] https://lists.apache.org/thread/xh58xs0y58kqjmfvd4yor79rv6dlcg5g

Re: Flink Kubernetes Operator 1.9.0 release planning

2024-06-12 Thread Peter Huang
+1 Thanks Gyula for driving this release!


Best Regards
Peter Huang

On Tue, Jun 11, 2024 at 12:28 PM Márton Balassi 
wrote:

> +1 for cutting the release and Gyula as the release manager.
>
> On Tue, Jun 11, 2024 at 10:41 AM David Radley 
> wrote:
>
> > I agree – thanks for driving this Gyula.
> >
> > From: Rui Fan <1996fan...@gmail.com>
> > Date: Tuesday, 11 June 2024 at 02:52
> > To: dev@flink.apache.org 
> > Cc: Mate Czagany 
> > Subject: [EXTERNAL] Re: Flink Kubernetes Operator 1.9.0 release planning
> > Thanks Gyula for driving this release!
> >
> > > I suggest we cut the release branch this week after merging current
> > > outstanding smaller PRs.
> >
> > It makes sense to me.
> >
> > Best,
> > Rui
> >
> > On Mon, Jun 10, 2024 at 3:05 PM Gyula Fóra  wrote:
> >
> > > Hi all!
> > >
> > > I want to kick off the discussion / release process for the Flink
> > > Kubernetes Operator 1.9.0 version.
> > >
> > > The last, 1.8.0, version was released in March and since then we have
> > had a
> > > number of important fixes. Furthermore there are some bigger pieces of
> > > outstanding work in the form of open PRs such as the Savepoint CRD work
> > > which should only be merged to 1.10.0 to gain more exposure/stability.
> > >
> > > I suggest we cut the release branch this week after merging current
> > > outstanding smaller PRs.
> > >
> > > I volunteer as the release manager but if someone else would like to do
> > it,
> > > I would also be happy to assist.
> > >
> > > Please let me know what you think.
> > >
> > > Cheers,
> > > Gyula
> > >
> >
> > Unless otherwise stated above:
> >
> > IBM United Kingdom Limited
> > Registered in England and Wales with number 741598
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
> >
>


Re: [ANNOUNCE] New Apache Flink PMC Member - Fan Rui

2024-06-12 Thread Peter Huang
Congratulations Rui


Best Regards
Peter Huang

On Tue, Jun 11, 2024 at 10:31 PM 王刚  wrote:

> Congratulations Rui
>
> Best Regards,
> Gang Wang
>
> Jacky Lau  于2024年6月11日周二 13:04写道:
>
> > Congratulations Rui, well deserved!
> >
> > Regards,
> > Jacky Lau
> >
> > Jeyhun Karimov 于2024年6月11日 周二03:49写道:
> >
> > > Congratulations Rui, well deserved!
> > >
> > > Regards,
> > > Jeyhun
> > >
> > > On Mon, Jun 10, 2024, 10:21 Ahmed Hamdy  wrote:
> > >
> > > > Congratulations Rui!
> > > > Best Regards
> > > > Ahmed Hamdy
> > > >
> > > >
> > > > On Mon, 10 Jun 2024 at 09:10, David Radley 
> > > > wrote:
> > > >
> > > > > Congratulations, Rui!
> > > > >
> > > > > From: Sergey Nuyanzin 
> > > > > Date: Sunday, 9 June 2024 at 20:33
> > > > > To: dev@flink.apache.org 
> > > > > Subject: [EXTERNAL] Re: [ANNOUNCE] New Apache Flink PMC Member -
> Fan
> > > Rui
> > > > > Congratulations, Rui!
> > > > >
> > > > > On Fri, Jun 7, 2024 at 5:36 AM Xia Sun 
> wrote:
> > > > >
> > > > > > Congratulations, Rui!
> > > > > >
> > > > > > Best,
> > > > > > Xia
> > > > > >
> > > > > > Paul Lam  于2024年6月6日周四 11:59写道:
> > > > > >
> > > > > > > Congrats, Rui!
> > > > > > >
> > > > > > > Best,
> > > > > > > Paul Lam
> > > > > > >
> > > > > > > > 2024年6月6日 11:02,Junrui Lee  写道:
> > > > > > > >
> > > > > > > > Congratulations, Rui.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Junrui
> > > > > > > >
> > > > > > > > Hang Ruan  于2024年6月6日周四 10:35写道:
> > > > > > > >
> > > > > > > >> Congratulations, Rui!
> > > > > > > >>
> > > > > > > >> Best,
> > > > > > > >> Hang
> > > > > > > >>
> > > > > > > >> Samrat Deb  于2024年6月6日周四 10:28写道:
> > > > > > > >>
> > > > > > > >>> Congratulations Rui
> > > > > > > >>>
> > > > > > > >>> Bests,
> > > > > > > >>> Samrat
> > > > > > > >>>
> > > > > > > >>> On Thu, 6 Jun 2024 at 7:45 AM, Yuxin Tan <
> > > tanyuxinw...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >>>
> > > > > > >  Congratulations, Rui!
> > > > > > > 
> > > > > > >  Best,
> > > > > > >  Yuxin
> > > > > > > 
> > > > > > > 
> > > > > > >  Xuannan Su  于2024年6月6日周四 09:58写道:
> > > > > > > 
> > > > > > > > Congratulations!
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > Xuannan
> > > > > > > >
> > > > > > > > On Thu, Jun 6, 2024 at 9:53 AM Hangxiang Yu <
> > > > master...@gmail.com
> > > > > >
> > > > > > > >>> wrote:
> > > > > > > >>
> > > > > > > >> Congratulations, Rui !
> > > > > > > >>
> > > > > > > >> On Thu, Jun 6, 2024 at 9:18 AM Lincoln Lee <
> > > > > > lincoln.8...@gmail.com
> > > > > > > >>>
> > > > > > > > wrote:
> > > > > > > >>
> > > > > > > >>> Congratulations, Rui!
> > > > > > > >>>
> > > > > > > >>> Best,
> > > > > > > >>> Lincoln Lee
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> Lijie Wang  于2024年6月6日周四
> > > 09:11写道:
> > > > > > > >>>
> > > > > > >  Congratulations, Rui!
> > > > > > > 
> > > > > > >  Best,
> > > > > > >  Lijie
> > > > > > > 
> > > > > > >  Rodrigo Meneses  于2024年6月5日周三
> > > 21:35写道:
> > > > > > > 
> > > > > > > > All the best
> > > > > > > >
> > > > > > > > On Wed, Jun 5, 2024 at 5:56 AM xiangyu feng <
> > > > > > >  xiangyu...@gmail.com>
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > >> Congratulations, Rui!
> > > > > > > >>
> > > > > > > >> Regards,
> > > > > > > >> Xiangyu Feng
> > > > > > > >>
> > > > > > > >> Feng Jin  于2024年6月5日周三
> > 20:42写道:
> > > > > > > >>
> > > > > > > >>> Congratulations, Rui!
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> Best,
> > > > > > > >>> Feng Jin
> > > > > > > >>>
> > > > > > > >>> On Wed, Jun 5, 2024 at 8:23 PM Yanfei Lei <
> > > > > > >  fredia...@gmail.com
> > > > > > > >>
> > > > > > >  wrote:
> > > > > > > >>>
> > > > > > >  Congratulations, Rui!
> > > > > > > 
> > > > > > >  Best,
> > > > > > >  Yanfei
> > > > > > > 
> > > > > > >  Luke Chen  于2024年6月5日周三
> 20:08写道:
> > > > > > > >
> > > > > > > > Congrats, Rui!
> > > > > > > >
> > > > > > > > Luke
> > > > > > > >
> > > > > > > > On Wed, Jun 5, 2024 at 8:02 PM Jiabao Sun <
> > > > > > > >>> jiabao...@apache.org>
> > > > > > > >>> wrote:
> > > > > > > >
> > > > > > > >> Congrats, Rui. Well-deserved!
> > > > > > > >>
> > > > > > > >> Best,
> > > > > > > >> Jiabao
> > > > > > > >>
> > > > > > > >> Zhanghao Chen 
> > > > > > > >>> 于2024年6月5日周三
> > > > > > >  19:29写道:
> > > > > > > >>
> > > > > > > >>> Congrats, Rui!
> > > > > > > >>>
> > > > > > > 

Re: [DISCUSS] FLIP-463: Schema Definition in CREATE TABLE AS Statement

2024-06-12 Thread Timo Walther

> I just noticed the CREATE TABLE LIKE statement allows the definition
> of new columns in the CREATE part. The difference
> with this CTAS proposal is that TABLE LIKE appends the new columns at
> the end of the schema instead of adding them
> at the beginning like this proposal and Mysql do.

This should be fine. The LIKE rather "extends from" the right table. 
Whereas the SELECT in CTAS rather extends the left schema definition. 
Given that "the extended part" is always appended, we could argue that 
the current CTAS behavior in the FLIP is acceptable.


> If you want to rename a column in the create part, then that column
> position must be in the same position as the query column.
> I didn't like the Postgres approach because it does not let us add
> columns that do not exist in the query schema.

Flink offers similar functionality in INSERT INTO. INSERT INTO also 
allows syntax like: `INSERT INTO (b, c) SELECT * FROM t`. So given that 
our CTAS can be seen as a CREATE TABLE + INSERT INTO. I would adopt 
Jeyhun comment in the FLIP. If users don't want to add additional schema 
parts, the same column reordering should be available as if one would 
write a INSERT INTO.


Regards,
Timo




On 12.06.24 04:30, Yanquan Lv wrote:

Hi Sergio, thanks for driving it, +1 for this.

I have some comments:
1. If we have a source table with primary keys and partition keys defined,
what is the default behavior if PARTITIONED and DISTRIBUTED not specified
in the CTAS statement, It should not be inherited by default?
2. I suggest providing a complete syntax that includes table_properties
like FLIP-218.


Sergio Pena  于2024年6月12日周三 03:54写道:


I just noticed the CREATE TABLE LIKE statement allows the definition of new
columns in the CREATE part. The difference
with this CTAS proposal is that TABLE LIKE appends the new columns at the
end of the schema instead of adding them
at the beginning like this proposal and Mysql do.


create table t1(id int, name string);

create table s1(a int, b string) like t1;
describe s1;


+-+---+--++

| Column Name | Data Type | Nullable | Extras |
+-+---+--++
| id  | INT   | NULL ||
| name| STRING| NULL ||
| a   | INT   | NULL ||
| b   | STRING| NULL ||
+-+---+--++




The CREATE TABLE LIKE also does not let the definition of existing columns
in the CREATE part. The statement fails
that the column already exists.


create table t1(id int, name string);



create table s1(id double) like t1;
A column named 'id' already exists in the base table.



What do you guys think of making it similar to the CREATE TABLE LIKE? Seems
the best approach in order to
be compatible with it.

- Sergio

On Tue, Jun 11, 2024 at 2:10 PM Sergio Pena  wrote:


Thanks Timo for answering Jeyhun questions.

To add info more about your questions Jeyhun. This proposal is not
handling NULL/NOT_NULL types. I noticed that
the current CTAS impl. (as Timo said) adds this constraint as part of the
resulting schema. And when defining
a primary key in the CREATE part, if the resulting schema does not have a
NOT NULL in the column then the CTAS
will fail. This is similar to the CREATE TABLE LIKE which expects the

LIKE

table to have a NOT NULL column if
the user defines a primary key in the CREATE part.


In some cases, redefining the column types might be redundant,

especially

when users dont change the column type. A user just wants to change the
column name from the SELECT clause. Should we also support this

scenario,

similar to postgres?


I looked into Postgres too. Postgres matches the columns based on the
order defined in the create and select part.
If you want to rename a column in the create part, then that column
position must be in the same position as the query column.
I didn't like the Postgres approach because it does not let us add

columns

that do not exist in the query schema.

i.e. query has schema (a int, b string), now the `a` column is renamed to
`id` because both are in the same position 0
`create table s1(id int) as select a, b from t1`;
results in: [id int, b string]

I think, if users want to rename then they can use a different alias in
the select part. They could also do explicit casting
for changing the data types, which now makes it redundant (as you said)

to

allow redefining the query columns again. But
perhaps there are cases where explicit casting does not work and just
defining the column would? i.e. making a nullable
type to not null? I couldn't make `cast(c1 as int not null)` to work for
instance, but it may work in the create part?


Could you also mention the casting rules in the FLIP for this case?


I mentioned they're the same as insert/select when doing implicit

casting.

I will search for more info about the insert/select
and add the casting rules in the flip..

- Sergio


On 

Re: [DISCUSS] FLIP-463: Schema Definition in CREATE TABLE AS Statement

2024-06-12 Thread Sergio Pena
Hey Yanquan,
I just added the WITH clause to the CTAS syntax in the flip.

> If we have a source table with primary keys and partition keys defined,
> what is the default behavior if PARTITIONED and DISTRIBUTED not specified
> in the CTAS statement, It should not be inherited by default?

You're right. The default behavior does not infer the primary key from the
query schema.
It only brings column names, column types and nullability constraints. So
the reason to
support the primary key definition in the create part is important.

- Sergio


On Tue, Jun 11, 2024 at 9:32 PM Yanquan Lv  wrote:

> Hi Sergio, thanks for driving it, +1 for this.
>
> I have some comments:
> 1. If we have a source table with primary keys and partition keys defined,
> what is the default behavior if PARTITIONED and DISTRIBUTED not specified
> in the CTAS statement, It should not be inherited by default?
> 2. I suggest providing a complete syntax that includes table_properties
> like FLIP-218.
>
>
> Sergio Pena  于2024年6月12日周三 03:54写道:
>
> > I just noticed the CREATE TABLE LIKE statement allows the definition of
> new
> > columns in the CREATE part. The difference
> > with this CTAS proposal is that TABLE LIKE appends the new columns at the
> > end of the schema instead of adding them
> > at the beginning like this proposal and Mysql do.
> >
> > > create table t1(id int, name string);
> > > > create table s1(a int, b string) like t1;
> > > > describe s1;
> >
> > +-+---+--++
> > > | Column Name | Data Type | Nullable | Extras |
> > > +-+---+--++
> > > | id  | INT   | NULL ||
> > > | name| STRING| NULL ||
> > > | a   | INT   | NULL ||
> > > | b   | STRING| NULL ||
> > > +-+---+--++
> >
> >
> >
> > The CREATE TABLE LIKE also does not let the definition of existing
> columns
> > in the CREATE part. The statement fails
> > that the column already exists.
> >
> > > create table t1(id int, name string);
> >
> > > create table s1(id double) like t1;
> > > A column named 'id' already exists in the base table.
> > >
> >
> > What do you guys think of making it similar to the CREATE TABLE LIKE?
> Seems
> > the best approach in order to
> > be compatible with it.
> >
> > - Sergio
> >
> > On Tue, Jun 11, 2024 at 2:10 PM Sergio Pena  wrote:
> >
> > > Thanks Timo for answering Jeyhun questions.
> > >
> > > To add info more about your questions Jeyhun. This proposal is not
> > > handling NULL/NOT_NULL types. I noticed that
> > > the current CTAS impl. (as Timo said) adds this constraint as part of
> the
> > > resulting schema. And when defining
> > > a primary key in the CREATE part, if the resulting schema does not
> have a
> > > NOT NULL in the column then the CTAS
> > > will fail. This is similar to the CREATE TABLE LIKE which expects the
> > LIKE
> > > table to have a NOT NULL column if
> > > the user defines a primary key in the CREATE part.
> > >
> > > > In some cases, redefining the column types might be redundant,
> > especially
> > > > when users dont change the column type. A user just wants to change
> the
> > > > column name from the SELECT clause. Should we also support this
> > scenario,
> > > > similar to postgres?
> > >
> > > I looked into Postgres too. Postgres matches the columns based on the
> > > order defined in the create and select part.
> > > If you want to rename a column in the create part, then that column
> > > position must be in the same position as the query column.
> > > I didn't like the Postgres approach because it does not let us add
> > columns
> > > that do not exist in the query schema.
> > >
> > > i.e. query has schema (a int, b string), now the `a` column is renamed
> to
> > > `id` because both are in the same position 0
> > > `create table s1(id int) as select a, b from t1`;
> > > results in: [id int, b string]
> > >
> > > I think, if users want to rename then they can use a different alias in
> > > the select part. They could also do explicit casting
> > > for changing the data types, which now makes it redundant (as you said)
> > to
> > > allow redefining the query columns again. But
> > > perhaps there are cases where explicit casting does not work and just
> > > defining the column would? i.e. making a nullable
> > > type to not null? I couldn't make `cast(c1 as int not null)` to work
> for
> > > instance, but it may work in the create part?
> > >
> > > > Could you also mention the casting rules in the FLIP for this case?
> > >
> > > I mentioned they're the same as insert/select when doing implicit
> > casting.
> > > I will search for more info about the insert/select
> > > and add the casting rules in the flip..
> > >
> > > - Sergio
> > >
> > >
> > > On Tue, Jun 11, 2024 at 12:59 AM Timo Walther 
> > wrote:
> > >
> > >> Hi Sergio,
> > >>
> > >> thanks for proposing this FLIP for finalizing the CTAS 

Re:Re: Re: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join

2024-06-12 Thread Wencong Liu
Hi Jingsong,


Some of the points you mentioned are currently clarified in 
the updated FLIP. Please check it out.


1. Enabling custom data distribution can be done through the
LOOKUP SQL Hint. There are detailed examples provided in the FLIP.


2. We will add the isDeterministic method to the `InputDataPartitioner` 
interface, which will return true by default. If the `InputDataPartitioner` 
is not deterministic, the connector developer need to override the 
isDeterministic method to return false. If the connector developer 
cannot ensure this protocol, they will need to bear the correctness 
issues that arise.


3. Yes, this feature will work in batch mode as well.


Best regards,
Wencong





At 2024-06-11 23:47:40, "Jingsong Li"  wrote:
>Hi all,
>
>+1 to this FLIP, very thanks all for your proposal.
>
>isDeterministic looks good to me too.
>
>We can consider stating the following points:
>
>1. How to enable custom data distribution? Is it a dynamic hint? Can
>you provide an SQL example.
>
>2. What impact will it have when the mainstream is changelog? Causing
>disorder? This may need to be emphasized.
>
>3. Does this feature work in batch mode too?
>
>Best,
>Jingsong
>
>On Tue, Jun 11, 2024 at 8:22 PM Wencong Liu  wrote:
>>
>> Hi Lincoln,
>>
>>
>> Thanks for your reply. Weijie and I discussed these two issues offline,
>> and here are the results of our discussion:
>> 1. When the user utilizes the hash lookup join hint introduced by 
>> FLIP-204[1],
>> the `SupportsLookupCustomShuffle` interface should be ignored. This is 
>> because
>> the hash lookup join hint is directly specified by the user through a SQL 
>> HINT,
>> which is more in line with user intuition. WDYT?
>> 2. We agree with the introduction of the `isDeterministic` method. The
>> `SupportsLookupCustomShuffle` interface introduces a custom shuffle, which
>> can cause ADD/UPDATE_AFTER events (+I, +U) to appear
>> after UPDATE_BEFORE/DELETE events (-D, -U), thus breaking the current
>> limitations of the Flink Sink Operator[2]. If `isDeterministic` returns 
>> false and the
>> changelog event type is not insert-only, the Planner should not apply the 
>> shuffle
>> provided by `SupportsLookupCustomShuffle`.
>>
>>
>> [1] 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join
>> [2] 
>> https://www.ververica.com/blog/flink-sql-secrets-mastering-the-art-of-changelog-event-out-of-orderness
>>
>>
>> Best,
>> Wencong
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> At 2024-06-11 00:02:57, "Lincoln Lee"  wrote:
>> >Hi Weijie,
>> >
>> >Thanks for your proposal, this will be a useful advanced optimization for
>> >connector developers!
>> >
>> >I have two questions:
>> >
>> >1. FLIP-204[1] hash lookup join hint is mentioned in this FLIP, what's the
>> >apply ordering of the two feature? For example, a connector that
>> >implements the `SupportsLookupCustomShuffle` interface also has a
>> >`SHUFFLE_HASH` lookup join hint specified by the user in sql, what's
>> >the expected behavior?
>> >
>> >2. This FLIP considers the relationship with NDU processing, and I agree
>> >with the current choice to prioritize NDU first. However, we should also
>> >consider another issue: out-of-orderness of the changelog events in
>> >streaming[2]. If the connector developer supplies a non-deterministic
>> >partitioner, e.g., a random partitioner for anti-skew purpose, then it'll
>> >break the assumption relied by current SQL operators in streaming: the
>> >ADD/UDPATE_AFTER events (+I, +U) always occur before its related
>> >UDPATE_BEFORE/DELETE events (-D, -U) and they are always
>> >processed by the same task even if a data shuffle is involved. So a
>> >straightforward approach would be to add method `isDeterministic` to
>> >the `InputDataPartitioner` interface to explicitly tell the planner whether
>> >the partitioner is deterministic or not(then the planner can reject the
>> >non-deterministic custom partitioner for correctness requirements).
>> >
>> >[1]
>> >https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join
>> >[2]
>> >https://www.ververica.com/blog/flink-sql-secrets-mastering-the-art-of-changelog-event-out-of-orderness
>> >
>> >
>> >Best,
>> >Lincoln Lee
>> >
>> >
>> >Xintong Song  于2024年6月7日周五 13:53写道:
>> >
>> >> +1 for this proposal.
>> >>
>> >> This FLIP will make it possible for each lookup join parallel task to only
>> >> access and cache a subset of the data. This will significantly improve the
>> >> performance and reduce the overhead when using Paimon for the dimension
>> >> table. And it's general enough to also be leveraged by other connectors.
>> >>
>> >> Best,
>> >>
>> >> Xintong
>> >>
>> >>
>> >>
>> >> On Fri, Jun 7, 2024 at 10:01 AM weijie guo 
>> >> wrote:
>> >>
>> >> > Hi devs,
>> >> >
>> >> >
>> >> > I'd like to start a discussion about FLIP-462[1]: Support Custom Data
>> >> > Distribution for Input Stream of Lookup Join.
>> >> >
>> >> >
>> >> > Lookup Join is an important feature in Flink, 

[jira] [Created] (FLINK-35584) Support Java 21 in flink-docker

2024-06-12 Thread Josh England (Jira)
Josh England created FLINK-35584:


 Summary: Support Java 21 in flink-docker
 Key: FLINK-35584
 URL: https://issues.apache.org/jira/browse/FLINK-35584
 Project: Flink
  Issue Type: Improvement
  Components: flink-docker
Reporter: Josh England


Support Java 21. Base images are available for 8, 11 and 17 but since Apache 
flink now supports Java 21 (albeit in Beta) it would be good to have a base 
image for that too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35583) Flinker-connector-jdbc is expected to support DDL synchronization for mysql

2024-06-12 Thread linux (Jira)
linux created FLINK-35583:
-

 Summary: Flinker-connector-jdbc is expected to support DDL 
synchronization for mysql
 Key: FLINK-35583
 URL: https://issues.apache.org/jira/browse/FLINK-35583
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Affects Versions: cdc-3.1.0
Reporter: linux


I strongly hope the flinker-connector-jdbc is expected to support DDL 
synchronization for mysql,This use case is very common and is a feature that 
many users now expect to have,Hope the official can enhance this 
function,thanks a lot!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[RESULT][VOTE] FLIP-459: Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-12 Thread Yuxin Tan
Hi, all

I am happy to say that FLIP-459 Support Flink
hybrid shuffle integration with Apache Celeborn [1]
has been accepted.

There are 11 votes, of which 4 are binding[2].

Xintong Song (binding)
Zhu Zhu (binding)
weijie guo (binding)
Jim Hughes (non-binding)
Jeyhun Karimov (non-binding)
Ahmed Hamdy (non-binding)
Venkatakrishnan Sowrirajan (non-binding)
Junrui Lee (non-binding)
Muhammet Orazov (non-binding)
Rui Fan (binding)
Yuepeng Pan (non-binding)

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-459%3A+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn
[2] https://lists.apache.org/thread/l64ykk3n8c2gc40gjbowt0ozs0x0jmqm

Best,
Yuxin


[jira] [Created] (FLINK-35582) Marking ingestDB as the default recovery mode for rescaling

2024-06-12 Thread Yue Ma (Jira)
Yue Ma created FLINK-35582:
--

 Summary: Marking ingestDB as the default recovery mode for 
rescaling
 Key: FLINK-35582
 URL: https://issues.apache.org/jira/browse/FLINK-35582
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / State Backends
Affects Versions: 2.0.0
Reporter: Yue Ma
 Fix For: 2.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35580) Fix ingestDB recovery mode related bugs

2024-06-12 Thread Yue Ma (Jira)
Yue Ma created FLINK-35580:
--

 Summary: Fix ingestDB recovery mode related bugs
 Key: FLINK-35580
 URL: https://issues.apache.org/jira/browse/FLINK-35580
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / State Backends
Affects Versions: 2.0.0
Reporter: Yue Ma
 Fix For: 2.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35581) Remove comments from the code related to ingestDB

2024-06-12 Thread Yue Ma (Jira)
Yue Ma created FLINK-35581:
--

 Summary: Remove comments from the code related to ingestDB
 Key: FLINK-35581
 URL: https://issues.apache.org/jira/browse/FLINK-35581
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / State Backends
Affects Versions: 2.0.0
Reporter: Yue Ma
 Fix For: 2.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35579) Update the FrocksDB version in FLINK

2024-06-12 Thread Yue Ma (Jira)
Yue Ma created FLINK-35579:
--

 Summary: Update the FrocksDB version in FLINK
 Key: FLINK-35579
 URL: https://issues.apache.org/jira/browse/FLINK-35579
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / State Backends
Affects Versions: 2.0.0
Reporter: Yue Ma
 Fix For: 2.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35578) Release Frocksdb-8.10.0 official products

2024-06-12 Thread Yue Ma (Jira)
Yue Ma created FLINK-35578:
--

 Summary: Release Frocksdb-8.10.0 official products
 Key: FLINK-35578
 URL: https://issues.apache.org/jira/browse/FLINK-35578
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / State Backends
Affects Versions: 2.0.0
Reporter: Yue Ma
 Fix For: 2.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-459: Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-12 Thread Yuxin Tan
Hi all,

Thanks for all your votes, I hereby close the vote and I'll announce
the results in a separate email.

Best,
Yuxin


Zhu Zhu  于2024年6月12日周三 11:15写道:

> +1 (binding)
>
> Thanks,
> Zhu
>
> Yuepeng Pan  于2024年6月11日周二 17:04写道:
>
> > +1 (non-binding)
> >
> > Best regards,
> > Yuepeng Pan
> >
> > At 2024-06-11 16:34:12, "Rui Fan" <1996fan...@gmail.com> wrote:
> > >+1(binding)
> > >
> > >Best,
> > >Rui
> > >
> > >On Tue, Jun 11, 2024 at 4:14 PM Muhammet Orazov
> > > wrote:
> > >
> > >> +1 (non-binding)
> > >>
> > >> Thanks Yuxin for driving this!
> > >>
> > >> Best,
> > >> Muhammet
> > >>
> > >>
> > >> On 2024-06-07 08:02, Yuxin Tan wrote:
> > >> > Hi everyone,
> > >> >
> > >> > Thanks for all the feedback about the FLIP-459 Support Flink
> > >> > hybrid shuffle integration with Apache Celeborn[1].
> > >> > The discussion thread is here [2].
> > >> >
> > >> > I'd like to start a vote for it. The vote will be open for at least
> > >> > 72 hours unless there is an objection or insufficient votes.
> > >> >
> > >> > [1]
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-459%3A+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn
> > >> > [2]
> https://lists.apache.org/thread/gy7sm7qrf7yrv1rl5f4vtk5fo463ts33
> > >> >
> > >> > Best,
> > >> > Yuxin
> > >>
> >
>


[jira] [Created] (FLINK-35577) Setup the CI environment for FRocksDB-8.10

2024-06-12 Thread Yue Ma (Jira)
Yue Ma created FLINK-35577:
--

 Summary: Setup the CI environment for FRocksDB-8.10
 Key: FLINK-35577
 URL: https://issues.apache.org/jira/browse/FLINK-35577
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / State Backends
Affects Versions: 2.0.0
Reporter: Yue Ma
 Fix For: 2.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35576) FRocksdb Cherry pick IngestDB requires commit

2024-06-12 Thread Yue Ma (Jira)
Yue Ma created FLINK-35576:
--

 Summary: FRocksdb Cherry pick IngestDB requires  commit
 Key: FLINK-35576
 URL: https://issues.apache.org/jira/browse/FLINK-35576
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / State Backends
Affects Versions: 2.0.0
Reporter: Yue Ma
 Fix For: 2.0.0


We support the API related to ingest DB in FRocksDb-8.10.0, but many of the 
fixes related to ingest DB were only integrated in the latest RocksDB version. 
So we need to add these fixed commit cherryclicks to FRocksDB.
Mainly include:
[https://github.com/facebook/rocksdb/pull/11646]
[https://github.com/facebook/rocksdb/pull/11868]
[https://github.com/facebook/rocksdb/pull/11811]
[https://github.com/facebook/rocksdb/pull/11381]
[https://github.com/facebook/rocksdb/pull/11379]
[https://github.com/facebook/rocksdb/pull/11378]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35575) FRocksDB supports disabling perf context during compilation

2024-06-12 Thread Yue Ma (Jira)
Yue Ma created FLINK-35575:
--

 Summary: FRocksDB supports disabling perf context during 
compilation
 Key: FLINK-35575
 URL: https://issues.apache.org/jira/browse/FLINK-35575
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / State Backends
Affects Versions: 2.0.0
Reporter: Yue Ma
 Fix For: 2.0.0


In FrocksDB 6 thread-local perf-context is disabled by reverting a specific 
commit (FLINK-19710). However, this creates conflicts and makes upgrading more 
difficult. We found that disabling *PERF_CONTEXT* can improve the performance 
of statebenchmark by about 5% and it doesn't create any conflicts. So we plan 
to supports disabling perf context during compilation in FRocksDB new version



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35574) Set up base branch for FrocksDB-8.10

2024-06-12 Thread Yue Ma (Jira)
Yue Ma created FLINK-35574:
--

 Summary: Set up base branch for FrocksDB-8.10
 Key: FLINK-35574
 URL: https://issues.apache.org/jira/browse/FLINK-35574
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / State Backends
Affects Versions: 2.0.0
Reporter: Yue Ma
 Fix For: 2.0.0


As the first part of FLINK-35573, we need to prepare a base branch for 
FRocksDB-8.10.0 first. Mainly, it needs to be checked out from version 8.10.0 
of the Rocksdb community. Then check pick the commit which used by Flink from 
FRocksDB-6.20.3 to 8.10.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35573) [FLIP-447] Upgrade FRocksDB from 6.20.3 to 8.10.0

2024-06-12 Thread Yue Ma (Jira)
Yue Ma created FLINK-35573:
--

 Summary: [FLIP-447] Upgrade FRocksDB from 6.20.3 to 8.10.0
 Key: FLINK-35573
 URL: https://issues.apache.org/jira/browse/FLINK-35573
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / State Backends
Affects Versions: 2.0.0
Reporter: Yue Ma
 Fix For: 2.0.0


The FLIP: 
[https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0|https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0]
 

*_This FLIP proposes upgrading the version of FRocksDB in the Flink Project 
from 6.20.3 to 8.10.0._*

_RocksDBStateBackend is widely used by Flink users in large state scenarios.The 
last upgrade of FRocksDB was in version Flink-1.14, which mainly supported 
features such as support arm platform, deleteRange API, period compaction, etc. 
It has been a long time since then, and RocksDB has now been released to 
version 8.x. The main motivation for this upgrade is to leverage the features 
of higher versions of Rocksdb to make Flink RocksDBStateBackend more powerful. 
While RocksDB is also continuously optimizing and bug fixing, we hope to keep 
FRocksDB more or less in sync with RocksDB and upgrade it periodically._



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release 1.19.1, release candidate #1

2024-06-12 Thread Hong Liang
Thanks all for the testing and votes.

The RC is approved and this thread is now closed. See results in [1].

[1] https://lists.apache.org/thread/yqr3jv4wr85brnz2ylzqo9pqn453jqvq

Regards,
Hong


On Tue, Jun 11, 2024 at 9:39 AM Hang Ruan  wrote:

> +1(non-binding)
>
> - Verified signatures
> - Verified hashsums
> - Checked Github release tag
> - Source archives with no binary files
> - Reviewed the flink-web PR
> - Checked the jar build with jdk 1.8
>
> Best,
> Hang
>
> gongzhongqiang  于2024年6月11日周二 15:53写道:
>
> > +1(non-binding)
> >
> > - Verified signatures and sha512
> > - Checked Github release tag exsit
> > - Source archives with no binary files
> > - Build the source with jdk8 on ubuntu 22.04 succeed
> > - Reviewed the flink-web PR
> >
> > Best,
> > Zhongqiang Gong
> >
> > Hong Liang  于2024年6月6日周四 23:39写道:
> >
> > > Hi everyone,
> > > Please review and vote on the release candidate #1 for the flink
> v1.19.1,
> > > as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > >
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source release and binary convenience releases to
> > be
> > > deployed to dist.apache.org [2], which are signed with the key with
> > > fingerprint B78A5EA1 [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * source code tag "release-1.19.1-rc1" [5],
> > > * website pull request listing the new release and adding announcement
> > blog
> > > post [6].
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval, with at least 3 PMC affirmative votes.
> > >
> > > Thanks,
> > > Hong
> > >
> > > [1]
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354399
> > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.19.1-rc1/
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > >
> https://repository.apache.org/content/repositories/orgapacheflink-1736/
> > > [5] https://github.com/apache/flink/releases/tag/release-1.19.1-rc1
> > > [6] https://github.com/apache/flink-web/pull/745
> > >
> >
>


[RESULT] [VOTE] Release 1.19.1, release candidate #1

2024-06-12 Thread Hong Liang
Hi all,

I'm happy to announce that we have unanimously approved this release. [1]

There are 11 approving votes, 5 of which are binding:
* Rui Fan (binding)
* Xiqian Yu (non-binding)
* Weijie Guo (binding)
* Jeyhun (non-binding)
* Ahmed Hamdy (non-binding)
* Xintong Song (binding)
* Matthias Pohl (binding)
* Sergey Nuyanzin (non-binding)
* Leonard Xu (binding)
* Zhongqiang Gong (non-binding)
* Hang Ruan (non-binding)

There are no disapproving votes.

Thanks everyone!

[1] https://lists.apache.org/thread/hrptj22y6rjt61flzdzngxdsw134osk4

Regards,
Hong


[jira] [Created] (FLINK-35572) flink db2 cdc default value error

2024-06-12 Thread junxin lai (Jira)
junxin  lai created FLINK-35572:
---

 Summary: flink db2 cdc default value error 
 Key: FLINK-35572
 URL: https://issues.apache.org/jira/browse/FLINK-35572
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Affects Versions: cdc-3.1.0
Reporter: junxin  lai


I am using flink db2-cdc to sync database in real time,but fails to handle 
default values in schema when is making the snapshot.

After digging deeper into the problem, I found that this seems to be a bug in 
debezium and was fixed in 
2.0.0.CR1([https://issues.redhat.com/browse/DBZ-4990]). The latest flink3.1 
uses debezium version 1.9.8.Final. 

The default value is a common configuration in DB2. Is there a way we can 
backport this patch to 1.9.8.Final? 
!https://private-user-images.githubusercontent.com/18555755/338830194-2959745b-0952-4a27-a741-c03d13c47061.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTgxNzUyODYsIm5iZiI6MTcxODE3NDk4NiwicGF0aCI6Ii8xODU1NTc1NS8zMzg4MzAxOTQtMjk1OTc0NWItMDk1Mi00YTI3LWE3NDEtYzAzZDEzYzQ3MDYxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjEyVDA2NDk0NlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPThmMTk2ZWY2ZjRiM2U1MTE5ZDI5NGRiOThmZDBkMTk2ZGQ4YzUwNGZjMzQxNDEwNGExMWNiZmJmMzM2ZmIyYzMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.vehLYcbVqKM-exZU4E_DifFRfmWACAKFD_9Wo1z_0So!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join

2024-06-12 Thread Zhanghao Chen
Thanks for the clarification, that makes sense. +1 for the proposal.

Best,
Zhanghao Chen

From: weijie guo 
Sent: Wednesday, June 12, 2024 14:20
To: dev@flink.apache.org 
Subject: Re: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input 
Stream of Lookup Join

Hi Zhanghao,

Thanks for the reply!

> Could you give a more concrete example in production on when a custom
partitioning strategy will outperform partitioning by key

The key point here is partitioning logic cannot be fully expressed with all
or part of the join key. That is, even if we know which fields are used to
calculate buckets, still have to face the following problem:

1. The mapping from the bucket field to the bucket id is not necessarily
done via hashcode, and even if it is, the hash algorithm may be different
from the one used in Flink. The planner can't know how to do this mapping.
2. In order to get the bucket id, we have to mod the bucket number, but
planner has no notion of bucket number.



Best regards,

Weijie


Zhanghao Chen  于2024年6月12日周三 13:55写道:

> Thanks for driving this, Weijie. Usually, the data distribution of the
> external system is closely related to the keys, e.g. computing the bucket
> index by key hashcode % bucket num, so I'm not sure about how much
> difference there are between partitioning by key and a custom partitioning
> strategy. Could you give a more concrete example in production on when a
> custom partitioning strategy will outperform partitioning by key? Since
> you've mentioned Paimon in doc, maybe an example on Paimon.
>
> Best,
> Zhanghao Chen
> 
> From: weijie guo 
> Sent: Friday, June 7, 2024 9:59
> To: dev 
> Subject: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input
> Stream of Lookup Join
>
> Hi devs,
>
>
> I'd like to start a discussion about FLIP-462[1]: Support Custom Data
> Distribution for Input Stream of Lookup Join.
>
>
> Lookup Join is an important feature in Flink, It is typically used to
> enrich a table with data that is queried from an external system.
> If we interact with the external systems for each incoming record, we
> incur significant network IO and RPC overhead.
>
> Therefore, most connectors introduce caching to reduce the per-record
> level query overhead. However, because the data distribution of Lookup
> Join's input stream is arbitrary, the cache hit rate is sometimes
> unsatisfactory.
>
>
> We want to introduce a mechanism for the connector to tell the Flink
> planner its desired input stream data distribution or partitioning
> strategy. This can significantly reduce the amount of cached data and
> improve performance of Lookup Join.
>
>
> You can find more details in this FLIP[1]. Looking forward to hearing
> from you, thanks!
>
>
> Best regards,
>
> Weijie
>
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-462+Support+Custom+Data+Distribution+for+Input+Stream+of+Lookup+Join
>


[jira] [Created] (FLINK-35571) ProfilingServiceTest.testRollingDeletion intermittently fails due to improper test isolation

2024-06-12 Thread Grace Grimwood (Jira)
Grace Grimwood created FLINK-35571:
--

 Summary: ProfilingServiceTest.testRollingDeletion intermittently 
fails due to improper test isolation
 Key: FLINK-35571
 URL: https://issues.apache.org/jira/browse/FLINK-35571
 Project: Flink
  Issue Type: Bug
  Components: Tests
 Environment: *Git revision:*
{code:bash}
$ git show
commit b8d527166e095653ae3ff5c0431bf27297efe229 (HEAD -> master)
{code}

*Java info:*
{code:bash}
$ java -version
openjdk version "17.0.11" 2024-04-16
OpenJDK Runtime Environment Temurin-17.0.11+9 (build 17.0.11+9)
OpenJDK 64-Bit Server VM Temurin-17.0.11+9 (build 17.0.11+9, mixed mode)
{code}

{code:bash}
$ sdk current
Using:
java: 17.0.11-tem
maven: 3.8.6
scala: 2.12.19
{code}

*OS info:*
{code:bash}
$ uname -av
Darwin MacBook-Pro 23.5.0 Darwin Kernel Version 23.5.0: Wed May  1 20:14:38 PDT 
2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6020 arm64
{code}

*Hardware info:*
{code:bash}
$ sysctl -a | grep -e 'machdep\.cpu\.brand_string\:' -e 
'machdep\.cpu\.core_count\:' -e 'hw\.memsize\:'
hw.memsize: 34359738368
machdep.cpu.core_count: 12
machdep.cpu.brand_string: Apple M2 Pro
{code}
Reporter: Grace Grimwood
 Attachments: 
20240612_181148_mvn-clean-package_flink-runtime_also-make.log

*Symptom:*
The test *{{ProfilingServiceTest.testRollingDeletion}}* fails with the 
following error:
{code:java}
[ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 25.32 s 
<<< FAILURE! -- in org.apache.flink.runtime.util.profiler.ProfilingServiceTest
[ERROR] 
org.apache.flink.runtime.util.profiler.ProfilingServiceTest.testRollingDeletion 
-- Time elapsed: 9.264 s <<< FAILURE!
org.opentest4j.AssertionFailedError: expected: <3> but was: <6>
at 
org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at 
org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at 
org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
at 
org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)
at 
org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:145)
at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:531)
at 
org.apache.flink.runtime.util.profiler.ProfilingServiceTest.verifyRollingDeletionWorks(ProfilingServiceTest.java:175)
at 
org.apache.flink.runtime.util.profiler.ProfilingServiceTest.testRollingDeletion(ProfilingServiceTest.java:117)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at 
java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:194)
at 
java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
at 
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
at 
java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
at 
java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
at 
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
{code}
The number of extra files found varies from failure to failure.

*Cause:*
Many of the tests in *{{ProfilingServiceTest}}* rely on a specific 
configuration of the *{{ProfilingService}}* instance, but 
*{{ProfilingService.getInstance}}* does not check whether an existing 
instance's config matches the provided config before returning it. Because of 
this, and because JUnit does not guarantee a specific ordering of tests (unless 
they are specifically annotated), it is possible for these tests to receive an 
instance that does not behave in the expected way and therefore fail.

*Analysis:*
In troubleshooting the test failure, we tried adding an extra assertion to 
*{{ProfilingServiceTest.setUp}}* to validate the directories being written to 
were correct:
{code:java}
Assertions.assertEquals(tempDir.toString(), 
profilingService.getProfilingResultDir());
{code}
That assert produced the following failure:
{code:java}
org.opentest4j.AssertionFailedError: expected: 
 but 
was: 
{code}
This failure shows that the *{{ProfilingService}}* returned by 
*{{ProfilingService.getInstance}}* in the setup is not using the correct 
directory, and therefore cannot be the correct instance for this test class 
because it has the wrong config.

This is because the static method *{{ProfilingService.getInstance}}* attempts 
to reuse any existing instance of *{{ProfilingService}}* before it creates a 
new one and disregards any differences in config in doing so, which means that 
if another test instantiates a *{{ProfilingService}}* with different config 
first and does not close it, that previous instance will be provided to 
*{{ProfilingServiceTest}}* rather than the new instance those tests seem to 
expect. This only happens with the first test run in 

Re: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join

2024-06-12 Thread weijie guo
Hi Zhanghao,

Thanks for the reply!

> Could you give a more concrete example in production on when a custom
partitioning strategy will outperform partitioning by key

The key point here is partitioning logic cannot be fully expressed with all
or part of the join key. That is, even if we know which fields are used to
calculate buckets, still have to face the following problem:

1. The mapping from the bucket field to the bucket id is not necessarily
done via hashcode, and even if it is, the hash algorithm may be different
from the one used in Flink. The planner can't know how to do this mapping.
2. In order to get the bucket id, we have to mod the bucket number, but
planner has no notion of bucket number.



Best regards,

Weijie


Zhanghao Chen  于2024年6月12日周三 13:55写道:

> Thanks for driving this, Weijie. Usually, the data distribution of the
> external system is closely related to the keys, e.g. computing the bucket
> index by key hashcode % bucket num, so I'm not sure about how much
> difference there are between partitioning by key and a custom partitioning
> strategy. Could you give a more concrete example in production on when a
> custom partitioning strategy will outperform partitioning by key? Since
> you've mentioned Paimon in doc, maybe an example on Paimon.
>
> Best,
> Zhanghao Chen
> 
> From: weijie guo 
> Sent: Friday, June 7, 2024 9:59
> To: dev 
> Subject: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input
> Stream of Lookup Join
>
> Hi devs,
>
>
> I'd like to start a discussion about FLIP-462[1]: Support Custom Data
> Distribution for Input Stream of Lookup Join.
>
>
> Lookup Join is an important feature in Flink, It is typically used to
> enrich a table with data that is queried from an external system.
> If we interact with the external systems for each incoming record, we
> incur significant network IO and RPC overhead.
>
> Therefore, most connectors introduce caching to reduce the per-record
> level query overhead. However, because the data distribution of Lookup
> Join's input stream is arbitrary, the cache hit rate is sometimes
> unsatisfactory.
>
>
> We want to introduce a mechanism for the connector to tell the Flink
> planner its desired input stream data distribution or partitioning
> strategy. This can significantly reduce the amount of cached data and
> improve performance of Lookup Join.
>
>
> You can find more details in this FLIP[1]. Looking forward to hearing
> from you, thanks!
>
>
> Best regards,
>
> Weijie
>
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-462+Support+Custom+Data+Distribution+for+Input+Stream+of+Lookup+Join
>


Re: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join

2024-06-11 Thread Zhanghao Chen
Thanks for driving this, Weijie. Usually, the data distribution of the external 
system is closely related to the keys, e.g. computing the bucket index by key 
hashcode % bucket num, so I'm not sure about how much difference there are 
between partitioning by key and a custom partitioning strategy. Could you give 
a more concrete example in production on when a custom partitioning strategy 
will outperform partitioning by key? Since you've mentioned Paimon in doc, 
maybe an example on Paimon.

Best,
Zhanghao Chen

From: weijie guo 
Sent: Friday, June 7, 2024 9:59
To: dev 
Subject: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream 
of Lookup Join

Hi devs,


I'd like to start a discussion about FLIP-462[1]: Support Custom Data
Distribution for Input Stream of Lookup Join.


Lookup Join is an important feature in Flink, It is typically used to
enrich a table with data that is queried from an external system.
If we interact with the external systems for each incoming record, we
incur significant network IO and RPC overhead.

Therefore, most connectors introduce caching to reduce the per-record
level query overhead. However, because the data distribution of Lookup
Join's input stream is arbitrary, the cache hit rate is sometimes
unsatisfactory.


We want to introduce a mechanism for the connector to tell the Flink
planner its desired input stream data distribution or partitioning
strategy. This can significantly reduce the amount of cached data and
improve performance of Lookup Join.


You can find more details in this FLIP[1]. Looking forward to hearing
from you, thanks!


Best regards,

Weijie


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-462+Support+Custom+Data+Distribution+for+Input+Stream+of+Lookup+Join


Re: [ANNOUNCE] New Apache Flink PMC Member - Fan Rui

2024-06-11 Thread 王刚
Congratulations Rui

Best Regards,
Gang Wang

Jacky Lau  于2024年6月11日周二 13:04写道:

> Congratulations Rui, well deserved!
>
> Regards,
> Jacky Lau
>
> Jeyhun Karimov 于2024年6月11日 周二03:49写道:
>
> > Congratulations Rui, well deserved!
> >
> > Regards,
> > Jeyhun
> >
> > On Mon, Jun 10, 2024, 10:21 Ahmed Hamdy  wrote:
> >
> > > Congratulations Rui!
> > > Best Regards
> > > Ahmed Hamdy
> > >
> > >
> > > On Mon, 10 Jun 2024 at 09:10, David Radley 
> > > wrote:
> > >
> > > > Congratulations, Rui!
> > > >
> > > > From: Sergey Nuyanzin 
> > > > Date: Sunday, 9 June 2024 at 20:33
> > > > To: dev@flink.apache.org 
> > > > Subject: [EXTERNAL] Re: [ANNOUNCE] New Apache Flink PMC Member - Fan
> > Rui
> > > > Congratulations, Rui!
> > > >
> > > > On Fri, Jun 7, 2024 at 5:36 AM Xia Sun  wrote:
> > > >
> > > > > Congratulations, Rui!
> > > > >
> > > > > Best,
> > > > > Xia
> > > > >
> > > > > Paul Lam  于2024年6月6日周四 11:59写道:
> > > > >
> > > > > > Congrats, Rui!
> > > > > >
> > > > > > Best,
> > > > > > Paul Lam
> > > > > >
> > > > > > > 2024年6月6日 11:02,Junrui Lee  写道:
> > > > > > >
> > > > > > > Congratulations, Rui.
> > > > > > >
> > > > > > > Best,
> > > > > > > Junrui
> > > > > > >
> > > > > > > Hang Ruan  于2024年6月6日周四 10:35写道:
> > > > > > >
> > > > > > >> Congratulations, Rui!
> > > > > > >>
> > > > > > >> Best,
> > > > > > >> Hang
> > > > > > >>
> > > > > > >> Samrat Deb  于2024年6月6日周四 10:28写道:
> > > > > > >>
> > > > > > >>> Congratulations Rui
> > > > > > >>>
> > > > > > >>> Bests,
> > > > > > >>> Samrat
> > > > > > >>>
> > > > > > >>> On Thu, 6 Jun 2024 at 7:45 AM, Yuxin Tan <
> > tanyuxinw...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >>>
> > > > > >  Congratulations, Rui!
> > > > > > 
> > > > > >  Best,
> > > > > >  Yuxin
> > > > > > 
> > > > > > 
> > > > > >  Xuannan Su  于2024年6月6日周四 09:58写道:
> > > > > > 
> > > > > > > Congratulations!
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Xuannan
> > > > > > >
> > > > > > > On Thu, Jun 6, 2024 at 9:53 AM Hangxiang Yu <
> > > master...@gmail.com
> > > > >
> > > > > > >>> wrote:
> > > > > > >>
> > > > > > >> Congratulations, Rui !
> > > > > > >>
> > > > > > >> On Thu, Jun 6, 2024 at 9:18 AM Lincoln Lee <
> > > > > lincoln.8...@gmail.com
> > > > > > >>>
> > > > > > > wrote:
> > > > > > >>
> > > > > > >>> Congratulations, Rui!
> > > > > > >>>
> > > > > > >>> Best,
> > > > > > >>> Lincoln Lee
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> Lijie Wang  于2024年6月6日周四
> > 09:11写道:
> > > > > > >>>
> > > > > >  Congratulations, Rui!
> > > > > > 
> > > > > >  Best,
> > > > > >  Lijie
> > > > > > 
> > > > > >  Rodrigo Meneses  于2024年6月5日周三
> > 21:35写道:
> > > > > > 
> > > > > > > All the best
> > > > > > >
> > > > > > > On Wed, Jun 5, 2024 at 5:56 AM xiangyu feng <
> > > > > >  xiangyu...@gmail.com>
> > > > > >  wrote:
> > > > > > >
> > > > > > >> Congratulations, Rui!
> > > > > > >>
> > > > > > >> Regards,
> > > > > > >> Xiangyu Feng
> > > > > > >>
> > > > > > >> Feng Jin  于2024年6月5日周三
> 20:42写道:
> > > > > > >>
> > > > > > >>> Congratulations, Rui!
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> Best,
> > > > > > >>> Feng Jin
> > > > > > >>>
> > > > > > >>> On Wed, Jun 5, 2024 at 8:23 PM Yanfei Lei <
> > > > > >  fredia...@gmail.com
> > > > > > >>
> > > > > >  wrote:
> > > > > > >>>
> > > > > >  Congratulations, Rui!
> > > > > > 
> > > > > >  Best,
> > > > > >  Yanfei
> > > > > > 
> > > > > >  Luke Chen  于2024年6月5日周三 20:08写道:
> > > > > > >
> > > > > > > Congrats, Rui!
> > > > > > >
> > > > > > > Luke
> > > > > > >
> > > > > > > On Wed, Jun 5, 2024 at 8:02 PM Jiabao Sun <
> > > > > > >>> jiabao...@apache.org>
> > > > > > >>> wrote:
> > > > > > >
> > > > > > >> Congrats, Rui. Well-deserved!
> > > > > > >>
> > > > > > >> Best,
> > > > > > >> Jiabao
> > > > > > >>
> > > > > > >> Zhanghao Chen 
> > > > > > >>> 于2024年6月5日周三
> > > > > >  19:29写道:
> > > > > > >>
> > > > > > >>> Congrats, Rui!
> > > > > > >>>
> > > > > > >>> Best,
> > > > > > >>> Zhanghao Chen
> > > > > > >>> 
> > > > > > >>> From: Piotr Nowojski 
> > > > > > >>> Sent: Wednesday, June 5, 2024 18:01
> > > > > > >>> To: dev ; rui fan <
> > > > > >  1996fan...@gmail.com>
> > > > > > >>> Subject: [ANNOUNCE] New Apache Flink PMC Member -
> > > > > > >>> Fan
> > > > > > > Rui

Re: [June 15 Feature Freeze][SUMMARY] Flink 1.20 Release Sync 11/06/2024

2024-06-11 Thread weijie guo
Thanks Zhanghao for the feedback.

Please feel free to change the state of this one to `won't make it`.


Best regards,

Weijie


Zhanghao Chen  于2024年6月12日周三 13:18写道:

> Hi Rui,
>
> Thanks for the summary! A quick update here: FLIP-398 was decided not to
> go into 1.20, as it was just found that the effort to add dedicated
> serialization support for Maps, Sets and Lists, will break
> state-compatibility. I will revert the relevant changes soon.
>
> Best,
> Zhanghao Chen
> 
> From: Rui Fan <1996fan...@gmail.com>
> Sent: Wednesday, June 12, 2024 12:59
> To: dev 
> Subject: [June 15 Feature Freeze][SUMMARY] Flink 1.20 Release Sync
> 11/06/2024
>
> Dear devs,
>
> This is the sixth meeting for Flink 1.20 release[1] cycle.
>
> I'd like to share the information synced in the meeting.
>
> - Feature Freeze
>
> It is worth noting that there are only 3 days left until the
> feature freeze time(June 15, 2024, 00:00 CEST(UTC+2)),
> and developers need to pay attention to the feature freeze time.
>
> After checked with all contributors of 1.20 FLIPs, we don't need
> to postpone the feature freeze time. Please reply to this email
> if other features are valuable and it's better to be merged in 1.20,
> thanks.
>
> - Features:
>
> So far we've had 16 flips/features:
> - 6 flips/features are done
> - 8 flips/features are doing and release managers checked with
> corresponding contributors
>   - 7 of these flips/features can be completed before June 15, 2024, 00:00
> CEST(UTC+2)
>   - We were unable to contact the contributor of FLIP-436
> - 2 flips/features won't make in 1.20
>
> - Blockers:
>
> We don't have any blocker right now, thanks to everyone who fixed blockers
> before.
>
> - Sync meeting[2]:
>
> The next meeting is 18/06/2024 10am (UTC+2) and 4pm (UTC+8), please
> feel free to join us.
>
> Lastly, we encourage attendees to fill out the topics to be discussed at
> the bottom of 1.20 wiki page[1] a day in advance, to make it easier for
> everyone to understand the background of the topics, thanks!
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/1.20+Release
> [2] https://meet.google.com/mtj-huez-apu
>
> Best,
> Robert, Weijie, Ufuk and Rui
>


Re: [June 15 Feature Freeze][SUMMARY] Flink 1.20 Release Sync 11/06/2024

2024-06-11 Thread Zhanghao Chen
Hi Rui,

Thanks for the summary! A quick update here: FLIP-398 was decided not to go 
into 1.20, as it was just found that the effort to add dedicated serialization 
support for Maps, Sets and Lists, will break state-compatibility. I will revert 
the relevant changes soon.

Best,
Zhanghao Chen

From: Rui Fan <1996fan...@gmail.com>
Sent: Wednesday, June 12, 2024 12:59
To: dev 
Subject: [June 15 Feature Freeze][SUMMARY] Flink 1.20 Release Sync 11/06/2024

Dear devs,

This is the sixth meeting for Flink 1.20 release[1] cycle.

I'd like to share the information synced in the meeting.

- Feature Freeze

It is worth noting that there are only 3 days left until the
feature freeze time(June 15, 2024, 00:00 CEST(UTC+2)),
and developers need to pay attention to the feature freeze time.

After checked with all contributors of 1.20 FLIPs, we don't need
to postpone the feature freeze time. Please reply to this email
if other features are valuable and it's better to be merged in 1.20, thanks.

- Features:

So far we've had 16 flips/features:
- 6 flips/features are done
- 8 flips/features are doing and release managers checked with
corresponding contributors
  - 7 of these flips/features can be completed before June 15, 2024, 00:00
CEST(UTC+2)
  - We were unable to contact the contributor of FLIP-436
- 2 flips/features won't make in 1.20

- Blockers:

We don't have any blocker right now, thanks to everyone who fixed blockers
before.

- Sync meeting[2]:

The next meeting is 18/06/2024 10am (UTC+2) and 4pm (UTC+8), please
feel free to join us.

Lastly, we encourage attendees to fill out the topics to be discussed at
the bottom of 1.20 wiki page[1] a day in advance, to make it easier for
everyone to understand the background of the topics, thanks!

[1] https://cwiki.apache.org/confluence/display/FLINK/1.20+Release
[2] https://meet.google.com/mtj-huez-apu

Best,
Robert, Weijie, Ufuk and Rui


Re: [June 15 Feature Freeze][SUMMARY] Flink 1.20 Release Sync 11/06/2024

2024-06-11 Thread weijie guo
Thanks Rui for the summary!

Best regards,

Weijie


Rui Fan <1996fan...@gmail.com> 于2024年6月12日周三 13:00写道:

> Dear devs,
>
> This is the sixth meeting for Flink 1.20 release[1] cycle.
>
> I'd like to share the information synced in the meeting.
>
> - Feature Freeze
>
> It is worth noting that there are only 3 days left until the
> feature freeze time(June 15, 2024, 00:00 CEST(UTC+2)),
> and developers need to pay attention to the feature freeze time.
>
> After checked with all contributors of 1.20 FLIPs, we don't need
> to postpone the feature freeze time. Please reply to this email
> if other features are valuable and it's better to be merged in 1.20,
> thanks.
>
> - Features:
>
> So far we've had 16 flips/features:
> - 6 flips/features are done
> - 8 flips/features are doing and release managers checked with
> corresponding contributors
>   - 7 of these flips/features can be completed before June 15, 2024, 00:00
> CEST(UTC+2)
>   - We were unable to contact the contributor of FLIP-436
> - 2 flips/features won't make in 1.20
>
> - Blockers:
>
> We don't have any blocker right now, thanks to everyone who fixed blockers
> before.
>
> - Sync meeting[2]:
>
> The next meeting is 18/06/2024 10am (UTC+2) and 4pm (UTC+8), please
> feel free to join us.
>
> Lastly, we encourage attendees to fill out the topics to be discussed at
> the bottom of 1.20 wiki page[1] a day in advance, to make it easier for
> everyone to understand the background of the topics, thanks!
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/1.20+Release
> [2] https://meet.google.com/mtj-huez-apu
>
> Best,
> Robert, Weijie, Ufuk and Rui
>


[June 15 Feature Freeze][SUMMARY] Flink 1.20 Release Sync 11/06/2024

2024-06-11 Thread Rui Fan
Dear devs,

This is the sixth meeting for Flink 1.20 release[1] cycle.

I'd like to share the information synced in the meeting.

- Feature Freeze

It is worth noting that there are only 3 days left until the
feature freeze time(June 15, 2024, 00:00 CEST(UTC+2)),
and developers need to pay attention to the feature freeze time.

After checked with all contributors of 1.20 FLIPs, we don't need
to postpone the feature freeze time. Please reply to this email
if other features are valuable and it's better to be merged in 1.20, thanks.

- Features:

So far we've had 16 flips/features:
- 6 flips/features are done
- 8 flips/features are doing and release managers checked with
corresponding contributors
  - 7 of these flips/features can be completed before June 15, 2024, 00:00
CEST(UTC+2)
  - We were unable to contact the contributor of FLIP-436
- 2 flips/features won't make in 1.20

- Blockers:

We don't have any blocker right now, thanks to everyone who fixed blockers
before.

- Sync meeting[2]:

The next meeting is 18/06/2024 10am (UTC+2) and 4pm (UTC+8), please
feel free to join us.

Lastly, we encourage attendees to fill out the topics to be discussed at
the bottom of 1.20 wiki page[1] a day in advance, to make it easier for
everyone to understand the background of the topics, thanks!

[1] https://cwiki.apache.org/confluence/display/FLINK/1.20+Release
[2] https://meet.google.com/mtj-huez-apu

Best,
Robert, Weijie, Ufuk and Rui


  1   2   3   4   5   6   7   8   9   10   >