[jira] [Created] (FLINK-35169) Recycle buffers to freeSegments before releasing data buffer for sort accumulator

2024-04-18 Thread Yuxin Tan (Jira)
Yuxin Tan created FLINK-35169:
-

 Summary: Recycle buffers to freeSegments before releasing data 
buffer for sort accumulator
 Key: FLINK-35169
 URL: https://issues.apache.org/jira/browse/FLINK-35169
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Affects Versions: 1.20.0
Reporter: Yuxin Tan
Assignee: Yuxin Tan


When using sortBufferAccumulator, we should recycle the buffers to freeSegments 
before releasing the data buffer. The reason is that when getting buffers from 
the DataBuffer, it may require more buffers than the current quantity available 
in freeSegments. Consequently, to ensure adequate buffers from DataBuffer, the 
flushed and recycled buffers should also be added to freeSegments for reuse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLINK-34440 Support Debezium Protobuf Confluent Format

2024-04-18 Thread Anupam Aggarwal
Thanks David.
That's a great idea. For deserialization the external schema Id will be
used to obtain a dynamic message, so in a way it has to be inline with the
writer schema.
We could limit it to serialization and rename it according to your
suggestion.

Thanks
Anupam
On Tue, Apr 16, 2024 at 3:38 PM David Radley 
wrote:

> Hi Anupam,
> Thanks for your response. I was wondering around the schema id and had
> some thoughts:
>
> I assume that for Confluent Avro, specifying the schema is not normally
> done, but could be useful to force a particular shape.
>
> If you specify a schema id in the format configuration:
> - for deserialization : does this mean the schema id in the payload has to
> match it. If so we lose the ability to have multiple versions of the schema
> on a topic. For me schemaId makes less sense for deserialization as the
> existing mechanism used by Avro / confluent avro formats is working well.
>
> - I can see it makes sense for the serialization where there is an
> existing schema in the registry you want to target.
>
> I suggest the schemaId be called something like schemaIdForSink or
> schemaIdForSerilization; to prevent confusion with the deserialization
> case. We could have the schema as you suggest so we are compatible with the
> confluent avro format.
>
>
> WDYT?
> Kind regards, David.
>
>
> From: Anupam Aggarwal 
> Date: Saturday, 13 April 2024 at 16:08
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: [DISCUSS] FLINK-34440 Support Debezium Protobuf
> Confluent Format
> Hi David,
>
> Thank you for the suggestion.
> IIUC, you are proposing using an explicit schema string, instead of the
> schemaID.
> This makes sense, as it would make the behavior consistent with Avro,
> although a bit more verbose from a config standpoint.
>
> If we go via the schema string route, the user would have to ensure that
> the input schema string corresponds to an existing schemaID.
> This however, might end up registering a new id (based on
>
> https://github.com/confluentinc/schema-registry/issues/878#issuecomment-437510493
> ).
>
> How about adding both the options (explicit schema string/ schemaID).
> If a schema string is specified we register a new schemaID, if the user
> specifies an explicit schemaID we just use it directly?
>
> Thanks
> Anupam
>
> On Wed, Apr 10, 2024 at 2:27 PM David Radley 
> wrote:
>
> > Hi,
> > I notice in the draft pr that there is a schema id in the format config.
> I
> > was wondering why? In the confluent avro and existing debezium formats,
> > there is no schema id in the config, but there is the ability to specify
> a
> > complete schema. In the protobuf format there is no schema id.
> >
> > I assume the schema id would be used during serialize in the case there
> is
> > already an existing registered schema and you have its id. I see in the
> > docs
> >
> https://docs.confluent.io/platform/current/schema-registry/fundamentals/serdes-develop/serdes-protobuf.html
> > there is a serialize example where 2 schemas are registered.
> >
> > I would suggest aiming to copy what the confluent DeSer libraries do
> > rather than having a schema id hard coded in the config.
> >
> > WDYT?
> > Kind regards, David.
> >
> > From: Kevin Lam 
> > Date: Tuesday, 26 March 2024 at 20:06
> > To: dev@flink.apache.org 
> > Subject: [EXTERNAL] Re: [DISCUSS] FLINK-34440 Support Debezium Protobuf
> > Confluent Format
> > Thanks Anupam! Looking forward to it.
> >
> > On Thu, Mar 14, 2024 at 1:50 AM Anupam Aggarwal <
> anupam.aggar...@gmail.com
> > >
> > wrote:
> >
> > > Hi Kevin,
> > >
> > > Thanks, these are some great points.
> > > Just to clarify, I do agree that the subject should be an option (like
> in
> > > the case of RegistryAvroFormatFactory).
> > > We could fallback to subject and auto-register schemas, if schema-Id
> not
> > > provided explicitly.
> > > In general, I think it would be good to be more explicit about the
> > schemas
> > > used (
> > >
> > >
> >
> https://docs.confluent.io/platform/curren/schema-registry/schema_registry_onprem_tutorial.html#auto-schema-registration
> > > <
> > >
> >
> https://docs.confluent.io/platform/current/schema-registry/schema_registry_onprem_tutorial.html#auto-schema-registration
> > > >
> > > ).
> > > This would also help prevent us from overriding the ids in incompatible
> > > ways.
> > >
> > > Under the current implementation of FlinkToProtoSchemaConverter we
> might
> > > end up overwriting the field-Ids.
> > > If we are able to locate a prior schema, the approach you outlined
> makes
> > a
> > > lot of sense.
> > > Let me explore this a bit further and get back(in terms of
> feasibility).
> > >
> > > Thanks again!
> > > - Anupam
> > >
> > > On Wed, Mar 13, 2024 at 2:28 AM Kevin Lam
>  > >
> > > wrote:
> > >
> > > > Hi Anupam,
> > > >
> > > > Thanks again for your work on contributing this feature back.
> > > >
> > > > Sounds good re: the refactoring/re-organizing.
> > > >
> > > > Regarding the schema-id, in my opinion this should 

[jira] [Created] (FLINK-35168) Basic State Iterator for async processing

2024-04-18 Thread Zakelly Lan (Jira)
Zakelly Lan created FLINK-35168:
---

 Summary: Basic State Iterator for async processing
 Key: FLINK-35168
 URL: https://issues.apache.org/jira/browse/FLINK-35168
 Project: Flink
  Issue Type: Sub-task
Reporter: Zakelly Lan
Assignee: Zakelly Lan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-442: General Improvement to Configuration for Flink 2.0

2024-04-18 Thread gongzhongqiang
+1 (non-binding)


Best,
Zhongqiang Gong

Xuannan Su  于2024年4月17日周三 13:02写道:

> Hi everyone,
>
> Thanks for all the feedback about the FLIP-442: General Improvement to
> Configuration for Flink 2.0 [1] [2].
>
> I'd like to start a vote for it. The vote will be open for at least 72
> hours(excluding weekends,until APR 22, 12:00AM GMT) unless there is an
> objection or an insufficient number of votes.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-442%3A+General+Improvement+to+Configuration+for+Flink+2.0
> [2] https://lists.apache.org/thread/15k0stwyoknhxvd643ctwjw3fd17pqwk
>
>
> Best regards,
> Xuannan
>


Re: [VOTE] FLIP-435: Introduce a New Materialized Table for Simplifying Data Pipelines

2024-04-18 Thread Jingsong Li
+1

On Thu, Apr 18, 2024 at 11:09 AM Yun Tang  wrote:
>
> +1 (binding)
>
> Best,
> Yun Tang
> 
> From: Jark Wu 
> Sent: Thursday, April 18, 2024 9:54
> To: dev@flink.apache.org 
> Subject: Re: [VOTE] FLIP-435: Introduce a New Materialized Table for 
> Simplifying Data Pipelines
>
> +1 (binding)
>
> Best,
> Jark
>
> On Wed, 17 Apr 2024 at 20:52, Leonard Xu  wrote:
>
> > +1(binding)
> >
> > Best,
> > Leonard
> >
> > > 2024年4月17日 下午8:31,Lincoln Lee  写道:
> > >
> > > +1(binding)
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Ferenc Csaky  于2024年4月17日周三 19:58写道:
> > >
> > >> +1 (non-binding)
> > >>
> > >> Best,
> > >> Ferenc
> > >>
> > >>
> > >>
> > >>
> > >> On Wednesday, April 17th, 2024 at 10:26, Ahmed Hamdy <
> > hamdy10...@gmail.com>
> > >> wrote:
> > >>
> > >>>
> > >>>
> > >>> + 1 (non-binding)
> > >>>
> > >>> Best Regards
> > >>> Ahmed Hamdy
> > >>>
> > >>>
> > >>> On Wed, 17 Apr 2024 at 08:28, Yuepeng Pan panyuep...@apache.org wrote:
> > >>>
> >  +1(non-binding).
> > 
> >  Best,
> >  Yuepeng Pan
> > 
> >  At 2024-04-17 14:27:27, "Ron liu" ron9@gmail.com wrote:
> > 
> > > Hi Dev,
> > >
> > > Thank you to everyone for the feedback on FLIP-435: Introduce a New
> > > Materialized Table for Simplifying Data Pipelines[1][2].
> > >
> > > I'd like to start a vote for it. The vote will be open for at least
> > >> 72
> > > hours unless there is an objection or not enough votes.
> > >
> > > [1]
> > 
> > 
> > >>
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-435%3A+Introduce+a+New+Materialized+Table+for+Simplifying+Data+Pipelines
> > 
> > > [2] https://lists.apache.org/thread/c1gnn3bvbfs8v1trlf975t327s4rsffs
> > >
> > > Best,
> > > Ron
> > >>
> >
> >


[jira] [Created] (FLINK-35167) [CDC] Introduce MaxCompute pipeline DataSink

2024-04-18 Thread zhangdingxin (Jira)
zhangdingxin created FLINK-35167:


 Summary: [CDC] Introduce MaxCompute pipeline DataSink
 Key: FLINK-35167
 URL: https://issues.apache.org/jira/browse/FLINK-35167
 Project: Flink
  Issue Type: New Feature
  Components: Flink CDC
Reporter: zhangdingxin


By integrating the MaxCompute DataSink, we enable the precise and efficient 
synchronization of data from Flink's Change Data Capture (CDC) into MaxCompute.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-442: General Improvement to Configuration for Flink 2.0

2024-04-18 Thread Muhammet Orazov

Thanks Xuannan! +1 (non-binding)

Best,
Muhammet

On 2024-04-17 05:00, Xuannan Su wrote:

Hi everyone,

Thanks for all the feedback about the FLIP-442: General Improvement to
Configuration for Flink 2.0 [1] [2].

I'd like to start a vote for it. The vote will be open for at least 72
hours(excluding weekends,until APR 22, 12:00AM GMT) unless there is an
objection or an insufficient number of votes.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-442%3A+General+Improvement+to+Configuration+for+Flink+2.0

[2] https://lists.apache.org/thread/15k0stwyoknhxvd643ctwjw3fd17pqwk


Best regards,
Xuannan


[jira] [Created] (FLINK-35166) Improve the performance of Hybrid Shuffle when enable memory decoupling

2024-04-18 Thread Jiang Xin (Jira)
Jiang Xin created FLINK-35166:
-

 Summary: Improve the performance of Hybrid Shuffle when enable 
memory decoupling
 Key: FLINK-35166
 URL: https://issues.apache.org/jira/browse/FLINK-35166
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Network
Reporter: Jiang Xin
 Fix For: 1.20.0


Currently, the tiered result partition creates the SortBufferAccumulator with 
the number of expected buffers as min(numSubpartitions+1, 512), thus the 
SortBufferAccumulator may obtain very few buffers when the parallelism is 
small. We can easily make the number of expected buffers 512 by default to have 
a better performance when the buffers are sufficient.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Question around Flink's AdaptiveBatchScheduler

2024-04-18 Thread Venkatakrishnan Sowrirajan
Filed https://issues.apache.org/jira/browse/FLINK-35165 to address the
above described issue. Will share the PR here once it is ready for review.

Regards
Venkata krishnan


On Wed, Apr 17, 2024 at 5:32 AM Junrui Lee  wrote:

> Thanks Venkata and Xia for providing further clarification. I think your
> example illustrates the significance of this proposal very well. Please
> feel free go ahead and address the concerns.
>
> Best,
> Junrui
>
> Venkatakrishnan Sowrirajan  于2024年4月16日周二 07:01写道:
>
> > Thanks for adding your thoughts to this discussion.
> >
> > If we all agree that the source vertex parallelism shouldn't be bound by
> > the downstream max parallelism
> > (jobmanager.adaptive-batch-scheduler.max-parallelism)
> > based on the rationale and the issues described above, I can take a stab
> at
> > addressing the issue.
> >
> > Let me file a ticket to track this issue. Otherwise, I'm looking forward
> to
> > hearing more thoughts from others as well, especially Lijie and Junrui
> who
> > have more context on the AdaptiveBatchScheduler.
> >
> > Regards
> > Venkata krishnan
> >
> >
> > On Mon, Apr 15, 2024 at 12:54 AM Xia Sun  wrote:
> >
> > > Hi Venkat,
> > > I agree that the parallelism of source vertex should not be upper
> bounded
> > > by the job's global max parallelism. The case you mentioned, >> High
> > filter
> > > selectivity with huge amounts of data to read  excellently supports
> this
> > > viewpoint. (In fact, in the current implementation, if the source
> > > parallelism is pre-specified at job create stage, rather than relying
> on
> > > the dynamic parallelism inference of the AdaptiveBatchScheduler, the
> > source
> > > vertex's parallelism can indeed exceed the job's global max
> parallelism.)
> > >
> > > As Lijie and Junrui pointed out, the key issue is "semantic
> consistency."
> > > Currently, if a vertex has not set maxParallelism, the
> > > AdaptiveBatchScheduler will use
> > > `execution.batch.adaptive.auto-parallelism.max-parallelism` as the
> > vertex's
> > > maxParallelism. Since the current implementation does not distinguish
> > > between source vertices and downstream vertices, source vertices are
> also
> > > subject to this limitation.
> > >
> > > Therefore, I believe that if the issue of "semantic consistency" can be
> > > well explained in the code and configuration documentation, the
> > > AdaptiveBatchScheduler should support that the parallelism of source
> > > vertices can exceed the job's global max parallelism.
> > >
> > > Best,
> > > Xia
> > >
> > > Venkatakrishnan Sowrirajan  于2024年4月14日周日 10:31写道:
> > >
> > > > Let me state why I think "*jobmanager.adaptive-batch-sche*
> > > > *duler.default-source-parallelism*" should not be bound by the "
> > > > *jobmanager.adaptive-batch-sche**duler.max-parallelism*".
> > > >
> > > >- Source vertex is unique and does not have any upstream vertices
> > > >- Downstream vertices read shuffled data partitioned by key, which
> > is
> > > >not the case for the Source vertex
> > > >- Limiting source parallelism by downstream vertices' max
> > parallelism
> > > is
> > > >incorrect
> > > >
> > > > If we say for ""semantic consistency" the source vertex parallelism
> has
> > > to
> > > > be bound by the overall job's max parallelism, it can lead to
> following
> > > > issues:
> > > >
> > > >- High filter selectivity with huge amounts of data to read -
> > setting
> > > >high "*jobmanager.adaptive-batch-scheduler.max-parallelism*" so
> that
> > > >source parallelism can be set higher can lead to small blocks and
> > > >sub-optimal performance.
> > > >- Setting high
> > "*jobmanager.adaptive-batch-scheduler.max-parallelism*"
> > > >requires careful tuning of network buffer configurations which is
> > > >unnecessary in cases where it is not required just so that the
> > source
> > > >parallelism can be set high.
> > > >
> > > > Regards
> > > > Venkata krishnan
> > > >
> > > > On Thu, Apr 11, 2024 at 9:30 PM Junrui Lee 
> > wrote:
> > > >
> > > > > Hello Venkata krishnan,
> > > > >
> > > > > I think the term "semantic inconsistency" defined by
> > > > > jobmanager.adaptive-batch-scheduler.max-parallelism refers to
> > > > maintaining a
> > > > > uniform upper limit on parallelism across all vertices within a
> job.
> > As
> > > > the
> > > > > source vertices are part of the global execution graph, they should
> > > also
> > > > > respect this rule to ensure consistent application of parallelism
> > > > > constraints.
> > > > >
> > > > > Best,
> > > > > Junrui
> > > > >
> > > > > Venkatakrishnan Sowrirajan  于2024年4月12日周五
> 02:10写道:
> > > > >
> > > > > > Gentle bump on this question. cc @Becket Qin <
> becket@gmail.com
> > >
> > > as
> > > > > > well.
> > > > > >
> > > > > > Regards
> > > > > > Venkata krishnan
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 12, 2024 at 10:11 PM Venkatakrishnan Sowrirajan <
> > > > > > vsowr...@asu.edu> wrote:
> > > > > >
> > > > > > > Thanks for 

[jira] [Created] (FLINK-35165) AdaptiveBatch Scheduler should not restrict the default source parallelism to the max parallelism set

2024-04-18 Thread Venkata krishnan Sowrirajan (Jira)
Venkata krishnan Sowrirajan created FLINK-35165:
---

 Summary: AdaptiveBatch Scheduler should not restrict the default 
source parallelism to the max parallelism set
 Key: FLINK-35165
 URL: https://issues.apache.org/jira/browse/FLINK-35165
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Reporter: Venkata krishnan Sowrirajan


Copy-pasting the reasoning mentioned on this [discussion 
thread|https://lists.apache.org/thread/o887xhvvmn2rg5tyymw348yl2mqt23o7].

Let me state why I think 
"{_}jobmanager.adaptive-batch-scheduler.default-source-parallelism{_}" should 
not be bound by the "{_}jobmanager.adaptive-batch-scheduler.max-parallelism{_}".
 *  Source vertex is unique and does not have any upstream vertices - 
Downstream vertices read shuffled data partitioned by key, which is not the 
case for the Source vertex
 * Limiting source parallelism by downstream vertices' max parallelism is 
incorrect
 * If we say for ""semantic consistency" the source vertex parallelism has to 
be bound by the overall job's max parallelism, it can lead to following issues:
 ** High filter selectivity with huge amounts of data to read
 ** Setting high "*jobmanager.adaptive-batch-scheduler.max-parallelism*" so 
that source parallelism can be set higher can lead to small blocks and 
sub-optimal performance.
 ** Setting high "*jobmanager.adaptive-batch-scheduler.max-parallelism*" 
requires careful tuning of network buffer configurations which is unnecessary 
in cases where it is not required just so that the source parallelism can be 
set high.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ DISCUSS ] FLIP-XXX : [Plugin] Enhancing Flink Failure Management in Kubernetes with Dynamic Termination Log Integration

2024-04-18 Thread Swathi C
Hi Gyula and  Ahmed,

Thanks for reviewing this.

@gyula.f...@gmail.com  , currently since our aim as
part of this FLIP was only to fail the cluster when job manager/flink has
issues such that the cluster would no longer be usable, hence, we proposed
only related to that.
Your right, that it covers only job main class errors, job manager run time
failures, if the Job manager wants to write any metadata to any other
system ( ABFS, S3 , ... )  and the job failures will not be covered.

FLIP-304 is mainly used to provide Failure enrichers for job failures.
Since, this FLIP is mainly for flink Job manager failures, let us know if
we can leverage the goodness of both and try to extend FLIP-304 and add our
plugin implementation to cover the job level issues ( propagate this info
to the /dev/termination-log such that, the container status reports it for
flink on K8S by implementing Failure Enricher interface and
processFailure() to do this ) and use this FLIP proposal for generic flink
cluster (Job manager/cluster ) failures.

Regards,
Swathi C

On Thu, Apr 18, 2024 at 7:36 PM Ahmed Hamdy  wrote:

> Hi Swathi!
> Thanks for the proposal.
> Could you please elaborate what this FLIP offers more than Flip-304[1]?
> Flip 304 proposes a Pluggable mechanism for enriching Job failures, If I am
> not mistaken this proposal looks like a subset of it.
>
> 1-
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-304%3A+Pluggable+Failure+Enrichers
>
> Best Regards
> Ahmed Hamdy
>
>
> On Thu, 18 Apr 2024 at 08:23, Gyula Fóra  wrote:
>
> > Hi Swathi!
> >
> > Thank you for creating this proposal. I really like the general idea of
> > increasing the K8s native observability of Flink job errors.
> >
> > I took a quick look at your reference PR, the termination log related
> logic
> > is contained completely in the ClusterEntrypoint. What type of errors
> will
> > this actually cover?
> >
> > To me this seems to cover only:
> >  - Job main class errors (ie startup errors)
> >  - JobManager failures
> >
> > Would regular job errors (that cause only job failover but not JM errors)
> > be reported somehow with this plugin?
> >
> > Thanks
> > Gyula
> >
> > On Tue, Apr 16, 2024 at 8:21 AM Swathi C 
> > wrote:
> >
> > > Hi All,
> > >
> > > I would like to start a discussion on FLIP-XXX : [Plugin] Enhancing
> Flink
> > > Failure Management in Kubernetes with Dynamic Termination Log
> > Integration.
> > >
> > >
> > >
> >
> https://docs.google.com/document/d/1tWR0Fi3w7VQeD_9VUORh8EEOva3q-V0XhymTkNaXHOc/edit?usp=sharing
> > >
> > >
> > > This FLIP proposes an improvement plugin and focuses mainly on Flink on
> > > K8S but can be used as a generic plugin and add further enhancements.
> > >
> > > Looking forward to everyone's feedback and suggestions. Thank you !!
> > >
> > > Best Regards,
> > > Swathi Chandrashekar
> > >
> >
>


[jira] [Created] (FLINK-35164) Support `ALTER CATALOG RESET` syntax

2024-04-18 Thread Yubin Li (Jira)
Yubin Li created FLINK-35164:


 Summary: Support `ALTER CATALOG RESET` syntax
 Key: FLINK-35164
 URL: https://issues.apache.org/jira/browse/FLINK-35164
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Yubin Li
 Attachments: image-2024-04-18-23-26-59-854.png

h3. ALTER CATALOG catalog_name RESET (key1, key2, ...)

Reset one or more properties to its default value in the specified catalog.

!image-2024-04-18-23-26-59-854.png|width=781,height=527!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New Apache Flink PMC Member - Lincoln Lee

2024-04-18 Thread Lincoln Lee
Thanks everyone : )

Best,
Lincoln Lee


Feifan Wang  于2024年4月16日周二 12:44写道:

> Congratulations Lincoln !
>
>
> ——
>
> Best regards,
>
> Feifan Wang
>
>
>
>
> At 2024-04-12 15:59:00, "Jark Wu"  wrote:
> >Hi everyone,
> >
> >On behalf of the PMC, I'm very happy to announce that Lincoln Lee has
> >joined the Flink PMC!
> >
> >Lincoln has been an active member of the Apache Flink community for
> >many years. He mainly works on Flink SQL component and has driven
> >/pushed many FLIPs around SQL, including FLIP-282/373/415/435 in
> >the recent versions. He has a great technical vision of Flink SQL and
> >participated in plenty of discussions in the dev mailing list. Besides
> >that,
> >he is community-minded, such as being the release manager of 1.19,
> >verifying releases, managing release syncs, writing the release
> >announcement etc.
> >
> >Congratulations and welcome Lincoln!
> >
> >Best,
> >Jark (on behalf of the Flink PMC)
>


Re: [VOTE] FLIP-442: General Improvement to Configuration for Flink 2.0

2024-04-18 Thread Ahmed Hamdy
+1 (non-binding)
Great work!


Best Regards
Ahmed Hamdy


On Wed, 17 Apr 2024 at 14:36, Jeyhun Karimov  wrote:

> +1 (non binding)
>
> Regards,
> Jeyhun
>
> On Wed, Apr 17, 2024 at 2:22 PM Zhu Zhu  wrote:
>
> > +1 (binding)
> >
> > Thanks,
> > Zhu
> >
> > Yuxin Tan  于2024年4月17日周三 18:36写道:
> >
> > > +1 (non-binding)
> > >
> > > Best,
> > > Yuxin
> > >
> > >
> > > Zakelly Lan  于2024年4月17日周三 16:51写道:
> > >
> > > > +1 binding
> > > >
> > > >
> > > > Best,
> > > > Zakelly
> > > >
> > > > On Wed, Apr 17, 2024 at 2:05 PM Rui Fan <1996fan...@gmail.com>
> wrote:
> > > >
> > > > > +1(binding)
> > > > >
> > > > > Best,
> > > > > Rui
> > > > >
> > > > > On Wed, Apr 17, 2024 at 1:02 PM Xuannan Su 
> > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > Thanks for all the feedback about the FLIP-442: General
> Improvement
> > > to
> > > > > > Configuration for Flink 2.0 [1] [2].
> > > > > >
> > > > > > I'd like to start a vote for it. The vote will be open for at
> least
> > > 72
> > > > > > hours(excluding weekends,until APR 22, 12:00AM GMT) unless there
> is
> > > an
> > > > > > objection or an insufficient number of votes.
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-442%3A+General+Improvement+to+Configuration+for+Flink+2.0
> > > > > > [2]
> > https://lists.apache.org/thread/15k0stwyoknhxvd643ctwjw3fd17pqwk
> > > > > >
> > > > > >
> > > > > > Best regards,
> > > > > > Xuannan
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [ DISCUSS ] FLIP-XXX : [Plugin] Enhancing Flink Failure Management in Kubernetes with Dynamic Termination Log Integration

2024-04-18 Thread Ahmed Hamdy
Hi Swathi!
Thanks for the proposal.
Could you please elaborate what this FLIP offers more than Flip-304[1]?
Flip 304 proposes a Pluggable mechanism for enriching Job failures, If I am
not mistaken this proposal looks like a subset of it.

1-
https://cwiki.apache.org/confluence/display/FLINK/FLIP-304%3A+Pluggable+Failure+Enrichers

Best Regards
Ahmed Hamdy


On Thu, 18 Apr 2024 at 08:23, Gyula Fóra  wrote:

> Hi Swathi!
>
> Thank you for creating this proposal. I really like the general idea of
> increasing the K8s native observability of Flink job errors.
>
> I took a quick look at your reference PR, the termination log related logic
> is contained completely in the ClusterEntrypoint. What type of errors will
> this actually cover?
>
> To me this seems to cover only:
>  - Job main class errors (ie startup errors)
>  - JobManager failures
>
> Would regular job errors (that cause only job failover but not JM errors)
> be reported somehow with this plugin?
>
> Thanks
> Gyula
>
> On Tue, Apr 16, 2024 at 8:21 AM Swathi C 
> wrote:
>
> > Hi All,
> >
> > I would like to start a discussion on FLIP-XXX : [Plugin] Enhancing Flink
> > Failure Management in Kubernetes with Dynamic Termination Log
> Integration.
> >
> >
> >
> https://docs.google.com/document/d/1tWR0Fi3w7VQeD_9VUORh8EEOva3q-V0XhymTkNaXHOc/edit?usp=sharing
> >
> >
> > This FLIP proposes an improvement plugin and focuses mainly on Flink on
> > K8S but can be used as a generic plugin and add further enhancements.
> >
> > Looking forward to everyone's feedback and suggestions. Thank you !!
> >
> > Best Regards,
> > Swathi Chandrashekar
> >
>


Re: [DISCUSS][QUESTION] Drop jdk 8 support for Flink connector Opensearch

2024-04-18 Thread Danny Cranmer
Hey Sergey,

Given the circumstances I think it is ok to drop JDK 8 support with the
opensearch v2.0.0 connector. However, in parallel we should still support
the 1.x line for Flink 1.x series with JDK 8. This would mean two releases:
1/ flink-connector-opensearch v2.0.0 for Flink 1.18/1.19, opensearch
1.x/2.x and JDK 11
2/ flink-connector-opensearch v1.2.0 (or maybe just 1.1.0-1.19) for Flink
1.18/1.19, opensearch 1.x and JDK 8

What do you think?

Thanks,
Danny

On Wed, Apr 17, 2024 at 10:07 AM Sergey Nuyanzin 
wrote:

> Hi everyone
>
> I'm working on support for Opensearch v2.x for Flink connector
> Opensearch[1].
> Unfortunately after several breaking changes (e.g. [2], [3]) on Opensearch
> side it is not possible
> anymore to use the same connector built for both Opensearch v1 and v2.
> This makes us to go in a similar way as for Elasticsearch 6/7 and build a
> dedicated Opensearch v2 module.
>
> However the main pain point here is that Opensearch 2.x is built with jdk11
> and requires jdk11 to build and use Flink connector as well.
> Also in README[4] of most of the connectors it is mentioned explicitly that
> jdk11 is required to build connectors.
>
> At the same time it looks like we need to release a connector for
> Opensearch v1 with jdk8 and for Opensearch v2 with jdk11.
>
> The suggestion is to drop support of jdk8 for the Opensearch connector to
> make the release/testing for both modules (for Opensearch v1 and Openseach
> v2) easier.
>
> Other opinions are welcome
>
> [1] https://github.com/apache/flink-connector-opensearch/pull/38
> [2] opensearch-project/OpenSearch#9082
> [3] opensearch-project/OpenSearch#5902
> [4]
>
> https://github.com/apache/flink-connector-opensearch/blob/main/README.md?plain=1#L18
>
> --
> Best regards,
> Sergey
>


[jira] [Created] (FLINK-35163) Utilize ForSt's native MultiGet API to optimize remote state access

2024-04-18 Thread Jinzhong Li (Jira)
Jinzhong Li created FLINK-35163:
---

 Summary: Utilize ForSt's native MultiGet API to optimize remote 
state access
 Key: FLINK-35163
 URL: https://issues.apache.org/jira/browse/FLINK-35163
 Project: Flink
  Issue Type: Sub-task
Reporter: Jinzhong Li
 Fix For: 2.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35162) Support grouping state get and put access

2024-04-18 Thread Jinzhong Li (Jira)
Jinzhong Li created FLINK-35162:
---

 Summary: Support grouping state get and put access
 Key: FLINK-35162
 URL: https://issues.apache.org/jira/browse/FLINK-35162
 Project: Flink
  Issue Type: Sub-task
Reporter: Jinzhong Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35161) Implement StateExecutor for ForStStateBackend

2024-04-18 Thread Jinzhong Li (Jira)
Jinzhong Li created FLINK-35161:
---

 Summary: Implement StateExecutor for ForStStateBackend
 Key: FLINK-35161
 URL: https://issues.apache.org/jira/browse/FLINK-35161
 Project: Flink
  Issue Type: Sub-task
Reporter: Jinzhong Li
 Fix For: 2.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release flink-connector-mongodb v1.2.0, release candidate #2

2024-04-18 Thread Ahmed Hamdy
+1 (non-binding)

-  verified hashes and checksums
- verified signature
- verified source contains no binaries
- tag exists in github
- reviewed web PR


Best Regards
Ahmed Hamdy


On Thu, 18 Apr 2024 at 11:21, Danny Cranmer  wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #2 for v1.2.0, as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> This release supports Flink 1.18 and 1.19.
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2],
> which are signed with the key with fingerprint 125FD8DB [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag v1.2.0-rc2 [5],
> * website pull request listing the new release [6].
> * CI build of tag [7].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Danny
>
> [1]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354192
> [2]
>
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-mongodb-1.2.0-rc2
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1719/
> [5]
> https://github.com/apache/flink-connector-mongodb/releases/tag/v1.2.0-rc2
> [6] https://github.com/apache/flink-web/pull/735
> [7]
> https://github.com/apache/flink-connector-mongodb/actions/runs/8735987710
>


Re: [VOTE] Release flink-connector-jdbc v3.2.0, release candidate #2

2024-04-18 Thread Ahmed Hamdy
+1 (non-binding)

- Verified Checksums and hashes
- Verified Signatures
- No binaries in source
- Build source
- Github tag exists
- Reviewed Web PR


Best Regards
Ahmed Hamdy


On Thu, 18 Apr 2024 at 11:22, Danny Cranmer  wrote:

> Sorry for typos:
>
> > Please review and vote on the release candidate #1 for the version 3.2.0,
> as follows:
> Should be "release candidate #2"
>
> > * source code tag v3.2.0-rc1 [5],
> Should be "source code tag v3.2.0-rc2"
>
> Thanks,
> Danny
>
> On Thu, Apr 18, 2024 at 11:19 AM Danny Cranmer 
> wrote:
>
> > Hi everyone,
> >
> > Please review and vote on the release candidate #1 for the version 3.2.0,
> > as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > This release supports Flink 1.18 and 1.19.
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release to be deployed to dist.apache.org
> > [2], which are signed with the key with fingerprint 125FD8DB [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag v3.2.0-rc1 [5],
> > * website pull request listing the new release [6].
> > * CI run of tag [7].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Danny
> >
> > [1]
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353143
> > [2]
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-jdbc-3.2.0-rc2
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
> > https://repository.apache.org/content/repositories/orgapacheflink-1718/
> > [5]
> https://github.com/apache/flink-connector-jdbc/releases/tag/v3.2.0-rc2
> > [6] https://github.com/apache/flink-web/pull/734
> > [7]
> https://github.com/apache/flink-connector-jdbc/actions/runs/8736019099
> >
>


[jira] [Created] (FLINK-35160) Support for Thread Dump provides a convenient way to display issues of thread deadlocks in tasks

2024-04-18 Thread elon_X (Jira)
elon_X created FLINK-35160:
--

 Summary: Support for Thread Dump provides a convenient way to 
display issues of thread deadlocks in tasks
 Key: FLINK-35160
 URL: https://issues.apache.org/jira/browse/FLINK-35160
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / REST
Affects Versions: 1.18.1, 1.19.0, 1.17.1, 1.16.0
Reporter: elon_X
 Attachments: image-2024-04-18-20-57-52-440.png, 
image-2024-04-18-20-58-09-872.png, image-2024-04-18-21-00-04-532.png, 
image-2024-04-18-21-01-22-881.png, image-2024-04-18-21-34-41-014.png

After receiving feedback from the business side about performance issues in 
their tasks, we attempted to troubleshoot and discovered that their tasks had 
issues with thread deadlocks. However, the Thread Dump entry on the Flink page 
only shows thread stacks. Since the users are not very familiar with Java 
stacks, they couldn't clearly identify that the deadlocks were due to issues in 
the business logic code and mistakenly thought they were problems with the 
Flink framework

!image-2024-04-18-20-57-52-440.png!

!image-2024-04-18-20-58-09-872.png!

the JVM's jstack command can clearly display thread deadlocks, unfortunately, 
the business team does not have the permissions to log into the machines.  hear 
is the jstack log

!image-2024-04-18-21-00-04-532.png!

FlameGraph are excellent for visualizing performance bottlenecks and hotspots 
in application profiling but are not designed to pinpoint the exact lines of 
code where thread deadlocks occur.

!image-2024-04-18-21-01-22-881.png!

Perhaps we could enhance the Thread Dump feature to display thread deadlocks, 
similar to what the {{jstack}} command provides.

 

!image-2024-04-18-21-34-41-014.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource

2024-04-18 Thread Ahmed Hamdy
Hi Xia,
I have read through the FLIP and discussion and the new version of the FLIP
looks better.
+1 for the proposal.
Best Regards
Ahmed Hamdy


On Thu, 18 Apr 2024 at 12:21, Ron Liu  wrote:

> Hi, Xia
>
> Thanks for updating, looks good to me.
>
> Best,
> Ron
>
> Xia Sun  于2024年4月18日周四 19:11写道:
>
> > Hi Ron,
> > Yes, presenting it in a table might be more intuitive. I have already
> added
> > the table in the "Public Interfaces | New Config Option" chapter of FLIP.
> > PTAL~
> >
> > Ron Liu  于2024年4月18日周四 18:10写道:
> >
> > > Hi, Xia
> > >
> > > Thanks for your reply.
> > >
> > > > That means, in terms
> > > of priority, `table.exec.hive.infer-source-parallelism` >
> > > `table.exec.hive.infer-source-parallelism.mode`.
> > >
> > > I still have some confusion, if the
> > > `table.exec.hive.infer-source-parallelism`
> > > >`table.exec.hive.infer-source-parallelism.mode`, currently
> > > `table.exec.hive.infer-source-parallelism` default value is true, that
> > > means always static parallelism inference work? Or perhaps after this
> > FLIP,
> > > we changed the default behavior of
> > > `table.exec.hive.infer-source-parallelism` to indicate dynamic
> > parallelism
> > > inference when enabled.
> > > I think you should list the various behaviors of these two options that
> > > coexist in FLIP by a table, only then users can know how the dynamic
> and
> > > static parallelism inference work.
> > >
> > > Best,
> > > Ron
> > >
> > > Xia Sun  于2024年4月18日周四 16:33写道:
> > >
> > > > Hi Ron and Lijie,
> > > > Thanks for joining the discussion and sharing your suggestions.
> > > >
> > > > > the InferMode class should also be introduced in the Public
> > Interfaces
> > > > > section!
> > > >
> > > >
> > > > Thanks for the reminder, I have now added the InferMode class to the
> > > Public
> > > > Interfaces section as well.
> > > >
> > > > > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked
> > > through
> > > > > the code that the default value is 1000?
> > > >
> > > >
> > > > I have checked and the default value of
> > > > `table.exec.hive.infer-source-parallelism.max` is indeed 1000. This
> has
> > > > been corrected in the FLIP.
> > > >
> > > > > how are`table.exec.hive.infer-source-parallelism` and
> > > > > `table.exec.hive.infer-source-parallelism.mode` compatible?
> > > >
> > > >
> > > > This is indeed a critical point. The current plan is to deprecate
> > > > `table.exec.hive.infer-source-parallelism` but still utilize it as
> the
> > > main
> > > > switch for enabling automatic parallelism inference. That means, in
> > terms
> > > > of priority, `table.exec.hive.infer-source-parallelism` >
> > > > `table.exec.hive.infer-source-parallelism.mode`. In future versions,
> if
> > > > `table.exec.hive.infer-source-parallelism` is removed, this logic
> will
> > > also
> > > > need to be revised, leaving only
> > > > `table.exec.hive.infer-source-parallelism.mode` as the basis for
> > deciding
> > > > whether to enable parallelism inference. I have also added this
> > > description
> > > > to the FLIP.
> > > >
> > > >
> > > > > In FLIP-367 it is supported to be able to set the Source's
> > parallelism
> > > > > individually, if in the future HiveSource also supports this
> feature,
> > > > > however, the default value of
> > > > > `table.exec.hive.infer-source-parallelism.mode` is
> > `InferMode.DYNAMIC`,
> > > > at
> > > > > this point will the parallelism be dynamically derived or will the
> > > > manually
> > > > > set parallelism take effect, and who has the higher priority?
> > > >
> > > >
> > > > From my understanding, 'manually set parallelism' has the higher
> > > priority,
> > > > just like one of the preconditions for the effectiveness of dynamic
> > > > parallelism inference in the AdaptiveBatchScheduler is that the
> > vertex's
> > > > parallelism isn't set. I believe whether it's static inference or
> > dynamic
> > > > inference, the manually set parallelism by the user should be
> > respected.
> > > >
> > > > > The `InferMode.NONE` option.
> > > >
> > > > Currently, 'adding InferMode.NONE' seems to be the prevailing
> opinion.
> > I
> > > > will add InferMode.NONE as one of the Enum options in InferMode
> class.
> > > >
> > > > Best,
> > > > Xia
> > > >
> > > > Lijie Wang  于2024年4月18日周四 13:50写道:
> > > >
> > > > > Thanks for driving the discussion.
> > > > >
> > > > > +1 for the proposal and +1 for the `InferMode.NONE` option.
> > > > >
> > > > > Best,
> > > > > Lijie
> > > > >
> > > > > Ron liu  于2024年4月18日周四 11:36写道:
> > > > >
> > > > > > Hi, Xia
> > > > > >
> > > > > > Thanks for driving this FLIP.
> > > > > >
> > > > > > This proposal looks good to me overall. However, I have the
> > following
> > > > > minor
> > > > > > questions:
> > > > > >
> > > > > > 1. FLIP introduced
> `table.exec.hive.infer-source-parallelism.mode`
> > > as a
> > > > > new
> > > > > > parameter, and the value is the enum class `InferMode`, I think
> the
> > > > > > InferMode class should also be introduced in 

Re: [DISCUSS] FLIP-438: Make Flink's Hadoop and YARN configuration probing consistent

2024-04-18 Thread Ferenc Csaky
Hi Venkata krishnan,

My general point was that personally I do not think that the
current implementation is wrong or confusing. And the main thing
here is that how we define consistent in this case is subjective.
>From the proposed point of view, consistent mean we use the same
prefix. But we can consistently use the least required prefix
groups to identify a subsystem property, which is how it works
right now. The prop naming conventions of these dependent systems
are different, so do their prefixes in Flink.

It is very possible I am in minority with my view,
but I do not think duplicating the `yarn` prefix to make it
conform with Hadoop props would be a better UX as it is now, just
different. 

One thing that less opinionated maybe is if the proposed solution
simplifies the property load logic. Currently, Hadoop props from
the Flink conf are parsed in `HadoopUtils` (flink-hadoop-fs
module), while Yarn props in `Utils` (flink-yarn module). Maybe
`org.apache.flink.configuration.Configuration` could have a helper
to extract all prefixed properties to a `Map`, or another
`Configuration` object (the latter could be easily added as a
resource to the dependent system config objects).

That simplification could make the overall prefixed load logic
more clean IMO and that is something that would be useful.

WDYT?

Best,
Ferenc





On Monday, April 15th, 2024 at 20:55, Venkatakrishnan Sowrirajan 
 wrote:

> 
> 
> Sorry for the late reply, Ferenc.
> 
> I understand the rationale behind the current implementation as the problem
> is slightly different b/w yarn (always prefixed with `yarn`) and hadoop (it
> is not guaranteed all `hadoop` configs will be prefixed by `hadoop`)
> configs.
> 
> From the dev UX perspective, it is confusing and only if you really pay
> close attention to the docs it is evident. I understand your point on added
> complexity till Flink-3.0 but if we agree it should be made consistent, it
> has to be done at some point of time right?
> 
> Regards
> Venkata krishnan
> 
> 
> On Wed, Apr 3, 2024 at 4:51 AM Ferenc Csaky ferenc.cs...@pm.me.invalid
> 
> wrote:
> 
> > Hi Venkata,
> > 
> > Thank you for opening the discussion about this!
> > 
> > After taking a look at the YARN and Hadoop configurations, the
> > reason why it was implemented this way is that, in case of YARN,
> > every YARN-specific property is prefixed with "yarn.", so to get
> > the final, YARN-side property it is enough to remove the "flink."
> > prefix.
> > 
> > In case of Hadoop, there are properties that not prefixed with
> > "hadoop.", e.g. "dfs.replication" so to identify and get the
> > Hadoop-side property it is necessary to duplicate the "hadoop" part
> > in the properties.
> > 
> > Taking this into consideration I would personally say -0 to this
> > change. IMO the current behavior can be justified as giving
> > slightly different solutions to slightly different problems, which
> > are well documented. Handling both prefixes would complicate the
> > parsing logic until the APIs can be removed, which as it looks at
> > the moment would only be possible in Flink 3.0, which probably will
> > not happen in the foreseeable future, so I do not see the benefit
> > of the added complexity.
> > 
> > Regarding the FLIP, in the "YARN configuration override example"
> > part, I think you should present an example that works correctly
> > at the moment: "flink.yarn.application.classpath" ->
> > "yarn.application.classpath".
> > 
> > Best,
> > Ferenc
> > 
> > On Friday, March 29th, 2024 at 23:45, Venkatakrishnan Sowrirajan <
> > vsowr...@asu.edu> wrote:
> > 
> > > Hi Flink devs,
> > > 
> > > I would like to start a discussion on FLIP-XXX: Make Flink's Hadoop and
> > > YARN configuration probing consistent
> > 
> > https://urldefense.com/v3/__https://docs.google.com/document/d/1I2jBFI0eVkofAVCAEeajNQRfOqKGJsRfZd54h79AIYc/edit?usp=sharing__;!!IKRxdwAv5BmarQ!d0XJO_mzLCJZNkrjJDMyRGP95zPLW8Cuym88l7CoAUG8aD_KRYJbll3K-q1Ypplyqe6-jcsWq3S8YJqrDMCpK4IhpT4cZPXy$
> > .
> > 
> > > This stems from an earlier discussion thread here
> > 
> > https://urldefense.com/v3/__https://lists.apache.org/thread/l2fh5shbf59fjgbt1h73pmmsqj038ppv__;!!IKRxdwAv5BmarQ!d0XJO_mzLCJZNkrjJDMyRGP95zPLW8Cuym88l7CoAUG8aD_KRYJbll3K-q1Ypplyqe6-jcsWq3S8YJqrDMCpK4IhpW60A99X$
> > .
> > 
> > > This FLIP is proposing to make the configuration probing behavior between
> > > Hadoop and YARN configuration to be consistent.
> > > 
> > > Regards
> > > Venkata krishnan


[jira] [Created] (FLINK-35159) CreatingExecutionGraph can leak CheckpointCoordinator and cause JM crash

2024-04-18 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-35159:


 Summary: CreatingExecutionGraph can leak CheckpointCoordinator and 
cause JM crash
 Key: FLINK-35159
 URL: https://issues.apache.org/jira/browse/FLINK-35159
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.18.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.18.2, 1.20.0, 1.19.1


When a task manager dies while the JM is generating an ExecutionGraph in the 
background then {{CreatingExecutionGraph#handleExecutionGraphCreation}} can 
transition back into WaitingForResources if the TM hosted one of the slots that 
we planned to use in {{tryToAssignSlots}}.

At this point the ExecutionGraph was already transitioned to running, which 
implicitly kicks of periodic checkpointing by the CheckpointCoordinator, 
without the operator coordinator holders being initialized yet (as this happens 
after we assigned slots).

This effectively leaks that CheckpointCoordinator, including the timer thread 
that will continue to try triggering checkpoints, which will naturally fail to 
trigger.
This can cause a JM crash because it results in 
{{OperatorCoordinatorHolder#abortCurrentTriggering}} to be called, which fails 
with an NPE since the {{mainThreadExecutor}} was not initialized yet.

{code}
java.util.concurrent.CompletionException: 
java.util.concurrent.CompletionException: java.lang.NullPointerException
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$startTriggeringCheckpoint$8(CheckpointCoordinator.java:707)
at 
java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986)
at 
java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970)
at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at 
java.base/java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:610)
at 
java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:910)
at 
java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.CompletionException: 
java.lang.NullPointerException
at 
java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
at 
java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319)
at 
java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:932)
at 
java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
... 7 more
Caused by: java.lang.NullPointerException
at 
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.abortCurrentTriggering(OperatorCoordinatorHolder.java:388)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
at 
java.base/java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1085)
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.onTriggerFailure(CheckpointCoordinator.java:985)
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.onTriggerFailure(CheckpointCoordinator.java:961)
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$startTriggeringCheckpoint$7(CheckpointCoordinator.java:693)
at 
java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
... 8 more
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35158) Error handling in StateFuture's callback

2024-04-18 Thread Yanfei Lei (Jira)
Yanfei Lei created FLINK-35158:
--

 Summary: Error handling in StateFuture's callback
 Key: FLINK-35158
 URL: https://issues.apache.org/jira/browse/FLINK-35158
 Project: Flink
  Issue Type: Sub-task
Reporter: Yanfei Lei






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release flink-connector-gcp-pubsub v3.1.0, release candidate #1

2024-04-18 Thread Ahmed Hamdy
Hi Danny,
+1 (non-binding)

-  verified hashes and checksums
- verified signature
- verified source contains no binaries
- tag exists in github
- reviewed web PR

Best Regards
Ahmed Hamdy


On Thu, 18 Apr 2024 at 11:32, Danny Cranmer  wrote:

> Hi everyone,
>
> Please review and vote on release candidate #1 for
> flink-connector-gcp-pubsub v3.1.0, as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> This release supports Flink 1.18 and 1.19.
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2],
> which are signed with the key with fingerprint 125FD8DB [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag v3.1.0-rc1 [5],
> * website pull request listing the new release [6].
> * CI build of the tag [7].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Danny
>
> [1]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353813
> [2]
>
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-gcp-pubsub-3.1.0-rc1
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4] https://repository.apache.org/content/repositories/orgapacheflink-1720
> [5]
>
> https://github.com/apache/flink-connector-gcp-pubsub/releases/tag/v3.1.0-rc1
> [6] https://github.com/apache/flink-web/pull/736/files
> [7]
>
> https://github.com/apache/flink-connector-gcp-pubsub/actions/runs/8735952883
>


[jira] [Created] (FLINK-35157) Sources with watermark alignment get stuck once some subtasks finish

2024-04-18 Thread Gyula Fora (Jira)
Gyula Fora created FLINK-35157:
--

 Summary: Sources with watermark alignment get stuck once some 
subtasks finish
 Key: FLINK-35157
 URL: https://issues.apache.org/jira/browse/FLINK-35157
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.18.1, 1.19.0, 1.17.2
Reporter: Gyula Fora


The current watermark alignment logic can easily get stuck if some subtasks 
finish while others are still running.

The reason is that once a source subtask finishes, the subtask is not excluded 
from alignment, effectively blocking the rest of the job to make progress 
beyond last wm + alignment time for the finished sources.

This can be easily reproduced by the following simple pipeline:
{noformat}
StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(2);
DataStream s = env.fromSource(new NumberSequenceSource(0, 100),

WatermarkStrategy.forMonotonousTimestamps().withTimestampAssigner((SerializableTimestampAssigner)
 (aLong, l) -> aLong).withWatermarkAlignment("g1", Duration.ofMillis(10), 
Duration.ofSeconds(2)),
"Sequence Source").filter((FilterFunction) aLong -> {
Thread.sleep(200);
return true;
}
);

s.print();
env.execute();{noformat}
The solution could be to send out a max watermark event once the sources finish 
or to exclude them from the source coordinator



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource

2024-04-18 Thread Ron Liu
Hi, Xia

Thanks for updating, looks good to me.

Best,
Ron

Xia Sun  于2024年4月18日周四 19:11写道:

> Hi Ron,
> Yes, presenting it in a table might be more intuitive. I have already added
> the table in the "Public Interfaces | New Config Option" chapter of FLIP.
> PTAL~
>
> Ron Liu  于2024年4月18日周四 18:10写道:
>
> > Hi, Xia
> >
> > Thanks for your reply.
> >
> > > That means, in terms
> > of priority, `table.exec.hive.infer-source-parallelism` >
> > `table.exec.hive.infer-source-parallelism.mode`.
> >
> > I still have some confusion, if the
> > `table.exec.hive.infer-source-parallelism`
> > >`table.exec.hive.infer-source-parallelism.mode`, currently
> > `table.exec.hive.infer-source-parallelism` default value is true, that
> > means always static parallelism inference work? Or perhaps after this
> FLIP,
> > we changed the default behavior of
> > `table.exec.hive.infer-source-parallelism` to indicate dynamic
> parallelism
> > inference when enabled.
> > I think you should list the various behaviors of these two options that
> > coexist in FLIP by a table, only then users can know how the dynamic and
> > static parallelism inference work.
> >
> > Best,
> > Ron
> >
> > Xia Sun  于2024年4月18日周四 16:33写道:
> >
> > > Hi Ron and Lijie,
> > > Thanks for joining the discussion and sharing your suggestions.
> > >
> > > > the InferMode class should also be introduced in the Public
> Interfaces
> > > > section!
> > >
> > >
> > > Thanks for the reminder, I have now added the InferMode class to the
> > Public
> > > Interfaces section as well.
> > >
> > > > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked
> > through
> > > > the code that the default value is 1000?
> > >
> > >
> > > I have checked and the default value of
> > > `table.exec.hive.infer-source-parallelism.max` is indeed 1000. This has
> > > been corrected in the FLIP.
> > >
> > > > how are`table.exec.hive.infer-source-parallelism` and
> > > > `table.exec.hive.infer-source-parallelism.mode` compatible?
> > >
> > >
> > > This is indeed a critical point. The current plan is to deprecate
> > > `table.exec.hive.infer-source-parallelism` but still utilize it as the
> > main
> > > switch for enabling automatic parallelism inference. That means, in
> terms
> > > of priority, `table.exec.hive.infer-source-parallelism` >
> > > `table.exec.hive.infer-source-parallelism.mode`. In future versions, if
> > > `table.exec.hive.infer-source-parallelism` is removed, this logic will
> > also
> > > need to be revised, leaving only
> > > `table.exec.hive.infer-source-parallelism.mode` as the basis for
> deciding
> > > whether to enable parallelism inference. I have also added this
> > description
> > > to the FLIP.
> > >
> > >
> > > > In FLIP-367 it is supported to be able to set the Source's
> parallelism
> > > > individually, if in the future HiveSource also supports this feature,
> > > > however, the default value of
> > > > `table.exec.hive.infer-source-parallelism.mode` is
> `InferMode.DYNAMIC`,
> > > at
> > > > this point will the parallelism be dynamically derived or will the
> > > manually
> > > > set parallelism take effect, and who has the higher priority?
> > >
> > >
> > > From my understanding, 'manually set parallelism' has the higher
> > priority,
> > > just like one of the preconditions for the effectiveness of dynamic
> > > parallelism inference in the AdaptiveBatchScheduler is that the
> vertex's
> > > parallelism isn't set. I believe whether it's static inference or
> dynamic
> > > inference, the manually set parallelism by the user should be
> respected.
> > >
> > > > The `InferMode.NONE` option.
> > >
> > > Currently, 'adding InferMode.NONE' seems to be the prevailing opinion.
> I
> > > will add InferMode.NONE as one of the Enum options in InferMode class.
> > >
> > > Best,
> > > Xia
> > >
> > > Lijie Wang  于2024年4月18日周四 13:50写道:
> > >
> > > > Thanks for driving the discussion.
> > > >
> > > > +1 for the proposal and +1 for the `InferMode.NONE` option.
> > > >
> > > > Best,
> > > > Lijie
> > > >
> > > > Ron liu  于2024年4月18日周四 11:36写道:
> > > >
> > > > > Hi, Xia
> > > > >
> > > > > Thanks for driving this FLIP.
> > > > >
> > > > > This proposal looks good to me overall. However, I have the
> following
> > > > minor
> > > > > questions:
> > > > >
> > > > > 1. FLIP introduced `table.exec.hive.infer-source-parallelism.mode`
> > as a
> > > > new
> > > > > parameter, and the value is the enum class `InferMode`, I think the
> > > > > InferMode class should also be introduced in the Public Interfaces
> > > > section!
> > > > > 2. You mentioned in FLIP that the default value of
> > > > > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked
> > > through
> > > > > the code that the default value is 1000?
> > > > > 3. I also agree with Muhammet's idea that there is no need to
> > introduce
> > > > the
> > > > > option `table.exec.hive.infer-source-parallelism.enabled`, and that
> > > > > expanding the InferMode values will fulfill the need. There is
> > 

Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource

2024-04-18 Thread Xia Sun
Hi Ron,
Yes, presenting it in a table might be more intuitive. I have already added
the table in the "Public Interfaces | New Config Option" chapter of FLIP.
PTAL~

Ron Liu  于2024年4月18日周四 18:10写道:

> Hi, Xia
>
> Thanks for your reply.
>
> > That means, in terms
> of priority, `table.exec.hive.infer-source-parallelism` >
> `table.exec.hive.infer-source-parallelism.mode`.
>
> I still have some confusion, if the
> `table.exec.hive.infer-source-parallelism`
> >`table.exec.hive.infer-source-parallelism.mode`, currently
> `table.exec.hive.infer-source-parallelism` default value is true, that
> means always static parallelism inference work? Or perhaps after this FLIP,
> we changed the default behavior of
> `table.exec.hive.infer-source-parallelism` to indicate dynamic parallelism
> inference when enabled.
> I think you should list the various behaviors of these two options that
> coexist in FLIP by a table, only then users can know how the dynamic and
> static parallelism inference work.
>
> Best,
> Ron
>
> Xia Sun  于2024年4月18日周四 16:33写道:
>
> > Hi Ron and Lijie,
> > Thanks for joining the discussion and sharing your suggestions.
> >
> > > the InferMode class should also be introduced in the Public Interfaces
> > > section!
> >
> >
> > Thanks for the reminder, I have now added the InferMode class to the
> Public
> > Interfaces section as well.
> >
> > > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked
> through
> > > the code that the default value is 1000?
> >
> >
> > I have checked and the default value of
> > `table.exec.hive.infer-source-parallelism.max` is indeed 1000. This has
> > been corrected in the FLIP.
> >
> > > how are`table.exec.hive.infer-source-parallelism` and
> > > `table.exec.hive.infer-source-parallelism.mode` compatible?
> >
> >
> > This is indeed a critical point. The current plan is to deprecate
> > `table.exec.hive.infer-source-parallelism` but still utilize it as the
> main
> > switch for enabling automatic parallelism inference. That means, in terms
> > of priority, `table.exec.hive.infer-source-parallelism` >
> > `table.exec.hive.infer-source-parallelism.mode`. In future versions, if
> > `table.exec.hive.infer-source-parallelism` is removed, this logic will
> also
> > need to be revised, leaving only
> > `table.exec.hive.infer-source-parallelism.mode` as the basis for deciding
> > whether to enable parallelism inference. I have also added this
> description
> > to the FLIP.
> >
> >
> > > In FLIP-367 it is supported to be able to set the Source's parallelism
> > > individually, if in the future HiveSource also supports this feature,
> > > however, the default value of
> > > `table.exec.hive.infer-source-parallelism.mode` is `InferMode.DYNAMIC`,
> > at
> > > this point will the parallelism be dynamically derived or will the
> > manually
> > > set parallelism take effect, and who has the higher priority?
> >
> >
> > From my understanding, 'manually set parallelism' has the higher
> priority,
> > just like one of the preconditions for the effectiveness of dynamic
> > parallelism inference in the AdaptiveBatchScheduler is that the vertex's
> > parallelism isn't set. I believe whether it's static inference or dynamic
> > inference, the manually set parallelism by the user should be respected.
> >
> > > The `InferMode.NONE` option.
> >
> > Currently, 'adding InferMode.NONE' seems to be the prevailing opinion. I
> > will add InferMode.NONE as one of the Enum options in InferMode class.
> >
> > Best,
> > Xia
> >
> > Lijie Wang  于2024年4月18日周四 13:50写道:
> >
> > > Thanks for driving the discussion.
> > >
> > > +1 for the proposal and +1 for the `InferMode.NONE` option.
> > >
> > > Best,
> > > Lijie
> > >
> > > Ron liu  于2024年4月18日周四 11:36写道:
> > >
> > > > Hi, Xia
> > > >
> > > > Thanks for driving this FLIP.
> > > >
> > > > This proposal looks good to me overall. However, I have the following
> > > minor
> > > > questions:
> > > >
> > > > 1. FLIP introduced `table.exec.hive.infer-source-parallelism.mode`
> as a
> > > new
> > > > parameter, and the value is the enum class `InferMode`, I think the
> > > > InferMode class should also be introduced in the Public Interfaces
> > > section!
> > > > 2. You mentioned in FLIP that the default value of
> > > > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked
> > through
> > > > the code that the default value is 1000?
> > > > 3. I also agree with Muhammet's idea that there is no need to
> introduce
> > > the
> > > > option `table.exec.hive.infer-source-parallelism.enabled`, and that
> > > > expanding the InferMode values will fulfill the need. There is
> another
> > > > issue to consider here though, how are
> > > > `table.exec.hive.infer-source-parallelism` and
> > > > `table.exec.hive.infer-source-parallelism.mode` compatible?
> > > > 4. In FLIP-367 it is supported to be able to set the Source's
> > parallelism
> > > > individually, if in the future HiveSource also supports this feature,
> > > > however, the default value 

[VOTE] Release flink-connector-gcp-pubsub v3.1.0, release candidate #1

2024-04-18 Thread Danny Cranmer
Hi everyone,

Please review and vote on release candidate #1 for
flink-connector-gcp-pubsub v3.1.0, as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

This release supports Flink 1.18 and 1.19.

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which are signed with the key with fingerprint 125FD8DB [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag v3.1.0-rc1 [5],
* website pull request listing the new release [6].
* CI build of the tag [7].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Danny

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353813
[2]
https://dist.apache.org/repos/dist/dev/flink/flink-connector-gcp-pubsub-3.1.0-rc1
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1720
[5]
https://github.com/apache/flink-connector-gcp-pubsub/releases/tag/v3.1.0-rc1
[6] https://github.com/apache/flink-web/pull/736/files
[7]
https://github.com/apache/flink-connector-gcp-pubsub/actions/runs/8735952883


Re: [VOTE] Release flink-connector-jdbc v3.2.0, release candidate #2

2024-04-18 Thread Danny Cranmer
Sorry for typos:

> Please review and vote on the release candidate #1 for the version 3.2.0,
as follows:
Should be "release candidate #2"

> * source code tag v3.2.0-rc1 [5],
Should be "source code tag v3.2.0-rc2"

Thanks,
Danny

On Thu, Apr 18, 2024 at 11:19 AM Danny Cranmer 
wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #1 for the version 3.2.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> This release supports Flink 1.18 and 1.19.
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2], which are signed with the key with fingerprint 125FD8DB [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag v3.2.0-rc1 [5],
> * website pull request listing the new release [6].
> * CI run of tag [7].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Danny
>
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353143
> [2]
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-jdbc-3.2.0-rc2
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1718/
> [5] https://github.com/apache/flink-connector-jdbc/releases/tag/v3.2.0-rc2
> [6] https://github.com/apache/flink-web/pull/734
> [7] https://github.com/apache/flink-connector-jdbc/actions/runs/8736019099
>


[VOTE] Release flink-connector-mongodb v1.2.0, release candidate #2

2024-04-18 Thread Danny Cranmer
Hi everyone,

Please review and vote on the release candidate #2 for v1.2.0, as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

This release supports Flink 1.18 and 1.19.

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which are signed with the key with fingerprint 125FD8DB [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag v1.2.0-rc2 [5],
* website pull request listing the new release [6].
* CI build of tag [7].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Danny

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354192
[2]
https://dist.apache.org/repos/dist/dev/flink/flink-connector-mongodb-1.2.0-rc2
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1719/
[5]
https://github.com/apache/flink-connector-mongodb/releases/tag/v1.2.0-rc2
[6] https://github.com/apache/flink-web/pull/735
[7]
https://github.com/apache/flink-connector-mongodb/actions/runs/8735987710


[VOTE] Release flink-connector-jdbc v3.2.0, release candidate #2

2024-04-18 Thread Danny Cranmer
Hi everyone,

Please review and vote on the release candidate #1 for the version 3.2.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

This release supports Flink 1.18 and 1.19.

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which are signed with the key with fingerprint 125FD8DB [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag v3.2.0-rc1 [5],
* website pull request listing the new release [6].
* CI run of tag [7].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Danny

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353143
[2]
https://dist.apache.org/repos/dist/dev/flink/flink-connector-jdbc-3.2.0-rc2
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1718/
[5] https://github.com/apache/flink-connector-jdbc/releases/tag/v3.2.0-rc2
[6] https://github.com/apache/flink-web/pull/734
[7] https://github.com/apache/flink-connector-jdbc/actions/runs/8736019099


[jira] [Created] (FLINK-35156) Wire new operators for async state with DataStream V2

2024-04-18 Thread Zakelly Lan (Jira)
Zakelly Lan created FLINK-35156:
---

 Summary: Wire new operators for async state with DataStream V2
 Key: FLINK-35156
 URL: https://issues.apache.org/jira/browse/FLINK-35156
 Project: Flink
  Issue Type: Sub-task
Reporter: Zakelly Lan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource

2024-04-18 Thread Ron Liu
Hi, Xia

Thanks for your reply.

> That means, in terms
of priority, `table.exec.hive.infer-source-parallelism` >
`table.exec.hive.infer-source-parallelism.mode`.

I still have some confusion, if the
`table.exec.hive.infer-source-parallelism`
>`table.exec.hive.infer-source-parallelism.mode`, currently
`table.exec.hive.infer-source-parallelism` default value is true, that
means always static parallelism inference work? Or perhaps after this FLIP,
we changed the default behavior of
`table.exec.hive.infer-source-parallelism` to indicate dynamic parallelism
inference when enabled.
I think you should list the various behaviors of these two options that
coexist in FLIP by a table, only then users can know how the dynamic and
static parallelism inference work.

Best,
Ron

Xia Sun  于2024年4月18日周四 16:33写道:

> Hi Ron and Lijie,
> Thanks for joining the discussion and sharing your suggestions.
>
> > the InferMode class should also be introduced in the Public Interfaces
> > section!
>
>
> Thanks for the reminder, I have now added the InferMode class to the Public
> Interfaces section as well.
>
> > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked through
> > the code that the default value is 1000?
>
>
> I have checked and the default value of
> `table.exec.hive.infer-source-parallelism.max` is indeed 1000. This has
> been corrected in the FLIP.
>
> > how are`table.exec.hive.infer-source-parallelism` and
> > `table.exec.hive.infer-source-parallelism.mode` compatible?
>
>
> This is indeed a critical point. The current plan is to deprecate
> `table.exec.hive.infer-source-parallelism` but still utilize it as the main
> switch for enabling automatic parallelism inference. That means, in terms
> of priority, `table.exec.hive.infer-source-parallelism` >
> `table.exec.hive.infer-source-parallelism.mode`. In future versions, if
> `table.exec.hive.infer-source-parallelism` is removed, this logic will also
> need to be revised, leaving only
> `table.exec.hive.infer-source-parallelism.mode` as the basis for deciding
> whether to enable parallelism inference. I have also added this description
> to the FLIP.
>
>
> > In FLIP-367 it is supported to be able to set the Source's parallelism
> > individually, if in the future HiveSource also supports this feature,
> > however, the default value of
> > `table.exec.hive.infer-source-parallelism.mode` is `InferMode.DYNAMIC`,
> at
> > this point will the parallelism be dynamically derived or will the
> manually
> > set parallelism take effect, and who has the higher priority?
>
>
> From my understanding, 'manually set parallelism' has the higher priority,
> just like one of the preconditions for the effectiveness of dynamic
> parallelism inference in the AdaptiveBatchScheduler is that the vertex's
> parallelism isn't set. I believe whether it's static inference or dynamic
> inference, the manually set parallelism by the user should be respected.
>
> > The `InferMode.NONE` option.
>
> Currently, 'adding InferMode.NONE' seems to be the prevailing opinion. I
> will add InferMode.NONE as one of the Enum options in InferMode class.
>
> Best,
> Xia
>
> Lijie Wang  于2024年4月18日周四 13:50写道:
>
> > Thanks for driving the discussion.
> >
> > +1 for the proposal and +1 for the `InferMode.NONE` option.
> >
> > Best,
> > Lijie
> >
> > Ron liu  于2024年4月18日周四 11:36写道:
> >
> > > Hi, Xia
> > >
> > > Thanks for driving this FLIP.
> > >
> > > This proposal looks good to me overall. However, I have the following
> > minor
> > > questions:
> > >
> > > 1. FLIP introduced `table.exec.hive.infer-source-parallelism.mode` as a
> > new
> > > parameter, and the value is the enum class `InferMode`, I think the
> > > InferMode class should also be introduced in the Public Interfaces
> > section!
> > > 2. You mentioned in FLIP that the default value of
> > > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked
> through
> > > the code that the default value is 1000?
> > > 3. I also agree with Muhammet's idea that there is no need to introduce
> > the
> > > option `table.exec.hive.infer-source-parallelism.enabled`, and that
> > > expanding the InferMode values will fulfill the need. There is another
> > > issue to consider here though, how are
> > > `table.exec.hive.infer-source-parallelism` and
> > > `table.exec.hive.infer-source-parallelism.mode` compatible?
> > > 4. In FLIP-367 it is supported to be able to set the Source's
> parallelism
> > > individually, if in the future HiveSource also supports this feature,
> > > however, the default value of
> > > `table.exec.hive.infer-source-parallelism.mode` is `InferMode.
> DYNAMIC`,
> > at
> > > this point will the parallelism be dynamically derived or will the
> > manually
> > > set parallelism take effect, and who has the higher priority?
> > >
> > > Best,
> > > Ron
> > >
> > > Xia Sun  于2024年4月17日周三 12:08写道:
> > >
> > > > Hi Jeyhun, Muhammet,
> > > > Thanks for all the feedback!
> > > >
> > > > > Could you please mention the default values for 

[jira] [Created] (FLINK-35155) Introduce TableRuntimeException

2024-04-18 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-35155:


 Summary: Introduce TableRuntimeException
 Key: FLINK-35155
 URL: https://issues.apache.org/jira/browse/FLINK-35155
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Runtime
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.20.0


The `throwException` internal function throws a {{RuntimeException}}. It would 
be nice to have a specific kind of exception thrown from there, so that it's 
easier to classify those.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release flink-connector-mongodb v1.2.0, release candidate #1

2024-04-18 Thread Danny Cranmer
-1 (binding)

There is a bug [1] in the connector release utils resulting in the build
tools [2] not being copied into the source archive. As a result the source
archive fails build due to checkstyle errors.

This vote is now closed, I will follow up with an RC2.

Thanks,
Danny

[1] https://issues.apache.org/jira/browse/FLINK-35124
[2] https://github.com/apache/flink-connector-jdbc/tree/main/tools

On Wed, Apr 17, 2024 at 2:31 PM Jeyhun Karimov  wrote:

> Thanks for driving this.
>
> +1 (non-binding)
>
> - Validated checksum hash
> - Verified signature
> - Tag is present
> - Reviewed web PR
>
> Regards,
> Jeyhun
>
> On Wed, Apr 17, 2024 at 3:26 PM gongzhongqiang 
> wrote:
>
> > +1 (non-binding)
> >
> > - Flink website pr reviewed
> > - Check source code without binary files
> > - Validated checksum hash and signature
> > - CI passed with tag v1.2.0-rc1
> >
> > Best,
> > Zhongqiang Gong
> >
> > Danny Cranmer  于2024年4月17日周三 18:44写道:
> >
> > > Hi everyone,
> > >
> > > Please review and vote on the release candidate #1 for v1.2.0, as
> > follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > This release supports Flink 1.18 and 1.19.
> > >
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source release to be deployed to dist.apache.org
> > > [2],
> > > which are signed with the key with fingerprint 125FD8DB [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * source code tag v1.2.0-rc1 [5],
> > > * website pull request listing the new release [6].
> > > * CI build of tag [7].
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval, with at least 3 PMC affirmative votes.
> > >
> > > Thanks,
> > > Danny
> > >
> > > [1]
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354192
> > > [2]
> > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-mongodb-1.2.0-rc1
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > >
> https://repository.apache.org/content/repositories/orgapacheflink-1715/
> > > [5]
> > >
> >
> https://github.com/apache/flink-connector-mongodb/releases/tag/v1.2.0-rc1
> > > [6] https://github.com/apache/flink-web/pull/735
> > > [7]
> > >
> >
> https://github.com/apache/flink-connector-mongodb/actions/runs/8720057880
> > >
> >
>


Re: [VOTE] Release flink-connector-jdbc v3.2.0, release candidate #1

2024-04-18 Thread Danny Cranmer
-1 (binding)

There is a bug [1] in the connector release utils resulting in the build
tools [2] not being copied into the source archive. As a result the source
archive fails build due to checkstyle errors.

This vote is now closed, I will follow up with an RC2.

Thanks,
Danny

[1] https://issues.apache.org/jira/browse/FLINK-35124
[2] https://github.com/apache/flink-connector-jdbc/tree/main/tools

On Thu, Apr 18, 2024 at 4:02 AM Yuepeng Pan  wrote:

> +1 (non-binding)
>
> - Checked source-build for source code tag v3.2.0-rc1
> - Did test for some examples with mysql 5.8.
> - Checked release note page.
>
>
> Thanks for driving it!
>
>
> Best,
> Yuepeng Pan
>
>
>
>
> At 2024-04-17 18:02:06, "Danny Cranmer"  wrote:
> >Hi everyone,
> >
> >Please review and vote on the release candidate #1 for the version 3.2.0,
> >as follows:
> >[ ] +1, Approve the release
> >[ ] -1, Do not approve the release (please provide specific comments)
> >
> >This release supports Flink 1.18 and 1.19.
> >
> >The complete staging area is available for your review, which includes:
> >* JIRA release notes [1],
> >* the official Apache source release to be deployed to dist.apache.org
> [2],
> >which are signed with the key with fingerprint 125FD8DB [3],
> >* all artifacts to be deployed to the Maven Central Repository [4],
> >* source code tag v3.2.0-rc1 [5],
> >* website pull request listing the new release [6].
> >* CI run of tag [7].
> >
> >The vote will be open for at least 72 hours. It is adopted by majority
> >approval, with at least 3 PMC affirmative votes.
> >
> >Thanks,
> >Danny
> >
> >[1]
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353143
> >[2]
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-jdbc-3.2.0-rc1
> >[3] https://dist.apache.org/repos/dist/release/flink/KEYS
> >[4]
> https://repository.apache.org/content/repositories/orgapacheflink-1714/
> >[5]
> https://github.com/apache/flink-connector-jdbc/releases/tag/v3.2.0-rc1
> >[6] https://github.com/apache/flink-web/pull/734
> >[7]
> https://github.com/apache/flink-connector-jdbc/actions/runs/8719743185
>


Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource

2024-04-18 Thread Xia Sun
Hi Ron and Lijie,
Thanks for joining the discussion and sharing your suggestions.

> the InferMode class should also be introduced in the Public Interfaces
> section!


Thanks for the reminder, I have now added the InferMode class to the Public
Interfaces section as well.

> `table.exec.hive.infer-source-parallelism.max` is 1024, I checked through
> the code that the default value is 1000?


I have checked and the default value of
`table.exec.hive.infer-source-parallelism.max` is indeed 1000. This has
been corrected in the FLIP.

> how are`table.exec.hive.infer-source-parallelism` and
> `table.exec.hive.infer-source-parallelism.mode` compatible?


This is indeed a critical point. The current plan is to deprecate
`table.exec.hive.infer-source-parallelism` but still utilize it as the main
switch for enabling automatic parallelism inference. That means, in terms
of priority, `table.exec.hive.infer-source-parallelism` >
`table.exec.hive.infer-source-parallelism.mode`. In future versions, if
`table.exec.hive.infer-source-parallelism` is removed, this logic will also
need to be revised, leaving only
`table.exec.hive.infer-source-parallelism.mode` as the basis for deciding
whether to enable parallelism inference. I have also added this description
to the FLIP.


> In FLIP-367 it is supported to be able to set the Source's parallelism
> individually, if in the future HiveSource also supports this feature,
> however, the default value of
> `table.exec.hive.infer-source-parallelism.mode` is `InferMode.DYNAMIC`, at
> this point will the parallelism be dynamically derived or will the manually
> set parallelism take effect, and who has the higher priority?


>From my understanding, 'manually set parallelism' has the higher priority,
just like one of the preconditions for the effectiveness of dynamic
parallelism inference in the AdaptiveBatchScheduler is that the vertex's
parallelism isn't set. I believe whether it's static inference or dynamic
inference, the manually set parallelism by the user should be respected.

> The `InferMode.NONE` option.

Currently, 'adding InferMode.NONE' seems to be the prevailing opinion. I
will add InferMode.NONE as one of the Enum options in InferMode class.

Best,
Xia

Lijie Wang  于2024年4月18日周四 13:50写道:

> Thanks for driving the discussion.
>
> +1 for the proposal and +1 for the `InferMode.NONE` option.
>
> Best,
> Lijie
>
> Ron liu  于2024年4月18日周四 11:36写道:
>
> > Hi, Xia
> >
> > Thanks for driving this FLIP.
> >
> > This proposal looks good to me overall. However, I have the following
> minor
> > questions:
> >
> > 1. FLIP introduced `table.exec.hive.infer-source-parallelism.mode` as a
> new
> > parameter, and the value is the enum class `InferMode`, I think the
> > InferMode class should also be introduced in the Public Interfaces
> section!
> > 2. You mentioned in FLIP that the default value of
> > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked through
> > the code that the default value is 1000?
> > 3. I also agree with Muhammet's idea that there is no need to introduce
> the
> > option `table.exec.hive.infer-source-parallelism.enabled`, and that
> > expanding the InferMode values will fulfill the need. There is another
> > issue to consider here though, how are
> > `table.exec.hive.infer-source-parallelism` and
> > `table.exec.hive.infer-source-parallelism.mode` compatible?
> > 4. In FLIP-367 it is supported to be able to set the Source's parallelism
> > individually, if in the future HiveSource also supports this feature,
> > however, the default value of
> > `table.exec.hive.infer-source-parallelism.mode` is `InferMode. DYNAMIC`,
> at
> > this point will the parallelism be dynamically derived or will the
> manually
> > set parallelism take effect, and who has the higher priority?
> >
> > Best,
> > Ron
> >
> > Xia Sun  于2024年4月17日周三 12:08写道:
> >
> > > Hi Jeyhun, Muhammet,
> > > Thanks for all the feedback!
> > >
> > > > Could you please mention the default values for the new
> configurations
> > > > (e.g., table.exec.hive.infer-source-parallelism.mode,
> > > > table.exec.hive.infer-source-parallelism.enabled,
> > > > etc) ?
> > >
> > >
> > > Thanks for your suggestion. I have supplemented the explanation
> regarding
> > > the default values.
> > >
> > > > Since we are introducing the mode as a configuration option,
> > > > could it make sense to have `InferMode.NONE` option also?
> > > > The `NONE` option would disable the inference.
> > >
> > >
> > > This is a good idea. Looking ahead, it could eliminate the need for
> > > introducing
> > > a new configuration option. I haven't identified any potential
> > > compatibility issues
> > > as yet. If there are no further ideas from others, I'll go ahead and
> > update
> > > the FLIP to
> > > introducing InferMode.NONE.
> > >
> > > Best,
> > > Xia
> > >
> > > Muhammet Orazov  于2024年4月17日周三 10:31写道:
> > >
> > > > Hello Xia,
> > > >
> > > > Thanks for the FLIP!
> > > >
> > > > Since we are introducing 

[jira] [Created] (FLINK-35154) Javadoc aggregate fails

2024-04-18 Thread Dmitriy Linevich (Jira)
Dmitriy Linevich created FLINK-35154:


 Summary: Javadoc aggregate fails
 Key: FLINK-35154
 URL: https://issues.apache.org/jira/browse/FLINK-35154
 Project: Flink
  Issue Type: Bug
Reporter: Dmitriy Linevich


Javadoc plugin fails with error cannot find symbol. Using
{code:java}
javadoc:aggregate{code}
ERROR:

!image-2024-04-18-15-20-56-467.png!

 

Need to increase version of javadoc plugin from 2.9.1 to 2.10.4 (In current 
version exists bug same with 
[this|https://github.com/checkstyle/checkstyle/issues/291])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35153) Internal Async State Implementation and StateDescriptor for Map/List State

2024-04-18 Thread Zakelly Lan (Jira)
Zakelly Lan created FLINK-35153:
---

 Summary: Internal Async State Implementation and StateDescriptor 
for Map/List State
 Key: FLINK-35153
 URL: https://issues.apache.org/jira/browse/FLINK-35153
 Project: Flink
  Issue Type: Bug
Reporter: Zakelly Lan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ DISCUSS ] FLIP-XXX : [Plugin] Enhancing Flink Failure Management in Kubernetes with Dynamic Termination Log Integration

2024-04-18 Thread Gyula Fóra
Hi Swathi!

Thank you for creating this proposal. I really like the general idea of
increasing the K8s native observability of Flink job errors.

I took a quick look at your reference PR, the termination log related logic
is contained completely in the ClusterEntrypoint. What type of errors will
this actually cover?

To me this seems to cover only:
 - Job main class errors (ie startup errors)
 - JobManager failures

Would regular job errors (that cause only job failover but not JM errors)
be reported somehow with this plugin?

Thanks
Gyula

On Tue, Apr 16, 2024 at 8:21 AM Swathi C  wrote:

> Hi All,
>
> I would like to start a discussion on FLIP-XXX : [Plugin] Enhancing Flink
> Failure Management in Kubernetes with Dynamic Termination Log Integration.
>
>
> https://docs.google.com/document/d/1tWR0Fi3w7VQeD_9VUORh8EEOva3q-V0XhymTkNaXHOc/edit?usp=sharing
>
>
> This FLIP proposes an improvement plugin and focuses mainly on Flink on
> K8S but can be used as a generic plugin and add further enhancements.
>
> Looking forward to everyone's feedback and suggestions. Thank you !!
>
> Best Regards,
> Swathi Chandrashekar
>


Re: [ANNOUNCE] New Apache Flink Committer - Zakelly Lan

2024-04-18 Thread xiangyu feng
Congratulations, Zakelly!


Regards,
Xiangyu Feng

yh z  于2024年4月18日周四 14:27写道:

> Congratulations Zakelly!
>
> Best regards,
> Yunhong (swuferhong)
>
> gongzhongqiang  于2024年4月17日周三 21:26写道:
>
> > Congratulations, Zakelly!
> >
> >
> > Best,
> > Zhongqiang Gong
> >
> > Yuan Mei  于2024年4月15日周一 10:51写道:
> >
> > > Hi everyone,
> > >
> > > On behalf of the PMC, I'm happy to let you know that Zakelly Lan has
> > become
> > > a new Flink Committer!
> > >
> > > Zakelly has been continuously contributing to the Flink project since
> > 2020,
> > > with a focus area on Checkpointing, State as well as frocksdb (the
> > default
> > > on-disk state db).
> > >
> > > He leads several FLIPs to improve checkpoints and state APIs, including
> > > File Merging for Checkpoints and configuration/API reorganizations. He
> is
> > > also one of the main contributors to the recent efforts of
> "disaggregated
> > > state management for Flink 2.0" and drives the entire discussion in the
> > > mailing thread, demonstrating outstanding technical depth and breadth
> of
> > > knowledge.
> > >
> > > Beyond his technical contributions, Zakelly is passionate about helping
> > the
> > > community in numerous ways. He spent quite some time setting up the
> Flink
> > > Speed Center and rebuilding the benchmark pipeline after the original
> one
> > > was out of lease. He helps build frocksdb and tests for the upcoming
> > > frocksdb release (bump rocksdb from 6.20.3->8.10).
> > >
> > > Please join me in congratulating Zakelly for becoming an Apache Flink
> > > committer!
> > >
> > > Best,
> > > Yuan (on behalf of the Flink PMC)
> > >
> >
>


[jira] [Created] (FLINK-35152) Flink CDC Doris/Starrocks Sink Auto create table event should support setting auto partition fields for each table

2024-04-18 Thread tumengyao (Jira)
tumengyao created FLINK-35152:
-

 Summary: Flink CDC  Doris/Starrocks Sink Auto create table event 
should support setting auto partition fields for each table
 Key: FLINK-35152
 URL: https://issues.apache.org/jira/browse/FLINK-35152
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Affects Versions: 3.1.0
Reporter: tumengyao


In some scenarios, when creating a physical table in Doris, appropriate 
partition fields need to be selected to speed up the efficiency of data query 
and calculation. In addition, partition tables support more applications, such 
as hot and cold data layering and so on.
The current Flink CDC Doris Sink's create table event creates a table with no 
partitions set.
The Auto Partition function supported by doris 2.1.x simplifies the creation 
and management of partitions. We just need to add some configuration items to 
the Flink CDC job. To tell Flink CDC which fields Doris Sink will use in the 
create table event to create partitions, you can get a partition table in Doris.

Here's an example:
source: Mysql
source_table:
CREATE TABLE table1 (
col1 INT AUTO_INCREMENT PRIMARY KEY,
col2 DECIMAL(18, 2),
col3 VARCHAR(500),
col4 TEXT,
col5 DATETIME DEFAULT CURRENT_TIMESTAMP
);
If you want to specify the partition of table test.table1, you need to add 
sink-table-partition-keys and sink-table-partition-type information to the 
mysql_to_doris
route:
- source-table: test.table1
sink-table:ods.ods_table1
sink-table-partition-key:col5
sink-table-partition-func-call-expr:date_trunc(`col5`, 'month')
sink-table-partition-type:auto range

The auto range partition in Doris 2.1.x does not support null partitions. So 
you need to set test.table1.col5 == null then '1990-01-01 00:00:00' else 
test.table1.col5 end

Now after submitting the mysql_to_doris.ymal Flink CDC job, an ods.ods_table1 
data table should appear in the Doris database
The data table DDL is as follows:
CREATE TABLE table1 (
col1 INT ,
col5 DATETIME not null,
col2 DECIMAL(18, 2),
col3 VARCHAR(500),
col4 TEXT
) unique KEY(`col1`,`col5`)
AUTO PARTITION BY RANGE date_trunc(`col5`, 'month')()
DISTRIBUTED BY HASH (`id`) BUCKETS AUTO
PROPERTIES (
...
);



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New Apache Flink Committer - Zakelly Lan

2024-04-18 Thread yh z
Congratulations Zakelly!

Best regards,
Yunhong (swuferhong)

gongzhongqiang  于2024年4月17日周三 21:26写道:

> Congratulations, Zakelly!
>
>
> Best,
> Zhongqiang Gong
>
> Yuan Mei  于2024年4月15日周一 10:51写道:
>
> > Hi everyone,
> >
> > On behalf of the PMC, I'm happy to let you know that Zakelly Lan has
> become
> > a new Flink Committer!
> >
> > Zakelly has been continuously contributing to the Flink project since
> 2020,
> > with a focus area on Checkpointing, State as well as frocksdb (the
> default
> > on-disk state db).
> >
> > He leads several FLIPs to improve checkpoints and state APIs, including
> > File Merging for Checkpoints and configuration/API reorganizations. He is
> > also one of the main contributors to the recent efforts of "disaggregated
> > state management for Flink 2.0" and drives the entire discussion in the
> > mailing thread, demonstrating outstanding technical depth and breadth of
> > knowledge.
> >
> > Beyond his technical contributions, Zakelly is passionate about helping
> the
> > community in numerous ways. He spent quite some time setting up the Flink
> > Speed Center and rebuilding the benchmark pipeline after the original one
> > was out of lease. He helps build frocksdb and tests for the upcoming
> > frocksdb release (bump rocksdb from 6.20.3->8.10).
> >
> > Please join me in congratulating Zakelly for becoming an Apache Flink
> > committer!
> >
> > Best,
> > Yuan (on behalf of the Flink PMC)
> >
>