Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

2024-03-20 文章 Zakelly Lan
Congratulations!


Best,
Zakelly

On Thu, Mar 21, 2024 at 12:05 PM weijie guo 
wrote:

> Congratulations! Well done.
>
>
> Best regards,
>
> Weijie
>
>
> Feng Jin  于2024年3月21日周四 11:40写道:
>
>> Congratulations!
>>
>>
>> Best,
>> Feng
>>
>>
>> On Thu, Mar 21, 2024 at 11:37 AM Ron liu  wrote:
>>
>> > Congratulations!
>> >
>> > Best,
>> > Ron
>> >
>> > Jark Wu  于2024年3月21日周四 10:46写道:
>> >
>> > > Congratulations and welcome!
>> > >
>> > > Best,
>> > > Jark
>> > >
>> > > On Thu, 21 Mar 2024 at 10:35, Rui Fan <1996fan...@gmail.com> wrote:
>> > >
>> > > > Congratulations!
>> > > >
>> > > > Best,
>> > > > Rui
>> > > >
>> > > > On Thu, Mar 21, 2024 at 10:25 AM Hang Ruan 
>> > > wrote:
>> > > >
>> > > > > Congrattulations!
>> > > > >
>> > > > > Best,
>> > > > > Hang
>> > > > >
>> > > > > Lincoln Lee  于2024年3月21日周四 09:54写道:
>> > > > >
>> > > > >>
>> > > > >> Congrats, thanks for the great work!
>> > > > >>
>> > > > >>
>> > > > >> Best,
>> > > > >> Lincoln Lee
>> > > > >>
>> > > > >>
>> > > > >> Peter Huang  于2024年3月20日周三 22:48写道:
>> > > > >>
>> > > > >>> Congratulations
>> > > > >>>
>> > > > >>>
>> > > > >>> Best Regards
>> > > > >>> Peter Huang
>> > > > >>>
>> > > > >>> On Wed, Mar 20, 2024 at 6:56 AM Huajie Wang > >
>> > > > wrote:
>> > > > >>>
>> > > > 
>> > > >  Congratulations
>> > > > 
>> > > > 
>> > > > 
>> > > >  Best,
>> > > >  Huajie Wang
>> > > > 
>> > > > 
>> > > > 
>> > > >  Leonard Xu  于2024年3月20日周三 21:36写道:
>> > > > 
>> > > > > Hi devs and users,
>> > > > >
>> > > > > We are thrilled to announce that the donation of Flink CDC as
>> a
>> > > > > sub-project of Apache Flink has completed. We invite you to
>> > explore
>> > > > the new
>> > > > > resources available:
>> > > > >
>> > > > > - GitHub Repository: https://github.com/apache/flink-cdc
>> > > > > - Flink CDC Documentation:
>> > > > > https://nightlies.apache.org/flink/flink-cdc-docs-stable
>> > > > >
>> > > > > After Flink community accepted this donation[1], we have
>> > completed
>> > > > > software copyright signing, code repo migration, code cleanup,
>> > > > website
>> > > > > migration, CI migration and github issues migration etc.
>> > > > > Here I am particularly grateful to Hang Ruan, Zhongqaing Gong,
>> > > > > Qingsheng Ren, Jiabao Sun, LvYanquan, loserwang1024 and other
>> > > > contributors
>> > > > > for their contributions and help during this process!
>> > > > >
>> > > > >
>> > > > > For all previous contributors: The contribution process has
>> > > slightly
>> > > > > changed to align with the main Flink project. To report bugs
>> or
>> > > > suggest new
>> > > > > features, please open tickets
>> > > > > Apache Jira (https://issues.apache.org/jira).  Note that we
>> will
>> > > no
>> > > > > longer accept GitHub issues for these purposes.
>> > > > >
>> > > > >
>> > > > > Welcome to explore the new repository and documentation. Your
>> > > > feedback
>> > > > > and contributions are invaluable as we continue to improve
>> Flink
>> > > CDC.
>> > > > >
>> > > > > Thanks everyone for your support and happy exploring Flink
>> CDC!
>> > > > >
>> > > > > Best,
>> > > > > Leonard
>> > > > > [1]
>> > > https://lists.apache.org/thread/cw29fhsp99243yfo95xrkw82s5s418ob
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>


Re: [ANNOUNCE] Apache Flink 1.19.0 released

2024-03-18 文章 Zakelly Lan
Congratulations!

Thanks Lincoln, Yun, Martijn and Jing for driving this release.
Thanks everyone involved.


Best,
Zakelly

On Mon, Mar 18, 2024 at 5:05 PM weijie guo 
wrote:

> Congratulations!
>
> Thanks release managers and all the contributors involved.
>
> Best regards,
>
> Weijie
>
>
> Leonard Xu  于2024年3月18日周一 16:45写道:
>
>> Congratulations, thanks release managers and all involved for the great
>> work!
>>
>>
>> Best,
>> Leonard
>>
>> > 2024年3月18日 下午4:32,Jingsong Li  写道:
>> >
>> > Congratulations!
>> >
>> > On Mon, Mar 18, 2024 at 4:30 PM Rui Fan <1996fan...@gmail.com> wrote:
>> >>
>> >> Congratulations, thanks for the great work!
>> >>
>> >> Best,
>> >> Rui
>> >>
>> >> On Mon, Mar 18, 2024 at 4:26 PM Lincoln Lee 
>> wrote:
>> >>>
>> >>> The Apache Flink community is very happy to announce the release of
>> Apache Flink 1.19.0, which is the fisrt release for the Apache Flink 1.19
>> series.
>> >>>
>> >>> Apache Flink® is an open-source stream processing framework for
>> distributed, high-performing, always-available, and accurate data streaming
>> applications.
>> >>>
>> >>> The release is available for download at:
>> >>> https://flink.apache.org/downloads.html
>> >>>
>> >>> Please check out the release blog post for an overview of the
>> improvements for this bugfix release:
>> >>>
>> https://flink.apache.org/2024/03/18/announcing-the-release-of-apache-flink-1.19/
>> >>>
>> >>> The full release notes are available in Jira:
>> >>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353282
>> >>>
>> >>> We would like to thank all contributors of the Apache Flink community
>> who made this release possible!
>> >>>
>> >>>
>> >>> Best,
>> >>> Yun, Jing, Martijn and Lincoln
>>
>>


Re: Re:Re: RocksDB增量模式checkpoint大小持续增长的问题

2024-01-17 文章 Zakelly Lan
图挂了看不到,不然你把文字信息简单复制下来看看?
另外你的ProcessWindowFunction里是否会访问state,如果访问了,是否实现了clear方法?

On Thu, Jan 18, 2024 at 3:01 PM fufu  wrote:

> 看hdfs上shard文件比chk-xxx要大很多。
>
>
>
> 在 2024-01-18 14:49:14,"fufu"  写道:
>
> 是datastream作业,窗口算子本身没有设置TTL,其余算子设置了TTL,是在Flink
> UI上看到窗口算子的size不断增大,一天能增加个600~800M,持续不断的增大。以下图为例:ID为313的cp比ID为304的大了将近10M,一直运行,会一直这么增加下去。cp文件和rocksdb文件正在看~
>
> 在 2024-01-18 10:56:51,"Zakelly Lan"  写道:
>
> >你好,能提供一些详细的信息吗,比如:是datastream作业吧?是否设置了State
> >TTL?观测到逐渐变大是通过checkpoint监控吗,总量是什么级别。cp文件或者本地rocksdb目录下哪些文件最大
> >
> >On Wed, Jan 17, 2024 at 4:09 PM fufu  wrote:
> >
> >>
> >>
> 我有一个Flink任务,使用的是flink1.14.6版本,任务中有一个增量(AggregateFunction)+全量(ProcessWindowFunction)的窗口,任务运行的时候这个算子的状态在不断增大,每天能增大个几百M这种,这个问题怎么排查?使用的事件时间,水位线下发正常,其余的算子都正常,就这个算子在不断增长,非常诡异。在网上搜到一个类似的文章:
> >> https://blog.csdn.net/RL_LEEE/article/details/123864487
> ,想尝试下,但不知道manifest大小如何设置,没有找到对应的参数,
> >> 请社区指导下,或者有没有别的解决方案?感谢社区!
>


Re: RocksDB增量模式checkpoint大小持续增长的问题

2024-01-17 文章 Zakelly Lan
你好,能提供一些详细的信息吗,比如:是datastream作业吧?是否设置了State
TTL?观测到逐渐变大是通过checkpoint监控吗,总量是什么级别。cp文件或者本地rocksdb目录下哪些文件最大

On Wed, Jan 17, 2024 at 4:09 PM fufu  wrote:

>
> 我有一个Flink任务,使用的是flink1.14.6版本,任务中有一个增量(AggregateFunction)+全量(ProcessWindowFunction)的窗口,任务运行的时候这个算子的状态在不断增大,每天能增大个几百M这种,这个问题怎么排查?使用的事件时间,水位线下发正常,其余的算子都正常,就这个算子在不断增长,非常诡异。在网上搜到一个类似的文章:
> https://blog.csdn.net/RL_LEEE/article/details/123864487,想尝试下,但不知道manifest大小如何设置,没有找到对应的参数,
> 请社区指导下,或者有没有别的解决方案?感谢社区!


Re: flink-checkpoint 问题

2024-01-11 文章 Zakelly Lan
748)
>
>
> JM日志,没有25548的触发记录:
> 2023-12-31 18:39:10.664 [jobmanager-future-thread-20] INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Completed
> checkpoint 25546 for job d12f3c6e836f56fb23d96e31737ff0b3 (411347921 bytes
> in 50128 ms).
> 2023-12-31 18:40:10.681 [Checkpoint Timer] INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering
> checkpoint 25547 (type=CHECKPOINT) @ 1704019210665 for job
> d12f3c6e836f56fb23d96e31737ff0b3.
> 2023-12-31 18:50:10.681 [Checkpoint Timer] INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint
> 25547 of job d12f3c6e836f56fb23d96e31737ff0b3 expired before completing.
> 2023-12-31 18:50:10.698 [flink-akka.actor.default-dispatcher-3] INFO
> org.apache.flink.runtime.jobmaster.JobMaster - Trying to recover from a
> global failure.
> org.apache.flink.util.FlinkRuntimeException: Exceeded checkpoint tolerable
> failure threshold.
>  at
> org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleCheckpointException(CheckpointFailureManager.java:90)
>  at
> org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleJobLevelCheckpointException(CheckpointFailureManager.java:65)
>  at
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:1760)
>  at
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:1733)
>  at
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.access$600(CheckpointCoordinator.java:93)
>  at
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator$CheckpointCanceller.run(CheckpointCoordinator.java:1870)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>
>
>
>
> checkpoing路径下有:
> 25546:正常
> 25547:无
> 25548:有,路径下为空
>
>
>
>
> 任务人为从25548恢复时失败,抛出异常找不到_metadate文件
>
>
> | |
> 吴先生
> |
> |
> 15951914...@163.com
> |
>  回复的原邮件 ----
> | 发件人 | Xuyang |
> | 发送日期 | 2024年1月11日 14:55 |
> | 收件人 |  |
> | 主题 | Re:回复: flink-checkpoint 问题 |
> Hi, 你的图挂了,可以用图床处理一下,或者直接贴log。
>
>
>
>
> --
>
> Best!
> Xuyang
>
>
>
>
> 在 2024-01-11 13:40:43,"吴先生" <15951914...@163.com> 写道:
>
> JM中chk失败时间点日志,没有25548的触发记录:
>
>
> 自动recovery失败:
>
>
> TM日志:
>
>
> checkpoint文件路径,25548里面空的:
>
>
> | |
> 吴先生
> |
> |
> 15951914...@163.com
> |
>  回复的原邮件 
> | 发件人 | Zakelly Lan |
> | 发送日期 | 2024年1月10日 18:20 |
> | 收件人 |  |
> | 主题 | Re: flink-checkpoint 问题 |
> 你好,
> 方便的话贴一下jobmanager的log吧,应该有一些线索
>
>
> On Wed, Jan 10, 2024 at 5:55 PM 吴先生 <15951914...@163.com> wrote:
>
> Flink版本: 1.12
> checkpoint配置:hdfs
>
>
> 现象:作业由于一些因素第N个checkpoint失败,导致任务重试,任务重试失败,hdfs中不存在第N个chk路径,但是为什么会出现一个第N+1的chk路径,且这个路径下是空的
>
>
>


Re: flink-checkpoint 问题

2024-01-10 文章 Zakelly Lan
你好,
方便的话贴一下jobmanager的log吧,应该有一些线索


On Wed, Jan 10, 2024 at 5:55 PM 吴先生 <15951914...@163.com> wrote:

> Flink版本: 1.12
> checkpoint配置:hdfs
>
> 现象:作业由于一些因素第N个checkpoint失败,导致任务重试,任务重试失败,hdfs中不存在第N个chk路径,但是为什么会出现一个第N+1的chk路径,且这个路径下是空的
>
>


Re: Problems with the state.backend.fs.memory-threshold parameter

2023-10-13 文章 Zakelly Lan
Hi rui,

The 'state.backend.fs.memory-threshold' configures the threshold below
which state is stored as part of the metadata, rather than in separate
files. So as a result the JM will use its memory to merge small
checkpoint files and write them into one file. Currently the
FLIP-306[1][2] is proposed to merge small checkpoint files without
consuming JM memory. This feature is currently being worked on and is
targeted for the next minor release (1.19).


Best,
Zakelly

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-306%3A+Unified+File+Merging+Mechanism+for+Checkpoints
[2] https://issues.apache.org/jira/browse/FLINK-32070

On Fri, Oct 13, 2023 at 6:28 PM rui chen  wrote:
>
> We found that for some tasks, the JM memory continued to increase. I set
> the parameter of state.backend.fs.memory-threshold to 0, and the JM memory
> would no longer increase, but many small files might be written in this
> way. Does the community have any optimization plan for this area?