Re: Understanding checkpoint/savepoint storage requirements

2024-03-28 Thread Asimansu Bera
To add more details to it so that it will be clear why access to persistent object stores for all JVM processes are required for a job graph of Flink for consistent recovery. *JoB Manager:* Flink's JobManager writes critical metadata during checkpoints for fault tolerance: - Job

Re: [ANNOUNCE] Apache Paimon is graduated to Top Level Project

2024-03-28 Thread Yu Li
CC the Flink user and dev mailing list. Paimon originated within the Flink community, initially known as Flink Table Store, and all our incubating mentors are members of the Flink Project Management Committee. I am confident that the bonds of enduring friendship and close collaboration will

How to define the termination criteria for iterations in Flink ML?

2024-03-28 Thread Komal M
Hi all, I have another question regarding Flink ML’s iterations. In the documentation it says “The iterative algorithm has an iteration body that is repeatedly invoked until some termination criteria is reached (e.g. after a user-specified number of epochs has been reached).” My question is

Debugging Kryo Fallback

2024-03-28 Thread Salva Alcántara
I wonder which is the simplest way of troubleshooting/debugging what causes the Kryo fallback. Detecting it is just a matter of adding this line to your job: ``` env.getConfig().disableGenericTypes(); ``` or in more recent versions: ``` pipeline.generic-types: false ``` But once you detect

Re: [ANNOUNCE] Apache Paimon is graduated to Top Level Project

2024-03-28 Thread Rui Fan
Congratulations~ Best, Rui On Thu, Mar 28, 2024 at 3:55 PM Yu Li wrote: > CC the Flink user and dev mailing list. > > Paimon originated within the Flink community, initially known as Flink > Table Store, and all our incubating mentors are members of the Flink > Project Management Committee. I

Re: [ANNOUNCE] Apache Paimon is graduated to Top Level Project

2024-03-28 Thread Zhanghao Chen
Congratulations! Best, Zhanghao Chen From: Yu Li Sent: Thursday, March 28, 2024 15:55 To: d...@paimon.apache.org Cc: dev ; user Subject: Re: [ANNOUNCE] Apache Paimon is graduated to Top Level Project CC the Flink user and dev mailing list. Paimon originated

Re: [ANNOUNCE] Apache Paimon is graduated to Top Level Project

2024-03-28 Thread Yanfei Lei
Congratulations! Best, Yanfei Zhanghao Chen 于2024年3月28日周四 19:59写道: > > Congratulations! > > Best, > Zhanghao Chen > > From: Yu Li > Sent: Thursday, March 28, 2024 15:55 > To: d...@paimon.apache.org > Cc: dev ; user > Subject: Re: [ANNOUNCE] Apache Paimon is

Re: [ANNOUNCE] Apache Paimon is graduated to Top Level Project

2024-03-28 Thread gongzhongqiang
Congratulations! Best, Zhongqiang Gong Yu Li 于2024年3月28日周四 15:57写道: > CC the Flink user and dev mailing list. > > Paimon originated within the Flink community, initially known as Flink > Table Store, and all our incubating mentors are members of the Flink > Project Management Committee. I am

IcebergSourceReader metrics

2024-03-28 Thread Chetas Joshi
Hello, I am using Flink to read Iceberg (S3). I have enabled all the metrics scopes in my FlinkDeployment as below metrics.scope.jm: flink.jobmanager metrics.scope.jm.job: flink.jobmanager.job metrics.scope.tm: flink.taskmanager metrics.scope.tm.job: flink.taskmanager.job metrics.scope.task:

Flink cache support

2024-03-28 Thread Ganesh Walse
Hi Team, In my project my requirement is to cache data from the oracle database where the number of tables are more and the same data will be required for all the transactions to process. Can you please suggest the approach where cache should be 1st loaded in memory then stream processing should

One query just for curiosity

2024-03-28 Thread Ganesh Walse
Hi Team, If my 1 record gets processed in 1 second in a flink. Then what will be the best time taken to process 1000 records in flink using maximum parallelism.

Re: One query just for curiosity

2024-03-28 Thread Zhanghao Chen
Flink can be scaled up to a parallelism of 32767 at max. And if your record processing is mostly IO-bound, you can further boost the throughput via Async-IO [1]. [1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/asyncio/ Best, Zhanghao Chen

Re: One query just for curiosity

2024-03-28 Thread Ganesh Walse
You mean to say we can process 32767 records in parallel. And may I know if this is the case then do we need to do anything for this. On Fri, 29 Mar 2024 at 8:08 AM, Zhanghao Chen wrote: > Flink can be scaled up to a parallelism of 32767 at max. And if your > record processing is mostly

Re: One query just for curiosity

2024-03-28 Thread Zhanghao Chen
Yes. However, a huge parallelism would require additional coordination cost so you might need to set up the JobManager with a decent spec (at least 8C 16G by experience). Also, you'll need to make sure there's no external bottlenecks (e.g. reading/writing data from the external storage). Best,

Re: Flink cache support

2024-03-28 Thread Marco Villalobos
Zhanghao is correct. You can use what is called "keyed state". It's like a cache. https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/datastream/fault-tolerance/state/ > On Mar 28, 2024, at 7:48 PM, Zhanghao Chen wrote: > > Hi, > > You can maintain a cache manually in

Re: need flink support framework for dependency injection

2024-03-28 Thread Ruibin Xing
Hi Thais, Thanks, that's really detailed and inspiring! I think we can use the same pattern for states too. On Wed, Mar 27, 2024 at 6:40 PM Schwalbe Matthias < matthias.schwa...@viseca.ch> wrote: > Hi Ruibin, > > > > Our code [1] targets a very old version of Flink 1.8, for current >

Re: Flink cache support

2024-03-28 Thread Zhanghao Chen
Hi, You can maintain a cache manually in your operator implementations. You can load the initial cached data on the operator open() method before the processing starts. However, this would set up a cache per task instance. If you'd like to have a cache shared by all processing tasks without

flink version stable

2024-03-28 Thread Fokou Toukam, Thierry
Hi, just want to know which version of flink is stable? Thierry FOKOU | IT M.A.Sc Student Département de génie logiciel et TI École de technologie supérieure | Université du Québec 1100, rue Notre-Dame Ouest Montréal (Québec) H3C 1K3 Tél +1 (438) 336-9007 [image001]

Re: Re: 1.19自定义数据源

2024-03-28 Thread Shawn Huang
你好,关于如何实现source接口可以参考以下资料: [1] FLIP-27: Refactor Source Interface - Apache Flink - Apache Software Foundation [2] 如何高效接入 Flink:Connecter / Catalog API 核心设计与社区进展 (qq.com)

Re: 1.19自定义数据源

2024-03-28 Thread gongzhongqiang
你好: 当前 flink 1.19 版本只是标识为过时,在未来版本会移除 SourceFunction。所以对于你的应用而言为了支持长期 flink 版本考虑,可以将这些SourceFunction用Source重新实现。 ha.fen...@aisino.com 于2024年3月28日周四 14:18写道: > > 原来是继承SourceFunction实现一些简单的自动生成数据的方法,在1.19中已经标识为过期,好像是使用Source接口,这个和原来的SourceFunction完全不同,应该怎么样生成测试使用的自定义数据源呢? >

Re: Re: 1.19自定义数据源

2024-03-28 Thread Zhanghao Chen
如果是用的 DataStream API 的话,也可以看下新增的 DataGen Connector [1] 是否能直接满足你的测试数据生成需求。 [1] https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/connectors/datastream/datagen/ Best, Zhanghao Chen From: ha.fen...@aisino.com Sent: Thursday, March 28, 2024 15:34 To: