Re: Force to commit kafka offset when stop a job.

2024-06-06 Thread Zhanghao Chen
Yes, the exact offset position will also be committed when doing the savepoint. Best, Zhanghao Chen From: Lei Wang Sent: Thursday, June 6, 2024 16:54 To: Zhanghao Chen ; ruanhang1...@gmail.com Cc: user Subject: Re: Force to commit kafka offset when stop a job.

[ANNOUNCE] Apache flink-connector-mongodb 1.2.0 released

2024-06-06 Thread Danny Cranmer
The Apache Flink community is very happy to announce the release of Apache flink-connector-mongodb 1.2.0 for Flink 1.18 and 1.19. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications. The release

Re: Force to commit kafka offset when stop a job.

2024-06-06 Thread Lei Wang
Thanks Zhanghao && Hang. I am familiar with the flink savepoint feature. The exact offset position is stored in savepoint and the job can be resumed from the savepoint using the offset position that is stored in it. But I am not sure whether the exact offset position is committed to kafka when

使用hbase连接器插入数据,一个列族下有多列时如何只更新其中一列

2024-06-06 Thread 谢县东
各位好: flink版本: 1.13.6 我在使用 flink-connector-hbase 连接器,通过flinkSQL 将数据写入hbase,hbase 建表如下: CREATE TABLE hbase_test_db_test_table_xxd ( rowkey STRING, cf1 ROW, PRIMARY KEY (rowkey) NOT ENFORCED ) WITH ( 'connector' = 'hbase-2.2', 'table-name' =

Re: Force to commit kafka offset when stop a job.

2024-06-05 Thread Zhanghao Chen
Hi, you could stop the job with a final savepoint [1]. Flink which will trigger a final offset commit on the final savepoint. [1] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/cli/#stopping-a-job-gracefully-creating-a-final-savepoint Best, Zhanghao Chen

Re: Force to commit kafka offset when stop a job.

2024-06-05 Thread Hang Ruan
Hi Lei. I think you could try to use `stop with savepoint` to stop the job. The offset will be committed when the checkpoint finished. So I think `stop with savepoint` may be helpful. Best, Hang Lei Wang 于2024年6月6日周四 01:16写道: > > When stopping a flink job that consuming kafka message, how to

Re: Flink job Deployement problem

2024-06-05 Thread Hang Ruan
Hi, Fokou Toukam. This error occurs when the schema in the sink mismatches the schema you provided from the upstream. You may need to check whether the provided type of field `features` in sink is the same as the type in the provided upstream. Best, Hang Fokou Toukam, Thierry 于2024年6月6日周四

Re: Flink job Deployement problem

2024-06-05 Thread Xiqian YU
Hi Fokou, Seems `features` column was inferenced to be RAW type, which doesn’t carry any specific data information, and causes following type casting failed. Sometimes it will happen when Flink can’t infer return type from a lambda expression but no explicit returning type information was

Re:Flink job Deployement problem

2024-06-05 Thread Xuyang
Hi, Could you provide more details about it, such as a minimum reproducible sql? -- Best! Xuyang 在 2024-06-06 09:03:16,"Fokou Toukam, Thierry" 写道: Hi, i'm trying to deploy flink job but i have this error. How to solve it please? | Thierry FOKOU | IT M.A.Sc Student

Re:Re:Re: 请问flink sql作业如何给kafka source table消费限速?

2024-06-05 Thread Xuyang
Hi, 现在flink sql还没有办法限流。有需求的话可以建一个jira[1],在社区推进下。 [1] https://issues.apache.org/jira/projects/FLINK/issues -- Best! Xuyang 在 2024-06-05 15:33:30,"casel.chen" 写道: >flink sql作业要如何配置进行限流消费呢?以防止打爆存储系统 > > > > > > > > > > > > > > > > > >在 2024-06-05 14:46:23,"Alex Ching" 写道:

Flink job Deployement problem

2024-06-05 Thread Fokou Toukam, Thierry
Hi, i'm trying to deploy flink job but i have this error. How to solve it please? [cid:b13357ec-00b9-4a15-8d04-1c797a4eced3] Thierry FOKOU | IT M.A.Sc Student Département de génie logiciel et TI École de technologie supérieure | Université du Québec 1100, rue Notre-Dame Ouest Montréal

Force to commit kafka offset when stop a job.

2024-06-05 Thread Lei Wang
When stopping a flink job that consuming kafka message, how to force it to commit kafka offset Thanks, Lei

Re:Re: 请问flink sql作业如何给kafka source table消费限速?

2024-06-05 Thread casel.chen
flink sql作业要如何配置进行限流消费呢?以防止打爆存储系统 在 2024-06-05 14:46:23,"Alex Ching" 写道: >从代码上看,Flink >内部是有限速的组件的。org.apache.flink.api.common.io.ratelimiting.GuavaFlinkConnectorRateLimiter, >但是并没有在connector中使用。 > >casel.chen 于2024年6月5日周三 14:36写道: > >> kafka本身是支持消费限流的[1],但这些消费限流参数在flink kafka

Re: 请问flink sql作业如何给kafka source table消费限速?

2024-06-05 Thread Alex Ching
从代码上看,Flink 内部是有限速的组件的。org.apache.flink.api.common.io.ratelimiting.GuavaFlinkConnectorRateLimiter, 但是并没有在connector中使用。 casel.chen 于2024年6月5日周三 14:36写道: > kafka本身是支持消费限流的[1],但这些消费限流参数在flink kafka sql > connector中不起作用,请问这是为什么?如何才能给flink kafka source table消费限速? 谢谢! > > > [1]

请问flink sql作业如何给kafka source table消费限速?

2024-06-05 Thread casel.chen
kafka本身是支持消费限流的[1],但这些消费限流参数在flink kafka sql connector中不起作用,请问这是为什么?如何才能给flink kafka source table消费限速? 谢谢! [1] https://blog.csdn.net/qq_37774171/article/details/122816246

Re: Task Manager memory usage

2024-06-04 Thread Zhanghao Chen
Hi Sigalit, Yes. Here, most of your memory is consumed by JVM heap and Flink network memory, both are somewhat like a pre-allocated memory pool managed by JVM/Flink Memory Manager, which typically do not return memory to the OS even if there's some free space internally. Best, Zhanghao Chen

答复: Flink Datastream实现删除操作

2024-06-04 Thread Xiqian YU
您好, Iceberg 为 Flink 实现的 connector 同时支持 DataStream API 和 Table API[1]。其 DataStream API 提供 Append(默认行为)、Overwrite、Upsert 三种可选的模式,您可以使用下面的 Java 代码片段实现: 首先创建对应数据行 Schema 格式的反序列化器,例如,可以使用 RowDataDebeziumDeserializeSchema 的生成器来快速构造一个: private RowDataDebeziumDeserializeSchema getDeserializer(

Flink Datastream实现删除操作

2024-06-04 Thread zapjone
各位大佬好: 想请教下,在使用mysql-cdc到iceberg,通过sql方式可以实现自动更新和删除功能。但在使用datastream api进行处理后,注册成临时表,怎么实现类似于sql方式的自动更新和删除呢?

Re: Pulsar connector resets existing subscription

2024-06-04 Thread Yufan Sheng
Yes, I think you're right. This bug appeared after we switched from the Pulsar Admin API to the Pulsar Client API. Currently, the connector doesn't check the existing subscription position. I apologize for this regression. We need to add tests and implement a fix. Since this is relatively easy to

Re: TTL issue with large RocksDB keyed state

2024-06-03 Thread Yanfei Lei
Hi, > 1. After multiple full checkpoints and a NATIVE savepoint the size was > unchanged. I'm wondering if RocksDb compaction is because we never update > key values? The state is nearly fully composed of keys' space. Do keys not > get freed using RocksDb compaction filter for TTL? Regarding

Re: 【求助】关于 Flink ML 迭代中使用keyBy算子报错

2024-06-03 Thread Xiqian YU
您好! 看起来这个问题与 FLINK-35066[1] 有关,该问题描述了在 IterationBody 内实现自定义的RichCoProcessFunction 或 CoFlatMapFunction 算子时遇到的拆包问题,可以追溯到这个[2]邮件列表中的问题报告。看起来这个问题也同样影响您使用的 RichCoMapFunction 算子。 该问题已被此 Pull Request[3] 解决,并已合入 master 主分支。按照文档[4]尝试在本地编译 2.4-SNAPSHOT 快照版本并执行您的代码,看起来能够正常工作。 鉴于这是一个 Flink ML 2.3

RE: Implementing Multiple sink

2024-06-03 Thread Colletta, Edward via user
Yes. But the filter us usually a very light weight operation. From: Mingliang Liu Sent: Monday, June 3, 2024 7:16 PM To: Colletta, Edward Cc: mejri houssem ; user@flink.apache.org Subject: Re: Implementing Multiple sink NOTICE: This email is from an external sender - do not click on links or

Re: Implementing Multiple sink

2024-06-03 Thread Mingliang Liu
Colletta, I think that way, the upstream stream `streamWithMultipleConditions` will get processed twice, instead of once? Thanks, On Mon, Jun 3, 2024 at 10:28 AM Colletta, Edward wrote: > I usually just reuse the stream, sending it to through different filters > and adding different sinks to

Re: Implementing Multiple sink

2024-06-03 Thread mejri houssem
Thank you very much Mingliang and Colletta for the suggestions. I will try them out. To the others , I am still open to additional suggestions as well. Le lun. 3 juin 2024 à 18:28, Colletta, Edward a écrit : > I usually just reuse the stream, sending it to through different filters > and

RE: Implementing Multiple sink

2024-06-03 Thread Colletta, Edward via user
I usually just reuse the stream, sending it to through different filters and adding different sinks to the filtered streams. Something like streamWithMultipleConditions.filter(FilterForCondition1) .addSink(SinkforCondtiton1);

Re: Implementing Multiple sink

2024-06-03 Thread Mingliang Liu
Hi Mejri, Have you checked side outputs? https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/datastream/side_output/ On Mon, Jun 3, 2024 at 8:28 AM mejri houssem wrote: > Hello community, > > We have a use case in our Flink job that requires the implementation of > multiple

Implementing Multiple sink

2024-06-03 Thread mejri houssem
Hello community, We have a use case in our Flink job that requires the implementation of multiple sinks. I need to filter messages based on certain conditions (information in the message) to determine which sink to dispatch them to. To clarify, I would like to implement logic in the operator

State leak in tumbling windows

2024-06-03 Thread Adam Domanski
Dear Flink users, I spotted the ever growing checkpoint size in my Flink application which uses tumble windows. I found such a ticket: https://issues.apache.org/jira/browse/FLINK-33192, but no comments. Can somebody confirm the issue? BR, Adam.

Re: Flink Kubernetes Operator Pod Disruption Budget

2024-06-03 Thread Gyula Fóra
Hey Jeremy! This sounds like a good / fairly simple extension to add. Since this would result in a larger extension of the current FlinkDeployment CRD, it would be good to cover it in a small FLIP. Cheers, Gyula On Wed, May 22, 2024 at 10:20 PM Jeremy Alvis via user < user@flink.apache.org>

TTL issue with large RocksDB keyed state

2024-06-02 Thread Cliff Resnick
Hi everyone, We have a Flink application that has a very large and perhaps unusual state. The basic shape of it is a very large and somewhat random keyed-stream partition space, each with a continuously growing map-state keyed by microsecond time Long values. There are never any overwrites in

Statefun runtime crashes when experiencing high msg/second rate on kubernetes

2024-06-01 Thread Oliver Schmied
Hi everyone,     I set up the Flink statefun runtime on minikube (cpus=4, memory=10240) following the tutorial in statefun-playground https://github.com/apache/flink-statefun-playground/tree/main/deployments/k8s . I developed my own statefun-Functions in java and deployed them the same way as

Re: Pulsar connector resets existing subscription

2024-05-31 Thread Igor Basov
For the record here, I created an issue FLINK-35477 regarding this. On Mon, May 27, 2024 at 1:21 PM Igor Basov wrote: > Ok, believe the breaking changes were introduced in this commit >

Re: DynamoDB Table API Connector Failing On Row Deletion - "The provided key element does not match the schema"

2024-05-31 Thread Ahmed Hamdy
Hi Rob, I agree with the issue here as well as the proposed solution. Thanks alot for the deep dive and the reproducing steps. I have created a ticket on your behalf: https://issues.apache.org/jira/browse/FLINK-35500 you can comment on it if you intend to work on it and then submit a PR against

Re: Flink 1.18.2 release date

2024-05-30 Thread weijie guo
Hi Yang IIRC, 1.18.2 has not been kicked off yet. Best regards, Weijie Yang LI 于2024年5月30日周四 22:33写道: > Dear Flink Community, > > Anyone know about the release date for 1.18.2? > > Thanks very much, > Yang >

DynamoDB Table API Connector Failing On Row Deletion - "The provided key element does not match the schema"

2024-05-30 Thread Rob Goretsky
Hello! I am looking to use the DynamoDB Table API connector to write rows to AWS DynamoDB. I have found what appears to be a bug in the implementation for Row Delete operations. I have an idea on what needs to be fixed as well, and given that I am new to this community, I am looking for

Flink 1.18.2 release date

2024-05-30 Thread Yang LI
Dear Flink Community, Anyone know about the release date for 1.18.2? Thanks very much, Yang

Re: Java 17 incompatibilities with Flink 1.18.1 version

2024-05-30 Thread weijie guo
I believe we ran cron test also on JDK17 and JDK21. Best regards, Weijie Zhanghao Chen 于2024年5月30日周四 19:35写道: > Hi Rajat, > > There's no need to get Flink libraries' compiled version on jdk17. The > only things you need to do are: > > >1. Configure the JAVA_HOME env to use JDK 17 on both

Re: Slack Invite

2024-05-30 Thread gongzhongqiang
Hi, The invite link : https://join.slack.com/t/apache-flink/shared_invite/zt-2jtsd06wy-31q_aELVkdc4dHsx0GMhOQ Best, Zhongqiang Gong Nelson de Menezes Neto 于2024年5月30日周四 15:01写道: > Hey guys! > > I want to join the slack community but the invite has expired.. > Can u send me a new one? >

Re: Java 17 incompatibilities with Flink 1.18.1 version

2024-05-30 Thread Zhanghao Chen
Hi Rajat, There's no need to get Flink libraries' compiled version on jdk17. The only things you need to do are: 1. Configure the JAVA_HOME env to use JDK 17 on both the client and server side 2. Configure the Flink JVM options properly to cooperate with the JDK modularization. The

退订

2024-05-29 Thread jszhouch...@163.com
退订

Re: Re:Flink SQL消费kafka topic有办法限速么?

2024-05-28 Thread Zhanghao Chen
应该是可以的。另外在老版本的 Kafka connector 上,曾经也实现过限速逻辑 [1],可以参考下。这个需求我觉得还比较通用,可以提一个 JIRA。 [1] https://issues.apache.org/jira/browse/FLINK-11501 Best, Zhanghao Chen From: casel.chen Sent: Tuesday, May 28, 2024 22:00 To: user-zh@flink.apache.org Subject: Re:Flink SQL消费kafka

Re: Java 17 incompatibilities with Flink 1.18.1 version

2024-05-28 Thread Zhanghao Chen
Hi Rajat, Flink releases are compiled with JDK 8 but it is able to run on JDK 8-17. As long as your Flink runs on JDK17 on both server and client side, you are free to write your Flink jobs with Java 17. Best, Zhanghao Chen From: Rajat Pratap Sent: Tuesday,

Re:Flink SQL消费kafka topic有办法限速么?

2024-05-28 Thread casel.chen
查了下Flink源码,当前DataGeneratorSource有添加RateLimiterStrategy参数,但KafkaSource没有该参数,可以像DataGeneratorSource那样来实现限速么? public DataGeneratorSource( GeneratorFunction generatorFunction, long count, RateLimiterStrategy rateLimiterStrategy, TypeInformation typeInfo) {...}

Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-28 Thread Andrew Otto
> Flink CDC [1] now provides full-DB sync and schema evolution ability as a pipeline job. Ah! That is very cool. > Iceberg sink support was suggested before, and we’re trying to implement this in the next few releases. Does it cover the use-cases you mentioned? Yes! That would be fantastic.

Aw: Re: Help with monitoring metrics of StateFun runtime with prometheus

2024-05-28 Thread Oliver Schmied
Dear Biao Geng,   thank you for your reply. You are right, the statefun metrics are tracked along with the "normal" Flink metrics, I just could not find them. If anyone is interested, flink_taskmanager_job_task_operator_functions___ is the way to get them. Thanks again. Best regards, Oliver

Re: Flink Kubernetes Operator 1.8.0 CRDs

2024-05-28 Thread Alexis Sarda-Espinosa
Hello, I've also noticed this in our Argo CD setup. Since priority=0 is the default, Kubernetes accepts it but doesn't store it in the actual resource, I'm guessing it's like a mutating admission hook that comes out of the box. The "priority" property can be safely removed from the CRDs.

flink sqlgateway 提交sql作业如何设置组账号

2024-05-28 Thread 阿华田
flink sqlgateway 提交sql作业,发现sqlgateway服务启动后,默认是当前机器的租户信息进行任务提交到yarn集群,由于公司的hadoop集群设置了租户权限,需要设置提交的用户信息,各位大佬,flink sqlgateway 提交sql作业如何设置组账号 | | 阿华田 | | a15733178...@163.com | 签名由网易邮箱大师定制

Java 17 incompatibilities with Flink 1.18.1 version

2024-05-28 Thread Rajat Pratap
Hi Team, I am writing flink jobs with latest release version for flink (1.18.1). The jobmanager is also deployed with the same version build. However, we faced issues when we deployed the jobs. On further investigation, I noticed all libraries from flink have build jdk 1.8. I have the following

Re: Slack Community Invite?

2024-05-27 Thread Alexandre Lemaire
Thank you! > On May 27, 2024, at 9:35 PM, Junrui Lee wrote: > > Hi Alexandre, > > You can try this link: > https://join.slack.com/t/apache-flink/shared_invite/zt-2jn2dlgoi-P4oQBWRJT4I_3HY8ZbLxdg > > Best, > Junrui > > Alexandre Lemaire mailto:alema...@circlical.com>> > 于2024年5月28日周二

Re: Slack Community Invite?

2024-05-27 Thread Junrui Lee
Hi Alexandre, You can try this link: https://join.slack.com/t/apache-flink/shared_invite/zt-2jn2dlgoi-P4oQBWRJT4I_3HY8ZbLxdg Best, Junrui Alexandre Lemaire 于2024年5月28日周二 01:18写道: > Hello! > > Does the Slack community still exist? The link on the site is expired. > > Thank you! > Alex > > >

Re: Pulsar connector resets existing subscription

2024-05-27 Thread Igor Basov
Ok, believe the breaking changes were introduced in this commit . Here

Slack Community Invite?

2024-05-27 Thread Alexandre Lemaire
Hello! Does the Slack community still exist? The link on the site is expired. Thank you! Alex

Flink SQL消费kafka topic有办法限速么?

2024-05-27 Thread casel.chen
Flink SQL消费kafka topic有办法限速么?场景是消费kafka topic数据写入下游mongodb,在业务高峰期时下游mongodb写入压力大,希望能够限速消费kafka,请问要如何实现?

Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-26 Thread Hang Ruan
Hi, all. Flink CDC provides the schema evolution ability to sync the entire database. I think it could satisfy your needs. Flink CDC pipeline sources and sinks are listed in [1]. Iceberg pipeline connector is not provided by now. > What is not is the automatic syncing of entire databases, with

Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-26 Thread gongzhongqiang
Flink CDC 3.0 focuses on data integration scenarios, so you don't need to pay attention to the framework implementation, you just need to use the YAML format to describe the data source and target to quickly build a data synchronization task with schema evolution.And it supports rich source and

退订

2024-05-26 Thread 95chenjz
退订

Re: Help with monitoring metrics of StateFun runtime with prometheus

2024-05-26 Thread Biao Geng
Hi Oliver, I am not experienced in StateFun but its doc says 'Along with the standard metric scopes, Stateful Functions supports Function Scope which one level below operator scope.' So, as long as you can collect flink's metric via Prometheus, ideally, there should be no difference between using

Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-26 Thread Xiqian YU
Hi Otto, Flink CDC [1] now provides full-DB sync and schema evolution ability as a pipeline job. Iceberg sink support was suggested before, and we’re trying to implement this in the next few releases. Does it cover the use-cases you mentioned? [1]

Re: How to start a flink job on a long running yarn cluster from a checkpoint (with arguments)

2024-05-25 Thread Junrui Lee
Hi Sachin, Yes, that's correct. To resume from a savepoint, use the command bin/flink run -s . You can find more details in the Flink documentation on [1]. Additionally, information on how to trigger a savepoint can be found in the section for triggering savepoints [2]. [1]

How to start a flink job on a long running yarn cluster from a checkpoint (with arguments)

2024-05-25 Thread Sachin Mittal
Hi, I have a long running yarn cluster and I submit my streaming job using the following command: flink run -m yarn-cluster -yid application_1473169569237_0001 /usr/lib/flink/examples/streaming/WordCount.jar --input file:///input.txt --output file:///output/ Let's say I want to stop this job,

Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-25 Thread Péter Váry
> Could the table/database sync with schema evolution (without Flink job restarts!) potentially work with the Iceberg sink? Making this work would be a good addition to the Iceberg-Flink connector. It is definitely doable, but not a single PR sized task. If you want to try your hands on it, I

退订

2024-05-24 Thread 蒋少东
退订

Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-24 Thread Andrew Otto
> What is not is the automatic syncing of entire databases, with schema evolution and detection of new (and dropped?) tables. :) Wait. Is it? > Flink CDC supports synchronizing all tables of source database instance to downstream in one job by configuring the captured database list and table

Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-24 Thread Andrew Otto
Indeed, using Flink-CDC to write to Flink Sink Tables, including Iceberg, is supported. What is not is the automatic syncing of entire databases, with schema evolution and detection of new (and dropped?) tables. :) On Fri, May 24, 2024 at 8:58 AM Giannis Polyzos wrote: >

Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-24 Thread Giannis Polyzos
https://nightlies.apache.org/flink/flink-cdc-docs-stable/ All these features come from Flink cdc itself. Because Paimon and Flink cdc are projects native to Flink there is a strong integration between them. (I believe it’s on the roadmap to support iceberg as well) On Fri, 24 May 2024 at 3:52 PM,

Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-24 Thread Andrew Otto
> I’m curious if there is any reason for choosing Iceberg instead of Paimon No technical reason that I'm aware of. We are using it mostly because of momentum. We looked at Flink Table Store (before it was Paimon), but decided it was too early and the docs were too sparse at the time to really

Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-24 Thread Giannis Polyzos
I’m curious if there is any reason for choosing Iceberg instead of Paimon (other than - iceberg is more popular). Especially for a use case like CDC that iceberg struggles to support. On Fri, 24 May 2024 at 3:22 PM, Andrew Otto wrote: > Interesting thank you! > > I asked this in the Paimon

Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-24 Thread Andrew Otto
Interesting thank you! I asked this in the Paimon users group: How coupled to Paimon catalogs and tables is the cdc part of Paimon? RichCdcMultiplexRecord

Ways to detect a scaling event within a flink operator at runtime

2024-05-23 Thread Chetas Joshi
Hello, On a k8s cluster, I have the flink-k8s-operator running 1.8 with autoscaler = enabled (in-place) and a flinkDeployment (application mode) running 1.18.1. The flinkDeployment i.e. the flink streaming application has a mock data producer as the source. The source generates data points

Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-23 Thread Péter Váry
If I understand correctly, Paimon is sending `CdcRecord`-s [1] on the wire which contain not only the data, but the schema as well. With Iceberg we currently only send the row data, and expect to receive the schema on job start - this is more performant than sending the schema all the time, but

Pulsar connector resets existing subscription

2024-05-23 Thread Igor Basov
Hi everyone, I have a problem with how Flink deals with the existing subscription in a Pulsar topic. - Subscription has some accumulated backlog - Flink job is deployed from a clear state (no checkpoints) - Flink job uses the same subscription name as the existing one; the start

Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-23 Thread Andrew Otto
Ah I see, so just auto-restarting to pick up new stuff. I'd love to understand how Paimon does this. They have a database sync action which will sync entire databases, handle schema evolution, and I'm

Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-23 Thread Péter Váry
I will ask Marton about the slides. The solution was something like this in a nutshell: - Make sure that on job start the latest Iceberg schema is read from the Iceberg table - Throw a SuppressRestartsException when data arrives with the wrong schema - Use Flink Kubernetes Operator to restart

Re: issues with Flink kinesis connector

2024-05-23 Thread Nick Hecht
Thank you for your help! On Thu, May 23, 2024 at 1:40 PM Aleksandr Pilipenko wrote: > Hi Nick, > > You need to use another method to add sink to your job - sinkTo. > KinesisStreamsSink implements newer Sink interface, while addSink expect > old SinkFunction. You can see this by looking at

Re: issues with Flink kinesis connector

2024-05-23 Thread Aleksandr Pilipenko
Hi Nick, You need to use another method to add sink to your job - sinkTo. KinesisStreamsSink implements newer Sink interface, while addSink expect old SinkFunction. You can see this by looking at method signatures[1] and in usage examples in documentation[2] [1]

Re: Would Java 11 cause Getting OutOfMemoryError: Direct buffer memory?

2024-05-23 Thread Zhanghao Chen
Hi John, Based on the Memory config screenshot provided before, each of your TM should have MaxDirectMemory=1GB (network mem) + 128 MB (framework off-heap) = 1152 MB. Nor will taskmanager.memory.flink.size and the total including MaxDirectMemory exceed pod physical mem, you may check the

Re: Task Manager memory usage

2024-05-23 Thread Zhanghao Chen
Hi Sigalit, For states stored in memory, they would most probably keep alive for several rounds of GC and ended up in the old gen of heap, and won't get recycled until a Full GC. As for the TM pod memory usage, most probabliy it will stop increasing at some point. You could try setting a

Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-23 Thread Andrew Otto
Wow, I would LOVE to see this talk. If there is no recording, perhaps there are slides somewhere? On Thu, May 23, 2024 at 11:00 AM Carlos Sanabria Miranda < sanabria.miranda.car...@gmail.com> wrote: > Hi everyone! > > I have found in the Flink Forward website the following presentation: >

issues with Flink kinesis connector

2024-05-23 Thread Nick Hecht
Hello, I am currently having issues trying to use the python flink 1.18 Datastream api with the Amazon Kinesis Data Streams Connector. >From the documentation https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/datastream/kinesis/ I have downloaded the

"Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-23 Thread Carlos Sanabria Miranda
Hi everyone! I have found in the Flink Forward website the following presentation: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg " by

Re: Would Java 11 cause Getting OutOfMemoryError: Direct buffer memory?

2024-05-23 Thread John Smith
Based on these two settings... taskmanager.memory.flink.size: 16384m taskmanager.memory.jvm-metaspace.size: 3072m Reading the docs here I'm not sure how to calculate the formula. My suspicion is that I may have allocated too much of taskmanager.memory.flink.size and the total including

kirankumarkathe...@gmail.com-unsubscribe

2024-05-23 Thread Kiran Kumar Kathe
Kindly un subscribe for this gmail account kirankumarkathe...@gmail.com

Re: Task Manager memory usage

2024-05-23 Thread Sigalit Eliazov
hi, thanks for your reply, we are storing the data in memory since it is a short term we thought that adding rocksdb will add overhead. On Thu, May 23, 2024 at 4:38 PM Sachin Mittal wrote: > Hi > Where are you storing the state. > Try rocksdb. > > Thanks > Sachin > > > On Thu, 23 May 2024 at

Re: Task Manager memory usage

2024-05-23 Thread Sachin Mittal
Hi Where are you storing the state. Try rocksdb. Thanks Sachin On Thu, 23 May 2024 at 6:19 PM, Sigalit Eliazov wrote: > Hi, > > I am trying to understand the following behavior in our Flink application > cluster. Any assistance would be appreciated. > > We are running a Flink application

Help with monitoring metrics of StateFun runtime with prometheus

2024-05-23 Thread Oliver Schmied
Dear Apache Flink community,   I am setting up an apche flink statefun runtime on Kubernetes, following the flink-playground example: https://github.com/apache/flink-statefun-playground/tree/main/deployments/k8s. This is the manifest I used for creating the statefun enviroment: ```---

Task Manager memory usage

2024-05-23 Thread Sigalit Eliazov
Hi, I am trying to understand the following behavior in our Flink application cluster. Any assistance would be appreciated. We are running a Flink application cluster with 5 task managers, each with the following configuration: - jobManagerMemory: 12g - taskManagerMemory: 20g -

Re: 关于 mongo db 的splitVector 权限问题

2024-05-23 Thread Jiabao Sun
Hi, splitVector 是 MongoDB 计算分片的内部命令,在副本集部署模式下也可以使用此命令来计算 chunk 区间。 如果没有 splitVector 权限,会自动降级为 sample 切分策略。 Best, Jiabao evio12...@gmail.com 于2024年5月23日周四 16:57写道: > > hello~ > > > 我正在使用 flink-cdc mongodb connector 2.3.0 >

关于 mongo db 的splitVector 权限问题

2024-05-23 Thread evio12...@gmail.com
hello~ 我正在使用 flink-cdc mongodb connector 2.3.0 (https://github.com/apache/flink-cdc/blob/release-2.3/docs/content/connectors/mongodb-cdc.md) , 文档中指出 mongo 账号需要这些权限 'splitVector', 'listDatabases', 'listCollections', 'collStats', 'find', and 'changeStream' , 我现在使用的mongo是 replica-set , 但是了解到

Re: Flink kinesis connector 4.3.0 release estimated date

2024-05-23 Thread Vararu, Vadim
That’s great news. Thanks. From: Leonard Xu Date: Thursday, 23 May 2024 at 04:42 To: Vararu, Vadim Cc: user , Danny Cranmer Subject: Re: Flink kinesis connector 4.3.0 release estimated date Hey, Vararu The kinesis connector 4.3.0 release is under vote phase and we hope to finalize the

Re: Flink kinesis connector 4.3.0 release estimated date

2024-05-22 Thread Leonard Xu
Hey, Vararu The kinesis connector 4.3.0 release is under vote phase and we hope to finalize the release work in this week if everything goes well. Best, Leonard > 2024年5月22日 下午11:51,Vararu, Vadim 写道: > > Hi guys, > > Any idea when the 4.3.0 kinesis connector is estimated to be released?

Flink Kubernetes Operator Pod Disruption Budget

2024-05-22 Thread Jeremy Alvis via user
Hello, In order to maintain at least one pod for both the Flink Kubernetes Operator and JobManagers through Kubernetes processes that use the Eviction API such as when draining a node, we have deployed Pod

Flink kinesis connector 4.3.0 release estimated date

2024-05-22 Thread Vararu, Vadim
Hi guys, Any idea when the 4.3.0 kinesis connector is estimated to be released? Cheers, Vadim.

[ANNOUNCE] Apache Celeborn 0.4.1 available

2024-05-22 Thread Nicholas Jiang
Hi all, Apache Celeborn community is glad to announce the new release of Apache Celeborn 0.4.1. Celeborn is dedicated to improving the efficiency and elasticity of different map-reduce engines and provides an elastic, high-efficient service for intermediate data including shuffle data, spilled

StateMigrationException while using stateTTL

2024-05-22 Thread irakli.keshel...@sony.com
Hello, I'm using Flink 1.17.1 and I have stateTTL enabled in one of my Flink jobs where I'm using the RocksDB for checkpointing. I have a value state of Pojo class (which is generated from Avro schema). I added a new field to my schema along with the default value to make sure it is backwards

Re: Get access to unmatching events in Apache Flink Cep

2024-05-22 Thread Anton Sidorov
In answer Biao said "currently there is no such API to access the middle NFA state". May be that API exist in plan? Or I can create issue or pull request that add API? пт, 17 мая 2024 г. в 12:04, Anton Sidorov : > Ok, thanks for the reply. > > пт, 17 мая 2024 г. в 09:22, Biao Geng : > >> Hi

IllegalStateException: invalid BLOB

2024-05-21 Thread Lars Skjærven
Hello, We're facing the bug reported in https://issues.apache.org/jira/browse/FLINK-32212 More specifically, when kubernetes decides to drain a node, a job manager restart (but not the task manager), the job fails with: java.lang.IllegalStateException: The library registration references a

Re:咨询Flink 1.19文档中关于iterate操作

2024-05-20 Thread Xuyang
Hi, 目前Iterate api在1.19版本上废弃了,不再支持,具体可以参考[1][2]。Flip[1]中提供了另一种替代的办法[3] [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-357%3A+Deprecate+Iteration+API+of+DataStream [2] https://issues.apache.org/jira/browse/FLINK-33144 [3]

Re: Flink autoscaler with AWS ASG: checkpoint access issue

2024-05-20 Thread Chetas Joshi
Hello, After digging into the 403 issue a bit, I figured out that after the scale-up event, the flink-s3-fs-presto uses the node-profile instead of IRSA (Iam Role for Service Account) on some of the newly created TM pods. 1. Anyone else experienced this as well? 2. Verified that this is an issue

咨询Flink 1.19文档中关于iterate操作

2024-05-20 Thread www
尊敬的Flink开发团队: 您好! 我目前正在学习如何使用Apache Flink的DataStream API实现迭代算法,例如图的单源最短路径。在Flink 1.18版本的文档中,我注意到有关于iterate操作的介绍,具体请见:https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/datastream/overview/#iterations 但是,我发现Flink

Re: flinksql 经过优化后,group by字段少了

2024-05-20 Thread Lincoln Lee
Hi, 可以尝试下 1.17 或更新的版本, 这个问题在 flink 1.17.0 中已修复[1]。 批处理中做这个 remove 优化是符合语义的,而在流中不能直接裁剪, 对于相关时间函数的说明文档[2]中也进行了更新 [1] https://issues.apache.org/jira/browse/FLINK-30006 [2]

  1   2   3   4   5   6   7   8   9   10   >