Re: Zookeeper HA with Kubernetes: Possible to use the same Zookeeper cluster w/multiple Flink Operators?

2023-09-20 Thread Gyula Fóra
Hi! The cluster-id for each FlinkDeployment is simply the name of the deployment. So they are all different in a given namespace. (In other words they are not fixed as your question suggests but set automatically) So there should be no problem sharing the ZK cluster . Cheers Gyula On Thu, 21

Flink Gzip Sink with Error

2023-09-20 Thread Yunhui Han
Hi all, I want to write JSON strings with gzip compression by Flink following the demo on StackOverflow. I encountered a problem. There is an ill format string at the

RE: Re: Re: How to read flinkSQL job state

2023-09-20 Thread Yifan He via user
Hi Hangxiang, I still have one question about this problem, when using datastream api I know the key and value type I use in state because I defined ValueStateDescriptor, but how can I get the ValueStateDescriptor in flinksql? Thanks, Yifan On 2023/09/07 06:16:41 Hangxiang Yu wrote: > Hi,

RE: About Flink parquet format

2023-09-20 Thread Kamal Mittal via user
Yes. Due to below error, Flink bulk writer never close the part file and keep on creating new part file continuously. Is flink not handling exceptions like below? From: Feng Jin Sent: 20 September 2023 05:54 PM To: Kamal Mittal Cc: user@flink.apache.org Subject: Re: About Flink parquet

Re: Flink cdc 2.0 历史数据太大,导致log积压怎么解决

2023-09-20 Thread jinzhuguang
你好,除了这些运维手段外,flink cdc本身有什么解法吗,比如说增量阶段不用从头开始读binlog,因为其实很多都是重复读到的数据 > 2023年9月20日 21:00,Jiabao Sun 写道: > > Hi, > 生产环境的binlog还是建议至少保留7天,可以提高故障恢复时间容忍度。 > 另外,可以尝试增加snapshot的并行度和资源来提升snapshot速度,snapshot完成后可以从savepoint恢复并减少资源。 > Best, > Jiabao >

Re: 回复:flink1.17版本不支持hive 2.1版本了吗

2023-09-20 Thread yuxia
把这个 pr https://github.com/apache/flink/pull/19352 revert 掉,然后重新打包 flink hive connector 就可以。 Best regards, Yuxia - 原始邮件 - 发件人: "迎风浪子" <576637...@qq.com.INVALID> 收件人: "user-zh" 发送时间: 星期二, 2023年 9 月 19日 下午 5:20:58 主题: 回复:flink1.17版本不支持hive 2.1版本了吗 我们还在使用hive1.1.0,怎么办? ---原始邮件--- 发件人:

Zookeeper HA with Kubernetes: Possible to use the same Zookeeper cluster w/multiple Flink Operators?

2023-09-20 Thread Brian King
Hello Flink Users! We're attempting to deploy a Flink application cluster on Kubernetes, using the Flink Operator and Zookeeper for HA. We're using Flink 1.16 and I have a question about some of the Zookeeper configuration[0]: "high-availability.zookeeper.path.root" is described as "The root

Re: Using Flink k8s operator on OKD

2023-09-20 Thread Krzysztof Chmielewski
Thank you Zach, our flink-operator and flink deployments are in same namespace -> called "flink". We have executed what is described in [1] before my initial message. We are using OKD 4.6.0 that according to the doc is using k8s 1.19. the very same config is working fine on "vanilla" k8s, but for

Test message

2023-09-20 Thread Krzysztof Chmielewski
Community, please forgive me for this message. This is a test, because all day, my replays to my other user thread are being rejected by email server. Sincerely apologies Krzysztof

Extract response stream out of a AsyncSinkBase operator

2023-09-20 Thread Bhupendra Yadav
Hey Everyone, We have a use case where we want to extract a response out of a AsyncSink Operator(HTTP in our case) and perform more transformation on top of it. We implemented a HttpSink by following this blog https://flink.apache.org/2022/03/16/the-generic-asynchronous-base-sink/ . Since By

Flink cdc 2.0 历史数据太大,导致log积压怎么解决

2023-09-20 Thread jinzhuguang
以mysql cdc为例,现在的f整体流程是先同步全量数据,再开启增量同步;我看代码目前增量的初始offset选择的是所有全量split的最小的highwatermark。那我如果全量数据很大,TB级别,全量同步可能需要很久,但是binlog又不能删除,这样堆积起来会占用很大的空间,不知道这个问题现在有什么常见的解法吗?

Re: About Flink parquet format

2023-09-20 Thread Feng Jin
Hi I tested it on my side and also got the same error. This should be a limitation of Parquet. ``` java.lang.IllegalArgumentException: maxCapacityHint can't be less than initialSlabSize 64 1 at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:57)

Re: Flink cdc 2.0 历史数据太大,导致log积压怎么解决

2023-09-20 Thread Jiabao Sun
Hi, 生产环境的binlog还是建议至少保留7天,可以提高故障恢复时间容忍度。 另外,可以尝试增加snapshot的并行度和资源来提升snapshot速度,snapshot完成后可以从savepoint恢复并减少资源。 Best, Jiabao -- From:jinzhuguang Send Time:2023年9月20日(星期三) 20:56 To:user-zh Subject:Flink cdc 2.0 历史数据太大,导致log积压怎么解决

Flink cdc 2.0 历史数据太大,导致log积压怎么解决

2023-09-20 Thread jinzhuguang
以mysql cdc为例,现在的f整体流程是先同步全量数据,再开启增量同步;我看代码目前增量的初始offset选择的是所有全量split的最小的highwatermark。那我如果全量数据很大,TB级别,全量同步可能需要很久,但是binlog又不能删除,这样堆积起来会占用很大的空间,不知道这个问题现在有什么常见的解法吗?