Re: Trouble sinking to Kafka

2022-02-23 Thread Nicolaus Weidner
Hi Marco, The documentation kind of suggestion this is the cause: > https://nightlies.apache.org/flink/flink-docs-release-1.12/dev/connectors/kafka.html > > However, I think the documentation could benefit with a few examples and > scenarios that can ill-considered configurations. > Matthias

RE: Trouble sinking to Kafka

2022-02-23 Thread Schwalbe Matthias
Good morning Marco, Your fix is pretty plausible: * Kafka transactions get started at the beginning of a checkpoint period and contain all events collected through this period, * At the end of the checkpoint period the associated transaction is committed and concurrently the

?????? Flink????????HDFS

2022-02-23 Thread Tony
---- ??: "wenjie li"

Re: Flink数据写入HDFS

2022-02-23 Thread wenjie li
1. 使用 BucketingSink 写入HDFS 可以配置滚动策略来决定写文件的大小 2. 如果由于写入频率大和输出数据量比较小的情况第一种方案不是很好,可以考虑在后面另外启动一个合并小文件的定时任务。 Tony <1298877...@qq.com.invalid> 于2022年2月24日周四 12:10写道: > Flink数据写入HDFS,如何解决小文件问题? > FlinkSQL有小文件合并策略,Flink dataStream 写入HDFS,如何解决?

Re: Restart from checkpoint - kinesis consumer

2022-02-23 Thread Vijayendra Yadav
Thanks Danny, Let me comeback with results. > > On Feb 23, 2022, at 3:41 AM, Danny Cranmer wrote: > >  > Hello Vijay, > > > Once i do that my flink consumer need to be restarted with changed > > parallelism. > Why is this? The Flink consumer continuously scans for new shards, and will >

[Flink-1.14.3] Restart of pod due to duplicatejob submission

2022-02-23 Thread Parag Somani
Hello, Recently due to log4j vulnerabilities, we have upgraded to Apache Flink 1.14.3. What we observed we are getting following exception, and because of it pod gets in crashloopback. We have seen this issues esp. during the time of upgrade or deployment time when existing pod is already

Flink????????HDFS

2022-02-23 Thread Tony
FlinkHDFS?? FlinkSQL??Flink dataStream HDFS

回复:hive 进行 overwrite 合并数据后文件变大?

2022-02-23 Thread Shuai Xia
emmm,你看下这个能不能帮到你 https://jxeditor.github.io/2020/06/10/Hive%E5%8E%8B%E7%BC%A9%E6%95%88%E6%9E%9C%E4%B8%8D%E6%98%8E%E6%98%BE%E8%B8%A9%E5%9D%91%E8%AE%B0%E5%BD%95/ -- 发件人:RS 发送时间:2022年2月22日(星期二) 09:36 收件人:user-zh 主 题:hive 进行 overwrite

Re: 状态初始化

2022-02-23 Thread Jie Han
可以参考 CheckpointedFunction::initializeState > 2022年2月23日 下午8:21,huangzhi...@iwgame.com 写道: > > > flink是否能够做到程序第一次启动还没有checkpoint的情况下,对状态进行初始化? > > > huangzhi...@iwgame.com

Re: Flink SQL: HOW to define `ANY` subtype in constructured Constructured Data Types(MAP, ARRAY...)

2022-02-23 Thread Jie Han
There is no built-in LogicType for ’ANY’, it’s a invalid token > 2022年2月23日 下午10:29,zhouhaifengmath 写道: > > > When I define a udf paramters like: > public @DataTypeHint("Row") Row > eval(@DataTypeHint("MAP") Map mapData) > > It gives error: > Please check for implementation

状态初始化

2022-02-23 Thread huangzhi...@iwgame.com
flink是否能够做到程序第一次启动还没有checkpoint的情况下,对状态进行初始化? huangzhi...@iwgame.com

状态初始化

2022-02-23 Thread huangzhi...@iwgame.com
大家好,我想请问下flink是否能够做到程序第一次启动还没有checkpoint的情况下,对状态进行初始化? huangzhi...@iwgame.com

Re: 状态初始化

2022-02-23 Thread Jiangang Liu
作业在启动时可以使用 Processor API加载状态,可以参考 https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/libs/state_processor_api/ huangzhi...@iwgame.com 于2022年2月23日周三 20:28写道: > > flink是否能够做到程序第一次启动还没有checkpoint的情况下,对状态进行初始化? > > > huangzhi...@iwgame.com >

Re: Trouble sinking to Kafka

2022-02-23 Thread Marco Villalobos
I fixed this, but I'm not 100% sure why. Here is my theory: My checkpoint interval is one minute, and the minimum pause interval is also one minute. My transaction timeout time is also one minute. I think the checkpoint causes Flink to hold the transaction open for one minute, and thus it times

Re: Flink job recovery after task manager failure

2022-02-23 Thread Zhilong Hong
Hi, Afek! When a TaskManager is killed, JobManager will not be acknowledged until a heartbeat timeout happens. Currently, the default value of heartbeat.timeout is 50 seconds [1]. That's why it takes more than 30 seconds for Flink to trigger a failover. If you'd like to shorten the time a

Re: Trouble sinking to Kafka

2022-02-23 Thread Nicolaus Weidner
Hi Marco, I'm no expert on the Kafka producer, but I will try to help. [1] seems to have a decent explanation of possible error causes for the error you encountered. Which leads me to two questions: if (druidProducerTransactionMaxTimeoutMs > 0) { > >

Flink job recovery after task manager failure

2022-02-23 Thread Afek, Ifat (Nokia - IL/Kfar Sava)
Hi, I am trying to use Flink checkpoints solution in order to support task manager recovery. I’m running flink using beam with filesystem storage and the following parameters: checkpointingInterval=3 checkpointingMode=EXACTLY_ONCE. What I see is that if I kill a task manager pod, it takes

Flink SQL: HOW to define `ANY` subtype in constructured Constructured Data Types(MAP, ARRAY...)

2022-02-23 Thread zhouhaifengmath
When I define a udf paramters like:    public @DataTypeHint("Row") Row eval(@DataTypeHint("MAP")     Map mapData)It gives error:    Please check for implementation mistakes and/or provide a corresponding hint.        at

Re: Task manager errors with Flink ZooKeeper High Availability

2022-02-23 Thread Sigalit Eliazov
Thanks for the response on this issue. with the same configuration defined *high-availability: zookeeper* *high-availability.zookeeper.quorum: zk-noa-edge-infra:2181* *high-availability.zookeeper.path.root: /flink* *high-availability.storageDir: /flink_state* *

状态初始化

2022-02-23 Thread huangzhi...@iwgame.com
flink是否能够做到程序第一次启动还没有checkpoint的情况下,对状态进行初始化? huangzhi...@iwgame.com

Re: Restart from checkpoint - kinesis consumer

2022-02-23 Thread Danny Cranmer
Hello Vijay, > Once i do that my flink consumer need to be restarted with changed parallelism. Why is this? The Flink consumer continuously scans for new shards, and will auto scale up/down the number of shard consumer threads to accommodate Kinesis resharding. Flink job/operator parallelism does

Restart from checkpoint - kinesis consumer

2022-02-23 Thread Vijayendra Yadav
Hi Team, I am running flink 1.11 kinesis consumer with say N kinesis shards, but i want to increase/decrease shards to N+M or N-M. Once i do that my flink consumer need to be restarted with changed parallelism. But i am unable to restart from existing checkpoint because of change in number of

Re: java.lang.Exception: Job leader for job id 0efd8681eda64b072b72baef58722bc0 lost leadership.

2022-02-23 Thread Jai Patel
Hi Nico, Thanks for getting back to us. We are using Flink 1.14.0 and we are using RocksDB. We currently are using the default memory settings. We'll look into increasing our managed memory fraction to 0.6 and see what happens. Do writes to ValueStates/MapStates have a direct on churn of the

退订

2022-02-23 Thread 谭 海棠
退订 获取 Outlook for iOS

Re: java.lang.Exception: Job leader for job id 0efd8681eda64b072b72baef58722bc0 lost leadership.

2022-02-23 Thread Nicolaus Weidner
Hi Jai, On Tue, Feb 22, 2022 at 9:19 PM Jai Patel wrote: > It seems like the errors are similar to those discussed here: > - https://issues.apache.org/jira/browse/FLINK-14316 > - https://cdmana.com/2020/11/20201116104527255b.html > I couldn't find any other existing issue apart from the one

Re: CSV join in batch mode

2022-02-23 Thread Guowei Ma
Hi, Killian Sorry for responding late! I think there is no simple way that could catch csv processing errors. That means that you need to do it yourself.(Correct me if I am missing something). I think you could use RockDB State Backend[1], which would spill data to disk. [1]