Re: Left join query not clearing state after migrating from 1.9.0 to 1.14.3

2022-03-03 Thread Prakhar Mathur
Hi, Can someone kindly help and take a look at this? It's a major blocker for us. Thanks, Prakhar On Wed, Mar 2, 2022 at 2:11 PM Prakhar Mathur wrote: > Hello, > > We recently did a migration of our Flink jobs from version 1.9.0 to > 1.14.3. These jobs consume from Kafka and produce to

Re: [DISCUSS]当timestamp - offset + windowSize < 0时,元素无法被分配到正确的窗口

2022-03-03 Thread yidan zhao
嗯,问题的确存在。 只是场景有点特别,ts一般是时间戳,ts本身负数或很小的情况这个我没考虑。 邓子琦 于2022年3月4日周五 14:28写道: > 不是的,这是一个现存问题 > > abs(offset)>size的约束并不能让ts-offset+size必然大于0 > > 图中给出的示例是我们用代码验证过的。你可以尝试运行下面的代码 > > 会发现它暴露出来的问题跟我所描述的一样 > > /* output > > *窗口开始时间是-15000 有1个元素 数据是 (a,-17000) > > *窗口开始时间是-1 有1个元素

Re: Example with JSONKeyValueDeserializationSchema?

2022-03-03 Thread HG
Hi Kamil, Aeden and others It was already answered This was the complete solution: KafkaSource source = KafkaSource.builder() .setProperties(kafkaProps) .setProperty("ssl.truststore.type",trustStoreType) .setProperty("ssl.truststore.password",trustStorePassword)

Pyflink1.13 or JavaFlink1.13 + Jpython + Python2.7, which way has better performance?

2022-03-03 Thread vtygoss
Hi, community! I am working on data processing structure optimization from full data pipeline to incremental data pipeline, from PySpark with PythonCode to two optional ways below: 1. PyFlink 1.13 + Python 2.7 2. JavaFlink 1.13 + JPython + Python 2.7 As far as i know, the python APIs

Re: [DISCUSS]当timestamp - offset + windowSize < 0时,元素无法被分配到正确的窗口

2022-03-03 Thread 邓子琦
不是的,这是一个现存问题 abs(offset)>size的约束并不能让ts-offset+size必然大于0 图中给出的示例是我们用代码验证过的。你可以尝试运行下面的代码 会发现它暴露出来的问题跟我所描述的一样 /* output *窗口开始时间是-15000 有1个元素 数据是 (a,-17000) *窗口开始时间是-1 有1个元素 数据是 (b,-12000) *窗口开始时间是-5000 有2个元素 数据是 (c,-7000) *窗口开始时间是-5000 有2个元素

Re:Re: source code build failure

2022-03-03 Thread Edwin
Hi, yu'an: Many thanks for your reply, it has been fixed :), it turns out to be related to some local environmental settings. Best regards, Edwin At 2022-03-03 15:58:43, "yu'an huang" wrote: Hi Edwin, I suddenly realised that I replied to you directly, so I just sent

Version Upgrade of FlinkSQL (1.10 to 1.12)

2022-03-03 Thread zihao chen
hi, All. I would like to upgrade the Flink version of several FlinkSQLs from 1.10 to 1.12. And I want to restore the state saved by version 1.10 on version 1.12. After I look at the StreamGraph, JobGraph, Checkpoint and some other related informations, I found these points that would cause the

CDC 分表同步快照顺序

2022-03-03 Thread 刘 家锹
Hi, all 我们在使用Flink CDC同步多张表,然后合并slink到一张es表中。但表之间有数据流转关系,比如有table_1, table_2, table2, 一条数据A之前table_1,但后续可能更新到table_2。 想请教下,如果使用正则表达式匹配同步分表,是否可以保证数据有序无误呢? 也就是全部分表同时快照,且等待所有分表快照同步完后才开始处理binlog。 从文档[1]中看到对于单表这种模式是可以保证的,但不确定多表且有数据流转是否也一样。 [1]

Re: flink 反压导致checkpoint超时,从而导致任务失败问题

2022-03-03 Thread yu'an huang
你好,我检查了下关于checkpoint的文档:https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/checkpointing/ tolerable checkpoint failure number: This defines how many

Re: Flink job recovery after task manager failure

2022-03-03 Thread yidan zhao
I think you should use nfs, which is easily to be deployed unlike hdfs. The state is written and read by TM. ZK is used to record some meta data of the checkpoint, such as the ckpt file path. Finally, I don't think your job can be recovered normally if you are not running with a shared storage.

Re: [DISCUSS]当timestamp - offset + windowSize < 0时,元素无法被分配到正确的窗口

2022-03-03 Thread yidan zhao
1 在flink中调用这个方法的部分是 windowAssigner,以TumblingEventTimeWindows 为例,分配window的时候的逻辑为: long start = TimeWindow.getWindowStartWithOffset( timestamp, (globalOffset + staggerOffset) % size, size); 构造函数中offset逻辑: protected TumblingEventTimeWindows(long size, long offset,

Re: flink 反压导致checkpoint超时,从而导致任务失败问题

2022-03-03 Thread yu'an huang
你好,checkpoint超时默认不会导致作业重启,可以提供下JM log看看作业为什么会重启吗? > On 3 Mar 2022, at 9:15 PM, kong <62...@163.com> wrote: > > hello,我最近遇到一个问题: > 我通过flink消费kafka数据,job 图大概是这样的:Source -> map -> filter -> flatMap -> Map -> > Sink > 在一瞬间kafka的producer端会产生大量的数据,导致flink无法消费完,我的checkpoint设置的是10分钟; > 最后会产生Checkpoint

Flink??????????????????,????????????

2022-03-03 Thread Tony
cpu,??Flink??? 32, 512G, Flink???

Max parallelism and reactive mode

2022-03-03 Thread Alexis Sarda-Espinosa
Hi everyone, I have some questions regarding max parallelism and how interacts with deployment modes. The documentation states that max parallelism should be "set on a per-job and per-operator granularity" but doesn't provide more details. Is it possible to have different values of max

Re: Example with JSONKeyValueDeserializationSchema?

2022-03-03 Thread Aeden Jameson
I believe you can solve this iss with, .setDeserializer(KafkaRecordDeserializationSchema.of(new JSONKeyValueDeserializationSchema(false))) On Thu, Mar 3, 2022 at 8:07 AM Kamil ty wrote: > > Hello, > > Sorry for the late reply. I have checked the issue and it seems to be a type > issue as the

Task Manager shutdown causing jobs to fail

2022-03-03 Thread Puneet Duggal
Hi, Currently in production, i have HA session mode flink cluster with 3 job managers and multiple task managers with more than enough free task slots. But i have seen multiple times that whenever task manager goes down ( e.g. due to heartbeat issue).. so does all the jobs running on it even

Re: Help with pom dependencies for Flink with Table API

2022-03-03 Thread Adesh Dsilva
Hi, Yes, you are right. I was mixing some dependencies without knowing. I did a complete reset of all dependencies and started with a fresh pom and it fixed it. Many Thanks! > On 2 Mar 2022, at 17:37, Adesh Dsilva wrote: > > Hello, > > I think I accidentally posted this question on the

StreamingFileSink bulk formats - small files

2022-03-03 Thread Kamil ty
Hello all, In multiple jobs I'm saving data using the datastream API with StreamingFileSink and various bulk formats (avro, parquet). As bulk formats require a rolling policy that extends the CheckpointRollingPolicy I have created a policy that rolls on file size additionally. Unfortunately for

Re: Example with JSONKeyValueDeserializationSchema?

2022-03-03 Thread Kamil ty
Hello, Sorry for the late reply. I have checked the issue and it seems to be a type issue as the exception suggests. What happens is that the JSONKeyValueDeserializationSchema included in flink implements a KafkaDeserializationSchema. The .setDeserializer method expects a Deserialization schema

Re: Questions regarding connecting local FlinkSQL client to remote JobManager in K8s

2022-03-03 Thread yu'an huang
Hi Elkhan, I confirm that the FlinkSQL Client is communicating with JM via Rest endpoint. After I changed the “rest.port”, the sql client thrown exception: "[ERROR] Could not execute SQL statement. Reason: java.net.ConnectException: Connection refused”. So for your case, since Flink will

error: cannot find symbol .setDeliverGuarantee in KafkaRecordSerializationSchemaBuilder

2022-03-03 Thread HG
As per the documentation , https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/kafka/ A kafka sink can be defined as further below But in fact it fails with * error: cannot find symbol .setDeliverGuarantee(DeliveryGuarantee.AT_LEAST_ONCE) ^ symbol:

Re: AWS Kinesis Flink vs K8s

2022-03-03 Thread Puneet Duggal
Hi Jeremy, Thank you for this detailed answer and yes this surely helps.. Regards, Puneet > On 16-Feb-2022, at 9:21 PM, Ber, Jeremy wrote: > > Hi Puneet, > Amazon Kinesis Data Analytics for Apache Flink is a managed Apache Flink > offering--it removes the need to setup your own

Re: How to sort Iterable in ProcessWindowFunction?

2022-03-03 Thread HG
Hi, I have need to sort the input of the ProcesWindowFunction by one of the fields of the Tuple4 that is in the Iterator. Any advice as to what the best way is? static class MyProcessWindowFunction extends ProcessWindowFunction, String, String, TimeWindow> { @Override public

How to sort Iterable in ProcessWindowFunction?

2022-03-03 Thread HG
Hi, I have need to sort the input of the ProcesWindowFunction by one of the fields of the Tuple4 that is in the Iterator. static class MyProcessWindowFunction extends ProcessWindowFunction, String, String, TimeWindow> { @Override public void process(String key, Context context,

flink 反压导致checkpoint超时,从而导致任务失败问题

2022-03-03 Thread kong
hello,我最近遇到一个问题: 我通过flink消费kafka数据,job 图大概是这样的:Source -> map -> filter -> flatMap -> Map -> Sink 在一瞬间kafka的producer端会产生大量的数据,导致flink无法消费完,我的checkpoint设置的是10分钟; 最后会产生Checkpoint expired before completing.的错误,导致job重启,从而导致从上一个checkpoint恢复,然后重复消费数据,又导致checkpoint超时,死循环了。 不知道有什么好办法解决该问题。 多谢~

??????????????????kafka??????????????kafka??????????????????????????

2022-03-03 Thread jianjianjianjianjianjianjianjian
kafka??kafkakafkaDebug

Re: [DISCUSS]当timestamp - offset + windowSize < 0时,元素无法被分配到正确的窗口

2022-03-03 Thread 邓子琦
好滴 谢谢 yu'an huang 于2022年3月3日周四 18:17写道: > > 我想你们可以为Flink贡献代码。只要按照guide > https://flink.apache.org/contributing/contribute-code.html < > https://flink.apache.org/contributing/contribute-code.html%E4%B8%AD%E7%9A%84%E6%AD%A5%E9%AA%A4%E5%85%88%E5%BB%BA%E7%AB%8BJIRA > > > > 建立JIRA

Re: Help with pom dependencies for Flink with Table API

2022-03-03 Thread Francesco Guardiani
Hi, The moving of org.apache.flink.connector.file.table.factories.BulkReaderFormatFactory was done in master a couple of months ago by me, and it should be only on 1.15+. Could it be you're somehow mixing master snapshots with 1.14.x? Are you trying to run the job on a cluster using a Flink

Re: [DISCUSS]当timestamp - offset + windowSize < 0时,元素无法被分配到正确的窗口

2022-03-03 Thread yu'an huang
我想你们可以为Flink贡献代码。只要按照guide https://flink.apache.org/contributing/contribute-code.html 建立JIRA Ticket然后讨论就可以了,为了社区可以更方便的review你们的代码。 > On 3 Mar 2022, at 5:33 PM, 邓子琦

Re: Questions regarding connecting local FlinkSQL client to remote JobManager in K8s

2022-03-03 Thread yu'an huang
Hi Elkhan, Except for JM have an external IP address, I think the port 6123 also need to be opened. You may need to set a host port for 6123 in JM pod or expose this port by Kubernetes service. But I am not sure whether the sql-client communicate with JM via Rest endpoint or RPC port. Hopes

[DISCUSS]当timestamp - offset + windowSize < 0时,元素无法被分配到正确的窗口

2022-03-03 Thread 邓子琦
当timestamp - offset + windowSize < 0时,元素无法被分配到正确的窗口问题 你好! 我们在学习flink源码时,发现它计算窗口开始时间的算法存在问题。当timestamp - offset + windowSize < 0 时,元素会被错误地分配到比自身时间戳大一个WindowSize的窗口里去。 问题在org.apache.flink.streaming.api.windowing.windows.TimeWindow public static long getWindowStartWithOffset(long timestamp,