Failed to cancel a job using the STOP rest API

2021-06-03 Thread Thomas Wang
Hi, Flink community, I'm trying to use the STOP rest API to cancel a job. So far, I'm seeing some inconsistent results. Sometimes, jobs could be cancelled successfully while other times, they couldn't. Either way, the POST request is accepted with a status code 202 and a "request-id". >From the

Re: 关于flink sql的kafka source的开始消费offset相关问题。

2021-06-03 Thread yidan zhao
那就挺难受的,我之前还想过一个办法是,禁止sql 任务的检查点功能就可以了。但是呢,flink禁止检查点后web UI就无法显示重启次数了,任务的重启的监控现在都只能通过开启检查点才能反映出来(ckpt失败数量、ckpt restored数量)。 JasonLee <17610775...@163.com> 于2021年6月4日周五 上午11:49写道: > > hi > > sql 也是会从上一次成功的 checkpoint 中保存的 offset 位置开始恢复数据的. > > > > - > Best Wishes > JasonLee > -- > Sent from:

Re: 关于flink sql的kafka source的开始消费offset相关问题。

2021-06-03 Thread JasonLee
hi sql 也是会从上一次成功的 checkpoint 中保存的 offset 位置开始恢复数据的. - Best Wishes JasonLee -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 关于无法被看作POJO的类型,怎么转变可以让其不当作GenericType呢?

2021-06-03 Thread yidan zhao
这个问题比较复杂,具体最后糊里糊涂的半解决了。大概就是考虑用hashMap,以及最好不要继承,通过组合方式用。比如hashMap作为内层成员,最外边一层不要做成Map。这样可能会解决一定问题。 Lin Hou 于2021年4月1日周四 下午1:55写道: > > 你好,请问一下,这个问题是怎么解决的啊? > > 赵一旦 于2021年2月3日周三 下午1:59写道: > > > 我看Flink的要求是public,每个属性要么public,要么有getter/setter。估计内嵌的属性也会递归检查的。 > > > > ℡小新的蜡笔不见嘞、 <1515827...@qq.com>

Re: Flink exported metrics scope configuration

2021-06-03 Thread Mason Chen
Hi Kai, You can use the excluded variables config for the reporter. metrics.reporter..scope.variables.excludes: (optional) A semi-colon (;) separate list of variables that should be ignored by tag-based reporters (e.g., Prometheus, InfluxDB).

Re: Flink/FlinkSQL如何开启检查点,但自动重启不基于检查点。

2021-06-03 Thread yidan zhao
这个问题拾起来,还有人回答下吗。 yidan zhao 于2021年5月24日周一 上午10:25写道: > > 如题,Flink/FlinkSQL如何开启检查点,但自动重启不基于检查点,或者基于检查点重启但忽略kafkaSource的状态。 > 目前Flink部分我自己覆盖了部分实现,可以实现基于检查点重启但忽略KafkaSource的offset状态。 > 现在是FlinkSQL部分,我目前都是设置很大的重启次数,但是自动重启后经常还是慢等导致继续ckpt失败,这个是恶性循环的。所以我目前希望是自动重启后忽略堆积的数据。 >

关于flink sql的kafka source的开始消费offset相关问题。

2021-06-03 Thread yidan zhao
如题,按照官方文档的kafka source部分,有如下配置说明。 scan.startup.mode : optionalgroup-offsetsStringStartup mode for Kafka consumer, valid values are 'earliest-offset', 'latest-offset', 'group-offsets', 'timestamp' and 'specific-offsets'. See the following Start Reading Position for more details. 其中Reading

open checkpoint, send message to kafka OutOfOrderSequenceException

2021-06-03 Thread SmileSmile
Dear all: flink version is 1.12.4,kafka version is 1.1.1。topology is very simple ,source-->flatmap--->sink ,enable checkpoint,job will fail after a few hours 。 the error message is Caused by: org.apache.flink.streaming.connectors.kafka.FlinkKafkaException: Failed to send data to

Re: Flink checkpoint 速度很慢 问题排查

2021-06-03 Thread JasonLee
hi source 端的并发保持和 partition 的个数一样就行了,不要大于 partition 个数,因为这会导致 subtask 空跑,浪费资源,你只需要把 map 的并行度调大即可. - Best Wishes JasonLee -- Sent from: http://apache-flink.147419.n8.nabble.com/

subscribe

2021-06-03 Thread Boyang Chen

Re: Flink checkpoint 速度很慢 问题排查

2021-06-03 Thread Jacob
@JasonLee 谢谢回复 A job 的背压情况如下图 我清楚job处理数据速度的确赶不上消息的生产速度这一事实,但暂时想不到一个合理的解决办法,并行度都已经设置的比较大了(从等于topic分区数量已经调整为大于partition数量了)。 我把各个task的并行度设置是一样的,让他们链在一个task上,从而优化线程切换的性能 其中 Map算子是最耗时的,所有的逻辑和数据加工都在这个Map算子,前后两个Flat Map

Flink exported metrics scope configuration

2021-06-03 Thread Kai Fu
Hi team, We noticed that Prometheus metrics exporter exports all of the metrics at the most fine-grained level, which is tremendous for the prometheus server especially when the parallelism is high. The metrics volume crawled from a single host(parallelism 8) is around 40MB for us currently. This

Re: Unexpected end of ZLIB input stream

2021-06-03 Thread Chesnay Schepler
What filesystem are you using? Is it possible that the source tries to read a file that is the process of being written to disk? n 6/3/2021 11:32 PM, Billy Bain wrote: We are getting this exception occasionally after a job runs for a month or more. The files don't seem to be corrupt from

Unexpected end of ZLIB input stream

2021-06-03 Thread Billy Bain
We are getting this exception occasionally after a job runs for a month or more. The files don't seem to be corrupt from our testing, so not sure what this error means. Task resources & network connectivity seem fine. How would you debug this? ink) (1/1)#52423 (595ced3edfe32bb7d826955f1a195a29)

Re: Corrupted unaligned checkpoints in Flink 1.11.1

2021-06-03 Thread Alexander Filipchik
On the checkpoints -> what kind of issues should I check for? I was looking for metrics and it looks like they were reporting successful checkpoints. It looks like some files were removed in the shared folder, but I'm not sure how to check for what caused it. Savepoints were failing due to

Re: Flattening of events

2021-06-03 Thread Chesnay Schepler
Have a look at flatMaps: https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/datastream/operators/overview/#datastream-rarr-datastream-1 On 6/3/2021 8:28 PM, Satish Saley wrote: Hi team, I am trying to figure out a way to flatten events in my Flink app. Event that i am

Re: behavior differences when sinking side outputs to a custom KeyedCoProcessFunction

2021-06-03 Thread Jin Yi
just to resolve this thread, i figured out the issue. there's a local version of a watermark strategy that we use when running locally for development that didn't work correctly on many events with the same timestamp which the fake data generation that happens for local runs has a tendency to do.

Flattening of events

2021-06-03 Thread Satish Saley
Hi team, I am trying to figure out a way to flatten events in my Flink app. Event that i am consuming from Kafka is UpperLevelData { int upperId; List listOfModules } ModuleData { int moduleId; string info; } After consuming this event, i want to flatten it out in following format -

Re: streaming file sink OUT metrics not populating

2021-06-03 Thread Chesnay Schepler
This is a known issue, and cannot be fixed on the user side. The underlying problem is that this needs to be implemented separately for each source/sink and we haven't gotten around to doing that yet, but some progress is being made for 1.14 to that end. On 6/3/2021 6:06 PM, Vijayendra Yadav

Re: Flink stream processing issue

2021-06-03 Thread Qihua Yang
Sorry for the confusion Yes, I mean multiple parallelism. Really thanks for your help. Thanks, Qihua On Thu, Jun 3, 2021 at 12:03 AM JING ZHANG wrote: > Hi Qihua, > > I’m sorry I didn’t understand what you mean by ‘replica’. Would you please > explain a little more? > If you means you job

streaming file sink OUT metrics not populating

2021-06-03 Thread Vijayendra Yadav
Hi Team, I am using streaming file sink and sinking data to s3. Also, using Graphite exporter for metrics. I can see correct counters for Source, Map, Filter functions. But for SINK, only* numRecordsIn* is populating, I am looking to get *numRecordsOut *counts also, but it's always staying 0

Re: StreamingFileSink output formatting to CSV

2021-06-03 Thread Chesnay Schepler
This is handled by the StringEncoder; the one you use (SimpleStringEncoder) just calls toString on the input element. I don't think Flink provides a CSV StringEncoder, but if all you want is remove the parenthesis, then you could wrap the SimpleStringEncoder and trim the first and last

StreamingFileSink output formatting to CSV

2021-06-03 Thread Robert Cullen
I have a StreamingFileSink that writes to S3: final StreamingFileSink> sink = StreamingFileSink.forRowFormat( new Path("s3://argo-artifacts/files"), new SimpleStringEncoder>("UTF-8"))

ByteSerializationSchema in PyFlink

2021-06-03 Thread Wouter Zorgdrager
Hi all, I have a PyFlink job connected to a KafkaConsumer and Producer. I want to directly work with the bytes from and to Kafka because I want to serialize/deserialize in my Python code rather than the JVM environment. Therefore, I can't use the SimpleStringSchema for (de)serialization (the

Events triggering JobListener notification

2021-06-03 Thread Barak Ben Nathan
Hi all, I am using Flink 1.12.1 I’m building a system that creates/cancels Flink Jobs and monitors them. We thought to use org.apache.flink.core.execution.JobListener as a ‘push’ mechanism for job-status-change events. We based this idea on the documentation that stated that JobListener ‘…is

Re: Best practice for adding support for Kafka variants

2021-06-03 Thread deepthi Sridharan
Makes sense. Thanks for the confirmation. On Thu, Jun 3, 2021, 4:08 AM Arvid Heise wrote: > Just to add, we target that for 1.14. > > However, it's also not too complicated to add a new TableFactory that uses > the new sources (or your source). > > On Thu, Jun 3, 2021 at 10:04 AM Chesnay

mysql主从切换导致通过flink mysql-cdc消费binlog 点位出错

2021-06-03 Thread 董建
由于各种原因,dba进行了数据库主从切换。 目前我采用flink mysql-cdc采集binlog,但是数据库主从切换后,导致binlog的pos不一致。 flink 程序会自动重启,在经过配置的重启策略后就会挂掉,日志打印 org.apache.kafka.connect.errors.ConnectException: The connector is trying to read binlog starting at GTIDs 3fa7d5bb-65f3-11eb-9413-b0262879b560:1-730774004 and binlog file

Re: Best practice for adding support for Kafka variants

2021-06-03 Thread Arvid Heise
Just to add, we target that for 1.14. However, it's also not too complicated to add a new TableFactory that uses the new sources (or your source). On Thu, Jun 3, 2021 at 10:04 AM Chesnay Schepler wrote: > The FLIP-27 were primarily aimed at the DataStream API; the integration > into the

退订

2021-06-03 Thread 李朋辉
退订 | | 李朋辉 | | 邮箱:lipengh...@126.com | 签名由 网易邮箱大师 定制

退订

2021-06-03 Thread 李朋辉
| | 李朋辉 | | 邮箱:lipengh...@126.com | 签名由 网易邮箱大师 定制 - 转发的邮件 - 发件人: Fighting 发送日期: 2021年06月02日 11:00 收件人: user-zh 抄送人: 主题: 退订 退订

Re: Flink checkpoint 速度很慢 问题排查

2021-06-03 Thread HunterXHunter
一般是 Job A出现背压了,checkpoint的时候是要等背压的数据都处理完了才会处理barrier。 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: Flink checkpoint 速度很慢 问题排查

2021-06-03 Thread JasonLee
hi 你理解的可能有点偏差,应该是因为任务出现了反压或者数据倾斜的问题导致了cp时间长,01消息堆积说明已经反压到source端了,需要先定位反压的位置,看是具体什么原因导致的,然后再根据情况解决. - Best Wishes JasonLee -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: recover from svaepoint

2021-06-03 Thread Till Rohrmann
Thanks for this insight. So the problem might be Flink using an internal Kafka API (the connector uses reflection to get hold of the TransactionManager) which changed between version 2.4.1 and 2.5. I think this is a serious problem because it breaks our end-to-end exactly once story when using new

Re: In native k8s application mode, how can I know whether the job is failed or finished?

2021-06-03 Thread LIU Xiao
Thank you for timely help! I've tried session mode a little bit, it's better than I thought, the TaskManager can be allocated and de-allocated dynamically. But it seems the memory size of TaskManager is fixed when the session starts, and can not be adjusted for different job. I'll try to deploy

flink 1.11.2 fileSystem source table 读取 fileSystem sink table 分区错误问题

2021-06-03 Thread 范未太
1.问题描述 基于flink filesystem connect 创建create table source_test(id string,name string dayno sring,`hour` string) partitioned (dayno ,`hour`) with('connector'='filesystm',path='x/data/') 报错堆栈如下: | ava.lang.reflect.InvocationTargetException at

回复: flink 1.13.0 中cumulate window 使用

2021-06-03 Thread 邹云鹤
hello 大佬, 我现在 使用 cumulate 的SQL 如下:insert into `test_out` select a.uid, 'dt', max(a.age) from TABLE( CUMULATE(TABLE test_in, DESCRIPTOR(proctime), INTERVAL '1' MINUTES, INTERVAL '1' hours)) as a group by uid, window_start, window_end; 是可以运行了,但是发现每次窗口触发, 通过JDBC Sink 写入到数据库执行的都是insert 操作,

Re: Corrupted unaligned checkpoints in Flink 1.11.1

2021-06-03 Thread Chesnay Schepler
Is there anything in the Flink logs indicating issues with writing the checkpoint data? When the savepoint could not be created, was anything logged from Flink? How did you shut down the cluster? On 6/3/2021 5:56 AM, Alexander Filipchik wrote: Hi, Trying to figure out what happened with our

scala2.12错误:Error: scala/Product$class

2021-06-03 Thread maker_d...@foxmail.com
我使用maven构建了一个scala2.12的flink项目 我希望使用flink消费kafka,但项目运行时报错: scalac: Error: scala/Product$class java.lang.NoClassDefFoundError: scala/Product$class at org.apache.flink.api.scala.codegen.TypeDescriptors$RecursiveDescriptor.(TypeDescriptors.scala:155) at

Re: recover from svaepoint

2021-06-03 Thread Tianxin Zhao
I encountered the exact same issue before when experimenting in a testing environment. I was not able to spot the bug as mentioned in this thread, the solution I did was to downgrade my own kafka-client version from 2.5 to 2.4.1, matching the version of flink-connector-kafka. In 2.4.1 Kafka,

回复: flink 1.13.0 中cumulate window 使用

2021-06-03 Thread 邹云鹤
大佬, 你好 | | 邹云鹤 | | kevinyu...@163.com | 签名由网易邮箱大师定制 在2021年5月28日 11:52,邹云鹤 写道: 好的,我再研究下。 | | 邹云鹤 | | kevinyu...@163.com | 签名由网易邮箱大师定制 在2021年5月28日 11:51,Leonard Xu 写道: Hi, Cumulate window 是基于 window TVF 语法的,和之前的 groupWindowFunction 不一样, 你可以参考 [1] Window TVF 也支持了 tumble window, hop window,

Re: 回复:Flink checkpoint 速度很慢 问题排查

2021-06-03 Thread Jacob
@lian 谢谢回复 我通过webui查询,没有背压的情况。 hbase性能这块确实存在一定问题,公司的hbase性能一直不佳,但我的程序中,是先从缓存redis中取,大部分数据都能查到,只有少部分会查hbase。 谢谢你提供的思路,我将会从代码级别考虑,看是否能优化相关逻辑 - Thanks! Jacob -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: Best practice for adding support for Kafka variants

2021-06-03 Thread Chesnay Schepler
The FLIP-27 were primarily aimed at the DataStream API; the integration into the SQL/Table APIs will happen at a later date. On 6/1/2021 5:59 PM, deepthi Sridharan wrote: Thank you, Roman. I should have said our own flavor of Kafka and not version. Thanks for the reference of the new source

Re: 回复:Flink checkpoint 速度很慢 问题排查

2021-06-03 Thread Jacob
@lian 谢谢回复 我通过webui查询,没有背压的情况。 hbase性能这块确实存在一定问题,公司的hbase性能一直不佳,但我的程序中,是先从缓存redis中取,大部分数据都能查到,只有少部分会查hbase。 谢谢你提供的思路,我将会从代码级别考虑,看是否能优化相关逻辑 - Thanks! Jacob -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: DSL for Flink CEP

2021-06-03 Thread Dipanjan Mazumder
Thanks a lot ... On Thursday, June 3, 2021, 12:49:58 PM GMT+5:30, Dawid Wysakowicz wrote: Hi, Just to add on top to what Fabian said. The only community supported CEP library is the one that comes with Flink[1]. It is also used internally to support the MATCH_RECOGNIZE clause in

回复:Flink checkpoint 速度很慢 问题排查

2021-06-03 Thread lian
排查一下任务在执行过程中,是否有背压,以及在ck过程中,buffer积压了多少数据量。 很可能是在访问hbase的过程,性能不是很好。 在2021年06月03日 15:27,Jacob 写道: Dear all, 我有一个两个Flink Job A和B A job任务是消费kafka topic01数据,经过一系列逻辑加工,最终将数据sink到topic02

Flink checkpoint 速度很慢 问题排查

2021-06-03 Thread Jacob
Dear all, 我有一个两个Flink Job A和B A job任务是消费kafka topic01数据,经过一系列逻辑加工,最终将数据sink到topic02 其中加工大致过程是:消费到topic01消息后,根据数据相关字段查询redis、查询hbase,然后组装业务数据(过程比较复杂),然后将业务数据写到另一个topic02,30s做一次checkpoint,state大小只有几十kb,但做一次checkpoint平均需要两分钟,导致topic01消息产生堆积,实时性降低。 B

Re: DSL for Flink CEP

2021-06-03 Thread Dawid Wysakowicz
Hi, Just to add on top to what Fabian said. The only community supported CEP library is the one that comes with Flink[1]. It is also used internally to support the MATCH_RECOGNIZE clause in Flink SQL[2]. Therefore there is a both programmatic library native DSL for defining patterns. Moreover

Re: In native k8s application mode, how can I know whether the job is failed or finished?

2021-06-03 Thread Xintong Song
There are two ways to access the status of a job after it is finished. 1. You can try native k8s deployment in session mode. When jobs are finished in this mode, TMs will be automatically released after a short period of time, while JM will not be terminated until you explicitly shutdown the

Re: recover from svaepoint

2021-06-03 Thread Till Rohrmann
Thanks for the update. Skimming over the code it looks indeed that we are overwriting the values of the static value ProducerIdAndEpoch.NONE. I am not 100% how this will cause the observed problem, though. I am also not a Flink Kafka connector and Kafka expert so I would appreciate it if someone

Re: Flink stream processing issue

2021-06-03 Thread JING ZHANG
Hi Qihua, I’m sorry I didn’t understand what you mean by ‘replica’. Would you please explain a little more? If you means you job has multiple parallelism, and whether same data from different two inputs would be send to the same downstream subtask after `keyedCoProcessFunction`. Yes, Flink could

Re: recover from svaepoint

2021-06-03 Thread Till Rohrmann
Forwarding 周瑞's message to a duplicate thread: After our analysis, we found a bug in the org.apache.flink.streaming.connectors.kafka.internals.FlinkKafkaInternalProducer.resumeTransaction method The analysis process is as follows:

Re: How to check is Kafka partition "idle" in emitRecordsWithTimestamps

2021-06-03 Thread Till Rohrmann
Hi Alexey, I think the current idleness detection works based on timeouts. You need a special watermark generator that periodically emits the watermarks. If no event has been emitted for so and so long, then it is marked as idle. Yes, I was referring to FLINK-18450. At the moment nobody is

Re: Prometheus Reporter Enhancement

2021-06-03 Thread Chesnay Schepler
Let's move this to the ticket then. :) On 6/2/2021 8:45 PM, Mason Chen wrote: Hi Chesnay, I would like to take on https://issues.apache.org/jira/browse/FLINK-17495  as a contribution to OSS, if that’s alright with the team. We can discuss

In native k8s application mode, how can I know whether the job is failed or finished?

2021-06-03 Thread 刘逍
Hi, We are currently using Flink 1.6 standalone mode, but the lack of isolation is a headache for us. At present, I am trying application mode of Flink 1.13.0 on native K8s. I found that as soon as the job ends, whether it ends normally or abnormally, the jobmanager can no longer be accessed,

(无主题)

2021-06-03 Thread 田磊
您好: 在一个app应用程序中,如果我用flink自定义source读hbase里面的表,处理完之后再通过sink更新到hbase里面,这样会不会出现冲突,有没有可能从source里面来的数据是已经处理过的数据。还有一种情况是第一个程序走完一套逻辑将数据更新到hbase中,同时另外一套程序从这张表中自定义source将数据再更新到该表中,会不会出现冲突呢。 | | totorobabyfans | | 邮箱:totorobabyf...@163.com | 签名由 网易邮箱大师 定制

Re: flink1.12版本,yarn-application模式Flink web ui看不到日志

2021-06-03 Thread zilong xiao
1.10默认用的log4j1,1.12用log4j2 smq <374060...@qq.com> 于2021年6月2日周三 下午3:26写道: > > 你的意思是在log4j.properties中的配置吗,我门在这个里边配置了生成日志文件的格式,是在安装节点里加的,不过这个应该不是在webui里显示的。奇怪的一点是我们组有别的程序是正常的,但是一部分在webUI不显示日志。我们目前是从1.10升级到1.12,这种情况在1.12出现的 > > > > > > -- 原始邮件 -- > 发件人: r pp 发送时间: