Re: Flink任务假死;无限100%反压;但下游节点无压力。

2021-08-25 Thread yidan zhao
可以看yuque里边哈,有DAG的。 JasonLee <17610775...@163.com> 于2021年8月26日周四 下午1:35写道: > Hi > > > 可以发一下任务的 DAG 吗 > > > Best > JasonLee > > > 在2021年08月26日 13:09,yidan zhao 写道: > 补充了个附录(https://www.yuque.com/sixhours-gid0m/ls9vqu/rramvh > )正常任务和异常任务的window算子的FlameGraph,不清楚是否有参考价值。 > > yidan zhao 于2021年8月26日周四

?????? ????flink 1.12 ????????Flink SQL ???? ??sink?? ??????????????

2021-08-25 Thread ????
flink-connector-kafka_2.11??flink-connector-jdbc_2.11?? ??mysql ?? ?? ?? kafka??java.sql.BatchUpdateException ??3 sink Kafka ??Kafka?? 'sink.semantic' = 'exactly-once',

Flink sql CLI parsing avro format data error 解析avro数据失败

2021-08-25 Thread Wayne
i have Apache Avro schema 我的avro schema 如下 { "type" : "record", "name" : "KafkaAvroMessage", "namespace" : "xxx", "fields" : [ { "name" : "aaa", "type" : "string" }, { "name" : "bbb", "type" : [ "null", "string" ], "default" : null },{ "name" : "ccc",

Flink sql CLI parsing avro format data error 解析avro数据失败

2021-08-25 Thread Wayne
i have Apache Avro schema 我的avro schema 如下 { "type" : "record", "name" : "KafkaAvroMessage", "namespace" : "xxx", "fields" : [ { "name" : "aaa", "type" : "string" }, { "name" : "bbb", "type" : [ "null", "string" ], "default" : null },{ "name" : "ccc",

Re: Flink SQL: Custom exception handling External

2021-08-25 Thread Caizhi Weng
Hi! You can open a JIRA ticket for this feature. However from my perspective this feature should only be added to some specific connectors (mostly message queues) and formats. You might want to attach a list of proposed connectors and formats in that ticket. Chong Yun Long 于2021年8月25日周三

回复: Flink任务假死;无限100%反压;但下游节点无压力。

2021-08-25 Thread JasonLee
Hi 可以发一下任务的 DAG 吗 Best JasonLee 在2021年08月26日 13:09,yidan zhao 写道: 补充了个附录(https://www.yuque.com/sixhours-gid0m/ls9vqu/rramvh )正常任务和异常任务的window算子的FlameGraph,不清楚是否有参考价值。 yidan zhao 于2021年8月26日周四 下午1:01写道: 目前来看,我运行6小时,window总计才收到200MB数据,这个数据量级相比我很多小到没有一样。所以很难想象反压的原因是啥究竟。

Re: 基于flink 1.12 如何通过Flink SQL 实现 多sink端 在同一个事务中

2021-08-25 Thread Caizhi Weng
Hi! 如果 statement set 已经包含了多个 insert 语句,那么写入 kafka 表和写入 db 应该是在同一个作业里进行的。如果写入 db 失败,那么产生的 exception 应该会让作业失败才对。这里 db 写入失败但 kafka 依旧写入是什么样的现象? 另外其实可以考虑分成两个作业,第一个作业将数据写入 db,第二个作业从 db 读出数据写入 kafka。关于捕获 db 数据的变化,可以看一下 Flink CDC connector[1] [1] https://github.com/ververica/flink-cdc-connectors 悟空

Re: Re: Flink任务假死;无限100%反压;但下游节点无压力。

2021-08-25 Thread yidan zhao
补充了个附录(https://www.yuque.com/sixhours-gid0m/ls9vqu/rramvh )正常任务和异常任务的window算子的FlameGraph,不清楚是否有参考价值。 yidan zhao 于2021年8月26日周四 下午1:01写道: > 目前来看,我运行6小时,window总计才收到200MB数据,这个数据量级相比我很多小到没有一样。所以很难想象反压的原因是啥究竟。 > > 目前来看反压节点的outPoolUsage是1,看起来合理,因为处于100%反压。 >

Re: Re: Flink任务假死;无限100%反压;但下游节点无压力。

2021-08-25 Thread yidan zhao
目前来看,我运行6小时,window总计才收到200MB数据,这个数据量级相比我很多小到没有一样。所以很难想象反压的原因是啥究竟。 目前来看反压节点的outPoolUsage是1,看起来合理,因为处于100%反压。 下游节点的inPoolUsage却是0,这个也很奇怪,同时下游buzz和backpress都是0%. Shengkai Fang 于2021年8月26日周四 下午12:33写道: > - 得看一下具体的卡死的节点的栈,分析下具体的工作任务才知道。 > - 日志中有包含错误的信息吗? > > Best, > Shengkai > > yidan zhao

?????? ????flink 1.12 ????????Flink SQL ???? ??sink?? ??????????????

2021-08-25 Thread ????
statement set[1] StatementSet.addInsertSql ??sql execute() ---- ??:

Re: 基于flink 1.12 如何通过Flink SQL 实现 多sink端 在同一个事务中

2021-08-25 Thread Shengkai Fang
说的是 statement set [1] 吗 ? [1] https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/sqlclient/#execute-a-set-of-sql-statements 悟空 于2021年8月26日周四 上午11:33写道: > hi all: > 我目前基于flink 1.12 sql 来开发功能, 目前遇到一个问题, 我现在想实现 在一个事务里 先将kafka > 源的数据写入到一张msyql 表中,

Re: Re: Flink任务假死;无限100%反压;但下游节点无压力。

2021-08-25 Thread Shengkai Fang
- 得看一下具体的卡死的节点的栈,分析下具体的工作任务才知道。 - 日志中有包含错误的信息吗? Best, Shengkai yidan zhao 于2021年8月26日周四 下午12:03写道: > 可能存在机器压力倾斜,但是我是不太清楚这种现象的原因,直接停滞了任务? > > 东东 于2021年8月26日周四 上午11:06写道: > > > 建议检查一下是否有数据倾斜 > > > > > > 在 2021-08-26 10:22:54,"yidan zhao" 写道: > > >问题期间的确ckpt时间较长。 > >

Re: Re: Flink任务假死;无限100%反压;但下游节点无压力。

2021-08-25 Thread yidan zhao
可能存在机器压力倾斜,但是我是不太清楚这种现象的原因,直接停滞了任务? 东东 于2021年8月26日周四 上午11:06写道: > 建议检查一下是否有数据倾斜 > > > 在 2021-08-26 10:22:54,"yidan zhao" 写道: > >问题期间的确ckpt时间较长。 > >但是,这个任务正常ckpt时间才不到1s,ckpt大小也就21MB,所以也很难说ckpt为啥会超时,我超时设置的2min。 > > > >Caizhi Weng 于2021年8月26日周四 上午10:20写道: > > > >> Hi! > >> > >> 从图中情况来看很可能是因为下游

????flink 1.12 ????????Flink SQL ???? ??sink?? ??????????????

2021-08-25 Thread ????
hi all: ??flink 1.12 sql ?? kafka ??msyql ?? kafkadb??kafka ?? ?? insert into db_table_sinkselect * from kafka_source_table; insert into

Re:Re: Flink任务假死;无限100%反压;但下游节点无压力。

2021-08-25 Thread 东东
建议检查一下是否有数据倾斜 在 2021-08-26 10:22:54,"yidan zhao" 写道: >问题期间的确ckpt时间较长。 >但是,这个任务正常ckpt时间才不到1s,ckpt大小也就21MB,所以也很难说ckpt为啥会超时,我超时设置的2min。 > >Caizhi Weng 于2021年8月26日周四 上午10:20写道: > >> Hi! >> >> 从图中情况来看很可能是因为下游 checkpoint 时间过长导致反压上游。是否观察过 checkpoint 的情况? >> >> yidan zhao 于2021年8月26日周四 上午10:09写道: >>

Re: 【Announce】Zeppelin 0.10.0 is released, Flink on Zeppelin Improved

2021-08-25 Thread Leonard Xu
Thanks Jeff for the great work ! Best, Leonard > 在 2021年8月25日,22:48,Jeff Zhang 写道: > > Hi Flink users, > > We (Zeppelin community) are very excited to announce Zeppelin 0.10.0 is > officially released. In this version, we made several improvements on Flink > interpreter. Here's the main

Re: 【Announce】Zeppelin 0.10.0 is released, Flink on Zeppelin Improved

2021-08-25 Thread Leonard Xu
Thanks Jeff for the great work ! Best, Leonard > 在 2021年8月25日,22:48,Jeff Zhang 写道: > > Hi Flink users, > > We (Zeppelin community) are very excited to announce Zeppelin 0.10.0 is > officially released. In this version, we made several improvements on Flink > interpreter. Here's the main

Re: 1.9 to 1.11 Managed Memory Migration Questions

2021-08-25 Thread Caizhi Weng
Hi! Why does this ~30% memory reduction happen? I don't know how memory is calculated in Flink 1.9 but this 1.11 memory allocation result is reasonable. This is because managed memory, network memory and JVM overhead memory in 1.11 all has their default sizes or fractions (managed memory 40%,

Identify metrics belonging to the "same" task manager in kubernetes

2021-08-25 Thread gaurav kulkarni
Hi,  We have multiple flink clusters running in kubernetes. We plan to enable prometheus on these clusters. Looks like flink metrics emitted are of the format: "flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Time{host="10_244_2_6",tm_id="10_244_2_6:6122_2e3d7a",} 65.0" 1. 

Re: 退订

2021-08-25 Thread Caizhi Weng
Hi! 退订中文邮件列表请发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org,其他邮件列表退订邮箱参见 https://flink.apache.org/community.html#mailing-lists Fighting <402991...@qq.com.invalid> 于2021年8月26日周四 上午10:11写道: > 退订 > > > > ---原始邮件--- > 发件人: "yidan zhao" 发送时间: 2021年8月26日(周四) 上午10:10 > 收件人: "user-zh" 主题:

Re: Flink任务假死;无限100%反压;但下游节点无压力。

2021-08-25 Thread yidan zhao
问题期间的确ckpt时间较长。 但是,这个任务正常ckpt时间才不到1s,ckpt大小也就21MB,所以也很难说ckpt为啥会超时,我超时设置的2min。 Caizhi Weng 于2021年8月26日周四 上午10:20写道: > Hi! > > 从图中情况来看很可能是因为下游 checkpoint 时间过长导致反压上游。是否观察过 checkpoint 的情况? > > yidan zhao 于2021年8月26日周四 上午10:09写道: > > > 如题,这个问题以前遇到过,后来发生频率低了,近期又多了几次,下面是具体的话题讨论,email不方便贴图。 > > > >

Re: 退订

2021-08-25 Thread Caizhi Weng
Hi! 退订请发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org uu 于2021年8月26日周四 上午9:56写道: > 退订

Re: Flink任务假死;无限100%反压;但下游节点无压力。

2021-08-25 Thread Caizhi Weng
Hi! 从图中情况来看很可能是因为下游 checkpoint 时间过长导致反压上游。是否观察过 checkpoint 的情况? yidan zhao 于2021年8月26日周四 上午10:09写道: > 如题,这个问题以前遇到过,后来发生频率低了,近期又多了几次,下面是具体的话题讨论,email不方便贴图。 > > 语雀:https://www.yuque.com/sixhours-gid0m/ls9vqu/rramvh >

Re: Not able to avoid Dynamic Class Loading

2021-08-25 Thread Caizhi Weng
Hi! What Flink version are you using? In current Flink code base FlinkKafkaConsumer does not contain fields related to Avro. Jars in usrlib has a higher priority to be loaded than jars in lib. So if there is another FlinkKafkaConsumer class in your user jar then it might affect class loading and

退订

2021-08-25 Thread Fighting
退订 ---原始邮件--- 发件人: "yidan zhao"https://www.yuque.com/sixhours-gid0m/ls9vqu/rramvh

Flink任务假死;无限100%反压;但下游节点无压力。

2021-08-25 Thread yidan zhao
如题,这个问题以前遇到过,后来发生频率低了,近期又多了几次,下面是具体的话题讨论,email不方便贴图。 语雀:https://www.yuque.com/sixhours-gid0m/ls9vqu/rramvh

退订

2021-08-25 Thread uu
退订

1.9 to 1.11 Managed Memory Migration Questions

2021-08-25 Thread Hailu, Andreas [Engineering]
Hi folks, We're about half way complete in migrating our YARN batch processing applications from Flink 1.9 to 1.11, and are currently tackling the memory configuration migrations. Our test application's sink failed with the following exception while writing to HDFS: Caused by:

Re: NullPointerException when using KubernetesHaServicesFactory

2021-08-25 Thread Thms Hmm
Can you check what is the output of those commands $ id $ ls -la $FLINK_HOME/plugins/s3-fs-presto/ jonas eyob schrieb am Mi. 25. Aug. 2021 um 16:17: > The exception is showing up both in TM and JM > > This however seemed only to appear when running on my local kubernetes > setup. > > I'd also

Re: Dynamic Cluster/Index Routing for Elasticsearch Sink

2021-08-25 Thread Rion Williams
Thanks again David, I've spun up a JIRA issue for the ticket while I work on getting things into the proper state. If someone with the appropriate privileges could assign it to me, I'd be appreciative. I'll likely need some assistance at a few

Re: Dynamic Cluster/Index Routing for Elasticsearch Sink

2021-08-25 Thread David Morávek
AFAIK there are currently no other sources in Flink that can treat "other sources" / "destination" as data. Most complete generic work on this topic that I'm aware of are Splittable DoFn based IOs in Apache Beam. I think the best module for the contribution would be "elasticsearch-base", because

Flink KafkaConsumer metrics in DataDog

2021-08-25 Thread Shilpa Shankar
Hello , We have enabled DataDogHTTPReporter to fetch metrics on flink v1.13.1 running on kubernetes. The metric flink.operator.KafkaConsumer.records_lag_max is not displaying accurate values. It also displays 0 most of the time and when it does fetch a value, it seems to be wrong when I compare

Not able to avoid Dynamic Class Loading

2021-08-25 Thread Kevin Lam
Hi all, I'm trying to avoid dynamic class loading my user code [0] due to a suspected classloading leak, but when I put my application jar into /lib instead of /usrlib, I run into the following error: ``` The main method caused an error: The implementation of the FlinkKafkaConsumer is not

退订

2021-08-25 Thread Fighting
退订 ---Original--- From: "Jeff Zhang"https://zeppelin.apache.org/docs/0.10.0/interpreter/flink.html The quickest way to try Flink on Zeppelin is via its docker image https://zeppelin.apache.org/docs/0.10.0/interpreter/flink.html#play-flink-in-zeppelin-docker Besides these, here’s one blog about

Re: Dynamic Cluster/Index Routing for Elasticsearch Sink

2021-08-25 Thread Rion Williams
Hi David, That was perfect and it looks like this is working as I'd expected. I put together some larger integration tests for my specific use-case (multiple Elasticsearch clusters running in TestContainers) and verified that messages were being routed dynamically to the appropriate sinks. I

【Announce】Zeppelin 0.10.0 is released, Flink on Zeppelin Improved

2021-08-25 Thread Jeff Zhang
Hi Flink users, We (Zeppelin community) are very excited to announce Zeppelin 0.10.0 is officially released. In this version, we made several improvements on Flink interpreter. Here's the main features of Flink on Zeppelin: - Support multiple versions of Flink - Support multiple versions

【Announce】Zeppelin 0.10.0 is released, Flink on Zeppelin Improved

2021-08-25 Thread Jeff Zhang
Hi Flink users, We (Zeppelin community) are very excited to announce Zeppelin 0.10.0 is officially released. In this version, we made several improvements on Flink interpreter. Here's the main features of Flink on Zeppelin: - Support multiple versions of Flink - Support multiple versions

Re: NullPointerException when using KubernetesHaServicesFactory

2021-08-25 Thread jonas eyob
The exception is showing up both in TM and JM This however seemed only to appear when running on my local kubernetes setup. > I'd also recommend setting "kubernetes.namespace" option, unless you're using "default" namespace. Yes, good point - I now see why that was needed. Den ons 25 aug. 2021

Re: [ANNOUNCE] Dropping "CheckpointConfig.setPreferCheckpointForRecovery()"

2021-08-25 Thread Gyula Fóra
Hi Stephan, I do not know if anyone is still relying on this but I think it makes sense to drop this feature. So +1 from me. I think it served a valid purpose originally but if we have a good improvement in the pipeline using the savepoints directly that will solve the problem properly. I would

Flink 从checkpoint恢复时,部分状态没有正确恢复

2021-08-25 Thread dixingxing
Hi Flink 社区: 我们的Flink版本是1.9.2,用的是blink planer,我们今天遇到一个问题,目前没有定位到原因,现象如下: 手动重启任务时,指定了从一个固定的checkpoint恢复时,有一定的概率,一部分状态数据无法正常恢复,启动后Flink任务本身可以正常运行,且日志中没有明显的报错信息。 具体现象是:type=realshow的数据没有从状态恢复,也就是从0开始累加,而type=show和type=click的数据是正常从状态恢复的。 SQL大致如下: createview view1 as select event_id, act_time,

Query on Metrics | Cumulative Metrics via Prometheus Reporter

2021-08-25 Thread bhawana gupta
Hi, We are using prometheus reporter to fetch metrics (flink 1.12.1 on k8s environment). We were facing an issue since the prometheus reporter has various targets corresponding to each job & task manager hence the different scraping for each target causing syncing issue in metrics count for

Disabling autogenerated uid/hash doesn't work when using file source

2021-08-25 Thread Vishal Surana
I set names and uid for all my flink operators and have explicitly disabled auto generation of uid to force developers in my team the same practice. However, when using a file source, there's no option of providing it due to which the job fails to start unless we enable auto generation. Am I

Re: NullPointerException when using KubernetesHaServicesFactory

2021-08-25 Thread David Morávek
Hi Jonas, Where does the exception pop-up? In job driver, TM, JM? You need to make sure that the plugin folder is setup for all of them, because they all may need to access s3 at some point. Best, D. On Wed, Aug 25, 2021 at 11:54 AM jonas eyob wrote: > Hey Thms, > > tried the s3p:// option as

Re: NullPointerException when using KubernetesHaServicesFactory

2021-08-25 Thread jonas eyob
Hey Thms, tried the s3p:// option as well - same issue. > Also check if your user that executes the process is able to read the jars. Not exactly sure how to do that? The user "flink" in the docker image is able to read the contents as far I understand. But maybe that's not how I would check it?

Re: Flink SQL: Custom exception handling External

2021-08-25 Thread Chong Yun Long
Hi, Thanks for the quick response. The use case is not specific to JDBC (JDBC is just an example) but more for custom error handling in all connectors. How would we go about proposing such a new feature to be added to Flink? On 2021/08/25 09:02:31, Caizhi Weng wrote: > Hi!> > > As far as I

Re: Do we have date_sub function in flink sql?

2021-08-25 Thread Caizhi Weng
Hi! Try this ts - interval '1' day where ts is your timestamp or date column. 1095193...@qq.com <1095193...@qq.com> 于2021年8月25日周三 下午5:20写道: > Hi >I want to substract 1 day from current date with Flink sql. Do we have > this function like date_sub()? > > -- >

Do we have date_sub function in flink sql?

2021-08-25 Thread 1095193...@qq.com
Hi I want to substract 1 day from current date with Flink sql. Do we have this function like date_sub()? 1095193...@qq.com

Re: NullPointerException when using KubernetesHaServicesFactory

2021-08-25 Thread Thms Hmm
Hey Jonas, you could also try to use the ´s3p://´ scheme to directly specify that presto should be used. Also check if your user that executes the process is able to read the jars. Am Mi., 25. Aug. 2021 um 10:01 Uhr schrieb jonas eyob : > Thanks David for the quick response! > > *face palm* -

Re: Flink SQL: Custom exception handling External

2021-08-25 Thread Caizhi Weng
Hi! As far as I know JDBC does not have this error handling mechanism. Also there are very few connectors / formats which support skipping erroneous records (for example the csv format). Which type of exception are you faced with? As JDBC connectors, unlike message queue connectors, rarely (if

Flink SQL: Custom exception handling External

2021-08-25 Thread Chong Yun Long
Hi, Is there any mechanism for handling of errors produced by Flink SQL? It can be useful for various use cases: 1. Logging exceptions and the erroneous row to a kafka topic 2. Ignoring transient exceptions instead of throwing and failing the entire job If there are no such mechanisms may I

Re: mini-batch 设置后没效果

2021-08-25 Thread Leonard Xu
> 如何退订这个邮件订阅了 如果需要取消订阅 user-zh@flink.apache.org 邮件组,请发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org 即可 Best, Leonard

Re:mini-batch配置信息没产生效果

2021-08-25 Thread 东东
这里说得很清楚,只是"allow"的最大latency,并不是固定这么长 | table.exec.mini-batch.allow-latency Streaming | 0 ms | Duration | The maximum latency can be used for MiniBatch to buffer input records. MiniBatch is an optimization to buffer input records to reduce state access. MiniBatch is triggered with the

mini-batch配置信息没产生效果

2021-08-25 Thread 李航飞
Configuration conf = new Configuration(); conf.setString("table.exec.mini-batch.enabled","true"); conf.setString("table.exec.mini-batch.allow-latency","15s"); conf.setString("table.exec.mini-batch.size","50"); StreamExecutionEnvironment execEnv =

Re: NullPointerException when using KubernetesHaServicesFactory

2021-08-25 Thread jonas eyob
Thanks David for the quick response! *face palm* - Thanks a lot, that seems to have addressed the NullPointerException issue. May I also suggest that this [1] page be updated, since it says the key is " high-availability.cluster-id" This led me to another issue however:

Bulk Scheduler timeout when creating several jobs in flink kubernetes HA deployment

2021-08-25 Thread Gil De Grove
Hello, We are struggling a bit with an error in our kubernetes deployment. The deployment is composed of 2 flink job managers and 58 task managers. When deploying the jobs everything is going fine at first, but after the deployment of several jobs (mix of batch and streaming job using the SQL

回复:mini-batch 设置后没效果

2021-08-25 Thread 牛成
如何退订这个邮件订阅了 -- 发件人:Caizhi Weng 发送时间:2021年8月25日(星期三) 11:12 收件人:user-zh 主 题:Re: mini-batch 设置后没效果 Hi! 所谓的没效果指的是什么现象呢?建议详细描述一下场景与问题。 李航飞 于2021年8月25日周三 上午11:04写道: > 通过Configuration 设置 > table.exec.mini-batch.enabled= true; >

Re: NullPointerException when using KubernetesHaServicesFactory

2021-08-25 Thread David Morávek
Hi Jonas, this exception is raised because "kubernetes.cluster-id" [1] is not set. I'd also recommend setting "kubernetes.namespace" option, unless you're using "default" namespace. I've filled FLINK-23961 [2] so we provide more descriptive warning for this issue next time ;) [1]

NullPointerException when using KubernetesHaServicesFactory

2021-08-25 Thread jonas eyob
Hey, I've been struggling with this problem now for some days - driving me crazy. I have a standalone kubernetes Flink (1.12.5) using an application cluster mode approach. *The problem* I am getting a NullPointerException when specifying the FQN of the Kubernetes HA Service Factory class i.e.

Re: Flink Performance Issue

2021-08-25 Thread Fabian Paul
Hi Mohammed, 200records should definitely be doable. The first you can do is remove the print out Sink because they are increasing the load on your cluster due to the additional IO operation and secondly preventing Flink from fusing operators. I am interested to see the updated job graph after