回复:Flink任务写入kafka 开启了EOS,statebackend为rocksdb,任务反压source端日志量堆积从cp恢复失败

2020-08-13 Thread
有没有可能是没分配uid,然后dag发生了变化,导致的恢复不了状态 ---原始邮件--- 发件人: "Yang Peng"

Re: 请教大佬一个在flink调用kafka数据源时'scan.startup.mode'参数的使用问题

2020-08-13 Thread yulu yang
好的,谢谢,我试一下! 魏子涵 于2020年8月14日周五 下午1:35写道: > 建议先不使用flink的Kafka来消费,先自己编码写一个kafka消费看是否还是有这个问题,作个对比,看是否是flink提供的kafka接口的问题。 > > > | | > 魏子涵 > | > | > 邮箱:wzh1007181...@163.com > | > > 签名由 网易邮箱大师 定制 > > 在2020年08月14日 13:27,yulu yang 写道: > 我这个flink作业和和分组都是新创建的,没有抽取历史 > group是新的 > > 魏子涵 于2020年8月14日周五

Re: TaskManagers are still up even after job execution completed in PerJob deployment mode

2020-08-13 Thread narasimha
Thanks, Till. Currently, the instance is getting timeout error and terminating the TaskManager. Sure, will try native K8s. On Thu, Aug 13, 2020 at 3:12 PM Till Rohrmann wrote: > Hi Narasimha, > > if you are deploying the Flink cluster manually on K8s then there is > no automatic way of

回复:请教大佬一个在flink调用kafka数据源时'scan.startup.mode'参数的使用问题

2020-08-13 Thread 魏子涵
建议先不使用flink的Kafka来消费,先自己编码写一个kafka消费看是否还是有这个问题,作个对比,看是否是flink提供的kafka接口的问题。 | | 魏子涵 | | 邮箱:wzh1007181...@163.com | 签名由 网易邮箱大师 定制 在2020年08月14日 13:27,yulu yang 写道: 我这个flink作业和和分组都是新创建的,没有抽取历史 group是新的 魏子涵 于2020年8月14日周五 下午1:20写道: > Kafka客户端的group. id参数有改吗? > > > | | > 魏子涵 > | > | >

Re: Tools for Flink Job performance testing

2020-08-13 Thread narasimha
Thanks, Arvid. The guide was helpful in how to start working with Flink. I'm currently exploring SQL/Table API. Will surely come back for queries on it. On Thu, Aug 13, 2020 at 1:25 PM Arvid Heise wrote: > Hi, > > performance testing is quite vague. Usually you start by writing a small >

Re: 请教大佬一个在flink调用kafka数据源时'scan.startup.mode'参数的使用问题

2020-08-13 Thread yulu yang
我这个flink作业和和分组都是新创建的,没有抽取历史 group是新的 魏子涵 于2020年8月14日周五 下午1:20写道: > Kafka客户端的group. id参数有改吗? > > > | | > 魏子涵 > | > | > 邮箱:wzh1007181...@163.com > | > > 签名由 网易邮箱大师 定制 > > 在2020年08月14日 12:44,yulu yang 写道: > 在flink作业中从kafka数据源获取数据,将 参数设置为'scan.startup.mode' = 'earliest-offset', >

回复:请教大佬一个在flink调用kafka数据源时'scan.startup.mode'参数的使用问题

2020-08-13 Thread 魏子涵
Kafka客户端的group. id参数有改吗? | | 魏子涵 | | 邮箱:wzh1007181...@163.com | 签名由 网易邮箱大师 定制 在2020年08月14日 12:44,yulu yang 写道: 在flink作业中从kafka数据源获取数据,将 参数设置为'scan.startup.mode' = 'earliest-offset', 检测flink运行结果时,发现只抽取了kafka中的newest数据,没有获取到oldest数据。 不知道是不是我这里'scan.startup.mode' 参数用的不对。 Flink 版本1.11.1 kafka版本

Re: 请教关于Flink算子FINISHED状态时无法保存Checkpoint的问题

2020-08-13 Thread yulu yang
对了,我这个flink作业和和分组都是新创建,不存在抽取历史。 杨豫鲁 于2020年8月13日周四 下午3:33写道: > 请教大家一个我最近在配置Flink流的过程中遇到问题, > > flink作业中关联使用了物理表(字典表),在flinkjob启动后,会对字典表进行一次读取,然后该算子会变成FINISHED状态,导致该flinkjob无法保存checkpoint和savepoint。一般大家遇到这种问题都是怎么处理的,我这个作业在数据加工过程中必须用到字典表赋值。 > > > > >

Flink任务写入kafka 开启了EOS,statebackend为rocksdb,任务反压source端日志量堆积从cp恢复失败

2020-08-13 Thread Yang Peng
Hi,咨询各位一个问题我们有个任务,statebackend为rocksdb 增量执行cp,flink读取kafka经过处理然后写入到kafka,producer开启了EOS,最近发现任务有反压,source端日志量有积压,然后准备改一下资源分配多加一些资源(没有修改并行度,代码未做修改)从cp恢复任务,任务被cancel之后然后从cp恢复发现起不来了连续两次都不行,由于客户端日志保存时间太短当时没来得及去查看客户端日志,所以没有找到客户端日志,

Re: [Flink-KAFKA-KEYTAB] Kafkaconsumer error Kerberos

2020-08-13 Thread Vijayendra Yadav
Hi Yangze, I tried the following: maybe I am missing something. https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/cli.html -yt,--yarnship Run: /usr/lib/flink/bin/flink run -m yarn-cluster -yt ${app_install_path}/conf my KRB5.conf is in ${app_install_path}/conf n master node

请教大佬一个在flink调用kafka数据源时'scan.startup.mode'参数的使用问题

2020-08-13 Thread yulu yang
在flink作业中从kafka数据源获取数据,将 参数设置为'scan.startup.mode' = 'earliest-offset', 检测flink运行结果时,发现只抽取了kafka中的newest数据,没有获取到oldest数据。 不知道是不是我这里'scan.startup.mode' 参数用的不对。 Flink 版本1.11.1 kafka版本 2.6.0

Re: [Flink-KAFKA-KEYTAB] Kafkaconsumer error Kerberos

2020-08-13 Thread Yangze Guo
Hi, When deploying Flink on Yarn, you could ship krb5.conf by "--ship" command. Notice that this command only supports to ship folders now. Best, Yangze Guo On Fri, Aug 14, 2020 at 11:22 AM Vijayendra Yadav wrote: > > Any inputs ? > > On Tue, Aug 11, 2020 at 10:34 AM Vijayendra Yadav >

Re: Avro format in pyFlink

2020-08-13 Thread Xingbo Huang
Hi Rodrigo, For the connectors, Pyflink just wraps the java implementation. And I am not an expert on Avro and corresponding connectors, but as far as I know, DataTypes really cannot declare the type of union you mentioned. Regarding the bytes encoding you mentioned, I actually have no good

Re: 如何设置FlinkSQL并行度

2020-08-13 Thread 赵一旦
检查点呢,大多数用FlinkSQL的同学们,你们的任务是随时可运行那种吗,不是必须保证不可间断的准确性级别吗? Xingbo Huang 于2020年8月14日周五 下午12:01写道: > Hi, > > 关于并行度的问题,据我所知,目前Table API上还没法对每一个算子单独设置并行度 > > Best, > Xingbo > > Zhao,Yi(SEC) 于2020年8月14日周五 上午10:49写道: > > > 并行度问题有人帮忙解答下吗,此外补充个相关问题,除了并行度,flink-sql情况下,能做检查点/保存点,并基于检查点/保存点重启sql任务吗。 > > > >

Re: flink 1.11 SQL idea调试无数据也无报错

2020-08-13 Thread Xingbo Huang
Hi, 这是因为flink 1.11里面executeSql是一个异步的接口,在idea里面跑的话,直接就结束了,你需要手动拿到那个executeSql的返回的TableResult,然后去 tableResult.getJobClient.get() .getJobExecutionResult(Thread.currentThread().getContextClassLoader) .get() 进行wait job finished Best, Xingbo DanielGu <610493...@qq.com> 于2020年8月14日周五

Re: Flink sql TUMBLE window 不支持offset吗

2020-08-13 Thread nobel一旦
所以总结下,实际不仅仅是 https://issues.apache.org/jira/browse/FLINK-17767 这一个问题,这个正式我讲到的UTC+8时区下天级别窗口正确划分的解决方案。 但FlinkSQL本身的eventtime的类型问题反而感觉更严重,造成各种误解等。时间戳是最准确的信息,既然采用了日期这种不准确的东西,就应该明确其时区信息。即使时区信息是被隐藏了,那么就正确考虑时区,而不是在将日期翻译回时间戳的时候默认采用了UTC+0的时区转回去,毕竟日期可能是UTC+8时区的表示。 nobel一旦 于2020年8月14日周五 上午11:49写道: >

Re: 如何设置FlinkSQL并行度

2020-08-13 Thread Xingbo Huang
Hi, 关于并行度的问题,据我所知,目前Table API上还没法对每一个算子单独设置并行度 Best, Xingbo Zhao,Yi(SEC) 于2020年8月14日周五 上午10:49写道: > 并行度问题有人帮忙解答下吗,此外补充个相关问题,除了并行度,flink-sql情况下,能做检查点/保存点,并基于检查点/保存点重启sql任务吗。 > > 发件人: "Zhao,Yi(SEC)" > 日期: 2020年8月13日 星期四 上午11:44 > 收件人: "user-zh@flink.apache.org" > 主题: 如何设置FlinkSQL并行度 > >

Re: Flink sql TUMBLE window 不支持offset吗

2020-08-13 Thread nobel一旦
窗口周期实际需求是UTC+8时区的(8月)14日0点~14日24点,实际对应UTC+0时区的(8月)*13日*16点~14日16点。 1 解释下为什么在FlinkSQL场景下时区设置正确情况下,窗口没划分错误。 *这个原因比较绕,这也是我想不通的点,作为疑问,希望有人解答(即为什么FlinkSQL使用TIMESTAMP(3)这种日期作为event timed定义,以及watermark计算的依据,而不是bigint的UTC+0的时间戳作为eventtime,和datastream api保持统一呢)*。

flink 1.11 SQL idea调试无数据也无报错

2020-08-13 Thread DanielGu
我遇到个问题,请教一下: 环境 1.11 idea 参考的wuchong大神的demo想把client变成java的,第一个例子 统计每小时的成交量 数据可以读到但是没有输出结果,写入es没反应,后改为print sink 还是没反应 https://github.com/wuchong/flink-sql-demo/tree/v1.11-CN 求助,各位 下面是pom 和代码,以及运行结果 // 创建执行环境 StreamExecutionEnvironment bsEnv =

Re: [Flink-KAFKA-KEYTAB] Kafkaconsumer error Kerberos

2020-08-13 Thread Vijayendra Yadav
Any inputs ? On Tue, Aug 11, 2020 at 10:34 AM Vijayendra Yadav wrote: > Dawid, I was able to resolve the keytab issue by passing the service name, > but now I am facing the KRB5 issue. > > Caused by: org.apache.kafka.common.errors.SaslAuthenticationException: > Failed to create SaslClient with

回复: 关于flink升级

2020-08-13 Thread 引领
谢谢各位大佬。我测测整体环境吧!!! | | 引领 | | yrx73...@163.com | 签名由网易邮箱大师定制 在2020年08月14日 10:51,caozhen 写道: 我升级1.7到1.11过程遇到主要下面的问题,可以参考下 1、编译mainjar阶段:flink api的变化,例如env创建有变化,某些类的包名有变化 2、编译mainjar阶段:flink、flinksql环境相关的依赖改为provided,不打进fat包,否则引发NoClassDefFoundError问题 3、测试运行阶段:资源设置:TM、JM 有重大变化

Re: k8s job cluster using StatefulSet

2020-08-13 Thread Yang Wang
Hi Alexey, Actually, StatefulSets could also be used to start the JobManager and TaskManager. So why do we suggest to use Deployment in the Flink documentation? * StatefulSets requires the user to have persistent volume in the K8s cluster. However, it is not always true, especially for the

Re: FlinkSQL命令行方式提交sql任务的jar依赖问题

2020-08-13 Thread Zhao,Yi(SEC)
补充: 刚刚翻了下源码,kafka那个直到原因了,见FlinkKafkaConsumer的288行,限定了必须是ByteArrayDeserializer,而且引用到了ByteArrayDeserializer类,这个是在new KafkaConsumer的过程就执行到的,所以这个依赖是提交端需要的。 按照 的讲法,flink-sql按照-j或-l指定的包会被上传,这个倒也合理,毕竟有些任务特定需要一些包,提供这个功能肯定有用。 但像connector,json,csv这种非常通用的包感觉应该统一放入集群就好,但实际按照这个情况来看无法做到。

Re:Re:Re:Re:Re:Re:Flink SQL No Watermark

2020-08-13 Thread forideal
Hi Zhou Zach: “但是,只有最后面两个算子有watermark,所以开启OperatorChaining后,因为前面3个没有watermark,整个chain的算子都没有watermark了,那么是不是就不能通过flink ui来监控watermark了,就依赖第三方监控工具来看watermark?因为上生产,肯定要开OperatorChaining的” 关于这个问题,我昨天也和李本超进行了线下沟通,大概的结论是: >1.如果不直接看每个operator的metrics,只看 flink ui 那个 graph 图,不进行

Re: 回复:关于flink升级

2020-08-13 Thread caozhen
我升级1.7到1.11过程遇到主要下面的问题,可以参考下 1、编译mainjar阶段:flink api的变化,例如env创建有变化,某些类的包名有变化 2、编译mainjar阶段:flink、flinksql环境相关的依赖改为provided,不打进fat包,否则引发NoClassDefFoundError问题 3、测试运行阶段:资源设置:TM、JM 有重大变化 4、测试运行阶段:解决flink和hadoop的依赖问题(1.11开始不再提供hadoop-shade依赖) --- Zhao,Yi(SEC) wrote >

Re: flink 1.11 日志不能正常打印问题

2020-08-13 Thread caozhen
我觉得可以看下用什么命令启动的,进到启动脚本里,跟踪下log的设置。 例如我用的standalone-job.sh start-forground启动, 跟踪到flink-console.sh 中,能看到具体log设置 zilong xiao wrote > 我也是用的properties配置文件,可是日志貌似没收集到,有什么方法可以判断配置文件是否生效吗 ? > > caozhen > caozhen1937@ > 于2020年8月14日周五 上午10:23写道: > >>

Re: 如何设置FlinkSQL并行度

2020-08-13 Thread Zhao,Yi(SEC)
并行度问题有人帮忙解答下吗,此外补充个相关问题,除了并行度,flink-sql情况下,能做检查点/保存点,并基于检查点/保存点重启sql任务吗。 发件人: "Zhao,Yi(SEC)" 日期: 2020年8月13日 星期四 上午11:44 收件人: "user-zh@flink.apache.org" 主题: 如何设置FlinkSQL并行度 看配置文件有 execution. Parallelism,但这个明显是全局类型的配置。 如何给Sql生成的数据源结点,window结点,sink结点等设置不同的并行度呢?

Re: [EXTERNAL] Re: Native K8S Jobmanager restarts and job never recovers

2020-08-13 Thread Yang Wang
Hi kevin, Thanks for sharing more information. You are right. Actually, "too old resource version" is caused by a bug of fabric8 kubernetes-client[1]. It has been fix in v4.6.1. And we have bumped the kubernetes-client version to v4.9.2 in Flink release-1.11. Also it has been backported to

Re: FlinkSQL命令行方式提交sql任务的jar依赖问题

2020-08-13 Thread Zhao,Yi(SEC)
分析个报错,报错如下: [ERROR] Could not execute SQL statement. Reason: org.apache.flink.table.api.NoMatchingTableFactoryException: Could not find a suitable table factory for 'org.apache.flink.table.factories.TableSourceFactory' in the classpath. Reason: Required context properties mismatch. The

Re: Flink 1.11 SQL error: No operators defined in streaming topology. Cannot execute.

2020-08-13 Thread Leonard Xu
Hi, Weizheng > 在 2020年8月13日,19:44,Danny Chan 写道: > > tEnv.executeSql would execute the SQL asynchronously, e.g. submitting a job > to the backend cluster with a builtin job name `tEnv.executeSql` is an asynchronous method which will submit the job immediately. If you’re test in your IDE,

Re: 回复:关于flink升级

2020-08-13 Thread Zhao,Yi(SEC)
经历了1.7到1.8,1.8到1.9,1.9到1.10;前2还好,最后一个有些坑,jdk8版本不要太旧,某个版本和1.10配合会有bug。 在 2020/8/14 上午9:25,“蒋佳成(Jiacheng Jiang)”<920334...@qq.com> 写入: 1.10有了新的内存模型,没弄清楚这些内存配置前,可能跑不起job!建议先弄清楚,在测试环境上先搞搞--原始邮件-- 发件人:引领

Re: flink 1.11 日志不能正常打印问题

2020-08-13 Thread zilong xiao
我也是用的properties配置文件,可是日志貌似没收集到,有什么方法可以判断配置文件是否生效吗 ? caozhen 于2020年8月14日周五 上午10:23写道: > log4j2的配置:我是直接用的flink1.11.1客户端提供的log4j-console.properties。 > > 如果你是用的xml、yaml文件,在客户端提交作业时可能要指定一下日志文件,也可以改下flink启动脚本的日志设置 > > > > -- > Sent from: http://apache-flink.147419.n8.nabble.com/

Re: Hostname for taskmanagers when running in docker

2020-08-13 Thread Xintong Song
Hi Nikola, I'm not entirely sure about how this happened. Would need some more information to investigate, such as the complete configurations for taskmanagers in your docker compose file, and the taskmanager logs. One quick thing you may try is to explicitly set the configuration option

Re: flink-1.10.1 想用 DDL 入 ES5.6

2020-08-13 Thread Leonard Xu
Hi, 我贴的链接里有对应的PR[1], 你可以看看这个PR里的代码,代码入口可以从 Elasticsearch6DynamicSink.java 开始 比如你自己实现了Elasticsearch5DynamicSink

Re: flink 1.11 日志不能正常打印问题

2020-08-13 Thread caozhen
log4j2的配置:我是直接用的flink1.11.1客户端提供的log4j-console.properties。 如果你是用的xml、yaml文件,在客户端提交作业时可能要指定一下日志文件,也可以改下flink启动脚本的日志设置 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: flink 1.11 日志不能正常打印问题

2020-08-13 Thread zilong xiao
想问下你是用的什么类型的配置 xml、yaml还是properties呢? caozhen 于2020年8月14日周五 上午9:58写道: > 我最后用的是log4j2。 > > > 之前mainjar中有很多log4j的依赖(slf4j-log4j12),而flink客户端lib下是log4j2的依赖(log4j-slf4j-impl),导致了冲突,不能打印日志。 > > 改动:把mainjar中的log4j的依赖改成provided,使用了客户端提供的log4j2依赖 > > > > -- > Sent from:

Re: 求助:flink 1.11.1 yarn perjob 模式配置zookeeper的HA后application提交超时,1.10时正常的

2020-08-13 Thread Yang Wang
perjob模式在1.10到1.11是没有变化的,只是1.11新增了application模式,Zookeeper的HA也没有变化 还是得你分享一下提交失败时候的Client端和JM端的log,这样才能方便查问题 Best, Yang x2009438 于2020年8月13日周四 下午5:35写道: > 各位, > > 今天从1.10.0升级到1.11.1,结果yarn per job 提交作业,配置zookeeper的HA之后作业提交超时,有人碰到过吗? > 看日志也没记录什么。 > > 配置是从1.10.0上粘贴过来的,1.10.0是正常可用的。 > > > > >

Re: flink-1.10.1 想用 DDL 入 ES5.6

2020-08-13 Thread kcz
查看您说的[1]的url之后,发现里面并没有跟 es sql jar有关的。 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: Avro format in pyFlink

2020-08-13 Thread rodrigobrochado
The upload of the schema through Avro(avro_schema) worked, but I had to select one type from the union type to put in Schema.field(field_type) inside t_env.connect(). If my dict has long and double values, and I declare Schema.field(DataTypes.Double()), all the int values are cast to double. My

Re: flink 1.11.1 sql client 流式 join 出现预期之外的含 null 行

2020-08-13 Thread godfrey he
可以把原始的计算结果打印出来,执行 set execution.result-mode=changelog (如果source有delete消息,可能会出现null值) LittleFall <1578166...@qq.com> 于2020年8月13日周四 下午3:33写道: > mysql 的建表语句 > use test; > create table base ( > id int primary key, > location varchar(20) > ); > create table stuff( > id int

Re: flink 1.11 日志不能正常打印问题

2020-08-13 Thread caozhen
我最后用的是log4j2。 之前mainjar中有很多log4j的依赖(slf4j-log4j12),而flink客户端lib下是log4j2的依赖(log4j-slf4j-impl),导致了冲突,不能打印日志。 改动:把mainjar中的log4j的依赖改成provided,使用了客户端提供的log4j2依赖 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: FlinkSQL命令行方式提交sql任务的jar依赖问题

2020-08-13 Thread godfrey he
sql client 中通过 -j 或者 -l 的指定的包会被随着job提交的时候一起上传到jm。 Zhao,Yi(SEC) 于2020年8月13日周四 下午5:11写道: > A是10机器集群(HA模式,独立集群),B作为提交机器。 > 从我实验效果来看,我是先启动一个sql-client的cli,如下命令: > ./bin/sql-client.sh embedded -l $(pwd)/libs_sql -l $(pwd)/libs_udf > 其中libs_sql中有:flink-connector-kafka_2.12-1.10.0.jar >

Re: 请问在 flink sql 中建立的多张表应当怎样分辨接收 kafka 传来的 canal-json?

2020-08-13 Thread Leonard Xu
Hello 现在只支持一个topic里包含单张表的changelog, 你这个case相当于用了一个topic里包含多张表的changelog,只是twocol在解析binlog时 a,b 字段找不到,你配置ignore-parse-errors就会返回(null,null) 建议每张chagnelog表和一个topic对应就可以了 祝好 Leonard > 在 2020年8月13日,19:55,LittleFall <1578166...@qq.com> 写道: > > 这是我在 flink sql 中建立的两张表: > create table base ( >id

Re: flink 1.11 日志不能正常打印问题

2020-08-13 Thread caozhen
恩,是这样,把mainjar中log4j的依赖挨个都provided了。 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re:关于flink升级

2020-08-13 Thread USERNAME
官网有升级建议 https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/upgrading.html#compatibility-table 在 2020-08-14 09:15:53,"引领" 写道: >我们的flink是在1.7版本的,所以这次想对flink进行升级,但升级建议直接升级flink1.11.1么?有木有大佬在生产环境部署的么? > > >| | >引领 >| >| >yrx73...@163.com >| >签名由网易邮箱大师定制 >

关于flink升级

2020-08-13 Thread 引领
我们的flink是在1.7版本的,所以这次想对flink进行升级,但升级建议直接升级flink1.11.1么?有木有大佬在生产环境部署的么? | | 引领 | | yrx73...@163.com | 签名由网易邮箱大师定制

Re: [EXTERNAL] Re: Native K8S Jobmanager restarts and job never recovers

2020-08-13 Thread Bohinski, Kevin
Might be useful https://stackoverflow.com/a/61437982 Best, kevin From: "Bohinski, Kevin" Date: Thursday, August 13, 2020 at 6:13 PM To: Yang Wang Cc: "user@flink.apache.org" Subject: Re: [EXTERNAL] Re: Native K8S Jobmanager restarts and job never recovers Hi Got the logs on crash,

Re: [EXTERNAL] Re: Native K8S Jobmanager restarts and job never recovers

2020-08-13 Thread Bohinski, Kevin
Hi Got the logs on crash, hopefully they help. 2020-08-13 22:00:40,336 ERROR org.apache.flink.kubernetes.KubernetesResourceManager[] - Fatal error occurred in ResourceManager. io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 8617182 (8633230)

Performance Flink streaming kafka consumer sink to s3

2020-08-13 Thread Vijayendra Yadav
Hi Team, I am trying to increase throughput of my flink stream job streaming from kafka source and sink to s3. Currently it is running fine for small events records. But records with large payloads are running extremely slow like at rate 2 TPS. Could you provide some best practices to tune?

Re: What async database library does the asyncio code example use?

2020-08-13 Thread Marco Villalobos
Thank you! This was very helpful. Sincerely, Marco A. Villalobos > On Aug 13, 2020, at 1:24 PM, Arvid Heise wrote: > > Hi Marco, > > you don't need to use an async library; you could simply write your code in > async fashion. > > I'm trying to sketch the basic idea using any JDBC driver

Re: What async database library does the asyncio code example use?

2020-08-13 Thread Arvid Heise
Hi Marco, you don't need to use an async library; you could simply write your code in async fashion. I'm trying to sketch the basic idea using any JDBC driver in the following (it's been a while since I used JDBC, so don't take it too literally). private static class SampleAsyncFunction extends

Re: Flink Parquet Streaming FileSink with scala case class with optional fields error

2020-08-13 Thread Arvid Heise
Hi Vikash, The error is coming from Parquet itself in conjunction with Avro (which is used to infer the schema of your scala class). The inferred schema is { "fields": [ { "name": "level", "type": "string" }, { "name": "time_stamp",

[Announce] Flink Forward Global Program is now Live

2020-08-13 Thread Seth Wiesman
Hi Everyone *The Flink Forward Global 2020 program is now online* and with 2 full days of exciting Apache Flink content, curated by our program committee[1]! Join us on October 21-22 to learn more about the newest technology updates, and hear use cases from Intel, Razorpay, Workday, Microsoft,

Re: Status of a job when a kafka source dies

2020-08-13 Thread Nick Bendtner
Hi Piotr, Sorry for the late reply. So the poll does not throw an exception when a broker goes down. In spring they solve it by generating an event [1] whenever this happens and you can intercept this event, consumer.timeout.ms helps to some extent does help but if the source topic does not

Re: Client's documentation for deploy and run remotely.

2020-08-13 Thread Jacek Grzebyta
It seems the documentation might be outdated. Probably I found what I wanted in different request: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Submit-Flink-1-11-job-from-java-td37245.html Cheers, Jacek On Thu, 13 Aug 2020 at 14:23, Jacek Grzebyta wrote: > Hi, > > I have

Re: Flink cluster deployment strategy

2020-08-13 Thread sidhant gupta
Thanks, I will check it out. On Thu, 13 Aug, 2020, 7:55 PM Arvid Heise, wrote: > Hi Sidhant, > > If you are starting fresh with Flink, I strongly recommend to skip ECS and > EMR and directly go to a kubernetes-based solution. Scaling is much easier > on K8s, there will be some kind of

Re: Question about ParameterTool

2020-08-13 Thread Arvid Heise
Since Picocli does not have any dependencies on its own, it's safe to use. It's a bit quirky to use with Scala, but it's imho the best CLI library for java. The only downside as Chesnay mentioned is the increased jar size. Also note that Flink is not graal-ready. Best, Arvid On Wed, Aug 12,

Re: Flink cluster deployment strategy

2020-08-13 Thread Arvid Heise
Hi Sidhant, If you are starting fresh with Flink, I strongly recommend to skip ECS and EMR and directly go to a kubernetes-based solution. Scaling is much easier on K8s, there will be some kind of autoscaling coming in the next release, and the best of it all: you even have the option to go to a

Re: Flink job percentage

2020-08-13 Thread Arvid Heise
Hi Flavio, This is a daunting task to implement properly. There is an easy fix in related workflow systems though. Assuming that it's a rerunning task, then you simply store the run times of the last run, use some kind of low-pass filter (=decaying average) and compare the current runtime with

Re: flink 1.11 日志不能正常打印问题

2020-08-13 Thread shizk233
flink框架里用的slf4j吧,log4j2只是一种具体实现,应该是可以直接替换掉的。 就是把flink发行包下log4j2相关的jar替换成log4j的jar,当然,相应的配置文件也要改成log4j支持的配置。 caozhen 于2020年8月13日周四 下午3:39写道: > flink1.11好像是用的log4j2,我的mainjar用到了log4j, 两者类有冲突,导致JM、TM日志为空。 > > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in >

Client's documentation for deploy and run remotely.

2020-08-13 Thread Jacek Grzebyta
Hi, I have a problem with some examples in the documentation. Particularly I meant about that paragraph: https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/parallel.html#client-level In the code there are used classes such as: Client and RemoteExecutor. I found those classes in the

Re: Using managed keyed state with AsynIo

2020-08-13 Thread KristoffSC
Hi Arvid, thank you for the respond. Yeah I tried to run my job shortly after posting my message and I got "State is not supported in rich async function" ;) I came up with a solution that would solve my initial problem - concurrent/Async problem of processing messages with the same key but

请问在 flink sql 中建立的多张表应当怎样分辨接收 kafka 传来的 canal-json?

2020-08-13 Thread LittleFall
这是我在 flink sql 中建立的两张表: create table base ( id int, location varchar(20) )WITH ( 'connector' = 'kafka', 'topic' = 'example', 'properties.group.id' = 'testGroup', 'scan.startup.mode' = 'latest-offset', 'properties.bootstrap.servers' = 'localhost:9092',

Re: Flink 1.11 SQL error: No operators defined in streaming topology. Cannot execute.

2020-08-13 Thread Danny Chan
Weighing ~ tEnv.executeSql would execute the SQL asynchronously, e.g. submitting a job to the backend cluster with a builtin job name, the tEnv.executeSql itself did return a JobResult immediately with a constant affected rows count -1. Best, Danny Chan 在 2020年8月13日 +0800 PM3:46,Lu Weizheng

Re: TaskManagers are still up even after job execution completed in PerJob deployment mode

2020-08-13 Thread Till Rohrmann
Hi Narasimha, if you are deploying the Flink cluster manually on K8s then there is no automatic way of stopping the TaskExecutor/TaskManager pods. This is something you have to do manually (similar to a standalone deployment). The only clean up mechanism is the automatic termination of the

Re: getting error after upgrade Flink 1.11.1

2020-08-13 Thread Kostas Kloudas
Hi Dasraj, Yes, I would recommend to use Public and, if necessary, PublicEvolving APIs as they provide better guarantees for future maintenance. Unfortunately there are no Docs about which APIs are public or publiceEvolving but you can see the annotations of the classes in the source code. I

求助:flink 1.11.1 yarn perjob 模式配置zookeeper的HA后application提交超时,1.10时正常的

2020-08-13 Thread x2009438
各位, 今天从1.10.0升级到1.11.1,结果yarn per job 提交作业,配置zookeeper的HA之后作业提交超时,有人碰到过吗? 看日志也没记录什么。 配置是从1.10.0上粘贴过来的,1.10.0是正常可用的。 发自我的iPhone

Re: TaskManagers are still up even after job execution completed in PerJob deployment mode

2020-08-13 Thread Kostas Kloudas
Hi Narasimha, I am not sure why the TMs are not shutting down, as Yun said, so I am cc'ing Till here as he may be able to shed some light. For the application mode, the page in the documentation that you pointed is the recommended way to deploy an application in application mode. Cheers, Kostas

Re:Re: FLINK1.11.1 对OGG数据入HIVE的问题咨询

2020-08-13 Thread USERNAME
感谢您的回复,您说的这个方法类似于 “采用通过表结构”如下结构,屏蔽掉 用table分区,每个表的data部分弱化到一个字段中,使用的时候再通过解析json方式来从 "before"或者"after"中获取对应表的字段及数据, 这种方式确实拓展跟灵活性强很多,牺牲掉部分易用性。 看到很多大公司 美团 字节等 都有基于flink的实时数仓,不知道他们是怎么解决这种大量表入仓的 拓展灵活易用性的 create table TABLENAME ( table STRING, op_type STRING,

Re: FlinkSQL命令行方式提交sql任务的jar依赖问题

2020-08-13 Thread Zhao,Yi(SEC)
A是10机器集群(HA模式,独立集群),B作为提交机器。 从我实验效果来看,我是先启动一个sql-client的cli,如下命令: ./bin/sql-client.sh embedded -l $(pwd)/libs_sql -l $(pwd)/libs_udf 其中libs_sql中有:flink-connector-kafka_2.12-1.10.0.jar flink-connector-kafka-base_2.12-1.10.0.jar flink-jdbc_2.12-1.10.0.jar

Re:Re: Re:Re:Re:Re:Flink SQL No Watermark

2020-08-13 Thread Zhou Zach
Hi,试了,将并行度设置为2和kafka分区数9,都试了,都只有一个consumer有watermark,可能是因为我开了一个producer吧 在 2020-08-13 16:57:25,"Shengkai Fang" 写道: >hi, watermark本来就是通过watermark assigner生成的。这是正常现象。 >我想问问 你有没有试过调大并行度来解决这个问题?因为不同partition的数据可能存在时间上的差异。 > >Zhou Zach 于2020年8月13日周四 下午4:33写道: > >> >> >> >> Hi

Re: Re:Re:Re:Re:Flink SQL No Watermark

2020-08-13 Thread Shengkai Fang
hi, watermark本来就是通过watermark assigner生成的。这是正常现象。 我想问问 你有没有试过调大并行度来解决这个问题?因为不同partition的数据可能存在时间上的差异。 Zhou Zach 于2020年8月13日周四 下午4:33写道: > > > > Hi forideal, Shengkai Fang, > > 加上env.disableOperatorChaining()之后,发现5个算子, > > > > > Source: TableSourceScan(table=[[default_catalog, default_database,

Re: user-zh

2020-08-13 Thread Xingbo Huang
Hi, 退订请发邮件到 user-zh-unsubscr...@flink.apache.org 详细的可以参考 [1] [1] https://flink.apache.org/zh/community.html#section-1 Best, Xingbo 15037433...@163.com <15037433...@163.com> 于2020年8月13日周四 下午3:40写道: > > 退订 > > > 15037433...@163.com >

Re: 退订

2020-08-13 Thread Xingbo Huang
Hi, 退订请发邮件到 user-zh-unsubscr...@flink.apache.org 详细的可以参考 [1] [1] https://flink.apache.org/zh/community.html#section-1 Best, Xingbo 李强 于2020年8月13日周四 下午4:35写道: > 退订

Re: k8s job cluster using StatefulSet

2020-08-13 Thread Arvid Heise
Hi Alexey, I don't see any issue in using stateful sets immediately. I'd recommend using one of the K8s operators or Ververica's community edition [1] though if you start with a new setup as they may solve even more issues that you might experience in the future. [1]

Re: FLINK1.11.1 对OGG数据入HIVE的问题咨询

2020-08-13 Thread Rui Li
你提到的这三个难点现在的hive connector确实是支持不了的。前两个也许可以通过把写不同的表变成写同一个表的不同分区来解决。第三个可能可以通过检查数据跟目标schema是不是匹配,来判断是不是需要去跟HMS同步新的schema。 On Thu, Aug 13, 2020 at 3:27 PM USERNAME wrote: > > > 任务流程: > OGG->KAFKA->FLINK->HIVE > > > KAFKA数据样例: > 其中会有多个 >

退订

2020-08-13 Thread 李强
退订

Re:Re:Re:Re:Re:Flink SQL No Watermark

2020-08-13 Thread Zhou Zach
Hi forideal, Shengkai Fang, 加上env.disableOperatorChaining()之后,发现5个算子, Source: TableSourceScan(table=[[default_catalog, default_database, user]], fields=[uid, sex, age, created_time]) -> Calc(select=[uid, sex, age, created_time, () AS procTime, TO_TIMESTAMP(((created_time / 1000)

Re: Using managed keyed state with AsynIo

2020-08-13 Thread Arvid Heise
Hi KristoffSC, Afaik asyncIO does not support state operations at all because of your mentioned issues (RichAsyncFunction fails if you access state). I'd probably solve it by having a map or process function before and after the asyncIO for the state operations. If you enable object reuse,

Re: Tools for Flink Job performance testing

2020-08-13 Thread Arvid Heise
Hi, performance testing is quite vague. Usually you start by writing a small first version of your pipeline and check how the well computation scales on your data. Flink's web UI [1] already helps quite well for the first time. Usually you'd also add some metric system and look for advanced

Re: Flink 1.11 SQL error: No operators defined in streaming topology. Cannot execute.

2020-08-13 Thread Lu Weizheng
Thanks Timo, So no need to use execute() method in Flink SQL If I do all the thins from source to sink in SQL. Best Regards, Lu > 2020年8月13日 下午3:41,Timo Walther 写道: > > Hi Lu, > > `env.execute("table api");` is not necessary after FLIP-84 [1]. Every method > that has `execute` in its name

Re: Flink 1.11 SQL error: No operators defined in streaming topology. Cannot execute.

2020-08-13 Thread Timo Walther
Hi Lu, `env.execute("table api");` is not necessary after FLIP-84 [1]. Every method that has `execute` in its name will immediately execute a job. Therefore your `env.execute` has an empty pipeline. Regards, Timo [1] https://wiki.apache.org/confluence/pages/viewpage.action?pageId=134745878

Re: State Processor API to boot strap keyed state for Stream Application.

2020-08-13 Thread Arvid Heise
For future readers: this thread has been resolved in "Please help, I need to bootstrap keyed state into a stream" on the user mailing list asked by Marco. On Fri, Aug 7, 2020 at 11:52 PM Marco Villalobos wrote: > I have read the documentation and various blogs that state that it is > possible

Re: Re:Re: 用hive streaming写 orc文件的问题

2020-08-13 Thread Rui Li
如果是IDE里执行的话,tableEnv.executeSql是马上返回的,然后就退出了,可以用类似这种写法等作业结束: val tableResult = tEnv.executeSql(insert) // wait to finish tableResult.getJobClient.get .getJobExecutionResult(Thread.currentThread.getContextClassLoader) .get > 为什么hive streaming 生成orc文件需要导入flink-orc_2.11jar包,而parquet不需要?

Re:Re:Re:Re:Flink SQL No Watermark

2020-08-13 Thread forideal
Hi Zhou Zach: 你可以试试 env.disableOperatorChaining(); 然后观察每个 op 的 watermark 情况。这样能够简单的看下具体的情况。 > 我是怎么设置参数的 我使用的是 Flink SQL Blink Planner,采用的设置方式和你一样 tableEnv.getConfig().getConfiguration() .setString(key, configs.getString(key, null)); 同时我在 source table 中定义了 WATERMARK FOR event_time AS event_time -

user-zh

2020-08-13 Thread 15037433...@163.com
退订 15037433...@163.com

flink 1.11 日志不能正常打印问题

2020-08-13 Thread caozhen
flink1.11好像是用的log4j2,我的mainjar用到了log4j, 两者类有冲突,导致JM、TM日志为空。 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/job.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in

Re: Is there a way to start a timer without ever receiving an event?

2020-08-13 Thread Timo Walther
What you can do is creating an initial control stream e.g. using `StreamExecutionEnivronment.fromElements()` and either use `union(controlStream, actualStream)` or use `actualStream.connect(controlStream)`. Regards, Timo On 12.08.20 18:15, Andrey Zagrebin wrote: I do not think so. Each

Flink 1.11 SQL error: No operators defined in streaming topology. Cannot execute.

2020-08-13 Thread Lu Weizheng
Hi, I am using Flink 1.11 SQL using java. All my operations are in SQL. I create source tables and insert result into sink tables. No other Java operators. I execute it in Intellij. I can get the final result in the sink tables. However I get the following error. I am not sure it is a bug or

flink 1.11.1 sql client 流式 join 出现预期之外的含 null 行

2020-08-13 Thread LittleFall
mysql 的建表语句 use test; create table base ( id int primary key, location varchar(20) ); create table stuff( id int primary key, b_id int, name varchar(20) ); flink sql client 的建表语句 create table base ( id int primary key, location varchar(20)

Re: Flink CPU load metrics in K8s

2020-08-13 Thread Arvid Heise
Hi Abhinav, according to [1], you need 8u261 for the OperatingSystemMXBean to work as expected. [1] https://bugs.openjdk.java.net/browse/JDK-8242287 On Thu, Aug 13, 2020 at 1:10 AM Bajaj, Abhinav wrote: > Thanks Xintong for your input. > > > > From the information I could find, I understand

请教关于Flink算子FINISHED状态时无法保存Checkpoint的问题

2020-08-13 Thread 杨豫鲁
请教大家一个我最近在配置Flink流的过程中遇到问题, flink作业中关联使用了物理表(字典表),在flinkjob启动后,会对字典表进行一次读取,然后该算子会变成FINISHED状态,导致该flinkjob无法保存checkpoint和savepoint。一般大家遇到这种问题都是怎么处理的,我这个作业在数据加工过程中必须用到字典表赋值。

flink 1.11 SQL Client 流式 join 出现了不应有的含 null 的行

2020-08-13 Thread LittleFall
在 mysql 上的建表语句: use test; create table base ( id int primary key, location varchar(20) ); create table stuff( id int primary key, b_id int, name varchar(20) ); 在 flink sql client 中的建表语句: create table base ( id int primary key, location

Re: Re:Re:Flink SQL No Watermark

2020-08-13 Thread Shengkai Fang
hi 那你有没有试过将并行度设置为partition的数量 Zhou Zach 于2020年8月13日 周四下午3:21写道: > > > > Hi forideal, > 我也遇到了No Watermark问题,我也设置了table.exec.source.idle-timeout 参数,如下: > > > val streamExecutionEnv = > StreamExecutionEnvironment.getExecutionEnvironment > >

FLINK1.11.1 对OGG数据入HIVE的问题咨询

2020-08-13 Thread USERNAME
任务流程: OGG->KAFKA->FLINK->HIVE KAFKA数据样例: 其中会有多个 "table",所以"before","after"中的字段是不一致的,同一个表如果有有DDL变更也会导致"before","after"字段的变更。 { "table": "SCOOT.TABLENAME", "op_type": "U", "op_ts": "2020-08-11 07:53:40.008001", "current_ts": "2020-08-11T15:56:41.233000", "pos":

Re:Re:Re:Flink SQL No Watermark

2020-08-13 Thread Zhou Zach
Hi forideal, 我也遇到了No Watermark问题,我也设置了table.exec.source.idle-timeout 参数,如下: val streamExecutionEnv = StreamExecutionEnvironment.getExecutionEnvironment streamExecutionEnv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) streamExecutionEnv.setStateBackend(new

Re:Re: flink1.11.1 flink on yarn 任务启动报错

2020-08-13 Thread 郭华威
好的,谢谢! 在 2020-08-13 14:08:07,"Congxian Qiu" 写道: >Hi > > 这应该是个已知问题[1] 在 1.11.2 和 1.12 中已经修复 > >[1] https://issues.apache.org/jira/browse/FLINK-18710 >Best, >Congxian > > >郭华威 于2020年8月13日周四 上午11:05写道: > >> 你好,请教下: >> flink1.11.1 flink on yarn 任务启动报错: >> >> >> 启动命令: >>

Re: FlinkSQL命令行方式提交sql任务的jar依赖问题

2020-08-13 Thread Jeff Zhang
你的10台机器是flink standalone还是 yarn集群 ? 如果是yarn集群,那么只要把依赖放到机器A就可以了,如果是standalone的话,就要在B集群上部署了依赖了。 另外你可以用zeppelin来提交flink sql 作业,zeppelin提供很多种添加第三方依赖的方法,具体可以参考文档 https://www.yuque.com/jeffzhangjianfeng/gldg8w/rn6g1s 或者加入钉钉群讨论,钉钉群号: 32803524 Zhao,Yi(SEC) 于2020年8月13日周四 下午1:02写道: >

????????????????flink

2020-08-13 Thread ??????
kafka0.10??flink1.10.flinkkafka

HBase Sink报错:UpsertStreamTableSink requires that Table has a full primary keys

2020-08-13 Thread xiao cai
Hi All: 使用flink-sql写入hbase sink时报错: UpsertStreamTableSink requires that Table has a full primary keys if it is updated. 我共创建了4张表,1张kafka source表,3张hbase 维表,1张hbase sink表 kafka source表与hbase 维表left join后的结果insert到hbase sink表中: sql如下: create table user_click_source( `id` bigint, `name` varchar,

  1   2   >