State migration for sql job

2021-06-07 Thread aitozi
When use flink sql, we encounter a big problem to deal with sql state compatibility.Think we have a group agg sql like ```sqlselect sum(`a`) from source_t group by `uid But if i want to add a new agg column to ```sqlselect sum(`a`), max(`a`) from source_t group by `uidThen sql state will

Re: Re: Add control mode for flink

2021-06-07 Thread Yun Gao
Very thanks Jiangang for bringing this up and very thanks for the discussion! I also agree with the summarization by Xintong and Jing that control flow seems to be a common buidling block for many functionalities and dynamic configuration framework is a representative application that

Re: How to configure column width in Flink SQL client?

2021-06-07 Thread Ingo Bürk
Hi Svend, unfortunately the column width in the SQL client cannot currently be configured. Regards Ingo On Mon, Jun 7, 2021 at 4:19 PM Svend wrote: > > Hi everyone, > > When using the Flink SQL client and displaying results interactively, it > seems the values of any column wider than 24

回复:【问题分析】Fink任务无限反压

2021-06-07 Thread smq
之前遇到过在sink到kudu的时候出现反压很严重,主要原因是测试数据不当的问题,根据我的经验,比较多的是下游io瓶颈,可以到sink组件的日志查看问题 -- 原始邮件 -- 发件人: yidan zhao http://apache-flink.147419.n8.nabble.com/

Re: DataStream API in Batch Execution mode

2021-06-07 Thread Guowei Ma
Hi, Macro I think you could try the `FileSource` and you could find an example from [1]. The `FileSource` would scan the file under the given directory recursively. Would you mind opening an issue for lacking the document? [1]

Re: Add control mode for flink

2021-06-07 Thread kai wang
I'm big +1 for this feature. 1. Limit the input qps. 2. Change log level for debug. in my team, the two examples above are needed JING ZHANG 于2021年6月8日周二 上午11:18写道: > Thanks Jiangang for bringing this up. > As mentioned in Jiangang's email, `dynamic configuration framework` > provides

回复:flink on yarn日志清理

2021-06-07 Thread 王刚
你可以在客户端的log4j.properties或者logback.xml文件上配置下相关清理策略 你先确认下使用的哪个slf4j的实现类 原始邮件 发件人: zjfpla...@hotmail.com 收件人: user-zh 发送时间: 2021年6月7日(周一) 12:17 主题: flink on yarn日志清理 大家好, 请问下如下问题: flink on yarn模式,日志清理机制有没有的? 是不是也是按照log4j/logback/log4j2等的清理机制来的?还是yarn上配置的。 是实时流作业,非离线一次性作业,一直跑着的

Re: Add control mode for flink

2021-06-07 Thread JING ZHANG
Thanks Jiangang for bringing this up. As mentioned in Jiangang's email, `dynamic configuration framework` provides many useful functions in Kuaishou, because it could update job behavior without relaunching the job. The functions are very popular in Kuaishou, we also see similar demands in

Re: 【问题分析】Fink任务无限反压

2021-06-07 Thread yidan zhao
怎么说的,这个应该不是代码层面的问题,所以感觉贴代码意义不大。 我是想看有没有类似的什么idea,什么情况可能呈现类似现象。 就是直接停滞了,我发现的时候所有算子都非反压,只要source处于反压状态。 然后source到下一个算子直接的records sent和records received数量已经彻底不变(眼看好几分钟也不变)。 HunterXHunter <1356469...@qq.com> 于2021年6月8日周二 上午11:13写道: > > 掐头去尾的提问,完全不知道是什么问题,没法回答你,最好是贴出代码,贴出图片等大家才能帮忙分析 > > > > -- > Sent

Re:Re: 【问题分析】Fink任务无限反压

2021-06-07 Thread 东东
反压也可能是下游IO跟不上造成的啊,严重的话上游确实就不工作了呗 在 2021-06-08 11:11:34,"yidan zhao" 写道: >正常的反压遇到过很多,也解决过很多。 但我现在这个问题类似的也是很多次了,很奇怪。就是反压到CPU利用率反而成0,就是不工作了。 > >LakeShen 于2021年6月8日周二 上午10:37写道: >> >> 你可以先结合你的任务逻辑,以及 Flink Web UI 反压监控,看看到底是什么地方引起反压。 >> 一般 source 反压,是下游的算子一直反压到 source。可以看看那个算子引起 >> >> Best, >>

Re: 【问题分析】Fink任务无限反压

2021-06-07 Thread HunterXHunter
掐头去尾的提问,完全不知道是什么问题,没法回答你,最好是贴出代码,贴出图片等大家才能帮忙分析 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 【问题分析】Fink任务无限反压

2021-06-07 Thread yidan zhao
正常的反压遇到过很多,也解决过很多。 但我现在这个问题类似的也是很多次了,很奇怪。就是反压到CPU利用率反而成0,就是不工作了。 LakeShen 于2021年6月8日周二 上午10:37写道: > > 你可以先结合你的任务逻辑,以及 Flink Web UI 反压监控,看看到底是什么地方引起反压。 > 一般 source 反压,是下游的算子一直反压到 source。可以看看那个算子引起 > > Best, > LakeShen > > > yidan zhao 于2021年6月8日周二 上午10:28写道: > > > 该任务有时候正常,偶尔反压。 > >

Re: sql multisink explain result is more expensive than expected

2021-06-07 Thread Luan Cooper
should I post in Dev user list ? On Mon, 7 Jun 2021 at 18:56 Luan Cooper wrote: > Hi > > We're using multi sink in sql with view, the TestCase is > > """java > @Test > def testJoinTemporalTableWithViewWithFilterPushDown(): Unit = { > createLookupTable("LookupTableAsync1", new

Re: 【问题分析】Fink任务无限反压

2021-06-07 Thread LakeShen
你可以先结合你的任务逻辑,以及 Flink Web UI 反压监控,看看到底是什么地方引起反压。 一般 source 反压,是下游的算子一直反压到 source。可以看看那个算子引起 Best, LakeShen yidan zhao 于2021年6月8日周二 上午10:28写道: > 该任务有时候正常,偶尔反压。 > 最近观察发现,反压时,kafkaSouce节点100%反压到停滞,后续算子什么也收不到,任务整体停滞。 > > 这类错误遇到过很多次了,目前我生产中flink有个很大问题就是这些稳定性,压力大不是需要时间去追赶,而是压力一大就整体处于停滞状态。 >

【问题分析】Fink任务无限反压

2021-06-07 Thread yidan zhao
该任务有时候正常,偶尔反压。 最近观察发现,反压时,kafkaSouce节点100%反压到停滞,后续算子什么也收不到,任务整体停滞。 这类错误遇到过很多次了,目前我生产中flink有个很大问题就是这些稳定性,压力大不是需要时间去追赶,而是压力一大就整体处于停滞状态。

Re: Flink checkpoint 速度很慢 问题排查

2021-06-07 Thread yidan zhao
不是的哈,那个方法本身还是同步调用的。就是需要你自己保证逻辑的异步。 Jacob <17691150...@163.com> 于2021年6月8日周二 上午9:31写道: > > @nobleyd > 谢谢大神指导,前两天休息没看邮件,才回复,抱歉 > > 我后面把代码大概改成如下样子,checkpoint时间确实得到了改善,job运行几天正常 > > 我提供的线程池在asyncInvoke方法内部跑,这样是不是不合适呀,asyncInvoke方法本身是不是就是封装好的异步方法,就不用单独启线程池了吧?直接在asyncInvoke方法内部写处理逻辑就好。 > > public class

Re: Add control mode for flink

2021-06-07 Thread 刘建刚
Thanks Xintong Song for the detailed supplement. Since flink is long-running, it is similar to many services. So interacting with it or controlling it is a common desire. This was our initial thought when implementing the feature. In our inner flink, many configs used in yaml can be adjusted by

Re: 如何获取flink sql的血缘关系?

2021-06-07 Thread LakeShen
一种方法就是借助 Flink SQL Parser,解析你的 SQL,然后获取到不同的 SQL node, 然后每个 SQL Node 都有对应的类型,以及 connector 后面的 with 参数,你需要自己在 写代码判定一下即可。本质是通过解析 SQL,来获取血缘关系。 Best, LakeShen casel.chen 于2021年6月8日周二 上午12:05写道: > 如何获取flink sql的血缘关系?如:表A -> 表B。有代码示例吗?谢谢!

flink sql 从savepoint 重启遭遇新旧状态序列化不匹配的问题

2021-06-07 Thread mangguozhi
各位好,我在flink 1.13中使用flink sql 在一次修改代码后的重启任务中,报以下错误: For heap backends, the new state serializer (org.apache.flink.api.common.typeutils.base.MapSerializer@a5b17bdb) must not be incompatible with the old state serializer (org.apache.flink.api.common.typeutils.base.MapSerializer@e5a9c6d8).

??????????????

2021-06-07 Thread happiless
---- ??: "happiless"

Re: Flink checkpoint 速度很慢 问题排查

2021-06-07 Thread Jacob
@nobleyd 谢谢大神指导,前两天休息没看邮件,才回复,抱歉 我后面把代码大概改成如下样子,checkpoint时间确实得到了改善,job运行几天正常 我提供的线程池在asyncInvoke方法内部跑,这样是不是不合适呀,asyncInvoke方法本身是不是就是封装好的异步方法,就不用单独启线程池了吧?直接在asyncInvoke方法内部写处理逻辑就好。 public class AsyncProcessFunction extends RichAsyncFunction, List> { private transient ExecutorService

Re: flink1.12版本,yarn-application模式Flink web ui看不到日志

2021-06-07 Thread r pp
可能和配置文件有关吧,我用的都是默认配置 smq <374060...@qq.com> 于2021年6月8日周二 上午7:10写道: > > 图里边可以看到,这个/jobmanager.log /jobmanager.out > /jobmanager.err中的LOG_DIR应该是一样的,也就是说这三个日志应该是放在一个目录下。至于什么原因少了这个. log > 确实是不清楚。 > > > > -- 原始邮件 -- > *发件人:* r pp > *发送时间:* 2021年6月7日 23:56 > *收件人:*

Re: Re: Re: Failed to cancel a job using the STOP rest API

2021-06-07 Thread Thomas Wang
This is actually a very simple job that reads from Kafka and writes to S3 using the StreamingFileSink w/ Parquet format. I'm all using Flink's API and nothing custom. Thomas On Sun, Jun 6, 2021 at 6:43 PM Yun Gao wrote: > Hi Thoms, > > Very thanks for reporting the exceptions, and it seems to

回复:flink1.12版本,yarn-application模式Flink web ui看不到日志

2021-06-07 Thread smq
图里边可以看到,这个http://apache-flink.147419.n8.nabble.com/ -- Best, nbsp; pp -- Best,    pp -- Best,   pp -- Best,  pp -- Best,  pp -- Best,  pp -- Best,  pp

DataStream API in Batch Execution mode

2021-06-07 Thread Marco Villalobos
How do I use a hierarchical directory structure as a file source in S3 when using the DataStream API in Batch Execution mode? I have been trying to find out if the API supports that, because currently our data is organized by years, halves, quarters, months, and but before I launch the job, I

Re: Flink app performance test framework

2021-06-07 Thread luck li
Thanks Yangze. Nextmark is useful to me. Best regards > On Jun 6, 2021, at 8:08 PM, Yangze Guo wrote: > > Hi, Luck, > > I may not fully understand your requirements. If you just want to test > the performance of typical streaming jobs with the Flink, you can > refer to the nexmark[1]. If you

如何获取flink sql的血缘关系?

2021-06-07 Thread casel.chen
如何获取flink sql的血缘关系?如:表A -> 表B。有代码示例吗?谢谢!

Re:回复: Flink 维表延迟join

2021-06-07 Thread casel.chen
双流interval join是否可行呢? 在 2021-06-07 16:35:10,"Jason Lee" 写道: > > >我么生产环境同样遇到这种问题,因为上有流数据到了,但是维表数据未更新导致丢失部分数据,请问大家现在有好的解决方案去解决Flink SQL >维表延迟Join的问题了吗? > > >有解决方案的小伙伴能分享下嘛? >| | >JasonLee >| >| >jasonlee1...@163.com >| >签名由网易邮箱大师定制 > > >在2021年02月25日 14:40,Suhan 写道:

Re: flink1.12版本,yarn-application模式Flink web ui看不到日志

2021-06-07 Thread r pp
flink on yarn 模式的 Per-Job Cluster mode 在启动container ,产生jobmanager.log 前 都不会运行 Jar包的东西,所以,两个Jar ,一个正常,一个不正常,只能说明,不正常的要么没有读取到日志配置文件,不产生日志,要么就是有魔法把它原本生产的日志挪走或者 删除,(这是我的个人理解,如果,什么都一样,同样的提交命令、同样的提交环境,就是一个有log,一个没有log,差别就是Jar包不一样,真的无法分析了,没有集群,也干不了远程调试) 但这些 都已经和flink 是否有代码缺陷 没有关系了。附上 yarn

Re:Re:使用mysql-cdc 的scan.startup.mode = specific-offset的读取模式,运行一段时间后,报错

2021-06-07 Thread casel.chen
这个问题很严重啊,生产线上可不敢这么用,丢失部分数据是不能接受的。社区什么时候能支持 GTID 呢?官方网档上有写么? 在 2021-06-07 18:40:50,"董建" <62...@163.com> 写道: > > > >我也遇到了这种情况,可能是你们的db做了主从切换。 >因为binlog每台服务器的pos都不一样。 >mysql5.6以后支持了GTID的同步方式,这个是全局唯一的。但是目前mysql-cdc貌似还不支持。 >我目前的解决方案是出错后从最后的位置开始消费,可能会丢失一部分数据。 > > > > > > > > > > > > > >

Re: Allow setting job name when using StatementSet

2021-06-07 Thread Nicolaus Weidner
Hi Yuval, I am not familiar with the Table API, but in the fragment you posted, the generated job name is only used as default if configuration option pipeline.name is not set. Can't you just set that to the name you want to have? Best wishes, Nico On Mon, Jun 7, 2021 at 10:09 AM Yuval

Re: Stream processing into single sink to multiple DB Schemas

2021-06-07 Thread Maciej Obuchowski
Hey, We had similar problem, but with 1000s of tables. I've created issue [1] and PR with internally used solution [2], but unfortunately, there seems to be no interest in upstreaming this feature. Thanks, Maciej [1] https://issues.apache.org/jira/browse/FLINK-21643 [2]

Re: Stream processing into single sink to multiple DB Schemas

2021-06-07 Thread Nicolaus Weidner
Hi Tamir, I assume you want to use the Jdbc connector? You can use three filters on your input stream to separate it into three separate streams, then add a sink to each of those (see e.g. [1]). Then you can have a different SQL statement for each of the three sinks. If you specify the driver

How to configure column width in Flink SQL client?

2021-06-07 Thread Svend
Hi everyone, When using the Flink SQL client and displaying results interactively, it seems the values of any column wider than 24 characters is truncated, which is indicated by a '~' character, e.g. the "member_user_id" below: ``` SELECT metadata.true_as_of_timestamp_millis,

退订

2021-06-07 Thread Chen Virtual
退订

Re: Is it possible to customize avro schema name when using SQL

2021-06-07 Thread Nicolaus Weidner
Hi Tao, This is currently not possible using Table API, though this will likely change in a future version. Currently, you would have to do that using the Datastream API [1] and then switch to the Table API. Best wishes, Nico [1]

Re: Multiple Exceptions during Load Test in State Access APIs with RocksDB

2021-06-07 Thread Chirag Dewan
Hi, I think I got my issue. Would help if someone can confirm it :) I am using a NFS filesystem for storing my checkpoints and my Flink cluster is running on a K8 with 2 TMs and 2 JMs.  All my pods share the NFS PVC with state.checkpoint.dir and we also missed setting the RocksDB local dir.

Re: recover from svaepoint

2021-06-07 Thread Piotr Nowojski
Hi, Thanks Tianxin and 周瑞' for reporting and tracking down the problem. Indeed that could be the reason behind it. Have either of you already created a JIRA ticket for this bug? > Concerning the required changing of the UID of an operator Piotr, is this a known issue and documented somewhere? I

邮件退订

2021-06-07 Thread happiless
邮件退订 发自我的iPhone

flink web ui 卡顿不出结果

2021-06-07 Thread yidan zhao
如题,flink web ui经常卡顿,有时候是慢,有时候一直出不来。 今天具体观察了某一次卡顿的case,发现至少有几次是 /config 接口一直 pengding 状态。请问有人清楚 /config 后端实现是什么情况吗?瓶颈在哪呢?

Re: Add control mode for flink

2021-06-07 Thread Jark Wu
Thanks Xintong for the summary, I'm big +1 for this feature. Xintong's summary for Table/SQL's needs is correct. The "custom (broadcast) event" feature is important to us and even blocks further awesome features and optimizations in Table/SQL. I also discussed offline with @Yun Gao several

Re: Flink application on native k8s如何修改配置文件?

2021-06-07 Thread Yang Wang
Native k8s部署模式下,会自动将$FLINK_CONF_DIR目录下的flink-conf.yaml以及log4j-console.properties放到ConfigMap里面 然后挂载给JM与TM。你只需要修改对应的本地文件就好了 Best, Yang eriendeng 于2021年6月7日周一 下午3:09写道: > Hi all, > 最近再把flink任务迁移到native > k8s,发现flink-conf系列的文件没有办法很好地被修改,比如log4j文件还有一些很通用的写在flink-conf的配置项(e.g. > Prometheus配置)。 >

sql multisink explain result is more expensive than expected

2021-06-07 Thread Luan Cooper
Hi We're using multi sink in sql with view, the TestCase is """java @Test def testJoinTemporalTableWithViewWithFilterPushDown(): Unit = { createLookupTable("LookupTableAsync1", new AsyncTableFunction1) util.addTable( """ |CREATE TEMPORARY VIEW v_vvv AS |SELECT

Re:使用mysql-cdc 的scan.startup.mode = specific-offset的读取模式,运行一段时间后,报错

2021-06-07 Thread 董建
我也遇到了这种情况,可能是你们的db做了主从切换。 因为binlog每台服务器的pos都不一样。 mysql5.6以后支持了GTID的同步方式,这个是全局唯一的。但是目前mysql-cdc貌似还不支持。 我目前的解决方案是出错后从最后的位置开始消费,可能会丢失一部分数据。 在 2021-06-07 16:00:21,"张海深" <18601255...@163.com> 写道: >Hi all: >最近用mysql-cdc的方式,使用Flink-sql整合数据,table的部分配置如下

Re: After configuration checkpoint strategy, Flink Job cannot restart when job failed

2021-06-07 Thread Chesnay Schepler
The default number of restart attempts is 1. You need to explicitly configure it to allow more failures. On 6/7/2021 11:53 AM, 1095193...@qq.com wrote: Hi community, I

退订

2021-06-07 Thread happiless
退订 发自我的iPhone

Re: Corrupted unaligned checkpoints in Flink 1.11.1

2021-06-07 Thread Piotr Nowojski
Hi Alex, A quick question. Are you using incremental checkpoints? Best, Piotrek sob., 5 cze 2021 o 21:23 napisał(a): > Small correction, in T4 and T5 I mean Job2, not Job 1 (as job 1 was save > pointed). > > Thank you, > Alex > > On Jun 4, 2021, at 3:07 PM, Alexander Filipchik > wrote: > > 

Flink1.12????????????????Web UI??TaskManager????????????????

2021-06-07 Thread ??????????
slf4j-logback??Flink??resourceslogback.xmlflink??web-ui??taskmanager.logwebuistdout??taskmanager.out

回复:flink1.12版本,yarn-application模式Flink web ui看不到日志

2021-06-07 Thread smq
这个配置读到了,可以在webui里看到这个配置internal.yarn.log-configure-file -- 原始邮件 -- 发件人: r pp http://apache-flink.147419.n8.nabble.com/ -- Best, nbsp; pp -- Best,    pp -- Best,   pp -- Best,  pp -- Best,  pp -- Best,  pp

Re: flink1.12版本,yarn-application模式Flink web ui看不到日志

2021-06-07 Thread r pp
好的,如果log 文件就没有产生,是真的没有产生,还有一个flink 的原因是,没有读取到日志的配置文件。 [image: image.png] smq <374060...@qq.com> 于2021年6月7日周一 下午3:31写道: > > 判断出确失这个.log是因为从yarn 上的logs中就看不到jobmanager .log > 这个文件,正常的是会有jobmanager.log文件,然后根据这个application id 去搜索containnerid > 也是找不到这个jobmanager .log 。正常的程序这三个文件是在一个containerid >

Re:?????? Flink ????????join

2021-06-07 Thread Michael Ran
join ?? 2021-06-07 16:35:10??"Jason Lee" ?? > > >??Flink > SQL Join > > >?? >| | >JasonLee >| >|

?????? ??????????

2021-06-07 Thread MOBIN
Flink??Flink SQL??set | | MOBIN | ?? ??2021??02??1?? 18:06?L<1039601...@qq.com> ?? ,??1.12?? EnvironmentSettings settings = EnvironmentSettings.newInstance().inStreamingMode().useBlinkPlanner().build();

?????? Flink ????????join

2021-06-07 Thread Jason Lee
??Flink SQL Join ?? | | JasonLee | | jasonlee1...@163.com | ?? ??2021??02??25?? 14:40??Suhan ??

Allow setting job name when using StatementSet

2021-06-07 Thread Yuval Itzchakov
Hi, Currently when using StatementSet, the name of the job is autogenerated by the runtime: [image: image.png] Is there any reason why there shouldn't be an overload that allows the user to explicitly specify the name of the running job? -- Best Regards, Yuval Itzchakov.

使用mysql-cdc 的scan.startup.mode = specific-offset的读取模式,运行一段时间后,报错

2021-06-07 Thread 张海深
Hi all: 最近用mysql-cdc的方式,使用Flink-sql整合数据,table的部分配置如下 'debezium.min.row.count.to.stream.results'='1000', 'scan.startup.mode'='specific-offset', 'scan.startup.specific-offset.file'='mybinlog.29', 'scan.startup.specific-offset.pos'='542607677' 在上述稳定运行接分钟之后,异常抛出了以下错误:

Re:Re:flink1.12版本,yarn-application模式Flink web ui看不到日志

2021-06-07 Thread 周瑞
我这也是,只有这些日志 prelaunch.out0.07 prelaunch.err0 taskmanager.out0 taskmanager.err --Original-- From: "smq"<374060...@qq.com; Date: Mon, Jun 7, 2021 03:49 PM To: "周瑞"http://apache-flink.147419.n8.nabble.com/ amp;gt;amp;gt; amp;gt; amp;gt;amp;gt; amp;gt; amp;gt;amp;gt;

Flink application on native k8s如何修改配置文件?

2021-06-07 Thread eriendeng
Hi all, 最近再把flink任务迁移到native k8s,发现flink-conf系列的文件没有办法很好地被修改,比如log4j文件还有一些很通用的写在flink-conf的配置项(e.g. Prometheus配置)。 flink-conf的配置我还可以在flink run的时候带上,那log4j的配置好像没有太好的办法带上。 大家在这块有什么实践吗?thx. -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re:flink1.12版本,yarn-application模式Flink web ui看不到日志

2021-06-07 Thread 周瑞
您好请问这个问题解决了么,我也遇到了同样的问题,在Standalone模式下日志是可以正常输出的,部署到yarn之后只有error日志了 --Original-- From: "smq"<374060...@qq.com; Date: Fri, Jun 4, 2021 07:06 PM To: "r pp"http://apache-flink.147419.n8.nabble.com/ gt;gt; gt; gt;gt; gt; gt;gt; gt; gt;gt; gt; -- gt;gt; gt; Best,

Re: Add control mode for flink

2021-06-07 Thread Xintong Song
Thanks Jiangang for bringing this up, and Steven & Peter for the feedback. I was part of the preliminary offline discussions before this proposal went public. So maybe I can help clarify things a bit. In short, despite the phrase "control mode" might be a bit misleading, what we truly want to do

Stream processing into single sink to multiple DB Schemas

2021-06-07 Thread Tamir Sagi
Hey Community Assuming there are 3 groups, A, B, C Each group represents a set of data about employees and salaries. Group A ( 0-20K $) Group B (20K$ - 50K$) Group C ( > 50K$) Is it possible to process stream data from single source containing information about employees and salaries and split