date:20221013

Flink 的 Taskmanager 间网络连接数、task之间的result sub partition 数对任务性能的影响。

2022-10-13 文章 yidan zhao

任务假设：
   任务从kafka读取数据，经过若干复杂处理（process、window、join、等等），然后sink到kafka。
   并发最高240（kafka分区数），当前采用全部算子相同并发方式部署。
   算子间存在 hash、forward、rebalance 等分区情况。
   此处假设 A 和 B 算子之间是 rebalance。 C 和 D 算子直接是 hash 分区（无数据倾斜）。ABCD都是240并发。 其他算子暂忽略。

TM连接数：
   Flink 的 taskmanager 之间的共享 tcp 连接。因此虽然A到B、C到D之间都是全连接，但实际增加的是result
sub partition数，不会导致tcp连接不断增。
   我们单个TM只提供1个slot的情况下，每个算子都是240并发，所以tm之间的tcp连接是不是可以大概认为是 240 * 239 ？

task间result sub partition：
   task之间的result sub partition太多会有啥影响呢？主要考虑性能影响。是否可能增大 partition not found 的概率呢？

Re: Flink SQL 中同时写入多个 sink 时，是否能够保证先后次序

2022-10-13 文章 Zhiwen Sun

好的，谢谢大家，之前也想过这个方案，复用/继承 JdbcDynamicTableSink 相关代码自定义 connector 。

Zhiwen Sun



On Fri, Oct 14, 2022 at 10:08 AM yidan zhao  wrote:

> 在一个自定义sink中实现先写database，再发消息。
>
> 或者2个都是自定义的，但是不能通过sink，因为sink后就没数据了。通过process，第一个process完成写入database后，后续process发送消息。
>
> Shuo Cheng  于2022年10月12日周三 16:59写道：
> >
> > Flink SQL 自身机制无法保证同一个作业多个 sink 的写入次序。 是否可以考虑从业务逻辑上动手脚，比如写入消息队列 sink 前加个
> udf
> > filter, udf 查询 database，满足条件才写入消息队列，当然这种方式对性能可能有影响。
> >
> > On Wed, Oct 12, 2022 at 2:41 PM Zhiwen Sun  wrote:
> >
> > > hi all：
> > >
> > > 我们有个场景，需要 Flink SQL 同时写入消息和 database， 后续实时任务消费消息，再次读取 database， 如果消息先于
> > > database 写入，这就可能导致读取的数据不正确。
> > >
> > > 是否有办法保证 database 写入后，再发送消息？
> > >
> > > Zhiwen Sun
> > >
>

Re: Re: flink cdc能否同步DDL语句?

2022-10-13 文章 Shengkai Fang

Hi.

可以从这个地方入手看看
https://github.com/ververica/flink-cdc-connectors/blob/master/flink-connector-mysql-cdc/src/main/java/com/ververica/cdc/connectors/mysql/source/reader/MySqlRecordEmitter.java#L95

Best,
Shengkai

casel.chen  于2022年10月11日周二 10:58写道：

> 可以给一些hints吗？看哪些类?
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 在 2022-10-11 10:22:07，"yuxia"  写道：
> >用 datastream api，自己解析一下 DDL。
> >
> >Best regards,
> >Yuxia
> >
> >- 原始邮件 -
> >发件人: "yh z" 
> >收件人: "user-zh" 
> >发送时间: 星期二, 2022年 10 月 11日 上午 10:23:43
> >主题: Re: flink cdc能否同步DDL语句?
> >
> >目前，社区的 Flink CDC，是可以读取到 binlog 中的 DDL 语句的，但是不会传递给下游，即下游无法感知 schema 的变化。
> >
> >Xuyang  于2022年10月10日周一 16:46写道：
> >
> >> Hi， 目前应该是不行的
> >> 在 2022-09-26 23:27:05，"casel.chen"  写道：
> >> >flink cdc能否同步DDL语句以实现schema同步? 例如create table , create index, truncate
> >> table等
> >>
>

Re: 退订

2022-10-13 文章 Shengkai Fang

你好，可以发送邮件到 user-zh-unsubscr...@flink.apache.org 来退订。

Best,
Shengkai

13341000780 <13341000...@163.com> 于2022年10月10日周一 18:21写道：

>
> 退订
>
>
>
>
> --
> 发自我的网易邮箱手机智能版

Re: [ANNOUNCE] Apache Flink Table Store 0.2.1 released

2022-10-13 文章 Martijn Visser

Congratulations and thanks to all those involved!

On Thu, Oct 13, 2022 at 4:47 AM Jingsong Lee 
wrote:

> The Apache Flink community is very happy to announce the release of
> Apache Flink Table Store 0.2.1.
>
> Apache Flink Table Store is a unified storage to build dynamic tables
> for both streaming and batch processing in Flink, supporting
> high-speed data ingestion and timely data query.
>
> Please check out the release blog post for an overview of the release:
> https://flink.apache.org/news/2022/10/13/release-table-store-0.2.1.html
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Maven artifacts for Flink Table Store can be found at:
> https://search.maven.org/search?q=g:org.apache.flink%20table-store
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352257
>
> We would like to thank all contributors of the Apache Flink community
> who made this release possible!
>
> Best,
> Jingsong Lee
>

Flink 的 Taskmanager 间网络连接数、task之间的result sub partition 数对任务性能的影响。

Re: Flink SQL 中同时写入多个 sink 时，是否能够保证先后次序

Re: Re: flink cdc能否同步DDL语句?

Re: 退订

Re: [ANNOUNCE] Apache Flink Table Store 0.2.1 released

5 matches

Site Navigation

Mail list logo

Footer information