另外,join 节点的并发可以再增加一些,提升 join 的处理性能。

On Wed, 18 Nov 2020 at 11:34, Jark Wu <[email protected]> wrote:

> 估计是你的全量数据太大,但是 join 节点处理性能太慢,导致半小时内全量数据还没有处理完,而 checkpoint 超时了。
> 注意全量数据阶段,是做不了 checkpoint 的。 具体可以看下这篇文章的第四点。
> https://mp.weixin.qq.com/s/Mfn-fFegb5wzI8BIHhNGvQ
>
> 解决办法文中也有提及:
>
> 解决办法:在 flink-conf.yaml 配置 failed checkpoint 容忍次数,以及失败重启策略,如下:
>
> execution.checkpointing.interval: 10min   # checkpoint间隔时间
> execution.checkpointing.tolerable-failed-checkpoints: 100  # checkpoint
> 失败容忍次数
> restart-strategy: fixed-delay  # 重试策略
> restart-strategy.fixed-delay.attempts: 2147483647   # 重试次数
>
> Best,
> Jark
>
> On Wed, 18 Nov 2020 at 11:28, 丁浩浩 <[email protected]> wrote:
>
>> 即使我将not
>> exists改成了join,join节点的checkpoint也无法完成,是我设置的checkpint时间太短了嘛,我设置的是每隔半小时发起checkpoint一次,超时时间也是半小时。
>> 下面是截图,(我上传图片每次都看不了啥情况)
>> https://imgchr.com/i/DeqixU
>> https://imgchr.com/i/DeqP2T
>>
>> > 在 2020年11月16日,上午10:29,Jark Wu <[email protected]> 写道:
>> >
>> > 瓶颈应该在两个 not exists 上面,not exists 目前只能单并发来做,所以无法水平扩展性能。
>> > 可以考虑把 not exists 替换成其他方案,比如 udf,维表 join。
>> >
>> > Best,
>> > Jark
>> >
>> > On Mon, 16 Nov 2020 at 10:05, 丁浩浩 <[email protected]> wrote:
>> >
>> >> select
>> >>    ri.sub_clazz_number,
>> >>    prcrs.rounds,
>> >>    count(*) as num
>> >> from
>> >>    subclazz gs
>> >> JOIN
>> >>    (SELECT gce.number, min( gce.extension_value ) AS grade FROM
>> >> course_extension gce WHERE gce.isdel = 0 AND gce.extension_type = 4
>> GROUP
>> >> BY gce.number) AS temp
>> >> ON
>> >>    temp.number = gs.course_number AND temp.grade>30
>> >> JOIN
>> >>    right_info ri
>> >> ON
>> >>    gs.number = ri.sub_clazz_number
>> >> join
>> >>    wide_subclazz ws
>> >> on
>> >>    ws.number = ri.sub_clazz_number
>> >> join
>> >>    course gc
>> >> on
>> >>    gc.number = ws.course_number and gc.course_category_id in (30,40)
>> >> left join
>> >>    performance_regular_can_renewal_sign prcrs
>> >> on prcrs.order_number = ri.order_number and    prcrs.rounds in (1,2)
>> >> where ri.is_del = 0 and ri.end_idx = -1 and prcrs.rounds is not null
>> >> and not exists (select 1 from internal_staff gis where gis.user_id =
>> >> ri.user_id)
>> >> and not exists (select 1 from clazz_extension ce where ws.clazz_number
>> =
>> >> ce.number
>> >>    and ce.extension_type = 3 and ce.isdel = 0
>> >>    and ce.extension_value in (1,3,4,7,8,11))
>> >> group by ri.sub_clazz_number, prcrs.rounds
>> >> Sql代码是这样的。
>> >> 瓶颈在所有的join节点上,每次的checkpoint无法完成的节点都是join节点。
>> >>
>> >>> 在 2020年11月14日,下午5:53,Jark Wu <[email protected]> 写道:
>> >>>
>> >>> 能展示下你的代码吗?是用的维表关联的语法 (FOR SYSTEM TIME AS OF)?
>> >>> 需要明确下,到底是什么节点慢了。
>> >>>
>> >>> On Fri, 13 Nov 2020 at 19:02, 丁浩浩 <[email protected]> wrote:
>> >>>
>> >>>> 我用flink cdc 对mysql的表进行关联查询,发现flink只能两张表关联之后再跟下一张表关联,导致最后落库的延迟非常大。
>> >>>> 有没有比较好的优化方案能缓解这样的问题?
>> >>
>> >>
>> >>
>>
>>
>>

回复