即使我将not exists改成了join,join节点的checkpoint也无法完成,是我设置的checkpint时间太短了嘛,我设置的是每隔半小时发起checkpoint一次,超时时间也是半小时。 下面是截图,(我上传图片每次都看不了啥情况) https://imgchr.com/i/DeqixU https://imgchr.com/i/DeqP2T
> 在 2020年11月16日,上午10:29,Jark Wu <[email protected]> 写道: > > 瓶颈应该在两个 not exists 上面,not exists 目前只能单并发来做,所以无法水平扩展性能。 > 可以考虑把 not exists 替换成其他方案,比如 udf,维表 join。 > > Best, > Jark > > On Mon, 16 Nov 2020 at 10:05, 丁浩浩 <[email protected]> wrote: > >> select >> ri.sub_clazz_number, >> prcrs.rounds, >> count(*) as num >> from >> subclazz gs >> JOIN >> (SELECT gce.number, min( gce.extension_value ) AS grade FROM >> course_extension gce WHERE gce.isdel = 0 AND gce.extension_type = 4 GROUP >> BY gce.number) AS temp >> ON >> temp.number = gs.course_number AND temp.grade>30 >> JOIN >> right_info ri >> ON >> gs.number = ri.sub_clazz_number >> join >> wide_subclazz ws >> on >> ws.number = ri.sub_clazz_number >> join >> course gc >> on >> gc.number = ws.course_number and gc.course_category_id in (30,40) >> left join >> performance_regular_can_renewal_sign prcrs >> on prcrs.order_number = ri.order_number and prcrs.rounds in (1,2) >> where ri.is_del = 0 and ri.end_idx = -1 and prcrs.rounds is not null >> and not exists (select 1 from internal_staff gis where gis.user_id = >> ri.user_id) >> and not exists (select 1 from clazz_extension ce where ws.clazz_number = >> ce.number >> and ce.extension_type = 3 and ce.isdel = 0 >> and ce.extension_value in (1,3,4,7,8,11)) >> group by ri.sub_clazz_number, prcrs.rounds >> Sql代码是这样的。 >> 瓶颈在所有的join节点上,每次的checkpoint无法完成的节点都是join节点。 >> >>> 在 2020年11月14日,下午5:53,Jark Wu <[email protected]> 写道: >>> >>> 能展示下你的代码吗?是用的维表关联的语法 (FOR SYSTEM TIME AS OF)? >>> 需要明确下,到底是什么节点慢了。 >>> >>> On Fri, 13 Nov 2020 at 19:02, 丁浩浩 <[email protected]> wrote: >>> >>>> 我用flink cdc 对mysql的表进行关联查询,发现flink只能两张表关联之后再跟下一张表关联,导致最后落库的延迟非常大。 >>>> 有没有比较好的优化方案能缓解这样的问题? >> >> >>
