Re: Re: FlinkSQL 1.12 Temporal Joins 多表关联问题

Shengkai Fang Mon, 15 Nov 2021 20:17:17 -0800

1. 不知道使用 udf 能不能实现，自己实现一个 udf，然后在实现里面手动查询外表；
2. 如果自己实现的话，那么也应该能控制攒 batch 的实现；


悟空 <[email protected]> 于2021年11月12日周五 上午11:53写道：

> Hi :
> &nbsp; &nbsp;第一个 我了解了Cache 不太适合我的场景，因为我的表都是几十亿量级，同时 我要根据一些关键键 去数据库里查询，所以
> 我先在Job 中 聚合一些主键，通过In 条件 去查询。
> &nbsp; &nbsp;第二个&nbsp; 好像是我理解的问题，最初想通过Flink Sql 把整体逻辑 下发到数据库去查询，因为有些OLAP
> 引擎 查询性能是可以接受的
>
>
>
>  ---
> Best,
> WuKong
>
>  &nbsp;
> 发件人：&nbsp;Caizhi Weng
> 发送时间：&nbsp;2021-11-12 11:32
> 收件人：&nbsp;flink中文邮件组
> 主题：&nbsp;Re: FlinkSQL 1.12 Temporal Joins 多表关联问题
>
>
> Hi！
>  &nbsp;
>  这是说每次主流来一条数据，都要去维表里查询一次吗？然后你想每次攒一批数据，一次性查询以提高性能？
>  &nbsp;
>  如果是的话，一部分维表（如 jdbc 和 hbase）支持 cache 功能 [1]。cache 功能可以在每次 cache 刷新的时候把数据加载到
>  task manager 内存中，这样主流来数据时只需要从 task manager 内存中查询对应数据即可，不必去外部系统查询。
>  &nbsp;
>  另外查询逻辑下沉到数据库具体指的是什么？能否详细说明一下。
>  &nbsp;
>  [1]
>
> https://nightlies.apache.org/flink/flink-docs-master/zh/docs/connectors/table/jdbc/#lookup-cache
>  &nbsp;
>  WuKong <[email protected]&gt; 于2021年11月11日周四 下午5:42写道：
>  &nbsp;
>  &gt; Hi :
>  &gt;&nbsp; &nbsp; 现在有个场景， 我有一张Kafka的表，需要基于这张Kafka的流表 进行事件触发，去关联DB的多表
> 来拉宽数据 。比如： select *
>  &gt; from kafkaTableA AS A
>  &gt;&nbsp; &nbsp; join DBTableB FOR SYSTEM_TIME AS OF A.`PROCTIME` AS B
> ON valueB =
>  &gt; B.columnB
>  &gt;&nbsp; &nbsp; join DBTableC FOR SYSTEM_TIME AS OF A.`PROCTIME` AS
> C&nbsp; ON valueC =
>  &gt; C.columnC 。
>  &gt; &nbsp; 目前有两个问题：
>  &gt;&nbsp; &nbsp; 1.&nbsp; 我看数据库里 是单表去查询数据的数据的，
>  &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; select * from
> DBTableB where B.columnB &nbsp; = valueB
>  &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; select * from
> DBTableC where C.columnC &nbsp; = valueC
>  &gt;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; 这里我可以配置 把整个查询逻辑 下沉到数据库去做吗？
>  &gt; &nbsp; 2. 我想把Kafka 里的数据 积累一点之后 通过微批的形式 IN 查询 请问 有没有可能这么做？
>  &gt;
>  &gt;
>  &gt;
>  &gt; ---
>  &gt; Best,
>  &gt; WuKong
>  &gt;

Re: Re: FlinkSQL 1.12 Temporal Joins 多表关联问题

回复