回复： Flink SQL1.10 大表join如何优化？

111 Fri, 20 Mar 2020 22:30:51 -0700

Hi:
看了下源代码，了解了下Hybrid hash join。大致了解了瓶颈点：
Hybrid hash join，会把build表（也就是我的右表）通过hash映射成map，并按照某种规则进行分区存储（有的在内存，超过的放入磁盘）。
目前看磁盘上的那部分join应该是整个任务的瓶颈。
具体调优方法，还在探索中...也许有什么配置可以控制build表内存存储的大小.
在2020年03月21日 11:01，111<[email protected]> 写道：
Hi, wu:
好的，我这边观察下gc情况。
另外，我的sql里面有关联条件的，只是第一个表1400多万条，第二张表1000多万条。
| select



  wte.external_user_id,

  wte.union_id,

  mr.fk_member_id as member_id

from a wte

left join b mr

 on wte.union_id = mr.relation_code

where wte.ods_date = '${today}'

limit 10;

|
我在ui里面可以看到任务也在正常运行，只是每秒输入700条左右，每秒输出1700，所以对比总量来说十分缓慢。


目前不太清楚性能的瓶颈点和优化的方向：
1 网络传输太慢，导致两表不能及时join？这里不知道如何排查，Metrics里面有个netty的相关指标，看不出什么；其他的指标除了hashjoin 
in和out缓慢变化，其他的都没有什么变化。
2 并行度过低，导致单点slot需要执行两个千万级表的关联？可否动态修改或者配置probe表的并行度？
3 JVM内存问题？详情见附件，观察内存还是很充足的，貌似垃圾回收有点频繁，是否有必要修改jvm配置？
4 taskmanager的日志不太理解….到build phase就停住了，是日志卡主了 还是 此时正在进行build的网络传输？
|
2020-03-21 09:23:14,732 INFO  
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash 
take 4 ms for 32768 segments
2020-03-21 09:23:14,738 INFO  
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash 
take 4 ms for 32768 segments
2020-03-21 09:23:14,744 INFO  
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash 
take 4 ms for 32768 segments
2020-03-21 09:23:14,750 INFO  
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash 
take 4 ms for 32768 segments
2020-03-21 09:23:14,756 INFO  
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash 
take 4 ms for 32768 segments
2020-03-21 09:23:14,762 INFO  
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash 
take 4 ms for 32768 segments
2020-03-21 09:23:14,772 INFO  
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash 
take 4 ms for 32768 segments
2020-03-21 09:23:14,779 INFO  
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash 
take 4 ms for 32768 segments
2020-03-21 09:23:16,357 INFO  
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash 
take 14 ms for 65536 segments
2020-03-21 09:23:16,453 INFO  
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash 
take 10 ms for 65536 segments
2020-03-21 09:23:16,478 INFO  
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash 
take 9 ms for 65536 segments
2020-03-21 09:23:16,494 INFO  
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash 
take 9 ms for 65536 segments
2020-03-21 09:23:16,509 INFO  
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash 
take 10 ms for 65536 segments
2020-03-21 09:23:16,522 INFO  
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash 
take 9 ms for 65536 segments
2020-03-21 09:23:16,539 INFO  
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash 
take 9 ms for 65536 segments
2020-03-21 09:23:16,554 INFO  
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash 
take 10 ms for 65536 segments
2020-03-21 09:23:16,574 INFO  
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash 
take 9 ms for 65536 segments
2020-03-21 09:23:16,598 INFO  
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash 
take 9 ms for 65536 segments
2020-03-21 09:23:16,611 INFO  
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash 
take 10 ms for 65536 segments
2020-03-21 09:23:20,157 INFO  
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash 
take 213 ms for 131072 segments
2020-03-21 09:23:21,579 INFO  
org.apache.flink.table.runtime.operators.join.HashJoinOperator  - Finish build 
phase.
|




在2020年03月21日 10:31，Jark Wu<[email protected]> 写道：
Hi,

看起来你的 join 没有等值关联条件，导致只能单并发运行。你可以观察下这个 join 节点的 gc 情况，看看是不是 full gc 导致运行缓慢。
关于 batch join，Jingsong 比我更熟悉一些调优手段，也许他能提供一些思路，cc @Jingsong Li
<[email protected]>

Best,
Jark

On Fri, 20 Mar 2020 at 17:56, 111 <[email protected]> wrote:



图片好像挂了：



https://picabstract-preview-ftn.weiyun.com/ftn_pic_abs_v3/93a8ac1299f8edd31aa93d69bd591dcc5b768e2c6f2d7a32ff3ac244040b1cac3e8afffd0daf92c4703c276fa1202361?pictype=scale&from=30113&version=3.3.3.3&uin=23603357&fname=F74D73D5-810B-4AE7-888C-E65BF787E490.png&size=750


在2020年03月20日 17:52，111<[email protected]> 写道：
您好：
我有两张表数据量都是1000多万条，需要针对两张表做join。
提交任务后，发现join十分缓慢，请问有什么调优的思路？
需要调整managed memory吗？

目前每个TaskManager申请的总内存是2g，每个taskManager上面有4个slot。taskmanager的metrics如下：
| {
"id":"container_e40_1555496777286_675191_01_000107",
"path":"akka.tcp://flink@hnode9:33156/user/taskmanager_0",
"dataPort":39423,
"timeSinceLastHeartbeat":1584697728127,
"slotsNumber":4,
"freeSlots":3,
"hardware":{
"cpuCores":32,
"physicalMemory":135355260928,
"freeMemory":749731840,
"managedMemory":732828804
},
"metrics":{
"heapUsed":261623760,
"heapCommitted":781189120,
"heapMax":781189120,
"nonHeapUsed":100441328,
"nonHeapCommitted":102957056,
"nonHeapMax":1426063360,
"directCount":5662,
"directUsed":191911352,
"directMax":191911350,
"mappedCount":0,
"mappedUsed":0,
"mappedMax":0,
"memorySegmentsAvailable":5582,
"memorySegmentsTotal":5591,
"garbageCollectors":[
{
"name":"PS_Scavenge",
"count":5734,
"time":19767
},
{
"name":"PS_MarkSweep",
"count":7,
"time":893
}
]
}
} |