Hi, 看起来你的 join 没有等值关联条件,导致只能单并发运行。你可以观察下这个 join 节点的 gc 情况,看看是不是 full gc 导致运行缓慢。 关于 batch join,Jingsong 比我更熟悉一些调优手段,也许他能提供一些思路,cc @Jingsong Li <[email protected]>
Best, Jark On Fri, 20 Mar 2020 at 17:56, 111 <[email protected]> wrote: > > > 图片好像挂了: > > > > https://picabstract-preview-ftn.weiyun.com/ftn_pic_abs_v3/93a8ac1299f8edd31aa93d69bd591dcc5b768e2c6f2d7a32ff3ac244040b1cac3e8afffd0daf92c4703c276fa1202361?pictype=scale&from=30113&version=3.3.3.3&uin=23603357&fname=F74D73D5-810B-4AE7-888C-E65BF787E490.png&size=750 > > > 在2020年03月20日 17:52,111<[email protected]> 写道: > 您好: > 我有两张表数据量都是1000多万条,需要针对两张表做join。 > 提交任务后,发现join十分缓慢,请问有什么调优的思路? > 需要调整managed memory吗? > > 目前每个TaskManager申请的总内存是2g,每个taskManager上面有4个slot。taskmanager的metrics如下: > | { > "id":"container_e40_1555496777286_675191_01_000107", > "path":"akka.tcp://flink@hnode9:33156/user/taskmanager_0", > "dataPort":39423, > "timeSinceLastHeartbeat":1584697728127, > "slotsNumber":4, > "freeSlots":3, > "hardware":{ > "cpuCores":32, > "physicalMemory":135355260928, > "freeMemory":749731840, > "managedMemory":732828804 > }, > "metrics":{ > "heapUsed":261623760, > "heapCommitted":781189120, > "heapMax":781189120, > "nonHeapUsed":100441328, > "nonHeapCommitted":102957056, > "nonHeapMax":1426063360, > "directCount":5662, > "directUsed":191911352, > "directMax":191911350, > "mappedCount":0, > "mappedUsed":0, > "mappedMax":0, > "memorySegmentsAvailable":5582, > "memorySegmentsTotal":5591, > "garbageCollectors":[ > { > "name":"PS_Scavenge", > "count":5734, > "time":19767 > }, > { > "name":"PS_MarkSweep", > "count":7, > "time":893 > } > ] > } > } | > > > > >
