Hi! Reusing common sub-plans are an optimization of Flink. Flink is really reusing them in runtime and the results of the reused tasks are calculated only once.
Vasily Melnik <vasily.mel...@glowbyteconsulting.com> 于2021年9月2日周四 下午6:32写道: > > Hi all. > > Using SQL with blink planner for batch calculations, i see *Reused* > nodes in Optimized Execution Plan while making self join operations: > > > == Optimized Execution Plan == > Union(all=[true], union=[id, v, v0, w0$o0]) > :- OverAggregate(orderBy=[id DESC], window#0=[ROW_NUMBER(*) AS w0$o0 ROWS > BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW], select=[id, v, v0, > w0$o0])(reuse_id=[2]) > : +- Sort(orderBy=[id DESC]) > : +- Exchange(distribution=[single]) > : +- Calc(select=[id, v, v0]) > : +- HashJoin(joinType=[LeftOuterJoin], where=[($f2 = id0)], > select=[id, v, $f2, id0, v0], build=[right]) > : :- Exchange(distribution=[hash[$f2]]) > : : +- Calc(select=[id, v, (id + 1) AS $f2]) > : : +- TableSourceScan(table=[[default_catalog, > default_database, t1]], fields=[id, v])(reuse_id=[1]) > : +- Exchange(distribution=[hash[id]]) > : +- *Reused*(reference_id=[1]) > +- *Reused*(reference_id=[2]) > > > Question is: do these steps (scans, intermediate calculations) really be > calculated once or it is just a print shortcut? >