Hi, The optimization you mentioned is only applicable for the product provided by Alibaba Cloud. In open-source Apache Flink there isn’t a unique caching abstraction for all lookup tables, and each connector has there own cache implementation. For example JDBC uses Guava cache and FileSystem uses in-memory HashMap, and both of them don’t load all records in dim table into the cache.
Best, Qingsheng > On Mar 28, 2022, at 12:26, dz902 <dz9...@gmail.com> wrote: > > Hi, > > I've read some docs > (https://help.aliyun.com/document_detail/182011.html) stating Flink > optimization technique using: > > - partitionedJoin = 'true' > - cache = 'ALL' > - blink.partialAgg.enabled=true > > However I could not find any official doc references. Are these > supported at all? > > Also "partitionedJoin" seemed to have the effect of shuffling input by > joining key so they can fit into memory. I read this > (https://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html) > and believes this is already a default behavior of Flink. > > Is this optimization not needed even for huge input tables? > > Thanks, > Dai