Hi Aitozi, I think it is necessary to add the following description in FLIP to express the difference between user-defined asynchronous table function and AsyncTableFunction:
User-defined asynchronous table functions allow complex parameters (e.g., Row type) to be passed to function, which is important in RPC, rather than using ‘join … on ...'. Thanks, Awake. On 2023/06/26 02:31:59 Aitozi wrote: > Hi Lincoln, > Thanks for your confirmation. I have updated the consensus to the FLIP > doc. > If there are no other comments, I'd like to restart the vote process in [1] > today. > > https://lists.apache.org/thread/7g5n2vshosom2dj9bp7x4n01okrnx4xx > > Thanks, > Aitozi. > > Lincoln Lee <li...@gmail.com> 于2023年6月21日周三 22:29写道: > > > Hi Aitozi, > > > > Thanks for your updates! > > > > By the design of hints, the hints after select clause belong to the query > > hints category, and this new hint is also a kind of join hints[1]. > > Join table function is one of the join type defined by flink sql joins[2], > > all existing join hints[1] omit the 'join' keyword, > > so I would prefer the 'ASYNC_TABLE_FUNC' (which is actually the one for > > 'ASYNC_TABLE_FUNC_JOIN'). > > > > Since a short Chinese holiday is coming, I suggest waiting for other > > people's responses before continuing to vote (next monday?) > > > > Btw, I discussed with @fudian offline about pyflink support, there should > > be no known issues, so you can create a subtask with pyflink support after > > the vote passed. > > > > [1] > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#join-hints > > [2] > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/ > > > > Best, > > Lincoln Lee > > > > > > Aitozi <gj...@gmail.com> 于2023年6月18日周日 21:18写道: > > > > > Hi all, > > > Sorry for the late reply, I have a discussion with Lincoln offline, > > > mainly about > > > the naming of the hints option. Thanks Lincoln for the valuable > > > suggestions. > > > > > > Let me answer the last email inline. > > > > > > >For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC call > > as > > > an example? > > > > > > Sure, will give an example when adding the doc of async udtf and will > > > update the FLIP simultaneously > > > > > > >For the name of this query hint, 'LATERAL' (include its internal > > options) > > > don't show any relevance to async, but I haven't thought of a suitable > > name > > > at the moment, > > > > > > After some discussion with Lincoln, We prefer to choose one of the > > > `ASYNC_TABLE_FUNC` and `ASYNC_LATERAL`. > > > Besides, In my opinion the keyword `lateral`'s use scenario is wider than > > > the table function join, but in this case we only want to config > > > the async table function, So I'm a bit more lean to the > > `ASYNC_TABLE_FUNC`. > > > Looking forward to some inputs if you guys have > > > some better suggestion on the naming. > > > > > > For the usage of the hints config option, I have updated the section > > > of ConfigOption, you can refer to the FLIP > > > for more details. > > > > > > >Also, the terms 'correlate join' and 'lateral join' are not the same as > > in > > > the current joins page[1], so maybe it would be better if we unified them > > > into 'join table function' > > > > > > Yes, we should unified to the 'join table function', updated. > > > > > > Best, > > > Aitozi > > > > > > Lincoln Lee <li...@gmail.com> 于2023年6月15日周四 09:15写道: > > > > > > > Hi Aitozi, > > > > > > > > Thanks for your reply! Gives sql users more flexibility to get > > > > asynchronous processing capabilities via lateral join table function +1 > > > for > > > > this > > > > > > > > For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC call > > > as > > > > an example? > > > > > > > > For the name of this query hint, 'LATERAL' (include its internal > > options) > > > > don't show any relevance to async, but I haven't thought of a suitable > > > name > > > > at the moment, > > > > maybe we need to highlight the async keyword directly, we can also see > > if > > > > others have better candidates > > > > > > > > For the hint option "timeout = '180s'" should be "'timeout' = '180s'", > > > > seems a typo in the flip. And use upper case for all keywords in sql > > > > examples. > > > > Also, the terms 'correlate join' and 'lateral join' are not the same as > > > in > > > > the current joins page[1], so maybe it would be better if we unified > > them > > > > into 'join table function' > > > > > > > > [1] > > > > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#table-function > > > > > > > > Best, > > > > Lincoln Lee > > > > > > > > > > > > Aitozi <gj...@gmail.com> 于2023年6月14日周三 16:11写道: > > > > > > > > > Hi Lincoln > > > > > > > > > > Very thanks for your valuable question. I will try to answer your > > > > > questions inline. > > > > > > > > > > >Does the async udtf bring any additional benefits besides a > > > > > lighter implementation? > > > > > > > > > > IMO, async udtf is more than a lighter implementation. It can act as > > a > > > > > general way for sql users to use the async operator. And they don't > > > have > > > > to > > > > > bind the async function with a table (a LookupTable), and they are > > not > > > > > forced to join on an equality join condition, and they can use it to > > do > > > > > more than enrich data. > > > > > > > > > > The async lookup join is more like a subset/specific usage of async > > > udtf. > > > > > The specific version has more opportunity to be optimized (like push > > > > down) > > > > > is acceptable. Async table function should be categorized to > > > used-defined > > > > > function. > > > > > > > > > > >Should users > > > > > > > > > > migrate to the lookup source when they encounter similar requirements > > > or > > > > > > > > > > problems, or should we develop an additional set of similar > > mechanisms? > > > > > > > > > > As I clarified above, the lookup join is a specific usage of async > > > udtf. > > > > So > > > > > it deserves more refined optimization like caching / retryable. But > > it > > > > may > > > > > not all > > > > > > > > > > suitable for the async udtf. As function, it can be deterministic/or > > > > > non-deterministic. So caching is not suitable, and we also do not > > have > > > a > > > > > common cache for the udf now. So I think optimization like > > > caching/retry > > > > > should be handed over to the function implementor. > > > > > > > > > > > the newly added query hint need a different name that > > > > > can be easier related to the lateral operation as the current join > > > > hints[5] > > > > > do. > > > > > > > > > > > > > > > What about using LATERAL? > > > > > > > > > > as below > > > > > > > > > > SELECT /*+ LATERAL('output-mode' = 'ordered', 'capacity' = '200', > > > > timeout = > > > > > '180s') */ a, c1, c2 > > > > > > > > > > FROM T1 > > > > > > > > > > LEFT JOIN lateral TABLE (async_split(b)) AS T(c1, c2) ON true > > > > > > > > > > >For the async func example, since the target scenario is an external > > > io > > > > > operation, it's better to add the `close` method to actively release > > > > > resources as a good example for users > > > > > > > > > > > > > > > Make sense to me, will update the FLIP > > > > > > > > > > Best, > > > > > > > > > > Aitozi. > > > > > > > > > > Lincoln Lee <li...@gmail.com> 于2023年6月14日周三 14:24写道: > > > > > > > > > > > Hi Aitozi, > > > > > > > > > > > > Sorry for the lately reply here! Supports async > > > > > udtf(`AsyncTableFunction`) > > > > > > directly in sql seems like an attractive feature, but there're two > > > > issues > > > > > > that need to be addressed before we can be sure to add it: > > > > > > 1. As mentioned in the flip[1], the current lookup function can > > > already > > > > > > implement the requirements, but it requires implementing an extra > > > > > > `LookupTableSource` and explicitly declaring the table schema > > (which > > > > can > > > > > > help implementers the various push-down optimizations supported by > > > the > > > > > > planner). Does the async udtf bring any additional benefits > > besides a > > > > > > lighter implementation? > > > > > > 2. FLIP-221[2] abstracts a reusable cache and metric infrastructure > > > for > > > > > > lookup sources, which are important to improve performance and > > > > > > observability for high overhead external io scenarios, how do we > > > > > integrate > > > > > > and reuse these capabilities after introducing async udtf? Should > > > users > > > > > > migrate to the lookup source when they encounter similar > > requirements > > > > or > > > > > > problems, or should we develop an additional set of similar > > > mechanisms? > > > > > (a > > > > > > similarly case: FLIP-234[3] introduced the retryable capability > > for > > > > > lookup > > > > > > join) > > > > > > > > > > > > For the flip itself, > > > > > > 1. Considering the 'options' is already used as the dynamic table > > > > > > options[4] in flink, the newly added query hint need a different > > name > > > > > that > > > > > > can be easier related to the lateral operation as the current join > > > > > hints[5] > > > > > > do. > > > > > > 2. For the async func example, since the target scenario is an > > > external > > > > > io > > > > > > operation, it's better to add the `close` method to actively > > release > > > > > > resources as a good example for users. Also in terms of the > > > determinism > > > > > of > > > > > > a function, it is important to remind users that unless the > > behavior > > > of > > > > > the > > > > > > function is deterministic, it needs to be explicitly declared as > > > > > > non-deterministic. > > > > > > > > > > > > [1]. > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-313%3A+Add+support+of+User+Defined+AsyncTableFunction?src=contextnavpagetreemode > > > > > > [2]. > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric?src=contextnavpagetreemode > > > > > > [3]. > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems?src=contextnavpagetreemode > > > > > > [4]. > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL?src=contextnavpagetreemode > > > > > > [5]. > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-229%3A+Introduces+Join+Hint+for+Flink+SQL+Batch+Job?src=contextnavpagetreemode > > > > > > > > > > > > Best, > > > > > > Lincoln Lee > > > > > > > > > > > > > > > > > > Aitozi <gj...@gmail.com> 于2023年6月13日周二 11:30写道: > > > > > > > > > > > > > Get your meaning now, thanks :) > > > > > > > > > > > > > > Best, > > > > > > > Aitozi. > > > > > > > > > > > > > > Feng Jin <ji...@gmail.com> 于2023年6月13日周二 11:16写道: > > > > > > > > > > > > > > > Hi Aitozi, > > > > > > > > > > > > > > > > Sorry for the confusing description. > > > > > > > > > > > > > > > > What I meant was that if we need to remind users about tire > > > safety > > > > > > > issues, > > > > > > > > we should introduce the new UDTF interface instead of executing > > > the > > > > > > > > original UDTF asynchronously. Therefore, I agree with > > introducing > > > > the > > > > > > > > AsyncTableFunction. > > > > > > > > > > > > > > > > Best, > > > > > > > > Feng > > > > > > > > > > > > > > > > On Tue, Jun 13, 2023 at 10:42 AM Aitozi <gj...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > > > Hi Feng, > > > > > > > > > Thanks for your question. We do not provide a way to > > switch > > > > the > > > > > > > UDTF > > > > > > > > > between sync and async way, > > > > > > > > > So there should be no thread safety problem here. > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > Aitozi > > > > > > > > > > > > > > > > > > Feng Jin <ji...@gmail.com> 于2023年6月13日周二 10:31写道: > > > > > > > > > > > > > > > > > > > Hi Aitozi, We do need to remind users about thread safety > > > > issues. > > > > > > > Thank > > > > > > > > > you > > > > > > > > > > for your efforts on this FLIP. I have no further questions. > > > > > > > > > > Best, Feng > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 13, 2023 at 6:05 AM Jing Ge > > > > > <j...@ververica.com.invalid > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi Aitozi, > > > > > > > > > > > > > > > > > > > > > > Thanks for taking care of that part. I have no other > > > concern. > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > Jing > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jun 12, 2023 at 5:38 PM Aitozi < > > > gjying1...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > BTW, If there are no other more blocking issue / > > > comments, > > > > I > > > > > > > would > > > > > > > > > like > > > > > > > > > > > to > > > > > > > > > > > > start a VOTE in another thread this wednesday 6.14 > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Aitozi. > > > > > > > > > > > > > > > > > > > > > > > > Aitozi <gj...@gmail.com> 于2023年6月12日周一 23:34写道: > > > > > > > > > > > > > > > > > > > > > > > > > Hi, Jing, > > > > > > > > > > > > > Thanks for your explanation. I get your point > > now. > > > > > > > > > > > > > > > > > > > > > > > > > > For the performance part, I think it's a good idea to > > > run > > > > > > with > > > > > > > > > > > returning > > > > > > > > > > > > a > > > > > > > > > > > > > big table case, the memory consumption > > > > > > > > > > > > > should be a point to be taken care about. Because in > > > the > > > > > > > ordered > > > > > > > > > > mode, > > > > > > > > > > > > the > > > > > > > > > > > > > head element in buffer may affect the > > > > > > > > > > > > > total memory consumption. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > Aitozi. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jing Ge <ji...@ververica.com.invalid> 于2023年6月12日周一 > > > > > 20:28写道: > > > > > > > > > > > > > > > > > > > > > > > > > >> Hi Aitozi, > > > > > > > > > > > > >> > > > > > > > > > > > > >> Which key will be used for lookup is not an issue, > > > only > > > > > one > > > > > > > row > > > > > > > > > will > > > > > > > > > > > be > > > > > > > > > > > > >> required for each key in order to enrich it. True, > > it > > > > > > depends > > > > > > > on > > > > > > > > > the > > > > > > > > > > > > >> implementation whether multiple rows or single row > > for > > > > > each > > > > > > > key > > > > > > > > > will > > > > > > > > > > > be > > > > > > > > > > > > >> returned. However, for the lookup & enrichment > > > scenario, > > > > > one > > > > > > > > > row/key > > > > > > > > > > > is > > > > > > > > > > > > >> recommended, otherwise, like I mentioned previously, > > > > > > > enrichment > > > > > > > > > > won't > > > > > > > > > > > > >> work. > > > > > > > > > > > > >> > > > > > > > > > > > > >> I am a little bit concerned about returning a big > > > table > > > > > for > > > > > > > each > > > > > > > > > > key, > > > > > > > > > > > > >> since > > > > > > > > > > > > >> it will take the async call longer to return and > > need > > > > more > > > > > > > > memory. > > > > > > > > > > The > > > > > > > > > > > > >> performance tests should cover this scenario. This > > is > > > > not > > > > > a > > > > > > > > > blocking > > > > > > > > > > > > issue > > > > > > > > > > > > >> for this FLIP. > > > > > > > > > > > > >> > > > > > > > > > > > > >> Best regards, > > > > > > > > > > > > >> Jing > > > > > > > > > > > > >> > > > > > > > > > > > > >> On Sat, Jun 10, 2023 at 4:11 AM Aitozi < > > > > > > gjying1...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > > > >> > > > > > > > > > > > > >> > Hi Jing, > > > > > > > > > > > > >> > I means the join key is not necessary [message truncated...]