Hi Aitozi,

I think it is necessary to add the following description in FLIP to express the 
difference between user-defined asynchronous table function and 
AsyncTableFunction:

User-defined asynchronous table functions allow complex parameters (e.g., Row 
type) to be passed to function, which is important in RPC, rather than using 
‘join … on ...'. 

Thanks,
Awake.


On 2023/06/26 02:31:59 Aitozi wrote:
> Hi Lincoln,
>     Thanks for your confirmation. I have updated the consensus to the FLIP
> doc.
> If there are no other comments, I'd like to restart the vote process in [1]
> today.
> 
> https://lists.apache.org/thread/7g5n2vshosom2dj9bp7x4n01okrnx4xx
> 
> Thanks,
> Aitozi.
> 
> Lincoln Lee <li...@gmail.com> 于2023年6月21日周三 22:29写道:
> 
> > Hi Aitozi,
> >
> > Thanks for your updates!
> >
> > By the design of hints, the hints after select clause belong to the query
> > hints category, and this new hint is also a kind of join hints[1].
> > Join table function is one of the join type defined by flink sql joins[2],
> > all existing join hints[1] omit the 'join' keyword,
> > so I would prefer the 'ASYNC_TABLE_FUNC' (which is actually the one for
> > 'ASYNC_TABLE_FUNC_JOIN').
> >
> > Since a short Chinese holiday is coming, I suggest waiting for other
> > people's responses before continuing to vote (next monday?)
> >
> > Btw, I discussed with @fudian offline about pyflink support, there should
> > be no known issues, so you can create a subtask with pyflink support after
> > the vote passed.
> >
> > [1]
> >
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#join-hints
> > [2]
> >
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Aitozi <gj...@gmail.com> 于2023年6月18日周日 21:18写道:
> >
> > > Hi all,
> > >     Sorry for the late reply, I have a discussion with Lincoln offline,
> > > mainly about
> > > the naming of the hints option. Thanks Lincoln for the valuable
> > > suggestions.
> > >
> > > Let me answer the last email inline.
> > >
> > > >For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC call
> > as
> > > an example?
> > >
> > > Sure, will give an example when adding the doc of async udtf and will
> > > update the FLIP simultaneously
> > >
> > > >For the name of this query hint, 'LATERAL' (include its internal
> > options)
> > > don't show any relevance to async, but I haven't thought of a suitable
> > name
> > > at the moment,
> > >
> > > After some discussion with Lincoln, We prefer to choose one of the
> > > `ASYNC_TABLE_FUNC` and `ASYNC_LATERAL`.
> > > Besides, In my opinion the keyword `lateral`'s use scenario is wider than
> > > the table function join, but in this case we only want to config
> > > the async table function, So I'm a bit more lean to the
> > `ASYNC_TABLE_FUNC`.
> > > Looking forward to some inputs if you guys have
> > > some better suggestion on the naming.
> > >
> > > For the usage of the hints config option, I have updated the section
> > > of ConfigOption, you can refer to the FLIP
> > > for more details.
> > >
> > > >Also, the terms 'correlate join' and 'lateral join' are not the same as
> > in
> > > the current joins page[1], so maybe it would be better if we unified them
> > > into  'join table function'
> > >
> > > Yes, we should unified to the 'join table function', updated.
> > >
> > > Best,
> > > Aitozi
> > >
> > > Lincoln Lee <li...@gmail.com> 于2023年6月15日周四 09:15写道:
> > >
> > > > Hi Aitozi,
> > > >
> > > > Thanks for your reply!  Gives sql users more flexibility to get
> > > > asynchronous processing capabilities via lateral join table function +1
> > > for
> > > > this
> > > >
> > > > For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC call
> > > as
> > > > an example?
> > > >
> > > > For the name of this query hint, 'LATERAL' (include its internal
> > options)
> > > > don't show any relevance to async, but I haven't thought of a suitable
> > > name
> > > > at the moment,
> > > > maybe we need to highlight the async keyword directly, we can also see
> > if
> > > > others have better candidates
> > > >
> > > > For the hint option "timeout = '180s'" should be "'timeout' = '180s'",
> > > > seems a typo in the flip. And use upper case for all keywords in sql
> > > > examples.
> > > > Also, the terms 'correlate join' and 'lateral join' are not the same as
> > > in
> > > > the current joins page[1], so maybe it would be better if we unified
> > them
> > > > into  'join table function'
> > > >
> > > > [1]
> > > >
> > > >
> > >
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#table-function
> > > >
> > > > Best,
> > > > Lincoln Lee
> > > >
> > > >
> > > > Aitozi <gj...@gmail.com> 于2023年6月14日周三 16:11写道:
> > > >
> > > > > Hi Lincoln
> > > > >
> > > > >     Very thanks for your valuable question. I will try to answer your
> > > > > questions inline.
> > > > >
> > > > > >Does the async udtf bring any additional benefits besides a
> > > > > lighter implementation?
> > > > >
> > > > > IMO, async udtf is more than a lighter implementation. It can act as
> > a
> > > > > general way for sql users to use the async operator. And they don't
> > > have
> > > > to
> > > > > bind the async function with a table (a LookupTable), and they are
> > not
> > > > > forced to join on an equality join condition, and they can use it to
> > do
> > > > > more than enrich data.
> > > > >
> > > > > The async lookup join is more like a subset/specific usage of async
> > > udtf.
> > > > > The specific version has more opportunity to be optimized (like push
> > > > down)
> > > > > is acceptable. Async table function should be categorized to
> > > used-defined
> > > > > function.
> > > > >
> > > > > >Should users
> > > > >
> > > > > migrate to the lookup source when they encounter similar requirements
> > > or
> > > > >
> > > > > problems, or should we develop an additional set of similar
> > mechanisms?
> > > > >
> > > > > As I clarified above, the lookup join is a specific usage of async
> > > udtf.
> > > > So
> > > > > it deserves more refined optimization like caching / retryable. But
> > it
> > > > may
> > > > > not all
> > > > >
> > > > > suitable for the async udtf. As function, it can be deterministic/or
> > > > > non-deterministic. So caching is not suitable, and we also do not
> > have
> > > a
> > > > > common cache for the udf now. So I think optimization like
> > > caching/retry
> > > > > should be handed over to the function implementor.
> > > > >
> > > > > > the newly added query hint need a different name that
> > > > > can be easier related to the lateral operation as the current join
> > > > hints[5]
> > > > > do.
> > > > >
> > > > >
> > > > > What about using LATERAL?
> > > > >
> > > > > as below
> > > > >
> > > > > SELECT /*+ LATERAL('output-mode' = 'ordered', 'capacity' = '200',
> > > > timeout =
> > > > > '180s') */ a, c1, c2
> > > > >
> > > > > FROM T1
> > > > >
> > > > > LEFT JOIN lateral TABLE (async_split(b)) AS T(c1, c2) ON true
> > > > >
> > > > > >For the async func example, since the target scenario is an external
> > > io
> > > > > operation, it's better to add the `close` method to actively release
> > > > > resources as a good example for users
> > > > >
> > > > >
> > > > > Make sense to me, will update the FLIP
> > > > >
> > > > > Best,
> > > > >
> > > > > Aitozi.
> > > > >
> > > > > Lincoln Lee <li...@gmail.com> 于2023年6月14日周三 14:24写道:
> > > > >
> > > > > > Hi Aitozi,
> > > > > >
> > > > > > Sorry for the lately reply here!  Supports async
> > > > > udtf(`AsyncTableFunction`)
> > > > > > directly in sql seems like an attractive feature, but there're two
> > > > issues
> > > > > > that need to be addressed before we can be sure to add it:
> > > > > > 1. As mentioned in the flip[1], the current lookup function can
> > > already
> > > > > > implement the requirements, but it requires implementing an extra
> > > > > > `LookupTableSource` and explicitly declaring the table schema
> > (which
> > > > can
> > > > > > help implementers the various push-down optimizations supported by
> > > the
> > > > > > planner). Does the async udtf bring any additional benefits
> > besides a
> > > > > > lighter implementation?
> > > > > > 2. FLIP-221[2] abstracts a reusable cache and metric infrastructure
> > > for
> > > > > > lookup sources, which are important to improve performance and
> > > > > > observability for high overhead external io scenarios, how do we
> > > > > integrate
> > > > > > and reuse these capabilities after introducing async udtf? Should
> > > users
> > > > > > migrate to the lookup source when they encounter similar
> > requirements
> > > > or
> > > > > > problems, or should we develop an additional set of similar
> > > mechanisms?
> > > > > (a
> > > > > > similarly case:  FLIP-234[3] introduced the retryable capability
> > for
> > > > > lookup
> > > > > > join)
> > > > > >
> > > > > > For the flip itself,
> > > > > > 1. Considering the 'options' is already used as the dynamic table
> > > > > > options[4] in flink, the newly added query hint need a different
> > name
> > > > > that
> > > > > > can be easier related to the lateral operation as the current join
> > > > > hints[5]
> > > > > > do.
> > > > > > 2. For the async func example, since the target scenario is an
> > > external
> > > > > io
> > > > > > operation, it's better to add the `close` method to actively
> > release
> > > > > > resources as a good example for users. Also in terms of the
> > > determinism
> > > > > of
> > > > > > a function, it is important to remind users that unless the
> > behavior
> > > of
> > > > > the
> > > > > > function is deterministic, it needs to be explicitly declared as
> > > > > > non-deterministic.
> > > > > >
> > > > > > [1].
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-313%3A+Add+support+of+User+Defined+AsyncTableFunction?src=contextnavpagetreemode
> > > > > > [2].
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric?src=contextnavpagetreemode
> > > > > > [3].
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems?src=contextnavpagetreemode
> > > > > > [4].
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL?src=contextnavpagetreemode
> > > > > > [5].
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-229%3A+Introduces+Join+Hint+for+Flink+SQL+Batch+Job?src=contextnavpagetreemode
> > > > > >
> > > > > > Best,
> > > > > > Lincoln Lee
> > > > > >
> > > > > >
> > > > > > Aitozi <gj...@gmail.com> 于2023年6月13日周二 11:30写道:
> > > > > >
> > > > > > > Get your meaning now, thanks :)
> > > > > > >
> > > > > > > Best,
> > > > > > > Aitozi.
> > > > > > >
> > > > > > > Feng Jin <ji...@gmail.com> 于2023年6月13日周二 11:16写道:
> > > > > > >
> > > > > > > > Hi Aitozi,
> > > > > > > >
> > > > > > > > Sorry for the confusing description.
> > > > > > > >
> > > > > > > > What I meant was that if we need to remind users about tire
> > > safety
> > > > > > > issues,
> > > > > > > > we should introduce the new UDTF interface instead of executing
> > > the
> > > > > > > > original UDTF asynchronously. Therefore, I agree with
> > introducing
> > > > the
> > > > > > > > AsyncTableFunction.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Feng
> > > > > > > >
> > > > > > > > On Tue, Jun 13, 2023 at 10:42 AM Aitozi <gj...@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Feng,
> > > > > > > > >     Thanks for your question. We do not provide a way to
> > switch
> > > > the
> > > > > > > UDTF
> > > > > > > > > between sync and async way,
> > > > > > > > > So there should be no thread safety problem here.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Aitozi
> > > > > > > > >
> > > > > > > > > Feng Jin <ji...@gmail.com> 于2023年6月13日周二 10:31写道:
> > > > > > > > >
> > > > > > > > > > Hi Aitozi, We do need to remind users about thread safety
> > > > issues.
> > > > > > > Thank
> > > > > > > > > you
> > > > > > > > > > for your efforts on this FLIP. I have no further questions.
> > > > > > > > > > Best, Feng
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Jun 13, 2023 at 6:05 AM Jing Ge
> > > > > <j...@ververica.com.invalid
> > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Aitozi,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for taking care of that part. I have no other
> > > concern.
> > > > > > > > > > >
> > > > > > > > > > > Best regards,
> > > > > > > > > > > Jing
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Jun 12, 2023 at 5:38 PM Aitozi <
> > > gjying1...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > BTW, If there are no other more blocking issue /
> > > comments,
> > > > I
> > > > > > > would
> > > > > > > > > like
> > > > > > > > > > > to
> > > > > > > > > > > > start a VOTE in another thread this wednesday 6.14
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Aitozi.
> > > > > > > > > > > >
> > > > > > > > > > > > Aitozi <gj...@gmail.com> 于2023年6月12日周一 23:34写道:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi, Jing,
> > > > > > > > > > > > >     Thanks for your explanation. I get your point
> > now.
> > > > > > > > > > > > >
> > > > > > > > > > > > > For the performance part, I think it's a good idea to
> > > run
> > > > > > with
> > > > > > > > > > > returning
> > > > > > > > > > > > a
> > > > > > > > > > > > > big table case, the memory consumption
> > > > > > > > > > > > > should be a point to be taken care about. Because in
> > > the
> > > > > > > ordered
> > > > > > > > > > mode,
> > > > > > > > > > > > the
> > > > > > > > > > > > > head element in buffer may affect the
> > > > > > > > > > > > > total memory consumption.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Aitozi.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Jing Ge <ji...@ververica.com.invalid> 于2023年6月12日周一
> > > > > 20:28写道:
> > > > > > > > > > > > >
> > > > > > > > > > > > >> Hi Aitozi,
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Which key will be used for lookup is not an issue,
> > > only
> > > > > one
> > > > > > > row
> > > > > > > > > will
> > > > > > > > > > > be
> > > > > > > > > > > > >> required for each key in order to enrich it. True,
> > it
> > > > > > depends
> > > > > > > on
> > > > > > > > > the
> > > > > > > > > > > > >> implementation whether multiple rows or single row
> > for
> > > > > each
> > > > > > > key
> > > > > > > > > will
> > > > > > > > > > > be
> > > > > > > > > > > > >> returned. However, for the lookup & enrichment
> > > scenario,
> > > > > one
> > > > > > > > > row/key
> > > > > > > > > > > is
> > > > > > > > > > > > >> recommended, otherwise, like I mentioned previously,
> > > > > > > enrichment
> > > > > > > > > > won't
> > > > > > > > > > > > >> work.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> I am a little bit concerned about returning a big
> > > table
> > > > > for
> > > > > > > each
> > > > > > > > > > key,
> > > > > > > > > > > > >> since
> > > > > > > > > > > > >> it will take the async call longer to return and
> > need
> > > > more
> > > > > > > > memory.
> > > > > > > > > > The
> > > > > > > > > > > > >> performance tests should cover this scenario. This
> > is
> > > > not
> > > > > a
> > > > > > > > > blocking
> > > > > > > > > > > > issue
> > > > > > > > > > > > >> for this FLIP.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Best regards,
> > > > > > > > > > > > >> Jing
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> On Sat, Jun 10, 2023 at 4:11 AM Aitozi <
> > > > > > gjying1...@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> > Hi Jing,
> > > > > > > > > > > > >> >     I means the join key is not necessary 
[message truncated...]

Reply via email to