Re: how to query the output of the scalar table function

Fabian Hueske Wed, 04 Apr 2018 04:17:52 -0700

Hi Darshan,

What you observe is the result of what's supposed to be an optimization. By
fusing the two select() calls, we reduce the number of operators in the
resulting plan (one MapFunction less).
This optimization is only applied for ScalarFunctions but not for
TableFunctions.
With a better cost-modeling that increases the cost of user-defined
ScalarFunctions it should be possible to "convince" the optimizer to not
fuse the operators.


For now, you could either use the TableFunction approach or convert the
result of the first select() into a DataStream (or DataSet depending on
your setup) and register that again as a table.
That would split the plan into two parts which are independently optimized
and hence the select() operators would not be merged.

Best, Fabian



2018-03-30 22:07 GMT+02:00 Darshan Singh <darshan.m...@gmail.com>:

> Hi,
>
> I am not able to find what is best way to query the output of a scalar
> table function.
>
> Suppose I have table which has column col1 which is string.
>
> I have a scalar function and returns a POJO
> {col1V1 String, col1V2 String , col1V3 String}.
>
>
> I am using following.
>
> so table.select("sf(col1) as sfc")
> .select("sfc.get('col1V1') as v1, sfc.get('col1V2') as v2 ,
> sfc.get('col1V13') as v3 ")
>
> It is working fine but strangely enough it calls the sf 3 times(each per
> get) for same row. However, If I use table function it calls just once.
>
> So to me it seems that scalar function is very expensive so if I have 10
> columns and the computation is expensive it will do 10 times. Thats why I
> thought maybe there is a better way to return pojo from scalar function
> rather than what I have been using.
>
> If this is the best way then I wonder if scalar functions should be used
> to return single value only.
>
> Thanks
>

Re: how to query the output of the scalar table function

Reply via email to