Hi,
I will do my best to provide pyflink related content, I hope it helps you.

>>>  each udf function is a separate process, that is managed by Beam (but
I'm not sure I got it right).

Strictly speaking, it is not true that every UDF is in a different python
process. For example, the two python functions of udf1 and udf2 such as
udf1(udf2(a)) are running in a python process, and you can even think that
there is a return value of python wrap func udf1(udf2(a)). In fact, you can
think that in most of the cases, we will put multiple python udf together
to improve its performance.

>>> Does it mean that I can register multiple udf functions with different
versions of the same library or what would be even better with different
python environments and they won't clash

A PyFlink job All nodes use the same python environment path currently. So
there is no way to make each UDF use a different python execution
environment. Maybe you need to use multiple jobs to achieve this effect.

Best,
Xingbo

Sharipov, Rinat <r.shari...@cleverdata.ru> 于2020年10月10日周六 上午1:18写道:

> Hi mates !
>
> I've just read an amazing article
> <https://medium.com/@Alibaba_Cloud/the-flink-ecosystem-a-quick-start-to-pyflink-6ad09560bf50>
> about PyFlink and I'm absolutely delighted.
> I got some questions about udf registration, and it seems that it's
> possible to specify the list of libraries that should be used to evaluate
> udf functions.
>
> As far as I understand, each udf function is a separate process, that is
> managed by Beam (but I'm not sure I got it right).
> Does it mean that I can register multiple udf functions with different
> versions of the same library or what would be even better with different
> python environments and they won't clash ?
>
> A few words about the task that I'm trying to solve: I would like to build
> a recommendation pipeline that will accumulate features as a table and make
> recommendations using models from Ml flow registry. Since I don't want to
> limit data analysts from usage in all libraries that they won't, the best
> solution
> for me - assemble the environment using conda descriptor and register a
> UDF function.
>
> Kubernetes and Kubeflow are not an option for us yet, so we are trying to
> include models into existing pipelines.
>
> thx !
>
>
>
>
>

Reply via email to