Re: [PyFlink] register udf functions with different versions of the same library in the same job

Sharipov, Rinat Sun, 11 Oct 2020 23:43:52 -0700

Hi Xingbo ! Thx a lot for such a detailed reply, it is very useful.

пн, 12 окт. 2020 г. в 09:32, Xingbo Huang <hxbks...@gmail.com>:


> Hi,
> I will do my best to provide pyflink related content, I hope it helps you.
>
> >>>  each udf function is a separate process, that is managed by Beam (but
> I'm not sure I got it right).
>
> Strictly speaking, it is not true that every UDF is in a different python
> process. For example, the two python functions of udf1 and udf2 such as
> udf1(udf2(a)) are running in a python process, and you can even think that
> there is a return value of python wrap func udf1(udf2(a)). In fact, you can
> think that in most of the cases, we will put multiple python udf together
> to improve its performance.
>
> >>> Does it mean that I can register multiple udf functions with different
> versions of the same library or what would be even better with different
> python environments and they won't clash
>
> A PyFlink job All nodes use the same python environment path currently. So
> there is no way to make each UDF use a different python execution
> environment. Maybe you need to use multiple jobs to achieve this effect.
>
> Best,
> Xingbo
>
> Sharipov, Rinat <r.shari...@cleverdata.ru> 于2020年10月10日周六 上午1:18写道：
>
>> Hi mates !
>>
>> I've just read an amazing article
>> <https://medium.com/@Alibaba_Cloud/the-flink-ecosystem-a-quick-start-to-pyflink-6ad09560bf50>
>> about PyFlink and I'm absolutely delighted.
>> I got some questions about udf registration, and it seems that it's
>> possible to specify the list of libraries that should be used to evaluate
>> udf functions.
>>
>> As far as I understand, each udf function is a separate process, that is
>> managed by Beam (but I'm not sure I got it right).
>> Does it mean that I can register multiple udf functions with different
>> versions of the same library or what would be even better with different
>> python environments and they won't clash ?
>>
>> A few words about the task that I'm trying to solve: I would like to
>> build a recommendation pipeline that will accumulate features as a table
>> and make
>> recommendations using models from Ml flow registry. Since I don't want to
>> limit data analysts from usage in all libraries that they won't, the best
>> solution
>> for me - assemble the environment using conda descriptor and register a
>> UDF function.
>>
>> Kubernetes and Kubeflow are not an option for us yet, so we are trying to
>> include models into existing pipelines.
>>
>> thx !
>>
>>
>>
>>
>>

Re: [PyFlink] register udf functions with different versions of the same library in the same job

Reply via email to