Hi, I will do my best to provide pyflink related content, I hope it helps you.
>>> each udf function is a separate process, that is managed by Beam (but I'm not sure I got it right). Strictly speaking, it is not true that every UDF is in a different python process. For example, the two python functions of udf1 and udf2 such as udf1(udf2(a)) are running in a python process, and you can even think that there is a return value of python wrap func udf1(udf2(a)). In fact, you can think that in most of the cases, we will put multiple python udf together to improve its performance. >>> Does it mean that I can register multiple udf functions with different versions of the same library or what would be even better with different python environments and they won't clash A PyFlink job All nodes use the same python environment path currently. So there is no way to make each UDF use a different python execution environment. Maybe you need to use multiple jobs to achieve this effect. Best, Xingbo Sharipov, Rinat <r.shari...@cleverdata.ru> 于2020年10月10日周六 上午1:18写道: > Hi mates ! > > I've just read an amazing article > <https://medium.com/@Alibaba_Cloud/the-flink-ecosystem-a-quick-start-to-pyflink-6ad09560bf50> > about PyFlink and I'm absolutely delighted. > I got some questions about udf registration, and it seems that it's > possible to specify the list of libraries that should be used to evaluate > udf functions. > > As far as I understand, each udf function is a separate process, that is > managed by Beam (but I'm not sure I got it right). > Does it mean that I can register multiple udf functions with different > versions of the same library or what would be even better with different > python environments and they won't clash ? > > A few words about the task that I'm trying to solve: I would like to build > a recommendation pipeline that will accumulate features as a table and make > recommendations using models from Ml flow registry. Since I don't want to > limit data analysts from usage in all libraries that they won't, the best > solution > for me - assemble the environment using conda descriptor and register a > UDF function. > > Kubernetes and Kubeflow are not an option for us yet, so we are trying to > include models into existing pipelines. > > thx ! > > > > >