Hi Xingbo ! Thx a lot for such a detailed reply, it is very useful. пн, 12 окт. 2020 г. в 09:32, Xingbo Huang <hxbks...@gmail.com>:
> Hi, > I will do my best to provide pyflink related content, I hope it helps you. > > >>> each udf function is a separate process, that is managed by Beam (but > I'm not sure I got it right). > > Strictly speaking, it is not true that every UDF is in a different python > process. For example, the two python functions of udf1 and udf2 such as > udf1(udf2(a)) are running in a python process, and you can even think that > there is a return value of python wrap func udf1(udf2(a)). In fact, you can > think that in most of the cases, we will put multiple python udf together > to improve its performance. > > >>> Does it mean that I can register multiple udf functions with different > versions of the same library or what would be even better with different > python environments and they won't clash > > A PyFlink job All nodes use the same python environment path currently. So > there is no way to make each UDF use a different python execution > environment. Maybe you need to use multiple jobs to achieve this effect. > > Best, > Xingbo > > Sharipov, Rinat <r.shari...@cleverdata.ru> 于2020年10月10日周六 上午1:18写道: > >> Hi mates ! >> >> I've just read an amazing article >> <https://medium.com/@Alibaba_Cloud/the-flink-ecosystem-a-quick-start-to-pyflink-6ad09560bf50> >> about PyFlink and I'm absolutely delighted. >> I got some questions about udf registration, and it seems that it's >> possible to specify the list of libraries that should be used to evaluate >> udf functions. >> >> As far as I understand, each udf function is a separate process, that is >> managed by Beam (but I'm not sure I got it right). >> Does it mean that I can register multiple udf functions with different >> versions of the same library or what would be even better with different >> python environments and they won't clash ? >> >> A few words about the task that I'm trying to solve: I would like to >> build a recommendation pipeline that will accumulate features as a table >> and make >> recommendations using models from Ml flow registry. Since I don't want to >> limit data analysts from usage in all libraries that they won't, the best >> solution >> for me - assemble the environment using conda descriptor and register a >> UDF function. >> >> Kubernetes and Kubeflow are not an option for us yet, so we are trying to >> include models into existing pipelines. >> >> thx ! >> >> >> >> >>