Re: [DISCUSS] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2022-01-12 Thread Xingbo Huang
Hi Thomas, Thanks for the confirmation. I will now start a vote. Best, Xingbo Thomas Weise 于2022年1月12日周三 02:20写道: > Hi Xingbo, > > +1 from my side > > Thanks for the clarification. For your use case the parameter size and > therefore serialization overhead was the limiting factor. I have seen

Re: [DISCUSS] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2022-01-11 Thread Thomas Weise
Hi Xingbo, +1 from my side Thanks for the clarification. For your use case the parameter size and therefore serialization overhead was the limiting factor. I have seen use cases where that is not the concern, because the Python logic itself is heavy and dwarfs the protocol overhead (for example

Re: [DISCUSS] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2022-01-11 Thread Xingbo Huang
Hi everyone, Thanks to all of you for the discussion. If there are no objections, I would like to start a vote thread tomorrow. Best, Xingbo Xingbo Huang 于2022年1月7日周五 16:18写道: > Hi Till, > > I have written a more complicated PyFlink job. Compared with the previous > single python udf job,

Re: [DISCUSS] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2022-01-07 Thread Xingbo Huang
Hi Till, I have written a more complicated PyFlink job. Compared with the previous single python udf job, there is an extra stage of converting between table and datastream. Besides, I added a python map function for the job. Because python datastream has not yet implemented Thread mode, the

Re: [DISCUSS] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2022-01-05 Thread Till Rohrmann
Thanks for the detailed answer Xingbo. Quick question on the last figure in the FLIP. You said that this is a real world Flink stream SQL job. The title of the graph says UDF(String Upper). So do I understand correctly that string upper is the real world use case you have measured? What I wanted

Re: [DISCUSS] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2022-01-05 Thread Xingbo Huang
Hi Till and Thomas, Thanks a lot for joining the discussion. For Till: >>> Is the slower performance currently the biggest pain point for our Python users? What else are our Python users mainly complaining about? PyFlink users are most concerned about two parts, one is better usability, the

Re: [DISCUSS] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2022-01-03 Thread Thomas Weise
Interesting discussion. It caught my attention because I was also interested in the Beam fn execution overhead a few years ago. We found back then that while in theory the fn protocol overhead is very significant, for realistic function workloads that overhead was negligible. And of course it all

Re: [DISCUSS] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2022-01-03 Thread Till Rohrmann
One more question that came to my mind: How much performance improvement do we gain on a real-world Python use case? Were the measurements more like micro benchmarks where the Python UDF was called w/o the overhead of Flink? I would just be curious how much the Python component contributes to the

Re: [DISCUSS] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2022-01-03 Thread Till Rohrmann
Hi Xingbo, Thanks for creating this FLIP. I have two general questions about the motivation for this FLIP because I have only very little exposure to our Python users: Is the slower performance currently the biggest pain point for our Python users? What else are our Python users mainly

Re: [DISCUSS] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2021-12-31 Thread Xingbo Huang
Hi Wei, Thanks a lot for your feedback. Very good questions! >>> 1. It seems that we dynamically load an embedded Python and user dependencies in the TM process. Can they be uninstalled cleanly after the task finished? i.e. Can we use the Thread Mode in session mode and Pyflink shell? I

Re: [DISCUSS] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2021-12-30 Thread Wei Zhong
Hi Xingbo, Thanks for creating this FLIP. Big +1 for it! I have some question about the Thread Mode: 1. It seems that we dynamically load an embedded Python and user dependencies in the TM process. Can they be uninstalled cleanly after the task finished? i.e. Can we use the Thread Mode in

[DISCUSS] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2021-12-28 Thread Xingbo Huang
Hi everyone, I would like to start a discussion thread on "Support PyFlink Runtime Execution in Thread Mode" We have provided PyFlink Runtime framework to support Python user-defined functions since Flink 1.10. The PyFlink Runtime framework is called Process Mode, which depends on an