[GitHub] [flink] WeiZhong94 commented on issue #9865: [FLINK-14212][python] Support no-argument Python UDFs.
WeiZhong94 commented on issue #9865: [FLINK-14212][python] Support no-argument Python UDFs. URL: https://github.com/apache/flink/pull/9865#issuecomment-541303687 > > @JingsongLi I noticed that the "getScalarFunction" method in "ScalarSqlFunction" of blink planner was replaced by "makeFunction" method. I think using "getScalarFunction" to get the initial ScalarFunction object and check its language type makes more sense here so I re-add the method. Please take a look at this PR to make sure it does not cause any side effects to blink planner :). > > LGTM, it is reasonable, can you just use scala val to scalarFunction just like `TableSqlFunction`? Thanks for your reply! That makes sense to me. I have updated the code in the latest commit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] WeiZhong94 commented on issue #9865: [FLINK-14212][python] Support no-argument Python UDFs.
WeiZhong94 commented on issue #9865: [FLINK-14212][python] Support no-argument Python UDFs. URL: https://github.com/apache/flink/pull/9865#issuecomment-540973875 @hequn8128 Thanks for your review! I have addressed your comments in the latest commit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] WeiZhong94 commented on issue #9865: [FLINK-14212][python] Support no-argument Python UDFs.
WeiZhong94 commented on issue #9865: [FLINK-14212][python] Support no-argument Python UDFs. URL: https://github.com/apache/flink/pull/9865#issuecomment-540956430 @hequn8128 Thanks for your feedback! Dian Fu and me has talked about these two approach and we come to an agreement that skip the optimization in `ExpressionReducer` and optimize the UDFs during runtime. Our thoughts are as follows: 1. How to optimize Python UDFs during runtime? After support constant parameters in Python UDF(see [this PR](https://github.com/apache/flink/pull/9858)), we can do this optimization when evaluating the chained Python UDFs in python worker: If the evaluated Python UDF is deterministic and has no argument or its arguments are all constant value, which means it will always return a constant value, replace it with the constant result value. This rule can be applied recursively until all the deterministic UDFs with constant inputs have been replaced. If the root nodes of the chained Python UDFs become constant values, we can only transmit them only once and replace them with None in following transmission to save IO resource. The Java operator also knows which fields of the evaluated result should be constant value rather than None because the reduce rule can be applied in Java side too. No additional interaction between Java operator and Python worker is needed in this design. 2. Why not optimize Python UDFs during optimization phase? Reducing Python UDFs in optimization phase is not a easy work in current design. It means the generated Java wrappers of Python UDFs can be evaluated and return the correct result. In other word we need run Python UDFs in client side, but the Python UDFs is designed to run on cluster whose python environment may different from client side after we introduce environment and dependency management in the future. To solve the environment problem, we need to prepare a python environment that is the same as the python environment on cluster before reducing Python UDFs. To evaluate the Python UDF, we need to implement a new UDF runner which does not depend on Apache Beam(the client side of Flink Python API does not depend on Apache Beam). We know if we complete these it will be a perfect solution of this problem, but it is too expensive compared to runtime optimization. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] WeiZhong94 commented on issue #9865: [FLINK-14212][python] Support no-argument Python UDFs.
WeiZhong94 commented on issue #9865: [FLINK-14212][python] Support no-argument Python UDFs. URL: https://github.com/apache/flink/pull/9865#issuecomment-540463353 @dianfu Thanks for your review! I have addressed the comment in the latest commit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] WeiZhong94 commented on issue #9865: [FLINK-14212][python] Support no-argument Python UDFs.
WeiZhong94 commented on issue #9865: [FLINK-14212][python] Support no-argument Python UDFs. URL: https://github.com/apache/flink/pull/9865#issuecomment-540452708 @dianfu Thanks for your review again! I have addressed your comments in the latest commit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] WeiZhong94 commented on issue #9865: [FLINK-14212][python] Support no-argument Python UDFs.
WeiZhong94 commented on issue #9865: [FLINK-14212][python] Support no-argument Python UDFs. URL: https://github.com/apache/flink/pull/9865#issuecomment-540423623 @JingsongLi I noticed that the "getScalarFunction" method in "ScalarSqlFunction" of blink planner was replaced by "makeFunction" method. I think using "getScalarFunction" to get the initial ScalarFunction object and check its language type makes more sense here so I re-add the method. Please take a look at this PR to make sure it does not cause any side effects to blink planner :). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] WeiZhong94 commented on issue #9865: [FLINK-14212][python] Support no-argument Python UDFs.
WeiZhong94 commented on issue #9865: [FLINK-14212][python] Support no-argument Python UDFs. URL: https://github.com/apache/flink/pull/9865#issuecomment-540416014 @dianfu Thanks for your comments! I have addressed them in the latest commit, please take a look. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services