[GitHub] [spark] zhengruifeng opened a new pull request, #39619: [SPARK-42089][CONNECT][PYTHON] Fix variable name issues in nested lambda functions

GitBox Mon, 16 Jan 2023 19:55:47 -0800


zhengruifeng opened a new pull request, #39619:
URL: https://github.com/apache/spark/pull/39619


   ### What changes were proposed in this pull request?
   1, https://github.com/apache/spark/pull/39068 reused the 
`UnresolvedAttribute` for the `UnresolvedNamedLambdaVariable`, but then 
`Column('x')` and `UnresolvedNamedLambdaVariable('x')` are mixed in `lambda x: 
x + cdf.x` (since we use `x/y/z` as augment names); this PR adds the 
`UnresolvedNamedLambdaVariable` back to distinguish between `Column('x')` and 
`UnresolvedNamedLambdaVariable('x')`;
   
   2, the `refreshVarName` logic in PySpark was added in 
https://github.com/apache/spark/pull/32523 to address similar issue in 
PySpark's Lambda Function, this PR adds a similar function in the Python Client 
to avoid rewriting the function expression in the server side, which is 
unnecessary and prone to error .
   
   ### Why are the changes needed?
   before this PR, the nested lambda function doesn't work properly
   
   
   ### Does this PR introduce _any_ user-facing change?
   no
   
   
   ### How was this patch tested?
   enabled UT and added UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zhengruifeng opened a new pull request, #39619: [SPARK-42089][CONNECT][PYTHON] Fix variable name issues in nested lambda functions

Reply via email to