[GitHub] [spark] maropu commented on pull request #32424: [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions
maropu commented on pull request #32424: URL: https://github.com/apache/spark/pull/32424#issuecomment-839430240 > BTW, it has the same problem in Python and R too. I and @ueshin are working on them as followups. Ur, I missed that. Thank you, @HyukjinKwon @ueshin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #32424: [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions
maropu commented on pull request #32424: URL: https://github.com/apache/spark/pull/32424#issuecomment-839430081 > Why it's a problem only in scala API? how about SQL API? In SQL, since user-specified param names are used as they are, the same issue cannot happen; ``` scala> val df = Seq((Seq(1,2,3), Seq("a", "b", "c"))).toDF("numbers", "letters") scala> df.selectExpr(""" | FLATTEN( | TRANSFORM( | numbers, | number -> TRANSFORM( | letters, | letter -> (number AS number, letter AS letter) | ) | ) | ) AS zipped | """).explain(true) == Analyzed Logical Plan == zipped: array> Project [flatten(transform(numbers#7, lambdafunction(transform(letters#8, lambdafunction(named_struct(number, lambda number#14, letter, lambda letter#15), lambda letter#15, false)), lambda number#14, false))) AS zipped#13] ^^ ^^^ +- Project [_1#2 AS numbers#7, _2#3 AS letters#8] +- LocalRelation [_1#2, _2#3] ``` On the other hand, In DataFame APIs, the same param names were used in lambda functions, so the name conflict could happen; ``` scala> df.select( | flatten( | transform( | $"numbers", | (number: Column) => { transform( | $"letters", | (letter: Column) => { struct( | number.as("number"), | letter.as("letter") | ) } | ) } | ) | ).as("zipped") | ).explain(true) == Analyzed Logical Plan == zipped: array> Project [flatten(transform(numbers#7, lambdafunction(transform(letters#8, lambdafunction(struct(number, lambda x_0#20, letter, lambda x_1#21), lambda x_1#21, false)), lambda x_0#20, false))) AS zipped#19] ^^ ^^^ +- Project [_1#2 AS numbers#7, _2#3 AS letters#8] +- LocalRelation [_1#2, _2#3] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #32424: [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions
maropu commented on pull request #32424: URL: https://github.com/apache/spark/pull/32424#issuecomment-832393349 GA passed. Merged to master/3.1/3.0. Thank you for the review, @ueshin ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #32424: [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions
maropu commented on pull request #32424: URL: https://github.com/apache/spark/pull/32424#issuecomment-831601631 cc: @HyukjinKwon @ueshin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org