[GitHub] [spark] maropu commented on pull request #32424: [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions

2021-05-11 Thread GitBox


maropu commented on pull request #32424:
URL: https://github.com/apache/spark/pull/32424#issuecomment-839430240


   > BTW, it has the same problem in Python and R too. I and @ueshin are 
working on them as followups.
   
   Ur, I missed that. Thank you, @HyukjinKwon @ueshin 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #32424: [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions

2021-05-11 Thread GitBox


maropu commented on pull request #32424:
URL: https://github.com/apache/spark/pull/32424#issuecomment-839430081


   > Why it's a problem only in scala API? how about SQL API?
   
   In SQL, since user-specified param names are used as they are, the same 
issue cannot happen;
   ```
   scala> val df = Seq((Seq(1,2,3), Seq("a", "b", "c"))).toDF("numbers", 
"letters")
   scala> df.selectExpr("""
| FLATTEN(
| TRANSFORM(
| numbers,
| number -> TRANSFORM(
| letters,
| letter -> (number AS number, letter AS letter)
| )
| )
| ) AS zipped
| """).explain(true)
   
   == Analyzed Logical Plan ==
   zipped: array>
   Project [flatten(transform(numbers#7, lambdafunction(transform(letters#8, 
lambdafunction(named_struct(number, lambda number#14, letter, lambda 
letter#15), lambda letter#15, false)), lambda number#14, false))) AS zipped#13]

  
^^  ^^^
   +- Project [_1#2 AS numbers#7, _2#3 AS letters#8]
  +- LocalRelation [_1#2, _2#3]
   ```
   On the other hand, In DataFame APIs, the same param names were used in 
lambda functions, so the name conflict could happen;
   ```
   scala> df.select(
| flatten(
| transform(
| $"numbers",
| (number: Column) => { transform(
| $"letters",
| (letter: Column) => { struct(
| number.as("number"),
| letter.as("letter")
| ) }
| ) }
| )
| ).as("zipped")
| ).explain(true)
   
   == Analyzed Logical Plan ==
   zipped: array>
   Project [flatten(transform(numbers#7, lambdafunction(transform(letters#8, 
lambdafunction(struct(number, lambda x_0#20, letter, lambda x_1#21), lambda 
x_1#21, false)), lambda x_0#20, false))) AS zipped#19]

  
^^  ^^^
   +- Project [_1#2 AS numbers#7, _2#3 AS letters#8]
  +- LocalRelation [_1#2, _2#3]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #32424: [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions

2021-05-04 Thread GitBox


maropu commented on pull request #32424:
URL: https://github.com/apache/spark/pull/32424#issuecomment-832393349


   GA passed. Merged to master/3.1/3.0. Thank you for the review, @ueshin ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #32424: [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions

2021-05-03 Thread GitBox


maropu commented on pull request #32424:
URL: https://github.com/apache/spark/pull/32424#issuecomment-831601631


   cc: @HyukjinKwon @ueshin 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org