Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10133#discussion_r46947151
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects.scala
 ---
    @@ -359,10 +359,27 @@ case class LambdaVariable(value: String, isNull: 
String, dataType: DataType) ext
     case class MapObjects(
         function: AttributeReference => Expression,
         inputData: Expression,
    -    elementType: DataType) extends Expression {
    +    elementType: DataType)(
    +    completeFunctionCopy: Option[Expression] = None) extends Expression {
     
    -  private lazy val loopAttribute = AttributeReference("loopVar", 
elementType)()
    -  private lazy val completeFunction = function(loopAttribute)
    +  lazy val completeFunction = completeFunctionCopy.getOrElse {
    +    function(AttributeReference("loopVar", elementType)())
    +  }
    +
    +  lazy val loopAttribute = completeFunction.collectFirst {
    --- End diff --
    
    here is the tricky part:
    
    * For `TreeNode.transform`, we will go through the `productIterator` to 
match the children, because we need to make a new copy if transformation 
changed something.
    * For `TreeNode.collect`, we will go through the children directly, so here 
we may collect the `loopVar` of nested `MapObjects` in `completeFunction`, 
which is wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to