imback82 commented on issue #26441: [SPARK-29682][SQL] Resolve conflicting 
references in aggregate expressions 
URL: https://github.com/apache/spark/pull/26441#issuecomment-553230414
 
 
   Thanks @cloud-fan! Your suggested solution of updating `Expand` works as 
expected. However, I do not think the following 
   ```Scala
   def output = child.output ++ additionalOutput
   ```
   is always true.
   
   For example,
   ```
   Expand [List(nums#3, nums#37, 0), List(nums#3, null, 1)], [nums#3, nums#38, 
spark_grouping_id#36]
     +- Project [nums#3, nums#3 AS nums#37]
   ```
   `#37` is an output of child, but not an output of `Expand`.
   
   So instead of adding `additionalOutput` to `Expand`, I just did the 
following:
   ```Scala
   case oldVersion: Expand if 
oldVersion.producedAttributes.intersect(conflictingAttributes).nonEmpty =>
     val producedAttributes = oldVersion.producedAttributes
     val newOutput = oldVersion.output.map{ e =>
       if (producedAttributes.contains(e)) { e.newInstance() } else { e } }
     (oldVersion, oldVersion.copy(output = newOutput))
   ```
   where `Expand.producedAttributes` is updated as:
   ```Scala
   override def producedAttributes: AttributeSet = AttributeSet(output diff 
child.output)
   ```
   
   Let me know if this approach is fine instead of updating `Expand`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to