GitHub user windpiger opened a pull request:

    https://github.com/apache/spark/pull/16267

    [SPARK-18841][SQL]fix PushProjectionThroughUnion throw exception when there 
are same column name 

    ## What changes were proposed in this pull request?
      a union SQL with the same column name, after apply the rule 
RemoveAliasOnlyProject&PushProjectionThroughUnion many times, it will throw a 
exception
    
    The reason is that RemoveAliasOnlyProject rule remove the left project 
child(alias only project) of Union, and replace the attribute of Union & the 
right project child of Union, then apply PushProjectionThroughUnion rule ,as 
the output attributes
    of a union are always equal to the left child's output, so it will throw a 
exception that the left child do not contain a attribute of the Union.
    
    for example:
    ```
       >spark.sql("DROP TABLE IF EXISTS p1")
       >spark.sql("DROP TABLE IF EXISTS p2")
        >spark.sql("DROP TABLE IF EXISTS p3")
    
        >spark.sql("CREATE TABLE p1 (col int)" )
        >spark.sql("CREATE TABLE p2 (col int)")
        >spark.sql("CREATE TABLE p3 (col int)")
        >spark.sql("set spark.sql.crossJoin.enabled = true")
       >spark.sql("SELECT 1 as cste,col FROM (SELECT col as col FROM (SELECT 
p1.col as col FROM p1 LEFT JOIN p2 UNION ALL SELECT col FROM p3 ) T1) T2").show
    ```
    
    exception:
    ```
    key not found: col#16
    java.util.NoSuchElementException: key not found: col#16
            at scala.collection.MapLike$class.default(MapLike.scala:228)
            at 
org.apache.spark.sql.catalyst.expressions.AttributeMap.default(AttributeMap.scala:31)
            at scala.collection.MapLike$class.apply(MapLike.scala:141)
            at 
org.apache.spark.sql.catalyst.expressions.AttributeMap.apply(AttributeMap.scala:31)
            at 
org.apache.spark.sql.catalyst.optimizer.PushProjectionThroughUnion$$anonfun$2.applyOrElse(Optimizer.scala:346)
            at 
org.apache.spark.sql.catalyst.optimizer.PushProjectionThroughUnion$$anonfun$2.applyOrElse(Optimizer.scala:345)
            at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:292)
            at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:292)
            at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
            at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:291)
            at 
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:281)
            at 
org.apache.spark.sql.catalyst.optimizer.PushProjectionThroughUnion$.org$apache$spark$sql$catalyst$optimizer$PushProjectionThroughUnion$$pushToRight(Optimizer.scala:345)
            at 
org.apache.spark.sql.catalyst.optimizer.PushProjectionThroughUnion$$anonfun$apply$4$$anonfun$8$$anonfun$apply$31.apply(Optimizer.scala:378)
            at 
org.apache.spark.sql.catalyst.optimizer.PushProjectionThroughUnion$$anonfun$apply$4$$anonfun$8$$anonfun$apply$31.apply(Optimizer.scala:378)
            at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
            at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
            at scala.collection.immutable.List.foreach(List.scala:381)
            at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
            at scala.collection.immutable.List.map(List.scala:285)
            at 
org.apache.spark.sql.catalyst.optimizer.PushProjectionThroughUnion$$anonfun$apply$4$$anonfun$8.apply(Optimizer.scala:378)
    ```
    
    ## How was this patch tested?
    unit test added

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/windpiger/spark 
FixPushDownUnionProjWithRemoveOnlyAliasProj

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16267.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16267
    
----
commit b7b27af579b7f3f1df31834575b9b3a994c2d806
Author: windpiger <[email protected]>
Date:   2016-12-13T14:34:22Z

    [SPARK-18841][SQL]fix PushProjectionThroughUnion throw exception when there 
are same column name

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to