GitHub user watermen opened a pull request:

    https://github.com/apache/spark/pull/10169

    [SPARK-12167][SQL] Invoke the right sameResult function when plan is 
warpped with SubQueries

    ### Bug
    I find this bug when I use cache table,
    ```
    spark-sql> create table src_p(key int, value int) stored as parquet;
    OK
    Time taken: 3.144 seconds
    spark-sql> cache table src_p;
    Time taken: 1.452 seconds
    spark-sql> explain extended select count(*) from src_p;
    ```
    I got the wrong physical plan
    ```
    == Physical Plan ==
    TungstenAggregate(key=[], 
functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#28L])
     TungstenExchange SinglePartition
      TungstenAggregate(key=[], 
functions=[(count(1),mode=Partial,isDistinct=false)], output=[currentCount#33L])
       Scan ParquetRelation[hdfs://9.91.8.131:9000/user/hive/warehouse/src_p][]
    ```
    and the right physical plan is
    ```
    == Physical Plan ==
    TungstenAggregate(key=[], 
functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#47L])
     TungstenExchange SinglePartition
      TungstenAggregate(key=[], 
functions=[(count(1),mode=Partial,isDistinct=false)], output=[currentCount#62L])
       InMemoryColumnarTableScan (InMemoryRelation [key#45,value#46], true, 
10000, StorageLevel(true, true, false, true, 1), (Scan 
ParquetRelation[hdfs://9.91.8.131:9000/user/hive/warehouse/src_p][key#9,value#10]),
 Some(src_p))
    ```
    ### Reason
    When the implementation classes of `MultiInstanceRelation`(eg. 
`LogicalRelation`, `LocalRelation`) are warpped with SubQueries, they can't 
invoke the right `sameResult` function in their own implementation. So we need 
to eliminate SubQueries first and then try to invoke `sameResult` function in 
their own implementation.
    Like:
    When plan is 
`Subquery(LogicalRelation(relation:ParquetRelation[hdfs://9.91.8.131:9000/user/hive/warehouse/src_p],
 expectedOutputAttributes:Some(ArrayBuffer(key#0, value#1))))`, first eliminate 
SubQueries, and then will invoke the `sameResult` function in `LogicalRelation` 
instead of `LogicalPlan`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/watermen/spark patch-2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10169.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10169
    
----
commit f1ef856ee4fa45bcd2143bf164dab61f6f17ce63
Author: Yadong Qi <[email protected]>
Date:   2015-12-07T03:47:05Z

    Invoke the right sameResult function when plan is warpped with SubQueries

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to