Yadong Qi created SPARK-12167:
---------------------------------

             Summary: Invoke the right sameResult function when plan is warpped 
with SubQueries
                 Key: SPARK-12167
                 URL: https://issues.apache.org/jira/browse/SPARK-12167
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.5.2
            Reporter: Yadong Qi


I find this bug when I use cache table,
```
spark-sql> create table src_p(key int, value int) stored as parquet;
OK
Time taken: 3.144 seconds
spark-sql> cache table src_p;
Time taken: 1.452 seconds
spark-sql> explain extended select count(*) from src_p;
```
I got the wrong physical plan
```
== Physical Plan ==
TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], 
output=[_c0#28L])
 TungstenExchange SinglePartition
  TungstenAggregate(key=[], 
functions=[(count(1),mode=Partial,isDistinct=false)], output=[currentCount#33L])
   Scan ParquetRelation[hdfs://9.91.8.131:9000/user/hive/warehouse/src_p][]
```
and the right physical plan is
```
== Physical Plan ==
TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], 
output=[_c0#47L])
 TungstenExchange SinglePartition
  TungstenAggregate(key=[], 
functions=[(count(1),mode=Partial,isDistinct=false)], output=[currentCount#62L])
   InMemoryColumnarTableScan (InMemoryRelation [key#45,value#46], true, 10000, 
StorageLevel(true, true, false, true, 1), (Scan 
ParquetRelation[hdfs://9.91.8.131:9000/user/hive/warehouse/src_p][key#9,value#10]),
 Some(src_p))
```

When the implementation classes of `MultiInstanceRelation`(eg. 
`LogicalRelation`, `LocalRelation`) are warpped with SubQueries, they can't 
invoke the right `sameResult` function in their own implementation. So we need 
to eliminate SubQueries first and then try to invoke `sameResult` function in 
their own implementation.
Like:
When plan is 
`Subquery(LogicalRelation(relation:ParquetRelation[hdfs://9.91.8.131:9000/user/hive/warehouse/src_p],
 expectedOutputAttributes:Some(ArrayBuffer(key#0, value#1))))`, first eliminate 
SubQueries, and then will invoke the `sameResult` function in `LogicalRelation` 
instead of `LogicalPlan`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to