Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/5821#issuecomment-110970960
cc @marmbrus actually I think we need a rule like this. @mengxr just
constructed a case:
```
== Parsed Logical Plan ==
Aggregate [brand#1902], [brand#1902,COUNT(1) AS count#2769L]
Limit 10
Sort [-num_reviews#2161L ASC], true
input
== Analyzed Logical Plan ==
brand: string, count: bigint
Aggregate [brand#1902], [brand#1902,COUNT(1) AS count#2769L]
Limit 10
Sort [-num_reviews#2161L ASC], true
Aggregate [brand#1902], [brand#1902,COUNT(1) AS num_reviews#2161L]
Project
[asin#1892,helpful#1893,overall#1894,reviewText#1895,reviewTime#1896,reviewerID#1897,reviewerName#1898,summary#1899,unixReviewTime#1900L,brand#1902,categories#1903,imUrl#1904,price#1905,related#1906,salesRank#1907,title#1908]
Join Inner, Some((asin#1892 = asin#1901))
Relation[asin#1892,helpful#1893,overall#1894,reviewText#1895,reviewTime#1896,reviewerID#1897,reviewerName#1898,summary#1899,unixReviewTime#1900L]
org.apache.spark.sql.parquet.ParquetRelation2@61c77b27
Relation[asin#1901,brand#1902,categories#1903,imUrl#1904,price#1905,related#1906,salesRank#1907,title#1908]
org.apache.spark.sql.parquet.ParquetRelation2@ab4decb4
== Optimized Logical Plan ==
Aggregate [brand#1902], [brand#1902,COUNT(1) AS count#2769L]
Limit 10
Project [brand#1902]
Sort [-num_reviews#2161L ASC], true
Aggregate [brand#1902], [brand#1902,COUNT(1) AS num_reviews#2161L]
Project [brand#1902]
Join Inner, Some((asin#1892 = asin#1901))
Project [asin#1892]
Relation[asin#1892,helpful#1893,overall#1894,reviewText#1895,reviewTime#1896,reviewerID#1897,reviewerName#1898,summary#1899,unixReviewTime#1900L]
org.apache.spark.sql.parquet.ParquetRelation2@61c77b27
Project [brand#1902,asin#1901]
Relation[asin#1901,brand#1902,categories#1903,imUrl#1904,price#1905,related#1906,salesRank#1907,title#1908]
org.apache.spark.sql.parquet.ParquetRelation2@ab4decb4
== Physical Plan ==
Aggregate false, [brand#1902],
[brand#1902,Coalesce(SUM(PartialCount#2772L),0) AS count#2769L]
Aggregate true, [brand#1902], [brand#1902,COUNT(1) AS PartialCount#2772L]
Limit 10
Project [brand#1902]
Sort [-num_reviews#2161L ASC], true
Exchange (RangePartitioning 200)
Aggregate false, [brand#1902],
[brand#1902,Coalesce(SUM(PartialCount#2774L),0) AS num_reviews#2161L]
Exchange (HashPartitioning 200)
Aggregate true, [brand#1902], [brand#1902,COUNT(1) AS
PartialCount#2774L]
Project [brand#1902]
ShuffledHashJoin [asin#1892], [asin#1901], BuildRight
Exchange (HashPartitioning 200)
PhysicalRDD [asin#1892], parquet MapPartitionsRDD[775] at
Exchange (HashPartitioning 200)
PhysicalRDD [brand#1902,asin#1901], parquet
MapPartitionsRDD[777] at
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]