Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/5821#issuecomment-110970960
  
    cc @marmbrus actually I think we need a rule like this. @mengxr just 
constructed a case:
    
    ```
    == Parsed Logical Plan ==
    Aggregate [brand#1902], [brand#1902,COUNT(1) AS count#2769L]
     Limit 10
      Sort [-num_reviews#2161L ASC], true
       input
    
    == Analyzed Logical Plan ==
    brand: string, count: bigint
    Aggregate [brand#1902], [brand#1902,COUNT(1) AS count#2769L]
     Limit 10
      Sort [-num_reviews#2161L ASC], true
       Aggregate [brand#1902], [brand#1902,COUNT(1) AS num_reviews#2161L]
        Project 
[asin#1892,helpful#1893,overall#1894,reviewText#1895,reviewTime#1896,reviewerID#1897,reviewerName#1898,summary#1899,unixReviewTime#1900L,brand#1902,categories#1903,imUrl#1904,price#1905,related#1906,salesRank#1907,title#1908]
         Join Inner, Some((asin#1892 = asin#1901))
          
Relation[asin#1892,helpful#1893,overall#1894,reviewText#1895,reviewTime#1896,reviewerID#1897,reviewerName#1898,summary#1899,unixReviewTime#1900L]
 org.apache.spark.sql.parquet.ParquetRelation2@61c77b27
          
Relation[asin#1901,brand#1902,categories#1903,imUrl#1904,price#1905,related#1906,salesRank#1907,title#1908]
 org.apache.spark.sql.parquet.ParquetRelation2@ab4decb4
    
    == Optimized Logical Plan ==
    Aggregate [brand#1902], [brand#1902,COUNT(1) AS count#2769L]
     Limit 10
      Project [brand#1902]
       Sort [-num_reviews#2161L ASC], true
        Aggregate [brand#1902], [brand#1902,COUNT(1) AS num_reviews#2161L]
         Project [brand#1902]
          Join Inner, Some((asin#1892 = asin#1901))
           Project [asin#1892]
            
Relation[asin#1892,helpful#1893,overall#1894,reviewText#1895,reviewTime#1896,reviewerID#1897,reviewerName#1898,summary#1899,unixReviewTime#1900L]
 org.apache.spark.sql.parquet.ParquetRelation2@61c77b27
           Project [brand#1902,asin#1901]
            
Relation[asin#1901,brand#1902,categories#1903,imUrl#1904,price#1905,related#1906,salesRank#1907,title#1908]
 org.apache.spark.sql.parquet.ParquetRelation2@ab4decb4
    
    == Physical Plan ==
    Aggregate false, [brand#1902], 
[brand#1902,Coalesce(SUM(PartialCount#2772L),0) AS count#2769L]
     Aggregate true, [brand#1902], [brand#1902,COUNT(1) AS PartialCount#2772L]
      Limit 10
       Project [brand#1902]
        Sort [-num_reviews#2161L ASC], true
         Exchange (RangePartitioning 200)
          Aggregate false, [brand#1902], 
[brand#1902,Coalesce(SUM(PartialCount#2774L),0) AS num_reviews#2161L]
           Exchange (HashPartitioning 200)
            Aggregate true, [brand#1902], [brand#1902,COUNT(1) AS 
PartialCount#2774L]
             Project [brand#1902]
              ShuffledHashJoin [asin#1892], [asin#1901], BuildRight
               Exchange (HashPartitioning 200)
                PhysicalRDD [asin#1892], parquet MapPartitionsRDD[775] at
               Exchange (HashPartitioning 200)
                PhysicalRDD [brand#1902,asin#1901], parquet 
MapPartitionsRDD[777] at
    
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to