[GitHub] spark issue #15668: [SPARK-18137][SQL]Fix RewriteDistinctAggregates Unresolv...

windpiger Mon, 07 Nov 2016 02:01:33 -0800

Github user windpiger commented on the issue:

    https://github.com/apache/spark/pull/15668
  
    @cloud-fan  I rewrite the expand logic:
    1) If the aggFunction has unfoldable children,it will only expand the 
unfoldable children;
    2) If the aggFunction only has foldable children, it will only expand the 
first children, generally the aggFunction should not run foldable TypeChecker 
for the first child(this is the way to avoid foldable typechecker for the first 
child).
    
    for example:
    select percentile_approx(2,0.99999),sum(distinct 1),count(distinct 1,2,3,1) 
from src limit 1
    
    explained:
    `
    == Physical Plan ==
    CollectLimit 1
    +- *HashAggregate(keys=[], functions=[first(if ((gid#177 = 0)) 
percentile_approx(CAST(2 AS DOUBLE), CAST(0.99999BD AS DOUBLE), 10000)#180 else 
null, true), sum(if ((gid#177 = 1)) CAST(1 AS BIGINT)#178L else null), count(if 
((gid#177 = 2)) 1#179 else null, 2, 3, if ((gid#177 = 2)) 1#179 else null)])
       +- Exchange SinglePartition
          +- *HashAggregate(keys=[], functions=[partial_first(if ((gid#177 = 
0)) percentile_approx(CAST(2 AS DOUBLE), CAST(0.99999BD AS DOUBLE), 10000)#180 
else null, true), partial_sum(if ((gid#177 = 1)) CAST(1 AS BIGINT)#178L else 
null), partial_count(if ((gid#177 = 2)) 1#179 else null, 2, 3, if ((gid#177 = 
2)) 1#179 else null)])
             +- SortAggregate(key=[CAST(1 AS BIGINT)#178L, 1#179, gid#177], 
functions=[percentile_approx(2.0, 0.99999, 10000, 0, 0)])
                +- *Sort [CAST(1 AS BIGINT)#178L ASC NULLS FIRST, 1#179 ASC 
NULLS FIRST, gid#177 ASC NULLS FIRST], false, 0
                   +- Exchange hashpartitioning(CAST(1 AS BIGINT)#178L, 1#179, 
gid#177, 200)
                      +- SortAggregate(key=[CAST(1 AS BIGINT)#178L, 1#179, 
gid#177], functions=[partial_percentile_approx(2.0, 0.99999, 10000, 0, 0)])
                         +- *Sort [CAST(1 AS BIGINT)#178L ASC NULLS FIRST, 
1#179 ASC NULLS FIRST, gid#177 ASC NULLS FIRST], false, 0
                            +- *Expand [List(null, null, 0), List(1, null, 1), 
List(null, 1, 2)], [CAST(1 AS BIGINT)#178L, 1#179, gid#177]
                               +- HiveTableScan MetastoreRelation default, src
    `
    and the result is:
    `
    
+--------------------------------------------------------------------+---------------+--------------------------+
    |percentile_approx(CAST(2 AS DOUBLE), CAST(0.99999 AS DOUBLE), 
10000)|sum(DISTINCT 1)|count(DISTINCT 1, 2, 3, 1)|
    
+--------------------------------------------------------------------+---------------+--------------------------+
    |                                                                 2.0|      
        1|                         1|
    
+--------------------------------------------------------------------+---------------+--------------------------+
    `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #15668: [SPARK-18137][SQL]Fix RewriteDistinctAggregates Unresolv...

Reply via email to