My guess is that in the second query, the size of the dataset is smaller,
and this causes the cost of sorting to be small enough that it is cheaper
than the HashAgg.

On Fri, Jul 10, 2015 at 4:27 PM, rahul challapalli <
[email protected]> wrote:

> Hi,
>
> Info about Data : The data is auto partitioned tpch 0.01 data. The second
> filter is a non-partitioned column, so in the first case the 'OR' predicate
> results in a full-table scan, while in the second case, partition pruning
> takes effect.
>
> The first case results in a hash agg and the second case in a streaming
> agg. Any idea why?
>
> 1. explain plan for select distinct l_modline, l_moddate from
> `tpch_multiple_partitions/lineitem_twopart` where l_moddate=date
> '1992-01-01' or l_shipdate=date'1992-01-01';
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(l_modline=[$0], l_moddate=[$1])
> 00-02        Project(l_modline=[$0], l_moddate=[$1])
> 00-03          HashAgg(group=[{0, 1}])
> 00-04            Project(l_modline=[$2], l_moddate=[$0])
> 00-05              SelectionVectorRemover
> 00-06                Filter(condition=[OR(=($0, 1992-01-01), =($1,
> 1992-01-01))])
> 00-07                  Project(l_moddate=[$2], l_shipdate=[$1],
> l_modline=[$0])
> 00-08                    Scan..........
>
> 2. explain plan for select distinct l_modline, l_moddate from
> `tpch_multiple_partitions/lineitem_twopart` where l_moddate=date
> '1992-01-01' and l_shipdate=date'1992-01-01';
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(l_modline=[$0], l_moddate=[$1])
> 00-02        Project(l_modline=[$0], l_moddate=[$1])
> 00-03          StreamAgg(group=[{0, 1}])
> 00-04            Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
> 00-05              Project(l_modline=[$2], l_moddate=[$0])
> 00-06                SelectionVectorRemover
> 00-07                  Filter(condition=[AND(=($0, 1992-01-01), =($1,
> 1992-01-01))])
> 00-08                    Project(l_moddate=[$2], l_shipdate=[$1],
> l_modline=[$0])
> 00-09                      Scan.....................
>
> - Rahul
>



-- 
 Steven Phillips
 Software Engineer

 mapr.com

Reply via email to