>
>
> Hi, All.
>
> I ran tpc-h on a hawq cluster. The cluster consists of 1 master node and 3
> segment nodes, and the same with HDFS cluster. Meanwhile, I build a gpdb
> cluster on same nodes.
>
> Each node(exactly each vm) has the same configurations -- 4GB memory, 1
> vcpu, and sharing host network. I used dbgen to generate 3GB data, then
> loaded into hawq and gpdb.
>
> I ran Query 4 in tpc-h. gpdb finished it in a few seconds, but hawq got
> stuck for tens of minutes and more.
>
> Query 4 is as follows:
>
> *select*
> *        o_orderpriority,*
> *        count(*) as order_count*
> *from*
> *        orders*
> *where*
> *        o_orderdate >= date '1996-03-01'*
> *        and o_orderdate < date '1996-03-01' + interval '3 month'*
> *        and exists (*
> *                select*
> *                        **
> *                from*
> *                        lineitem*
> *                where*
> *                        l_orderkey = o_orderkey*
> *                        and l_commitdate < l_receiptdate*
> *        )*
> *group by*
> *        o_orderpriority*
> *order by^M*
> *        o_orderpriority*
> *LIMIT 1;*
>
> Moreover, I retrieved both of query plans. I found hawq produced a huge
> cost plan while gpdb did a small one. The obvious difference between them
> is that hawq adopted
> nest loop join but gpdb adopted hash join. I checked that the parameter
> 'enable_nestloop' is off by default.
>
> The gpdb plan is as follows:
>
>            QUERY PLAN
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
>  Limit  (cost=407633.93..407633.96 rows=1 width=72)
>    ->  Gather Motion 12:1  (slice3; segments: 12)
>  (cost=407633.93..407633.96 rows=1 width=72)
>          Merge Key: partial_aggregation.o_orderpriority
>          ->  Limit  (cost=407633.93..407633.94 rows=1 width=72)
>                ->  Sort  (cost=407633.93..407633.94 rows=1 width=72)
>                      Sort Key (Limit): partial_aggregation.o_orderpriority
>                      ->  HashAggregate  (cost=407633.91..407633.92 rows=1
> width=72)
>                            Group By: orders.o_orderpriority
>                            ->  Redistribute Motion 12:12  (slice2;
> segments: 12)  (cost=407633.88..407633.90 rows=1 width=72)
>                                  Hash Key: orders.o_orderpriority
>                                  ->  HashAggregate
>  (cost=407633.88..407633.88 rows=1 width=72)
>                                        Group By: orders.o_orderpriority
>                                        ->  HashAggregate
>  (cost=404041.00..406436.25 rows=19961 width=74)
>                                              Group By:
> orders.ctid::bigint, orders.gp_segment_id
>                                              ->  Hash Join
>  (cost=89634.53..402843.37 rows=19961 width=26)
>                                                    Hash Cond:
> lineitem.l_orderkey = orders.o_orderkey
>                                                    ->  Append-only
> Columnar Scan on lineitem  (cost=0.00..295217.60 rows=499906 width=4)
>                                                          Filter:
> l_commitdate < l_receiptdate
>                                                    ->  Hash
>  (cost=87388.56..87388.56 rows=14974 width=34)
>                                                          ->  Redistribute
> Motion 12:12  (slice1; segments: 12)  (cost=0.00..87388.56 rows=14974
> width=34)
>                                                                Hash Key:
> orders.o_orderkey
>                                                                ->
>  Append-only Columnar Scan on orders  (cost=0.00..83795.00 rows=14974
> width=34)
>
>  Filter: o_orderdate >= '1996-03-01'::date AND o_orderdate < '1996-06-01
> 00:00:00'::timestamp without time zone
> (23 rows)
>
> The hawq plan is as follows:
>
>                  QUERY PLAN
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------------------
>  Limit  (cost=424555364732.86..424555364732.88 rows=1 width=72)
>    ->  Gather Motion 18:1  (slice4; segments: 18)
>  (cost=424555364732.86..424555364732.88 rows=1 width=72)
>          Merge Key: partial_aggregation.o_orderpriority
>          ->  Limit  (cost=424555364732.86..424555364732.86 rows=1
> width=72)
>                ->  Sort  (cost=424555364732.86..424555364732.86 rows=1
> width=72)
>                      Sort Key (Limit): partial_aggregation.o_orderpriority
>                      ->  HashAggregate  (cost=424555364732.84..424555364732.85
> rows=1 width=72)
>                            Group By: orders.o_orderpriority
>                            ->  Redistribute Motion 18:18  (slice3;
> segments: 18)  (cost=424555364732.80..424555364732.82 rows=1 width=72)
>                                  Hash Key: orders.o_orderpriority
>                                  ->  HashAggregate
>  (cost=424555364732.80..424555364732.80 rows=1 width=72)
>                                        Group By: orders.o_orderpriority
>                                        ->  HashAggregate
>  (cost=424555361238.53..424555363568.04 rows=12942 width=74)
>                                              Group By:
> orders.ctid::bigint, orders.gp_segment_id
>                                              ->  Result
>  (cost=122726.35..424555360073.77 rows=12942 width=26)
>                                                    ->  Redistribute Motion
> 18:18  (slice2; segments: 18)  (cost=122726.35..424555360073.77
> rows=12942 width=26)
>                                                          Hash Key:
> orders.ctid
>                                                          ->  Nested Loop
>  (cost=122726.35..424555355414.73 rows=12942 width=26)
>                                                                Join
> Filter: lineitem.l_orderkey = orders.o_orderkey
>                                                                ->  Parquet
> table Scan on lineitem  (cost=0.00..318758.60 rows=333271 width=4)
>
>  Filter: l_commitdate < l_receiptdate
>                                                                ->
>  Materialize  (cost=122726.35..154180.79 rows=174747 width=34)
>                                                                      ->
>  Broadcast Motion 18:18  (slice1; segments: 18)  (cost=0.00..119580.91
> rows=174747 width=34)
>
>  ->  Parquet table Scan on orders  (cost=0.00..86379.00 rows=9709 width=34)
>
>        Filter: o_orderdate >= '1996-03-01'::date AND o_orderdate <
> '1996-06-01 00:00:00'::timestamp without time zone
>  Settings:  default_hash_table_bucket_number=18
> (26 rows)
>
> Please suggest explanations why this happens. Thank you!
>
>

Reply via email to