I use the default optimizer not orca. The resource management's setting is none not YARN.
After analyzing all tables, the problem is still. 2017-01-16 21:50 GMT+08:00 Ruilong Huo <[email protected]>: > Which optimizer do you use in them respectively? Are the table analyzed > before you run the query? > > Best regards, > Ruilong Huo > > On Mon, Jan 16, 2017 at 9:32 PM, Guo Kai <[email protected]> wrote: > >> >>> Hi, All. >>> >>> I ran tpc-h on a hawq cluster. The cluster consists of 1 master node and >>> 3 segment nodes, and the same with HDFS cluster. Meanwhile, I build a gpdb >>> cluster on same nodes. >>> >>> Each node(exactly each vm) has the same configurations -- 4GB memory, 1 >>> vcpu, and sharing host network. I used dbgen to generate 3GB data, then >>> loaded into hawq and gpdb. >>> >>> I ran Query 4 in tpc-h. gpdb finished it in a few seconds, but hawq got >>> stuck for tens of minutes and more. >>> >>> Query 4 is as follows: >>> >>> *select* >>> * o_orderpriority,* >>> * count(*) as order_count* >>> *from* >>> * orders* >>> *where* >>> * o_orderdate >= date '1996-03-01'* >>> * and o_orderdate < date '1996-03-01' + interval '3 month'* >>> * and exists (* >>> * select* >>> * ** >>> * from* >>> * lineitem* >>> * where* >>> * l_orderkey = o_orderkey* >>> * and l_commitdate < l_receiptdate* >>> * )* >>> *group by* >>> * o_orderpriority* >>> *order by^M* >>> * o_orderpriority* >>> *LIMIT 1;* >>> >>> Moreover, I retrieved both of query plans. I found hawq produced a huge >>> cost plan while gpdb did a small one. The obvious difference between them >>> is that hawq adopted >>> nest loop join but gpdb adopted hash join. I checked that the parameter >>> 'enable_nestloop' is off by default. >>> >>> The gpdb plan is as follows: >>> >>> QUERY PLAN >>> >>> ------------------------------------------------------------ >>> ------------------------------------------------------------ >>> ------------------------------------------------------------ >>> Limit (cost=407633.93..407633.96 rows=1 width=72) >>> -> Gather Motion 12:1 (slice3; segments: 12) >>> (cost=407633.93..407633.96 rows=1 width=72) >>> Merge Key: partial_aggregation.o_orderpriority >>> -> Limit (cost=407633.93..407633.94 rows=1 width=72) >>> -> Sort (cost=407633.93..407633.94 rows=1 width=72) >>> Sort Key (Limit): partial_aggregation.o_orderpri >>> ority >>> -> HashAggregate (cost=407633.91..407633.92 >>> rows=1 width=72) >>> Group By: orders.o_orderpriority >>> -> Redistribute Motion 12:12 (slice2; >>> segments: 12) (cost=407633.88..407633.90 rows=1 width=72) >>> Hash Key: orders.o_orderpriority >>> -> HashAggregate >>> (cost=407633.88..407633.88 rows=1 width=72) >>> Group By: orders.o_orderpriority >>> -> HashAggregate >>> (cost=404041.00..406436.25 rows=19961 width=74) >>> Group By: >>> orders.ctid::bigint, orders.gp_segment_id >>> -> Hash Join >>> (cost=89634.53..402843.37 rows=19961 width=26) >>> Hash Cond: >>> lineitem.l_orderkey = orders.o_orderkey >>> -> Append-only >>> Columnar Scan on lineitem (cost=0.00..295217.60 rows=499906 width=4) >>> Filter: >>> l_commitdate < l_receiptdate >>> -> Hash >>> (cost=87388.56..87388.56 rows=14974 width=34) >>> -> >>> Redistribute Motion 12:12 (slice1; segments: 12) (cost=0.00..87388.56 >>> rows=14974 width=34) >>> Hash Key: >>> orders.o_orderkey >>> -> >>> Append-only Columnar Scan on orders (cost=0.00..83795.00 rows=14974 >>> width=34) >>> >>> Filter: o_orderdate >= '1996-03-01'::date AND o_orderdate < '1996-06-01 >>> 00:00:00'::timestamp without time zone >>> (23 rows) >>> >>> The hawq plan is as follows: >>> >>> QUERY PLAN >>> >>> ------------------------------------------------------------ >>> ------------------------------------------------------------ >>> ------------------------------------------------------------------------ >>> Limit (cost=424555364732.86..424555364732.88 rows=1 width=72) >>> -> Gather Motion 18:1 (slice4; segments: 18) >>> (cost=424555364732.86..424555364732.88 rows=1 width=72) >>> Merge Key: partial_aggregation.o_orderpriority >>> -> Limit (cost=424555364732.86..424555364732.86 rows=1 >>> width=72) >>> -> Sort (cost=424555364732.86..424555364732.86 rows=1 >>> width=72) >>> Sort Key (Limit): partial_aggregation.o_orderpri >>> ority >>> -> HashAggregate >>> (cost=424555364732.84..424555364732.85 >>> rows=1 width=72) >>> Group By: orders.o_orderpriority >>> -> Redistribute Motion 18:18 (slice3; >>> segments: 18) (cost=424555364732.80..424555364732.82 rows=1 width=72) >>> Hash Key: orders.o_orderpriority >>> -> HashAggregate >>> (cost=424555364732.80..424555364732.80 rows=1 width=72) >>> Group By: orders.o_orderpriority >>> -> HashAggregate >>> (cost=424555361238.53..424555363568.04 rows=12942 width=74) >>> Group By: >>> orders.ctid::bigint, orders.gp_segment_id >>> -> Result >>> (cost=122726.35..424555360073.77 rows=12942 width=26) >>> -> Redistribute >>> Motion 18:18 (slice2; segments: 18) (cost=122726.35..424555360073.77 >>> rows=12942 width=26) >>> Hash Key: >>> orders.ctid >>> -> Nested Loop >>> (cost=122726.35..424555355414.73 rows=12942 width=26) >>> Join >>> Filter: lineitem.l_orderkey = orders.o_orderkey >>> -> >>> Parquet table Scan on lineitem (cost=0.00..318758.60 rows=333271 width=4) >>> >>> Filter: l_commitdate < l_receiptdate >>> -> >>> Materialize (cost=122726.35..154180.79 rows=174747 width=34) >>> -> >>> Broadcast Motion 18:18 (slice1; segments: 18) (cost=0.00..119580.91 >>> rows=174747 width=34) >>> >>> -> Parquet table Scan on orders (cost=0.00..86379.00 rows=9709 >>> width=34) >>> >>> Filter: o_orderdate >= '1996-03-01'::date AND o_orderdate < >>> '1996-06-01 00:00:00'::timestamp without time zone >>> Settings: default_hash_table_bucket_number=18 >>> (26 rows) >>> >>> Please suggest explanations why this happens. Thank you! >>> >>> > -- Guo Kai, [email protected]
